Skip to content

Memory Mechanisms

LLM's Short-term Memory (Context)

The "short-term memory" of large language models is their context window, containing all information in the current conversation.

Characteristics

  1. Limited Capacity: Constrained by context window size (e.g., 4K, 32K, 200K tokens)
  2. Session-dependent: Only valid in the current conversation
  3. Dynamic Updates: Continuously changes as conversation progresses
  4. Cost-sensitive: Longer context means higher usage costs

Example

Conversation start:
User: My name is Zhang San
AI: Hello Zhang San!

After multiple rounds of conversation:
User: What's my name?
AI: Your name is Zhang San. (Remembered from context)

New conversation:
User: What's my name?
AI: I don't know your name. (New conversation, no previous information)

LLM's Long-term Memory (Training Data)

The "long-term memory" of large language models is the knowledge learned during training, encoded in the model's parameters.

Characteristics

  1. Huge Capacity: Contains all knowledge from training data
  2. Persistent: Doesn't disappear when conversation ends
  3. Static and Fixed: Fixed after training completes (unless retrained)
  4. Difficult to Update: Updating requires retraining or fine-tuning

Example

User: What is Python?

AI: Python is a high-level programming language created by Guido van Rossum in 1991...
(This is long-term memory learned from training data)

User: What's the birth date of Python's creator?

AI: Python's creator Guido van Rossum was born on January 31, 1956...
(This is also long-term memory learned from training data)

Memory Capacity Limitations

Short-term Memory Limitations

Problem:

  • Context window is limited (e.g., 4K, 32K, 200K tokens)
  • Information beyond the window is "forgotten"
  • Early information in long conversations may be lost

Example:

Long conversation scenario:
Round 1: [Important information A]
Round 2: [Information B]
...
Round 100: [Information Z]

Problem: Early important information A may have been "forgotten"

Solutions:

  1. Periodically summarize key information
  2. Only keep necessary information
  3. Use external memory systems

Long-term Memory Limitations

Problem:

  • Training data has a cutoff date
  • Cannot know new information after training
  • May contain outdated information

Example:

User: Who is the US President in 2024?

AI: [Answer based on 2023 data, possibly inaccurate]
(Because training data cutoff is 2023)

Solutions:

  1. Use online search functionality
  2. Combine with external knowledge bases
  3. Periodically update models

Memory Decay Problem

What is Memory Decay

Memory decay refers to the phenomenon where as conversation progresses, early information gradually loses weight in the model's attention, causing the model to "forget" early information.

Example:

Conversation start:
User: My name is Zhang San, I'm a programmer, living in Shanghai...
AI: Okay, I've noted it.

After 50 rounds:
User: What's my profession?
AI: I'm not quite sure, did you mention it before?
(Early information has decayed)

Impact of Memory Decay

  1. Information Loss: Early information may be forgotten
  2. Inconsistency: May give inconsistent answers
  3. Context Confusion: May confuse different information

Solutions

  1. Periodic Summarization
Prompt: "Let's summarize the key information so far..."
  1. Repeat Key Information
Prompt: "Remember, my name is Zhang San, profession is programmer, living in Shanghai."
  1. Use Structured Format
Prompt: "My information:
- Name: Zhang San
- Profession: Programmer
- Address: Shanghai"

External Memory Mechanisms (RAG)

What is RAG

RAG (Retrieval-Augmented Generation) is a technique that combines external knowledge bases, enhancing model capabilities by retrieving relevant information.

How RAG Works

Step 1: User asks question
Step 2: Retrieve relevant information from external knowledge base
Step 3: Input retrieved information along with user question to model
Step 4: Model generates answer based on retrieved information

Advantages of RAG

  1. Extended Memory: Not limited by context window
  2. Real-time Updates: Knowledge base can be updated anytime
  3. High Accuracy: Based on reliable information sources
  4. Traceable: Can know information sources

RAG Application Scenarios

  1. Enterprise Knowledge Base: Company documents, processes, FAQs
  2. Technical Documentation: API docs, development guides
  3. Academic Research: Papers, research reports
  4. Legal Consultation: Laws, cases

RAG Implementation Example

Scenario: Answering questions based on company documents

System components:
1. Vector database: Stores vector representations of documents
2. Retrieval system: Retrieves relevant documents based on questions
3. LLM: Generates answers based on retrieved information

Workflow:
User question: "What is the company's leave request process?"

Retrieval: Retrieve relevant documents from vector database

Context: [Retrieved document content]

Generation: LLM generates answer based on context

Answer: "According to company documents, the leave request process is as follows..."

Memory and Forgetting

Why Forgetting is Needed

  1. Avoid Interference: Irrelevant information may interfere with current task
  2. Save Resources: Retaining all information is costly
  3. Adapt to Changes: Need to update information when environment changes

Active Forgetting Strategies

  1. Clear Irrelevant Information
Prompt: "Let's start a new topic, forget previous discussion about X."
  1. Start New Conversation
New conversation clears all previous context
  1. Selective Retention
Prompt: "Remember these key pieces of information: [key info], forget other details."

Practical Application Cases

Case 1: Long Conversation Management

Scenario: Continuous technical consultation

Problem: Conversation is very long, early information may be forgotten

Solution:

1. Periodic summarization
Prompt: "Summarize the key technical points discussed so far"

2. Structured storage
Prompt: "Project information:
- Tech stack: [tech stack]
- Architecture: [architecture]
- Issues: [issues]"

3. Staged processing
Prompt: "Let's complete this stage, summarize key points, then move to next stage"

Case 2: Knowledge Base Q&A

Scenario: Answering questions based on company documents

Solution: Use RAG

1. Build vector database
- Convert documents to vectors
- Store in vector database

2. Implement retrieval system
- Retrieve relevant documents based on question
- Return most relevant document fragments

3. Combine LLM to generate answers
- Use retrieved information as context
- Let LLM generate answer based on context

Case 3: Personalized Assistant

Scenario: Remember user preferences

Solution:

1. User profile
- Store user preferences, history, etc.
- Use database for persistent storage

2. Context injection
- Inject user profile at conversation start
- Let AI remember key information

3. Dynamic updates
- Update user profile based on conversation
- Maintain information timeliness

Summary

Memory mechanisms are an important component of AI systems:

Key Points:

  • ✅ LLMs have short-term memory (context) and long-term memory (training data)
  • ✅ Memory capacity has limitations
  • ✅ Memory decays over time
  • ✅ External memory (RAG) can extend memory capabilities
  • ✅ Need to manage memory and forgetting

Best Practices:

  1. Periodically summarize key information
  2. Use structured formats to store information
  3. Reasonably use external memory systems
  4. Manage context window
  5. Actively forget when necessary

Remember:

  • AI memory is not real "memory"
  • Short-term memory has limitations
  • Long-term memory is static and fixed
  • External memory can extend capabilities

Understanding memory mechanisms helps better use AI, especially when handling long conversations and complex tasks.

Next Steps

MIT Licensed