Memory Mechanisms
LLM's Short-term Memory (Context)
The "short-term memory" of large language models is their context window, containing all information in the current conversation.
Characteristics
- Limited Capacity: Constrained by context window size (e.g., 4K, 32K, 200K tokens)
- Session-dependent: Only valid in the current conversation
- Dynamic Updates: Continuously changes as conversation progresses
- Cost-sensitive: Longer context means higher usage costs
Example
Conversation start:
User: My name is Zhang San
AI: Hello Zhang San!
After multiple rounds of conversation:
User: What's my name?
AI: Your name is Zhang San. (Remembered from context)
New conversation:
User: What's my name?
AI: I don't know your name. (New conversation, no previous information)LLM's Long-term Memory (Training Data)
The "long-term memory" of large language models is the knowledge learned during training, encoded in the model's parameters.
Characteristics
- Huge Capacity: Contains all knowledge from training data
- Persistent: Doesn't disappear when conversation ends
- Static and Fixed: Fixed after training completes (unless retrained)
- Difficult to Update: Updating requires retraining or fine-tuning
Example
User: What is Python?
AI: Python is a high-level programming language created by Guido van Rossum in 1991...
(This is long-term memory learned from training data)
User: What's the birth date of Python's creator?
AI: Python's creator Guido van Rossum was born on January 31, 1956...
(This is also long-term memory learned from training data)Memory Capacity Limitations
Short-term Memory Limitations
Problem:
- Context window is limited (e.g., 4K, 32K, 200K tokens)
- Information beyond the window is "forgotten"
- Early information in long conversations may be lost
Example:
Long conversation scenario:
Round 1: [Important information A]
Round 2: [Information B]
...
Round 100: [Information Z]
Problem: Early important information A may have been "forgotten"Solutions:
- Periodically summarize key information
- Only keep necessary information
- Use external memory systems
Long-term Memory Limitations
Problem:
- Training data has a cutoff date
- Cannot know new information after training
- May contain outdated information
Example:
User: Who is the US President in 2024?
AI: [Answer based on 2023 data, possibly inaccurate]
(Because training data cutoff is 2023)Solutions:
- Use online search functionality
- Combine with external knowledge bases
- Periodically update models
Memory Decay Problem
What is Memory Decay
Memory decay refers to the phenomenon where as conversation progresses, early information gradually loses weight in the model's attention, causing the model to "forget" early information.
Example:
Conversation start:
User: My name is Zhang San, I'm a programmer, living in Shanghai...
AI: Okay, I've noted it.
After 50 rounds:
User: What's my profession?
AI: I'm not quite sure, did you mention it before?
(Early information has decayed)Impact of Memory Decay
- Information Loss: Early information may be forgotten
- Inconsistency: May give inconsistent answers
- Context Confusion: May confuse different information
Solutions
- Periodic Summarization
Prompt: "Let's summarize the key information so far..."- Repeat Key Information
Prompt: "Remember, my name is Zhang San, profession is programmer, living in Shanghai."- Use Structured Format
Prompt: "My information:
- Name: Zhang San
- Profession: Programmer
- Address: Shanghai"External Memory Mechanisms (RAG)
What is RAG
RAG (Retrieval-Augmented Generation) is a technique that combines external knowledge bases, enhancing model capabilities by retrieving relevant information.
How RAG Works
Step 1: User asks question
Step 2: Retrieve relevant information from external knowledge base
Step 3: Input retrieved information along with user question to model
Step 4: Model generates answer based on retrieved informationAdvantages of RAG
- Extended Memory: Not limited by context window
- Real-time Updates: Knowledge base can be updated anytime
- High Accuracy: Based on reliable information sources
- Traceable: Can know information sources
RAG Application Scenarios
- Enterprise Knowledge Base: Company documents, processes, FAQs
- Technical Documentation: API docs, development guides
- Academic Research: Papers, research reports
- Legal Consultation: Laws, cases
RAG Implementation Example
Scenario: Answering questions based on company documents
System components:
1. Vector database: Stores vector representations of documents
2. Retrieval system: Retrieves relevant documents based on questions
3. LLM: Generates answers based on retrieved information
Workflow:
User question: "What is the company's leave request process?"
↓
Retrieval: Retrieve relevant documents from vector database
↓
Context: [Retrieved document content]
↓
Generation: LLM generates answer based on context
↓
Answer: "According to company documents, the leave request process is as follows..."Memory and Forgetting
Why Forgetting is Needed
- Avoid Interference: Irrelevant information may interfere with current task
- Save Resources: Retaining all information is costly
- Adapt to Changes: Need to update information when environment changes
Active Forgetting Strategies
- Clear Irrelevant Information
Prompt: "Let's start a new topic, forget previous discussion about X."- Start New Conversation
New conversation clears all previous context- Selective Retention
Prompt: "Remember these key pieces of information: [key info], forget other details."Practical Application Cases
Case 1: Long Conversation Management
Scenario: Continuous technical consultation
Problem: Conversation is very long, early information may be forgotten
Solution:
1. Periodic summarization
Prompt: "Summarize the key technical points discussed so far"
2. Structured storage
Prompt: "Project information:
- Tech stack: [tech stack]
- Architecture: [architecture]
- Issues: [issues]"
3. Staged processing
Prompt: "Let's complete this stage, summarize key points, then move to next stage"Case 2: Knowledge Base Q&A
Scenario: Answering questions based on company documents
Solution: Use RAG
1. Build vector database
- Convert documents to vectors
- Store in vector database
2. Implement retrieval system
- Retrieve relevant documents based on question
- Return most relevant document fragments
3. Combine LLM to generate answers
- Use retrieved information as context
- Let LLM generate answer based on contextCase 3: Personalized Assistant
Scenario: Remember user preferences
Solution:
1. User profile
- Store user preferences, history, etc.
- Use database for persistent storage
2. Context injection
- Inject user profile at conversation start
- Let AI remember key information
3. Dynamic updates
- Update user profile based on conversation
- Maintain information timelinessSummary
Memory mechanisms are an important component of AI systems:
Key Points:
- ✅ LLMs have short-term memory (context) and long-term memory (training data)
- ✅ Memory capacity has limitations
- ✅ Memory decays over time
- ✅ External memory (RAG) can extend memory capabilities
- ✅ Need to manage memory and forgetting
Best Practices:
- Periodically summarize key information
- Use structured formats to store information
- Reasonably use external memory systems
- Manage context window
- Actively forget when necessary
Remember:
- AI memory is not real "memory"
- Short-term memory has limitations
- Long-term memory is static and fixed
- External memory can extend capabilities
Understanding memory mechanisms helps better use AI, especially when handling long conversations and complex tasks.
Next Steps
- What is an Agent - Learn about Agent concepts
- Agent Architecture - Learn about Agent architecture design