Probabilistic Prediction
LLM is Essentially a Probabilistic Prediction Model
The core of large language models (LLM) is a probabilistic prediction model. Its working principle is: given the preceding text, predict the most likely next word.
Simple Understanding: LLM is like a super "fill-in-the-blank" expert, predicting the most suitable next word based on the preceding content.
Next Token Prediction Mechanism
Working Principle
- Input Processing: Convert input text to tokens
- Context Understanding: Understand the context and semantics of input text
- Probability Calculation: Calculate the probability of each possible next word
- Selection Output: Select next word based on probability
Example:
Input: "The weather today is really"
Possible next words and probabilities:
- good: 0.4
- great: 0.3
- nice: 0.2
- bad: 0.1
Selection: "good" (highest probability)
Output: "The weather today is really good"Multi-step Generation
Generating complete text requires multiple predictions:
Step 1: Input "The weather today is really" → Predict "good"
Step 2: Input "The weather today is really good" → Predict ","
Step 3: Input "The weather today is really good," → Predict "perfect"
Step 4: Input "The weather today is really good, perfect" → Predict "for"
Step 5: Input "The weather today is really good, perfect for" → Predict "going"
Final Output: "The weather today is really good, perfect for going out"Role of Temperature Parameter
Temperature parameter controls the randomness of model output, affecting the smoothness of probability distribution.
Impact of Temperature Values
| Temperature | Effect | Application Scenarios |
|---|---|---|
| 0.0-0.3 | High output determinism, strong repeatability | Tasks requiring precise answers |
| 0.4-0.7 | Balance between determinism and creativity | General conversation and writing |
| 0.8-1.0 | High output creativity, large diversity | Creative writing and brainstorming |
| 1.0+ | Very random output, possibly incoherent | Experimental exploration |
Temperature's Effect on Probability
Example:
Original probabilities:
- good: 0.4
- great: 0.3
- nice: 0.2
- bad: 0.1
Temperature=0.5 (more deterministic):
- good: 0.5
- great: 0.3
- nice: 0.15
- bad: 0.05
Temperature=1.5 (more random):
- good: 0.3
- great: 0.25
- nice: 0.25
- bad: 0.2Temperature Usage Recommendations
Low Temperature (0.0-0.3):
- Code generation
- Technical documentation
- Translation tasks
- Factual Q&A
Medium Temperature (0.4-0.7):
- Daily conversation
- Article writing
- Email drafting
- Learning assistance
High Temperature (0.8-1.0+):
- Creative writing
- Brainstorming
- Story creation
- Idea generation
Top-p and Top-k Sampling
Besides temperature, there are other methods to control output randomness.
Top-k Sampling
Only select from the k words with highest probability.
Example:
Original probabilities:
- good: 0.4
- great: 0.3
- nice: 0.2
- bad: 0.1
Top-k=2:
Only select from "good" and "great"Advantages:
- Filters out low-probability words
- Improves output quality
Disadvantages:
- May miss reasonable low-probability words
- k value needs adjustment
Top-p Sampling
Select from words whose cumulative probability reaches p.
Example:
Original probabilities:
- good: 0.4
- great: 0.3
- nice: 0.2
- bad: 0.1
Top-p=0.7:
Select "good" (0.4) + "great" (0.3) = 0.7
Select from "good" and "great"Advantages:
- Adaptive selection range
- More flexible
Disadvantages:
- Slightly more complex computation
Combined Use
Usually combine temperature, Top-p, and Top-k:
Temperature=0.7
Top-k=50
Top-p=0.9Deterministic vs Random Output
Deterministic Output
Characteristics:
- Same input produces same output
- Suitable for scenarios requiring consistency
Applications:
- Code generation
- Technical documentation
- Test case generation
Settings:
Temperature=0
Top-k=1Random Output
Characteristics:
- Same input may produce different outputs
- Suitable for scenarios requiring diversity
Applications:
- Creative writing
- Brainstorming
- Story creation
Settings:
Temperature=0.8-1.0
Top-p=0.9-1.0Practical Application Cases
Case 1: Code Generation
Scenario: Generate a function
Prompt:
Write a quicksort function in PythonLow Temperature (0.2) Output:
def quick_sort(arr):
if len(arr) <= 1:
return arr
pivot = arr[len(arr) // 2]
left = [x for x in arr if x < pivot]
middle = [x for x in arr if x == pivot]
right = [x for x in arr if x > pivot]
return quick_sort(left) + middle + quick_sort(right)High Temperature (0.9) Output:
def quick_sort(arr):
"""
Sort an array using quicksort algorithm
"""
if len(arr) <= 1:
return arr
# Choose pivot element
pivot = arr[len(arr) // 2]
# Partition
left = [x for x in arr if x < pivot]
middle = [x for x in arr if x == pivot]
right = [x for x in arr if x > pivot]
# Recursively sort and merge
return quick_sort(left) + middle + quick_sort(right)Case 2: Creative Writing
Scenario: Write a story opening
Prompt:
Write the opening of a sci-fi storyLow Temperature (0.3) Output:
In 2050, humanity finally achieved interstellar travel. The first interstellar spacecraft "Hope" departed from Earth, heading for Alpha Centauri.High Temperature (0.9) Output:
Neon lights flickered in the rainy Tokyo night, reflecting off the cyborg mechanic's metal arm. He had just completed an illegal neural implant surgery, and his client, a mysterious android, was watching him with glowing eyes...Case 3: Technical Documentation
Scenario: Explain a concept
Prompt:
Explain what a Transformer isLow Temperature (0.2) Output:
Transformer is a deep learning architecture proposed by Google in 2017. It uses self-attention mechanism to process sequence data and is the foundation of modern large language models.High Temperature (0.8) Output:
Imagine you're reading a book and can focus on all its content simultaneously rather than word by word. This is the core idea of Transformer—self-attention mechanism. It allows AI models to "see" the entire context, thus better understanding and generating text.Summary
Probabilistic prediction is the core mechanism of large language models:
Key Points:
- ✅ LLM is essentially a probabilistic prediction model
- ✅ It predicts the most likely next word
- ✅ Temperature parameter controls output randomness
- ✅ Top-p and Top-k sampling are also important control methods
Best Practices:
- Choose appropriate temperature based on task
- Use low temperature for code and documentation
- Use high temperature for creative tasks
- Combine multiple sampling methods
- Test effects of different parameters
Parameter Selection Guide:
- Code generation: Temperature 0-0.3
- Technical documentation: Temperature 0.2-0.5
- Daily conversation: Temperature 0.5-0.7
- Article writing: Temperature 0.6-0.8
- Creative tasks: Temperature 0.8-1.0+
Understanding probabilistic prediction helps better control AI output and achieve desired results.
Next Steps
- Hallucination Phenomenon - Learn why AI "fabricates" information
- Reasoning Capabilities - Learn about AI's reasoning mechanisms