Probabilistic Prediction

LLM is Essentially a Probabilistic Prediction Model

The core of large language models (LLM) is a probabilistic prediction model. Its working principle is: given the preceding text, predict the most likely next word.

Simple Understanding: LLM is like a super "fill-in-the-blank" expert, predicting the most suitable next word based on the preceding content.

Next Token Prediction Mechanism

Working Principle

Input Processing: Convert input text to tokens
Context Understanding: Understand the context and semantics of input text
Probability Calculation: Calculate the probability of each possible next word
Selection Output: Select next word based on probability

Example:

Input: "The weather today is really"
Possible next words and probabilities:
- good: 0.4
- great: 0.3
- nice: 0.2
- bad: 0.1

Selection: "good" (highest probability)
Output: "The weather today is really good"

Multi-step Generation

Generating complete text requires multiple predictions:

Step 1: Input "The weather today is really" → Predict "good"
Step 2: Input "The weather today is really good" → Predict ","
Step 3: Input "The weather today is really good," → Predict "perfect"
Step 4: Input "The weather today is really good, perfect" → Predict "for"
Step 5: Input "The weather today is really good, perfect for" → Predict "going"

Final Output: "The weather today is really good, perfect for going out"

Role of Temperature Parameter

Temperature parameter controls the randomness of model output, affecting the smoothness of probability distribution.

Impact of Temperature Values

Temperature	Effect	Application Scenarios
0.0-0.3	High output determinism, strong repeatability	Tasks requiring precise answers
0.4-0.7	Balance between determinism and creativity	General conversation and writing
0.8-1.0	High output creativity, large diversity	Creative writing and brainstorming
1.0+	Very random output, possibly incoherent	Experimental exploration

Temperature's Effect on Probability

Example:

Original probabilities:
- good: 0.4
- great: 0.3
- nice: 0.2
- bad: 0.1

Temperature=0.5 (more deterministic):
- good: 0.5
- great: 0.3
- nice: 0.15
- bad: 0.05

Temperature=1.5 (more random):
- good: 0.3
- great: 0.25
- nice: 0.25
- bad: 0.2

Temperature Usage Recommendations

Low Temperature (0.0-0.3):

Code generation
Technical documentation
Translation tasks
Factual Q&A

Medium Temperature (0.4-0.7):

Daily conversation
Article writing
Email drafting
Learning assistance

High Temperature (0.8-1.0+):

Creative writing
Brainstorming
Story creation
Idea generation

Top-p and Top-k Sampling

Besides temperature, there are other methods to control output randomness.

Top-k Sampling

Only select from the k words with highest probability.

Example:

Original probabilities:
- good: 0.4
- great: 0.3
- nice: 0.2
- bad: 0.1

Top-k=2:
Only select from "good" and "great"

Advantages:

Filters out low-probability words
Improves output quality

Disadvantages:

May miss reasonable low-probability words
k value needs adjustment

Top-p Sampling

Select from words whose cumulative probability reaches p.

Example:

Original probabilities:
- good: 0.4
- great: 0.3
- nice: 0.2
- bad: 0.1

Top-p=0.7:
Select "good" (0.4) + "great" (0.3) = 0.7
Select from "good" and "great"

Advantages:

Adaptive selection range
More flexible

Disadvantages:

Slightly more complex computation

Combined Use

Usually combine temperature, Top-p, and Top-k:

Temperature=0.7
Top-k=50
Top-p=0.9

Deterministic vs Random Output

Deterministic Output

Characteristics:

Same input produces same output
Suitable for scenarios requiring consistency

Applications:

Code generation
Technical documentation
Test case generation

Settings:

Temperature=0
Top-k=1

Random Output

Characteristics:

Same input may produce different outputs
Suitable for scenarios requiring diversity

Applications:

Creative writing
Brainstorming
Story creation

Settings:

Temperature=0.8-1.0
Top-p=0.9-1.0

Practical Application Cases

Case 1: Code Generation

Scenario: Generate a function

Prompt:

Write a quicksort function in Python

Low Temperature (0.2) Output:

python

def quick_sort(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[len(arr) // 2]
    left = [x for x in arr if x < pivot]
    middle = [x for x in arr if x == pivot]
    right = [x for x in arr if x > pivot]
    return quick_sort(left) + middle + quick_sort(right)

High Temperature (0.9) Output:

python

def quick_sort(arr):
    """
    Sort an array using quicksort algorithm
    """
    if len(arr) <= 1:
        return arr

    # Choose pivot element
    pivot = arr[len(arr) // 2]

    # Partition
    left = [x for x in arr if x < pivot]
    middle = [x for x in arr if x == pivot]
    right = [x for x in arr if x > pivot]

    # Recursively sort and merge
    return quick_sort(left) + middle + quick_sort(right)

Case 2: Creative Writing

Scenario: Write a story opening

Prompt:

Write the opening of a sci-fi story

Low Temperature (0.3) Output:

In 2050, humanity finally achieved interstellar travel. The first interstellar spacecraft "Hope" departed from Earth, heading for Alpha Centauri.

High Temperature (0.9) Output:

Neon lights flickered in the rainy Tokyo night, reflecting off the cyborg mechanic's metal arm. He had just completed an illegal neural implant surgery, and his client, a mysterious android, was watching him with glowing eyes...

Case 3: Technical Documentation

Scenario: Explain a concept

Prompt:

Explain what a Transformer is

Low Temperature (0.2) Output:

Transformer is a deep learning architecture proposed by Google in 2017. It uses self-attention mechanism to process sequence data and is the foundation of modern large language models.

High Temperature (0.8) Output:

Imagine you're reading a book and can focus on all its content simultaneously rather than word by word. This is the core idea of Transformer—self-attention mechanism. It allows AI models to "see" the entire context, thus better understanding and generating text.

Summary

Probabilistic prediction is the core mechanism of large language models:

Key Points:

✅ LLM is essentially a probabilistic prediction model
✅ It predicts the most likely next word
✅ Temperature parameter controls output randomness
✅ Top-p and Top-k sampling are also important control methods

Best Practices:

Choose appropriate temperature based on task
Use low temperature for code and documentation
Use high temperature for creative tasks
Combine multiple sampling methods
Test effects of different parameters

Parameter Selection Guide:

Code generation: Temperature 0-0.3
Technical documentation: Temperature 0.2-0.5
Daily conversation: Temperature 0.5-0.7
Article writing: Temperature 0.6-0.8
Creative tasks: Temperature 0.8-1.0+

Understanding probabilistic prediction helps better control AI output and achieve desired results.

Next Steps

Hallucination Phenomenon - Learn why AI "fabricates" information
Reasoning Capabilities - Learn about AI's reasoning mechanisms

Probabilistic Prediction ​

LLM is Essentially a Probabilistic Prediction Model ​

Next Token Prediction Mechanism ​

Working Principle ​

Multi-step Generation ​

Role of Temperature Parameter ​

Impact of Temperature Values ​

Temperature's Effect on Probability ​

Temperature Usage Recommendations ​

Top-p and Top-k Sampling ​

Top-k Sampling ​

Top-p Sampling ​

Combined Use ​

Deterministic vs Random Output ​

Deterministic Output ​

Random Output ​

Practical Application Cases ​

Case 1: Code Generation ​

Case 2: Creative Writing ​

Case 3: Technical Documentation ​

Summary ​

Next Steps ​

Probabilistic Prediction

LLM is Essentially a Probabilistic Prediction Model

Next Token Prediction Mechanism

Working Principle

Multi-step Generation

Role of Temperature Parameter

Impact of Temperature Values

Temperature's Effect on Probability

Temperature Usage Recommendations

Top-p and Top-k Sampling

Top-k Sampling

Top-p Sampling

Combined Use

Deterministic vs Random Output

Deterministic Output

Random Output

Practical Application Cases

Case 1: Code Generation

Case 2: Creative Writing

Case 3: Technical Documentation

Summary

Next Steps