Skip to content

Probabilistic Prediction

LLM is Essentially a Probabilistic Prediction Model

The core of large language models (LLM) is a probabilistic prediction model. Its working principle is: given the preceding text, predict the most likely next word.

Simple Understanding: LLM is like a super "fill-in-the-blank" expert, predicting the most suitable next word based on the preceding content.

Next Token Prediction Mechanism

Working Principle

  1. Input Processing: Convert input text to tokens
  2. Context Understanding: Understand the context and semantics of input text
  3. Probability Calculation: Calculate the probability of each possible next word
  4. Selection Output: Select next word based on probability

Example:

Input: "The weather today is really"
Possible next words and probabilities:
- good: 0.4
- great: 0.3
- nice: 0.2
- bad: 0.1

Selection: "good" (highest probability)
Output: "The weather today is really good"

Multi-step Generation

Generating complete text requires multiple predictions:

Step 1: Input "The weather today is really" → Predict "good"
Step 2: Input "The weather today is really good" → Predict ","
Step 3: Input "The weather today is really good," → Predict "perfect"
Step 4: Input "The weather today is really good, perfect" → Predict "for"
Step 5: Input "The weather today is really good, perfect for" → Predict "going"

Final Output: "The weather today is really good, perfect for going out"

Role of Temperature Parameter

Temperature parameter controls the randomness of model output, affecting the smoothness of probability distribution.

Impact of Temperature Values

TemperatureEffectApplication Scenarios
0.0-0.3High output determinism, strong repeatabilityTasks requiring precise answers
0.4-0.7Balance between determinism and creativityGeneral conversation and writing
0.8-1.0High output creativity, large diversityCreative writing and brainstorming
1.0+Very random output, possibly incoherentExperimental exploration

Temperature's Effect on Probability

Example:

Original probabilities:
- good: 0.4
- great: 0.3
- nice: 0.2
- bad: 0.1

Temperature=0.5 (more deterministic):
- good: 0.5
- great: 0.3
- nice: 0.15
- bad: 0.05

Temperature=1.5 (more random):
- good: 0.3
- great: 0.25
- nice: 0.25
- bad: 0.2

Temperature Usage Recommendations

Low Temperature (0.0-0.3):

  • Code generation
  • Technical documentation
  • Translation tasks
  • Factual Q&A

Medium Temperature (0.4-0.7):

  • Daily conversation
  • Article writing
  • Email drafting
  • Learning assistance

High Temperature (0.8-1.0+):

  • Creative writing
  • Brainstorming
  • Story creation
  • Idea generation

Top-p and Top-k Sampling

Besides temperature, there are other methods to control output randomness.

Top-k Sampling

Only select from the k words with highest probability.

Example:

Original probabilities:
- good: 0.4
- great: 0.3
- nice: 0.2
- bad: 0.1

Top-k=2:
Only select from "good" and "great"

Advantages:

  • Filters out low-probability words
  • Improves output quality

Disadvantages:

  • May miss reasonable low-probability words
  • k value needs adjustment

Top-p Sampling

Select from words whose cumulative probability reaches p.

Example:

Original probabilities:
- good: 0.4
- great: 0.3
- nice: 0.2
- bad: 0.1

Top-p=0.7:
Select "good" (0.4) + "great" (0.3) = 0.7
Select from "good" and "great"

Advantages:

  • Adaptive selection range
  • More flexible

Disadvantages:

  • Slightly more complex computation

Combined Use

Usually combine temperature, Top-p, and Top-k:

Temperature=0.7
Top-k=50
Top-p=0.9

Deterministic vs Random Output

Deterministic Output

Characteristics:

  • Same input produces same output
  • Suitable for scenarios requiring consistency

Applications:

  • Code generation
  • Technical documentation
  • Test case generation

Settings:

Temperature=0
Top-k=1

Random Output

Characteristics:

  • Same input may produce different outputs
  • Suitable for scenarios requiring diversity

Applications:

  • Creative writing
  • Brainstorming
  • Story creation

Settings:

Temperature=0.8-1.0
Top-p=0.9-1.0

Practical Application Cases

Case 1: Code Generation

Scenario: Generate a function

Prompt:

Write a quicksort function in Python

Low Temperature (0.2) Output:

python
def quick_sort(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[len(arr) // 2]
    left = [x for x in arr if x < pivot]
    middle = [x for x in arr if x == pivot]
    right = [x for x in arr if x > pivot]
    return quick_sort(left) + middle + quick_sort(right)

High Temperature (0.9) Output:

python
def quick_sort(arr):
    """
    Sort an array using quicksort algorithm
    """
    if len(arr) <= 1:
        return arr

    # Choose pivot element
    pivot = arr[len(arr) // 2]

    # Partition
    left = [x for x in arr if x < pivot]
    middle = [x for x in arr if x == pivot]
    right = [x for x in arr if x > pivot]

    # Recursively sort and merge
    return quick_sort(left) + middle + quick_sort(right)

Case 2: Creative Writing

Scenario: Write a story opening

Prompt:

Write the opening of a sci-fi story

Low Temperature (0.3) Output:

In 2050, humanity finally achieved interstellar travel. The first interstellar spacecraft "Hope" departed from Earth, heading for Alpha Centauri.

High Temperature (0.9) Output:

Neon lights flickered in the rainy Tokyo night, reflecting off the cyborg mechanic's metal arm. He had just completed an illegal neural implant surgery, and his client, a mysterious android, was watching him with glowing eyes...

Case 3: Technical Documentation

Scenario: Explain a concept

Prompt:

Explain what a Transformer is

Low Temperature (0.2) Output:

Transformer is a deep learning architecture proposed by Google in 2017. It uses self-attention mechanism to process sequence data and is the foundation of modern large language models.

High Temperature (0.8) Output:

Imagine you're reading a book and can focus on all its content simultaneously rather than word by word. This is the core idea of Transformer—self-attention mechanism. It allows AI models to "see" the entire context, thus better understanding and generating text.

Summary

Probabilistic prediction is the core mechanism of large language models:

Key Points:

  • ✅ LLM is essentially a probabilistic prediction model
  • ✅ It predicts the most likely next word
  • ✅ Temperature parameter controls output randomness
  • ✅ Top-p and Top-k sampling are also important control methods

Best Practices:

  1. Choose appropriate temperature based on task
  2. Use low temperature for code and documentation
  3. Use high temperature for creative tasks
  4. Combine multiple sampling methods
  5. Test effects of different parameters

Parameter Selection Guide:

  • Code generation: Temperature 0-0.3
  • Technical documentation: Temperature 0.2-0.5
  • Daily conversation: Temperature 0.5-0.7
  • Article writing: Temperature 0.6-0.8
  • Creative tasks: Temperature 0.8-1.0+

Understanding probabilistic prediction helps better control AI output and achieve desired results.

Next Steps

MIT Licensed