If you are a developer and are jumping into the world of building Gen AI applications there are 3 parameters you need to understand first. Temperature, Top-K and Top-P.

What is LLM (Large Language Model)

In simple terms, an LLM (Large Language Model) is a smart computer that can understand your questions and respond according to your needs. It can also write text, create songs, videos, images, code, and more. However, LLMs work step by step. They generate tokens one at a time and predict the next token.

What are tokens?

Tokens are the smallest units in LLMs. Every sentence or word you input into LLMs (e.g., ChatGPT, Gemini, Claude, etc.) is broken down into tokens. For example, if you ask the AI, “How are you?”, each part of the sentence is first converted into tokens (tokenized) and then processed by the LLMs. Based on these input tokens, LLMs predict the output tokens, which are then converted back into text (detokenized) to show you the LLM's response.

Key Takeaway so far : LLMs predict the output token based on the input token.

With this introduction, it's clear that LLMs predict the output. As a developer, you want to ensure that users always receive the correct output, meaning the LLM in your application predicts accurately. In this point, the three parameters Temperature, Top-K and Top-P become very important. However these parameters do not solely contribute to your accuracy but this is first step.

With the motivation lets begin our discussion on Temperature first

Temperature

The temperature parameter controls how random the generated text is. By adjusting the temperature, you change how the model picks the next word in a sequence, affecting how creative or predictable the output will be.

For an example, you asked chatGPT “I went to the zoo and saw..”. Before generating the next predicted tokens chatGPT will consider different options like “lion”, “tiger”, “hippo” etc. But there will be a probability associated with each word. Temperature controls these probabilities. A higher temperature means more randomness and a lower temperature means less randomenss

Depending on specific usecases, you have to set your temperature. If you are doing something creative a higher temperature will be beneficial but if you are working with some specific solutions then lower temperature is always good.

OpenAI	temp ranges from 0.0 to 2.0 default is 1.0
Anthropic	temp ranges from 0.0 to 1.0 default is 1.0
Gemini	temp ranges from 0.0 to 2.0 default is 1.0

Let us try an example:

Temperature is set to 0.1 (less randomness, less creative)

from google import genai
from google.genai import types

client = genai.Client(api_key='YOUR_API_KEY')

response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents=["Write a 1 day experience in a zoo of a small girl"],
    config=types.GenerateContentConfig(
        max_output_tokens=500,
        temperature=0.1
    )
)
print(response.text)

When I set the temperature at 2.0 i.e. highly random and higher creativity the output is

I think the difference is clearly visible to you now. There is no correct formula on what the value of temperature should be but you now have the idea and the rest you have to do is experiment.

Top K and Top P

Top K is a setting supported by some LLMs that determines how many of the most likely tokens should be considered when generating a response. It specifically decides how many tokens to consider for generating the next output token.

For example, if I say I like to add toppings to my burger, with a Top K of "2", the LLM would only consider the two most likely tokens, like ketchup (0.2) and mustard (0.1). Other options, such as onion (0.05), pickles (0.04), or butter (0.02), would not be considered.

Top P on the other hand, selects the top tokens whose total probability does not exceed the provided value for top P.

Higher Top K means more creative and varied the model’s output and lower Top K means more factual response.
Values for P ranges from 0 ( less creative) to 1 (all tokens from the vocabulary)

Putting it all together

If you set temperature to 0, top-K and top-P become irrelevant–the most probable token becomes the next token predicted. If you set temperature extremely high (above 1–generally into the 10s), temperature becomes irrelevant and whatever tokens make it through the top-K and/or top-P criteria are then randomly sampled to choose a next predicted token.
If you set top-K to 1, temperature and top-P become irrelevant. Only one token passes the top-K criteria, and that token is the next predicted token. If you set top-K extremely high, like to the size of the LLM’s vocabulary, any token with a nonzero probability of being the next token will meet the top-K criteria and none are selected out.
If you set top-P to 0 (or a very small value), most LLM sampling implementations will then only consider the most probable token to meet the top-P criteria, making temperature and top-K irrelevant. If you set top-P to 1, any token with a nonzero probability of being the next token will meet the top-P criteria, and none are selected out.

As a general starting point, a temperature of .2, top-P of .95, and top-K of 30 will give you relatively coherent results that can be creative but not excessively so. If you want especially creative results, try starting with a temperature of .9, top-P of .99, and top-K of 40. And if you want less creative results, try starting with a temperature of .1, top-P of .9, and top-K of 20. Finally, if your task always has a single correct answer (e.g., answering a math problem), start with a temperature of 0.

With this, I am concluding here. I will be happy to see how you are playing with these parameters and how the response is. Will be glad to have your comments on the blog as well.

Essential Parameters to Consider in Generative AI App Development

Table of contents