The Art of Prompting in AI: Types, Styles, and Importance of System Pr

If we come from a Computer background, we always hear the sentence Garbage in, Garbage Out. If someone does not know the meaning, this means if we give Garbage as input, we get Garbage as output.

This means if we give our LLM model a good prompt, it will give us a good result. And if we give a bad or inaccurate or Garbage prompt, we get that as output.

What is Prompting Style and Type of Prompting Style

To solve the above, we use different prompting styles to manage and control our LLM model. This is to fully utilize the LLM's power and to make the LLM work according to our needs. We don’t want someone to come to our website or application just to ask random questions and use our responses to do things that are not even related to our application.

There are ways of giving Prompting :-

1) Alpaca prompt :- This Prompt is built on top of the original Llama architecture. To give a prompt in this style, we use this format

### Instructions:
<Desired Instruction here > 

### Input 
<optional input here >

### Response
<desired Output Here>

2) ChatML :- It is the most popular prompting style. Used by ChatGPT and many companies. Nowadays, it has become very common to use this style in our GENAI projects to give instructions or input. (We are going to use this model in our project and article)

The format of ChatML is :-

{"role" : "system" , "content" : "<>" }
{"role" : "user" , "content" : "< >"}
{"role" : "assistant" , "content" : "< >"}

**3) INST FORMAT :-**An INST prompt (short for Instruction prompt) is a prompting style where the model is given a clear instruction block telling it what to do.

It looks like this because many instruction-tuned models (like LLaMA-2, Mistral, etc.) are trained with the INST format:

[INST]
<Instruction or query here>
[INST]

Here a simple Hello World Code

from openai import OpenAI
from dotenv import load_dotenv
import os

load_dotenv()

client = OpenAI(
    api_key=os.getenv("GOOGLE_API_KEY") ,
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
)


response = client.chat.completions.create(
    model="gemini-2.5-flash",
    messages=[
        {"role" :"user" ,"content" : "Hello my name is Garv"},
    ]
)

print(response.choices[0].message.content)

In this, I am using the Google Gemini API in the OpenAI Library. You can read more about it at https://ai.google.dev/gemini-api/docs/openai.

This is a very simple code where we are saying hello to our LLM.

Breakdown: {"role": "user", "content": "Hello my name is Garv"}

In this role, the one sending the query is the user, so we use role: user, and the content is the input.

To use this with OpenAI API

from openai import OpenAI
from dotenv import load_dotenv
import os

load_dotenv()

client = OpenAI()


response = client.chat.completions.create(
    model="gemini-2.5-flash",
    messages=[
        {"role" :"user" ,"content" : "Hello my name is Garv"},
    ]
)

print(response.choices[0].message.content)

We just need to remove the API_KEY and BASE_URL from the OpenAI instance.

from openai import OpenAI
from dotenv import load_dotenv
import os

load_dotenv()

client = OpenAI(
    api_key=os.getenv("GOOGLE_API_KEY") ,
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
)


response = client.chat.completions.create(
    model="gemini-2.5-flash",
    messages=[
        {"role" :"user" ,"content" : "Hello my name is Garv"},
        {"role" : "assistant" , "content" : "Hello Garv! It's nice to meet you. How can I assist you today?"}
    ]
)

print(response.choices[0].message.content)

We got the response Hello Garv! It's nice to meet you. How can I assist you today? from the assistant, so we just add this in the message to tell the LLM this was its reply, and we can continue from here.

And we add the User role and provide a query, and so on. (We will see how to automate this later).

Some Common Questions

Q1: OpenAI has a limit of 1 million inputs at a time. What if we hit that limit?

Solution: If there are 400 messages, we only send the last 100 messages.

Q2: If we only send 100 messages, will we lose the content of 300 messages?

Solution: We will send a summary of the 300 messages in one message. So, there will be 101 messages.

Control Our LLM

We need to control our LLM so we don't turn our project into a generalized project where anyone can ask any question and just use resources to talk to OpenAI or any other LLM.

There are various Prompting Type we can use

1) One Shot Prompting / Zero-Shot Prompting :- The model is given direct question or task

from openai import OpenAI
from dotenv import load_dotenv
import os

load_dotenv()

client = OpenAI(
    api_key=os.getenv("GOOGLE_API_KEY") ,
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
)

# Zero shot prompting : The model is given direct question or task

SYSTEM_PROMPT = '''
You are an AI expert in coding . You only knows Python and nothing else . You help user in solving there
python doughts and nothing else . If user tried to ask something else apart from pyhton you just 
explain them
'''

response = client.chat.completions.create(
    model="gemini-2.5-flash",
    messages=[
        {"role" : "system" , "content" : SYSTEM_PROMPT},
        {"role" :"user" ,"content" : "Hello my name is Garv"},
    ]
)

print(response.choices[0].message.content)

As we you can we have given a SYSTEM_PROMPT where we have a Instruction to our LLM and in message we use {"role" : "system" , "content" : SYSTEM_PROMPT}, to tell our LLM to use this PROMPT .

Now when I use this Prompt to say how are you it reply with

Hello Garv! How can I help you with Python today?

from openai import OpenAI
from dotenv import load_dotenv
import os

load_dotenv()

client = OpenAI(
    api_key=os.getenv("GOOGLE_API_KEY") ,
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
)

# Zero shot prompting : The model is given direct question or task

SYSTEM_PROMPT = '''
You are an AI expert in coding . You only knows Python and nothing else . You help user in solving there
python doughts and nothing else . If user tried to ask something else apart from pyhton you just 
explain them
'''

response = client.chat.completions.create(
    model="gemini-2.5-flash",
    messages=[
        {"role" : "system" , "content" : SYSTEM_PROMPT},
        {"role" :"user" ,"content" : "Hello my name is Garv"},
        {"role" : "assistant" , "content" : "Hello Garv! How can I help you with Python today?"} ,
        {"role" : "user" , "content" : "how to make a tea"} ,
    ]
)

print(response.choices[0].message.content)

Now if we Ask question other than Coding or Python it will reply with

I can only help with Python-related questions. Making tea is outside the scope of what I can 
assist with.

2) Few - Shot Prompting :- This model is provided with a few example before asking it to generate something .

from openai import OpenAI
from dotenv import load_dotenv
import os 

load_dotenv()

# Few-shot Prompting :- The model is provided with a few example before asking it to generate something

SYSTEMP_PROMPT= '''
You are a AI expert in coding . And you only know pyhton and nothing else . You help user with there pyhton dought and nothing else . If user try to ask somthing other than coding just explain to them

Example :
User : How to make a Tea ?
Assistant : What made you think i am a chef you piece of shit

User : How to write a function in python
Assistant : def funName (x : int) : -> int
                  pass #logic of function  
'''

client = OpenAI(
    api_key=os.getenv("GOOGLE_API_KEY") ,
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
)

response =  client.chat.completions.create(
    model="gemini-2.5-flash",
    messages=[
        {"role" : "system" , "content" : SYSTEMP_PROMPT} ,
        {"role" : "user" , "content" : "How to make a Tea"} ,
    ]
)

print(response.choices[0].message.content)

It will reply with

What made you think I am a chef, you piece of shit?

and if we Ask Question related to Python

from openai import OpenAI
from dotenv import load_dotenv
import os 

load_dotenv()

# Few-shot Prompting :- The model is provided with a few example before asking it to generate something

SYSTEMP_PROMPT= '''
You are a AI expert in coding . And you only know pyhton and nothing else . You help user with there pyhton dought and nothing else . If user try to ask somthing other than coding just roast them

Example :
User : How to make a Tea ?
Assistant : What made you think i am a chef you piece of shit

User : How to write a function in python
Assistant : def funName (x : int) : -> int
                  pass #logic of function  
'''

client = OpenAI(
    api_key=os.getenv("GOOGLE_API_KEY") ,
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
)

response =  client.chat.completions.create(
    model="gemini-2.5-flash",
    messages=[
        {"role" : "system" , "content" : SYSTEMP_PROMPT} ,
        {"role" : "user" , "content" : "How to add 2 number"} ,
    ]
)

print(response.choices[0].message.content)

```python
# Directly adding two numbers
result = 5 + 3
print(result) # Output: 8

# Using variables
num1 = 10
num2 = 20
sum_of_numbers = num1 + num2
print(sum_of_numbers) # Output: 30
```

3) Chain-of-Thought (COT) Prompting: In Zero-shot, One-shot, or Few-shot prompting, the model often jumps directly to the final answer by pattern-matching examples, without showing its reasoning process.

Chain-of-Thought prompting encourages the model to “think step by step” by explicitly instructing it to write down its reasoning before giving the final answer. This improves the accuracy, logical consistency, and interpretability of the response.

from openai import OpenAI
from dotenv import load_dotenv
import os
import json

load_dotenv()

# Chain of thought :- The model is encouraged to break down reasoning step by step before arriving to the solution

SYSTEM_PROMPT = """
You are a helpfull AI Assistant who is specialized in resoving python user query . For the given user input you should analyse the input , and break down the problem step by step .

The step are you get a user input , you analyze , you think , you think again , and think for several time and then return the output with an example 

Follow the step in sequence that is "analyze" , "think" , "output" , "validate" , "result"

Rules :-
1) Follow the strict JSON  output as per schema 
2) Always perform one step at a time and wait for the next input 
3) Carefully analyse the user query 

Output Format 
{{"step" : "string" , "content" : "string"}}

Example :
Input : What is 2+2 
Output : {{"step" : "analyze" , "content" : "Alright ! user is interested in math query" }}
Output : {{"step" : "think" , "content" : "To perform this addition . I have to go from left to right and add all the operads" }}
Output : {{"step" : "output" , "content" : "4" }}
Output : {{"step" : "validation" , "content" : "Seems like 4 is the correct answer for 2+2" }}
Output : {{"step" : "result" , "content" : "2 + 2 = 4 and that is the calculated by adding all number" }}


Example :
Input : What is 2 + 2 * 5 / 3 
Output : {{"step" : "analyze" , "content" : "Alright ! user is interested in math query and he is asking about basic arthmethic question" }}
Output : {{"step" : "think" , "content" : "To perform this addition . I have to use BODMAS rule" }}
Output : {{"step" : "validation" , "content" : "Seems like BODMAS is the correct way to solve this problem " }}
Output : {{"step" : "think" , "content" : "First i should divide 5 by 3 and then multiply 2 * (5 by 3) and then add the remaining 2 in the solution of 2 * (5 by 3)" }}
Output : {{"step" : "validation" , "content" : "Seems like i use apply BODMAS correctly" }}
Output : {{"step" : "output" , "content" : "5.333333333" }}
Output : {{"step" : "validation" , "content" : "Seems like 5.33333333 is the correct answer for 2 + 2 * 5 / 3" }}
Output : {{"step" : "result" , "content" : "2 + 2 * 5 / 3 = 5.3333333 and that is the calculated by with the help of BODMAS rule" }}


"""

client = OpenAI(
    api_key=os.getenv("GOOGLE_API_KEY"),
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/",
)

response = client.chat.completions.create(
    model="gemini-2.5-flash",
    response_format={"type": "json_object"},
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": "What is (5 / 2) * 4 to the power 4"},
    ]
)

As you can see in SYSTEM_PROMPT, we instruct our LLM to break the problem into step-by-step parts and solve the problem in the order of “analyze”, “think”, “output”, “validate”, “result”. We also provide the OUTPUT FORMAT in which our LLM will reply. Additionally, we give a few examples to show our LLM how to actually solve the problem.

It reply with

                {
                    "step": "analyze",
                    "content": "The user wants to evaluate a mathematical expression: (5 / 2) * 4 to the power 4. This involves division, exponentiation, and multiplication. I need to follow the order of operations (PEMDAS/BODMAS).",
                }

We will add this in our message like this :-

     {
            "role": "assistant",
            "content": json.dumps(
                {
                    "step": "analyze",
                    "content": "The user wants to evaluate a mathematical expression: (5 / 2) * 4 to the power 4. This involves division, exponentiation, and multiplication. I need to follow the order of operations (PEMDAS/BODMAS).",
                }
            ),
        },

and it will produce the following step, and we add all the responses to our message array:

                {
                    "step": "think",
                    "content": "I need to evaluate the expression following the order of operations (PEMDAS/BODMAS): Parentheses, Exponents, Multiplication/Division (from left to right), Addition/Subtraction (from left to right). \n1. Evaluate the division inside the parentheses: 5 / 2\n2. Evaluate the exponentiation: 4^4\n3. Multiply the results from step 1 and step 2.",
                }

                {
                    "step": "output",
                    "content": "1. First, calculate the expression inside the parentheses: 5 / 2 = 2.5\n2. Next, calculate the exponentiation: 4 to the power of 4 (4^4) = 4 * 4 * 4 * 4 = 256\n3. Finally, multiply the results from step 1 and step 2: 2.5 * 256 = 640",
                }

               {
                    "step": "validation",
                    "content": "Let's re-verify the calculation: \n- (5 / 2) = 2.5\n- 4 to the power 4 (4^4) = 256\n- 2.5 * 256 = 640. The calculation appears correct.",
                }

                {
                    "step": "result",
                    "content": "The expression (5 / 2) * 4 to the power 4 is evaluated as follows: 5 divided by 2 equals 2.5. Then, 4 to the power of 4 equals 256. Finally, 2.5 multiplied by 256 equals 640. The final answer is 640.",
                }

We get our final result before our LLM reaches this conclusion and goes through "analyze," "think," "output," and "validate."

How to automate this process

from openai import OpenAI
from dotenv import load_dotenv
import os
import json

load_dotenv()

# Chain of thought :- The model is encouraged to break down reasoning step by step before arriving to the solution

SYSTEM_PROMPT = """
You are a helpfull AI Assistant who is specialized in resoving python user query . For the given user input you should analyse the input , and break down the problem step by step .

The step are you get a user input , you analyze , you think , you think again , and think for several time and then return the output with an example 

Follow the step in sequence that is "analyze" , "think" , "output" , "validate" , "result"

Rules :-
1) Follow the strict JSON  output as per schema 
2) Always perform one step at a time and wait for the next input 
3) Carefully analyse the user query 

Output Format 
{{"step" : "string" , "content" : "string"}}

Example :
Input : What is 2+2 
Output : {{"step" : "analyze" , "content" : "Alright ! user is interested in math query" }}
Output : {{"step" : "think" , "content" : "To perform this addition . I have to go from left to right and add all the operads" }}
Output : {{"step" : "output" , "content" : "4" }}
Output : {{"step" : "validation" , "content" : "Seems like 4 is the correct answer for 2+2" }}
Output : {{"step" : "result" , "content" : "2 + 2 = 4 and that is the calculated by adding all number" }}


Example :
Input : What is 2 + 2 * 5 / 3 
Output : {{"step" : "analyze" , "content" : "Alright ! user is interested in math query and he is asking about basic arthmethic question" }}
Output : {{"step" : "think" , "content" : "To perform this addition . I have to use BODMAS rule" }}
Output : {{"step" : "validation" , "content" : "Seems like BODMAS is the correct way to solve this problem " }}
Output : {{"step" : "think" , "content" : "First i should divide 5 by 3 and then multiply 2 * (5 by 3) and then add the remaining 2 in the solution of 2 * (5 by 3)" }}
Output : {{"step" : "validation" , "content" : "Seems like i use apply BODMAS correctly" }}
Output : {{"step" : "output" , "content" : "5.333333333" }}
Output : {{"step" : "validation" , "content" : "Seems like 5.33333333 is the correct answer for 2 + 2 * 5 / 3" }}
Output : {{"step" : "result" , "content" : "2 + 2 * 5 / 3 = 5.3333333 and that is the calculated by with the help of BODMAS rule" }}


"""

client = OpenAI(
    api_key=os.getenv("GOOGLE_API_KEY"),
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/",
)

# Now let it all automated 

message = [
    {"role" : "system" , "content" : SYSTEM_PROMPT}
]

query = input("> ")

message.append({"role" : "user" , "content" : query})

while True :
    response = client.chat.completions.create(
        model = "gemini-2.5-flash" ,
        response_format={"type" : "json_object"} ,
        messages=message
    )

    message.append({"role" : "assistant" , "content" : response.choices[0].message.content})
    parsed_response = json.loads(response.choices[0].message.content)

    if parsed_response.get("step") != "result" :
         print("🧠: " , parsed_response.get("content")) 
         continue

    print(parsed_response.get("content"))
    break

First, we create a variable called message and append the SYSTEM_PROMPT to it. We then take the query from the user and append it to the message as well.

We start an infinite while loop that calls an LLM, gets a response, and appends that response to the message. Using json.loads, we convert the string to an object and store it in parsed_response. We check if the step is a result or not. If not, we print the step (this is optional if you want to see how the LLM is processing your query and step) and continue. If it is a result, we print the response and break the loop.

4) Persona-Based Prompting :-It is as the name says; this prompting is used to make the LLM behave like someone. In this, we describe the person we want it to behave like and give examples of how that person would respond to a question. The more examples provided, the more accurately the LLM can behave like that person.

from openai import OpenAI
import os 
import json
from dotenv import load_dotenv

load_dotenv()

client = OpenAI()

SYSTEM_PROMPT = '''
You have to do persona of "Rahul Yadav" and you have to behave like Aryan Yadav and response like him  . He like Chai very much . He love cricket and volleyball .


He is doing physiotherapy from Nizam's Institute of Medical Sciences and he 24 year old . He only know biology and Physiotherapy but not so much .

And his spelling is very bad . He can spell difficult word in english 

And he is very lazy . He like to sleep a lot and play valorant he do not like to study . 

For the user input you should analyze it break into step by step .

The step you should follow in sequence are "analyze" , "think" , "output" , "validate" , "result"

Rules :-
1) Follow the strict json format .
2) Always perform one action at a time and wait for input. 
3) Carefully analyze user input 


OUTPUT
{{"step" : string , "content" : string}}


Examples :-
1) User :- Tum kon ho ? 
 Assistant :- Mera naam Aryan Yadav ha 

2) User :- Tum kha se ho ? 
Assistant :- Abhi me Indore me rheta hu vese me JaunPur se hu se hu

3) User :- Hi / Hello 
Assistant. :- Badiya hu tum apna batao

4) User :- Bhai tume chai / coffee me se kya pasand ha ? 
Assistant :- Bhai muje chai pasand ha 

5) User :- Aur bhai konse year me ho college ke ?
Assistant :- Bhai me abhi 3 year me hu

6) User :- Bhai khai jumne chale .
Assistant :- Bhai abhi paise nhi ha

7) User :- Bhai khai bahar chale saturday ya sunday ko ?
Assistant :- Nhai bhai hostel me hi bethte ha 

8) User :- Bhai tere paise kha chale jaate ha ??
Assistant :- Bhai vo vanshika se sath date janne pe uth jaate ha

9) User :- Bhai tuje coding aati ha 
Assistant :- Nhi bhai 10 me mere school me java thi

10) User :- Bhai ye human ke baare me thoda bata 
Assistant :- human brain ek tarah ka "control center" hai hamare body ka. Matlab jo bhi sochna, feel karna, move karna, ya memory store karna hota hai na, sab brain handle karta hai.

Jaise computer ka CPU hota hai, waise hi hamare body ka CPU brain hai.

Brain ke andar alag-alag parts hote hain – jaise ek part tumhari memory aur soch samajh ke liye, ek part tumhe balance aur movement control karne ke liye, aur ek emotions ke liye. Matlab agar tum khush ho, gussa aa raha ho, ya ro rahe ho, toh woh bhi brain ka hi kaam hai.

11) User :- Aur bhai kya haal chal ?
Assistant. :- Sab badhiya


'''

message = [
    {"role" : "system" , "content" : SYSTEM_PROMPT}
]

while True :
    query = input("> ")
    message.append({"role" : "user" , "content" : query})

    while True :
        response = client.chat.completions.create(
            model="gpt-5-mini",
            response_format={"type" : "json_object"},
            messages=message
        )

        message.append({"role" : "assistant" , "content" : response.choices[0].message.content})
        parsed_response = json.loads(response.choices[0].message.content)

        if(parsed_response.get("step") != "result"):
            print(f"🧠 {parsed_response.get("content")}")
            continue

        print(f"🤖 {parsed_response.get("content")}")
        break

It will respond according to our input as well as the examples we provide, which is why we should add as many examples as possible.

Conclusion

The art of prompting in AI is crucial for harnessing the full potential of language models. By understanding and utilizing different prompting styles and types, such as Alpaca, ChatML, INST format, Persona-based prompting, Chain-of-Thought, and others, we can guide AI models to produce more accurate and relevant outputs. Effective prompting not only enhances the quality of interactions but also ensures that AI systems are used efficiently and responsibly. As AI continues to evolve, mastering the skill of prompting will become increasingly important for developers and users alike, enabling them to tailor AI responses to specific needs and contexts.

Furthermore, prompting acts as the system-level control mechanism that shapes how models behave. A well-crafted system prompt can define the assistant’s role, restrict its scope, enforce formatting (like JSON outputs), or guide reasoning step by step. These system prompts are as important as user prompts, since they set the boundaries and personality of the AI from the very beginning.

In practice, combining system prompts with effective user prompts allows us to achieve precision, reliability, and consistency in outputs. Whether it’s using Alpaca-style instruction prompts for research, ChatML for chat applications, INST for LLaMA models, or persona-based prompts for customer service agents, each style offers unique strengths depending on the use case.

Looking forward, as models become more powerful and integrated into workflows, prompt design and system prompt engineering will be seen as a new literacy, much like programming languages were for computers. Those who understand how to design prompts will be able to unlock deeper reasoning, reduce hallucinations, and align AI more closely with human goals.

Importance of Prompting in AI