How to Train AI to "Think": Creating Thinking Models with Chain-of-Thought


Introduction
Large Language Models (LLMs) like GPT-4, Claude, or LLaMA are incredible at producing fluent, contextually relevant text. But at their core, these models don’t truly think they predict the next word based on patterns in the data they’ve seen.
However, through a technique called Chain-of-Thought (CoT) prompting, we can transform these “fast guessers” into step-by-step reasoners, making them appear more logical, deliberate, and intelligent.
In this article, we’ll explore:
What “thinking” means for an AI model.
What Chain-of-Thought is and why it matters.
How to apply CoT prompting effectively.
JavaScript examples showing the difference between non-thinking and thinking responses.
Thinking vs. Non-Thinking Models
A non-thinking model gives you a direct answer without explaining its reasoning.
A thinking model, on the other hand, breaks down a problem into smaller steps before giving you the final answer much like how a human might solve a math problem on paper.
Example (Math Question: 17 × 23)
Without Thinking (Non-CoT):
"391"
With Thinking (CoT):
"Let’s break it down:
17 × 23 = (17 × 20) + (17 × 3) = 340 + 51 = 391"
The second answer is not just correct it’s explainable. This is key for tasks requiring reasoning, like problem-solving, logic puzzles, or multi-step decision-making.
What is Chain-of-Thought (CoT) Prompting?
Chain-of-Thought prompting is a technique where you explicitly tell the model to think step-by-step before giving the final answer.
Instead of jumping directly to a solution, the model produces intermediate reasoning steps making it better at handling complex problems, arithmetic, and logical reasoning.
Think of it as teaching the model to show its work in school.
Why Chain-of-Thought Works
Under the hood, LLMs generate text token by token. Without guidance, they tend to produce short, direct answers. By asking them to “think step-by-step”, we nudge them into generating intermediate reasoning text before the answer.
This works because:
Breaking problems down reduces errors.
Intermediate steps help the model avoid logic jumps.
Human-like reasoning increases trust and interpretability.
Applying Chain-of-Thought in Practice
Here’s how a simple prompt transformation changes the behavior of an LLM.
Without CoT (Non-Thinking)
const prompt = `What is 17 multiplied by 23?`;
const response = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: prompt }]
});
console.log(response.choices[0].message.content);
// Output: "391"
With CoT (Thinking)
const prompt = `What is 17 multiplied by 23? Think step-by-step and explain your reasoning before giving the final answer.`;
const response = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: prompt }]
});
console.log(response.choices[0].message.content);
// Output: "First, multiply 17 by 20 = 340.
// Then multiply 17 by 3 = 51.
// Add them together: 340 + 51 = 391."
The only difference is how you phrase the prompt — and the model shifts from “spitting answers” to “thinking out loud.”
Variations of Chain-of-Thought Prompting
There are different ways to implement CoT prompting:
a) Zero-shot CoT - No examples, just an instruction
const prompt = `If a train travels 60 km in 1.5 hours, what is its speed? Think step-by-step.`;
Output:
"Time = 1.5 hours, distance = 60 km.
Speed = Distance ÷ Time = 60 ÷ 1.5 = 40 km/h."
b) Few-shot CoT - Provide reasoning examples before the question
const prompt = `
Q: If John has 3 apples and buys 2 more, how many apples does he have?
A: Start with 3, add 2, total = 5 apples.
Q: A car drives 120 km in 2 hours, what is its speed?
A: Speed = Distance ÷ Time = 120 ÷ 2 = 60 km/h.
Q: If a train travels 60 km in 1.5 hours, what is its speed?
A:
`;
Output:
"Speed = Distance ÷ Time = 60 ÷ 1.5 = 40 km/h."
By showing how to reason in the examples, we prime the model to do the same.
c) Self-Ask CoT - Encourage the model to ask itself questions before answering
const prompt = `
Question: There are 12 cookies. You eat 3, and give 4 to a friend. How many are left?
Think: How many did I start with? How many were eaten? How many were given away? What’s left?
Answer:
`;
Output:
"Start with 12, eat 3 → 9 left. Give 4 away → 5 cookies left."
Benefits and Limitations
Benefits:
Higher accuracy in reasoning tasks.
More transparent decision-making.
Easier debugging of AI errors.
Limitations:
Slower responses (more text to generate).
Not always needed for simple Q&A.
Can hallucinate wrong reasoning but still produce correct answers.
When to Use CoT Prompting
Math & logic problems.
Multi-step workflows.
Legal or financial reasoning.
AI tutoring and educational tools.
For simple fact retrieval (e.g., “What’s the capital of Japan?”), CoT might be overkill. But for problem-solving, it’s a game changer.
Conclusion
Chain-of-Thought prompting is like switching an AI from instant answer mode to show your work mode. By adding just a few extra words to our prompts, we can make non-thinking models behave like deliberate reasoners.
This doesn't make them truly intelligent, but it makes their outputs more reliable, transparent, and human-like and in AI development, that's often exactly what we need.
Subscribe to my newsletter
Read articles from Harshit directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
