Training AI to Think: Chain-of-Thought Models

Introduction

Large Language Models (LLMs) like GPT-4, Claude, or LLaMA are incredible at producing fluent, contextually relevant text. But at their core, these models don’t truly think they predict the next word based on patterns in the data they’ve seen.

However, through a technique called Chain-of-Thought (CoT) prompting, we can transform these “fast guessers” into step-by-step reasoners, making them appear more logical, deliberate, and intelligent.

In this article, we’ll explore:

What “thinking” means for an AI model.
What Chain-of-Thought is and why it matters.
How to apply CoT prompting effectively.
JavaScript examples showing the difference between non-thinking and thinking responses.

Thinking vs. Non-Thinking Models

A non-thinking model gives you a direct answer without explaining its reasoning.

A thinking model, on the other hand, breaks down a problem into smaller steps before giving you the final answer much like how a human might solve a math problem on paper.

Example (Math Question: 17 × 23)

Without Thinking (Non-CoT):

"391"

With Thinking (CoT):

"Let’s break it down:
17 × 23 = (17 × 20) + (17 × 3) = 340 + 51 = 391"

The second answer is not just correct it’s explainable. This is key for tasks requiring reasoning, like problem-solving, logic puzzles, or multi-step decision-making.

What is Chain-of-Thought (CoT) Prompting?

Chain-of-Thought prompting is a technique where you explicitly tell the model to think step-by-step before giving the final answer.

Instead of jumping directly to a solution, the model produces intermediate reasoning steps making it better at handling complex problems, arithmetic, and logical reasoning.

Think of it as teaching the model to show its work in school.

Why Chain-of-Thought Works

Under the hood, LLMs generate text token by token. Without guidance, they tend to produce short, direct answers. By asking them to “think step-by-step”, we nudge them into generating intermediate reasoning text before the answer.

This works because:

Breaking problems down reduces errors.
Intermediate steps help the model avoid logic jumps.
Human-like reasoning increases trust and interpretability.

Applying Chain-of-Thought in Practice

Here’s how a simple prompt transformation changes the behavior of an LLM.

Without CoT (Non-Thinking)

const prompt = `What is 17 multiplied by 23?`;

const response = await openai.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: prompt }]
});

console.log(response.choices[0].message.content);
// Output: "391"

With CoT (Thinking)

const prompt = `What is 17 multiplied by 23? Think step-by-step and explain your reasoning before giving the final answer.`;

const response = await openai.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: prompt }]
});

console.log(response.choices[0].message.content);
// Output: "First, multiply 17 by 20 = 340.
// Then multiply 17 by 3 = 51.
// Add them together: 340 + 51 = 391."

The only difference is how you phrase the prompt — and the model shifts from “spitting answers” to “thinking out loud.”

Variations of Chain-of-Thought Prompting

There are different ways to implement CoT prompting:

a) Zero-shot CoT - No examples, just an instruction

const prompt = `If a train travels 60 km in 1.5 hours, what is its speed? Think step-by-step.`;

Output:

"Time = 1.5 hours, distance = 60 km.
Speed = Distance ÷ Time = 60 ÷ 1.5 = 40 km/h."

b) Few-shot CoT - Provide reasoning examples before the question

const prompt = `
Q: If John has 3 apples and buys 2 more, how many apples does he have?
A: Start with 3, add 2, total = 5 apples.

Q: A car drives 120 km in 2 hours, what is its speed?
A: Speed = Distance ÷ Time = 120 ÷ 2 = 60 km/h.

Q: If a train travels 60 km in 1.5 hours, what is its speed?
A:
`;

Output:

"Speed = Distance ÷ Time = 60 ÷ 1.5 = 40 km/h."

By showing how to reason in the examples, we prime the model to do the same.

c) Self-Ask CoT - Encourage the model to ask itself questions before answering

const prompt = `
Question: There are 12 cookies. You eat 3, and give 4 to a friend. How many are left?
Think: How many did I start with? How many were eaten? How many were given away? What’s left?
Answer:
`;

Output:

"Start with 12, eat 3 → 9 left. Give 4 away → 5 cookies left."

Benefits and Limitations

Benefits:

Higher accuracy in reasoning tasks.
More transparent decision-making.
Easier debugging of AI errors.

Limitations:

Slower responses (more text to generate).
Not always needed for simple Q&A.
Can hallucinate wrong reasoning but still produce correct answers.

When to Use CoT Prompting

Math & logic problems.
Multi-step workflows.
Legal or financial reasoning.
AI tutoring and educational tools.

For simple fact retrieval (e.g., “What’s the capital of Japan?”), CoT might be overkill. But for problem-solving, it’s a game changer.

Conclusion

Chain-of-Thought prompting is like switching an AI from instant answer mode to show your work mode. By adding just a few extra words to our prompts, we can make non-thinking models behave like deliberate reasoners.

This doesn't make them truly intelligent, but it makes their outputs more reliable, transparent, and human-like and in AI development, that's often exactly what we need.

How to Train AI to "Think": Creating Thinking Models with Chain-of-Thought

Table of contents

Introduction

Thinking vs. Non-Thinking Models

What is Chain-of-Thought (CoT) Prompting?

Why Chain-of-Thought Works

Applying Chain-of-Thought in Practice

Without CoT (Non-Thinking)

With CoT (Thinking)

Variations of Chain-of-Thought Prompting

a) Zero-shot CoT - No examples, just an instruction

b) Few-shot CoT - Provide reasoning examples before the question

c) Self-Ask CoT - Encourage the model to ask itself questions before answering

Benefits and Limitations

When to Use CoT Prompting

Conclusion

Subscribe to my newsletter

Harshit

Harshit