Thinking Model with Chain-of-Thought


Most language models will answer quickly and confidently—but not always correctly—because they jump straight from input → output. A “thinking model” inserts a deliberate reasoning phase in the middle: input → reasoning (scratchpad) → answer. This article shows how to turn an ordinary, non-thinking model into a thinking one using Chain-of-Thought (CoT), from prompt-only techniques to training and evaluation.
1) What “Chain-of-Thought” Really Is
Chain-of-Thought (CoT) is any intermediate reasoning text the model writes for itself before giving an answer. Think of it as a scratchpad: plans, sub-steps, partial calculations, checks, tool calls, and reflections.
Non-thinking model: “What’s 37×42?” → “1554” (often a guess).
Thinking model: “Break into steps, compute 37×40 + 37×2, then sum.” → Answer.
Your system should use the scratchpad to reason but usually hide it from end users for safety and clarity.
2) Three Paths to Add “Thinking”
A. Prompt-Only (no training)
Get more reasoning from the same model:
Step-by-step cue:
“Think step by step before answering.”Few-shot CoT:
Provide 2–5 worked examples (question → short reasoning → answer).Self-consistency:
Sample multiple CoT answers with temperature > 0, then vote on the final answer.Decomposition cues:
“First list subproblems. Then solve each. Finally summarize.”Verification cue (reflect):
“Check your solution; if you find an error, correct it before answering.”
Pros: free and fast.
Cons: not as reliable; may produce long, noisy rationales.
B. Lightweight Finetuning (instruction + rationales)
Teach the model to naturally produce clean, bounded CoT:
Supervised fine-tuning (SFT) on (input, scratchpad, answer).
Use process-based supervision (reward good reasoning) or outcome-based (reward correct final answers). Process supervision tends to reduce lucky guesses and improves robustness.
Pros: big quality jump; scratchpads become crisp.
Cons: requires data with rationales.
C. Distillation & Alignment
Use a stronger “teacher” (or curated solutions) to generate rationales for many tasks and distill them into a smaller model. Then align it to:
Produce hidden scratchpads (for internal use).
Return short, user-facing answers (no rationale leakage).
Follow safety rules (don’t reveal private steps by default).
Pros: scalable and cost-effective.
Cons: needs a reliable teacher + careful filtering.
3) The Thinking Loop: A Practical Blueprint
A reliable pattern looks like this:
Plan: “What’s the task? What sub-steps are needed?”
Decompose: Split into atomic actions (retrieve, compute, compare).
Reason: Fill a scratchpad with steps and intermediate results.
Check: Sanity-check assumptions and numbers.
Answer: Produce a concise final response.
(Optional) Verify: Use a separate verifier model or rule checks.
This is compatible with ReAct (reasoning + tool actions), Tree-of-Thoughts (branching search), Least-to-Most (start simple, add complexity), and PAL (delegate sub-steps to small programs).
Different approaches to CoT prompting
CoT prompting has multiple variants, each of which uses a different approach to getting LLMs to explain their outputs:
Auto-CoT. In automatic CoT, the user crafts a few examples of inputs and desired outputs for an LLM to learn, including the intermediate steps taken to achieve those outputs. The LLM then learns from these examples and automatically applies the same reasoning to future interactions with the user.
Multimodal CoT. LLMs that are capable of processing inputs besides text -- such as audio, image and video -- are multimodal AI. An example of multimodal CoT would be asking an LLM to examine images when explaining and justifying outputs.
Zero-shot CoT. With this approach, the user doesn't provide an LLM with any examples for it to reference, instead asking it to "show its work" and explain how it achieved its output. This process is efficient, but not as effective for complex inputs; zero-shot chain of thought is best suited for simpler problems.
Least-to-most CoT. With this approach, a user breaks a large problem into smaller subproblems and sends each one to the LLM sequentially. The LLM can then solve each subsequent subproblem more easily using the answers to previous subproblems for reference.
Advantages of CoT prompting
CoT prompting offers several advantages:
Better responses. LLMs can only take in a limited amount of information at one time. Breaking down complex problems into simpler subtasks helps mitigate this issue. It lets LLMs process those smaller components individually, leading to more accurate and precise model responses.
Expanded knowledge base. CoT prompting takes advantage of LLMs' extensive pool of general knowledge. LLMs are exposed to a wide array of explanations, definitions and problem-solving examples during their training on vast textual data sets, encompassing books, articles and much of the open internet.
Logical reasoning. The technique directly targets a common limitation of LLMs: difficulty with logical reasoning. Although LLMs excel at generating coherent, relevant text, they weren't primarily designed to provide information or solve problems.
Debugging. CoT prompting assists with model debugging and improvement by providing transparency in the process by which a model arrives at its answer. Because the prompts ask the model to explicitly delineate a reasoning process, they give model testers and developers better insight into how the model reached a particular conclusion. This, in turn, makes it easier to identify and correct errors when refining the model.
Fine-tuning. Developers can combine CoT prompting with fine tunning to enhance LLM reasoning capabilities. For example, fine-tuning a model on a training data set containing curated examples of step-by-step reasoning and logical deduction can improve the effectiveness of CoT prompting.
Limitations of CoT prompting
It's important to keep in mind that CoT prompting is a technique for using an existing model more effectively, not a training method. While these prompts can help users elicit better results from pretrained LLMs, prompt engineering isn't a cure-all and can't fix model limitations that should have been handled during the training stage.
Use cases of CoT prompting
Understanding regulations
Educating new employees
Managing logistics and supply chains
Creating original content
Benefits of Chain-of-Thought Prompting
Chain-of-Thought prompting offers several benefits, particularly in enhancing the performance and reliability of language models in complex tasks.
1. Improved accuracy
2. Enhanced interpretability
Conclusion
In this article, we have seen how Chain-of-Thought prompting represents a significant advancement in enhancing the reasoning capabilities of Large Language Models, along with some practical examples of its implementation.
Additionally, we have explored powerful techniques such as one-shot and few-shot prompting that further enhance the model’s performance and can be combined with CoT, along with the benefits and some limitations that we cannot overlook.
You can learn more about prompt engineering through, Understanding Prompt Engineering and ChatGPT Prompt Engineering for Developers.
Subscribe to my newsletter
Read articles from Dipanjan Roy directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
