Building a Thinking Model from a Non-Thinking Model with Chain-of-Thought

satyasandhyasatyasandhya
3 min read

Large Language Models (LLMs) are powerful, but in their default form they often act like “non-thinking models.” They generate outputs directly from inputs without showing the reasoning process. While this works for simple tasks, it quickly breaks down for complex reasoning, math, coding, or multi-step decision making.

This is where Chain-of-Thought (CoT) comes in. It transforms a “non-thinking” model into a thinking model by guiding it to reason step-by-step before producing the final answer.


The Problem with Non-Thinking Models:

Imagine you ask a model:

Example: “A farmer has 12 apples. He gives 4 to his friend and buys 6 more. How many apples does he have now?”

A non-thinking model may answer:

  • “16” (just guessing)

  • “14” (ignoring the addition)

  • “18” (lucky guess)

Why? Because it jumps straight to the answer instead of reasoning through the steps.


What is Chain-of-Thought?:

Chain-of-Thought (CoT) is a prompting technique where we encourage the model to explain intermediate steps before the final answer.

Example with CoT prompt:

Prompt:
“A farmer has 12 apples. He gives 4 to his friend and buys 6 more. How many apples does he have now? Think step by step.”

Model Output (CoT):

  1. Start with 12 apples.

  2. He gives 4 away → 12 - 4 = 8.

  3. He buys 6 more → 8 + 6 = 14.

  4. Final Answer = 14.

Now the model thinks out loud, leading to a more reliable answer.


How to Build a Thinking Model from a Non-Thinking Model:

  1. Prompt Engineering (Zero-Shot CoT):

    • Add phrases like “Let’s think step by step” to encourage reasoning.

    • Works even on models not explicitly trained for reasoning.

  2. Few-Shot CoT:

    • Provide the model with examples of reasoning.

    • Example:

        Q: If you have 2 pens and buy 3 more, how many pens do you have?  
        A: 2 + 3 = 5. Answer: 5  
      
        Q: A farmer has 12 apples, gives away 4, and buys 6 more. How many apples?  
        A: 12 - 4 + 6 = 14. Answer: 14
      
    • This sets a reasoning pattern the model can imitate.

  3. Self-Consistency:

    • Instead of generating one reasoning path, ask the model to produce multiple CoT outputs and then vote on the most common answer.

    • Improves reliability in math and logical reasoning.

  4. Externalizing Reasoning:

    • Capture the CoT outputs, store them, and analyze them for debugging.

    • Helpful for explainability in production systems.


Speed vs Accuracy Trade-Off:

  • Without CoT: Fast, cheap, but error for complex tasks.

  • With CoT: Slower, more tokens, but accurate and interpretable.

Example:

  • Customer chatbot (simple FAQs) → may not need CoT.

  • Medical reasoning assistant → must use CoT for safety.


Beyond CoT: Structured Reasoning:

CoT is just the start. Advanced techniques extend this idea:

  • Tree-of-Thought (ToT): Explore multiple reasoning branches.

  • Graph-of-Thought: Connect reasoning paths into structured graphs.

  • Program-Aided Language (PAL): Delegate reasoning steps to external code execution.

These move models closer to human-like problem solving.


Conclusion:

A non-thinking model can feel like a “guessing machine.” By applying Chain-of-Thought prompting, we give LLMs a way to reason transparently, step by step.

  • Start with “Let’s think step by step.”

  • Use few-shot reasoning examples.

  • Improve reliability with self-consistency.

CoT is a small tweak with huge impact, turning LLMs into models that don’t just answer they think.

0
Subscribe to my newsletter

Read articles from satyasandhya directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

satyasandhya
satyasandhya