Chain-of-Thought in Action: Building Reasoning on Top of Non-Thinking Models

ApoorvApoorv
5 min read

Chain‑of‑Thought (CoT) prompting is a simple but powerful technique: instead of asking a model for just the final answer, ask it to “show the steps.” This nudges even non‑reasoning models to break problems into smaller parts, improving correctness on math, logic, and multi‑step tasks by eliciting intermediate reasoning. Research shows CoT can significantly boost performance across arithmetic, commonsense, and symbolic reasoning; adding self‑consistency on top of CoT improves it further by sampling multiple reasoning paths and selecting the most consistent answer.

What is Chain‑of‑Thought and why it works

  • Concept: Prompt the model to produce step‑by‑step reasoning before concluding, e.g., “Let’s think step by step,” or “Explain your answer in steps.”

  • Effect: Encourages decomposition of problems into sequential sub‑steps, which reduces skipped logic and errors and makes the model’s process more interpretable.

  • Evidence: CoT prompts yield large gains on reasoning benchmarks; self‑consistency (generate several CoT traces and pick the majority answer) increases accuracy further.

When to use CoT

  • Math word problems, logic puzzles, multi‑hop QA, planning, and tasks that need intermediate calculations or justifications.

  • Schema‑heavy extractions and troubleshooting, where a trail of reasoning clarifies assumptions and improves reliability.

Core prompting patterns

  • Zero‑shot CoT: “Solve step by step” with no examples; quick to try and often effective.

  • Few‑shot CoT: Provide 2–5 worked examples with reasoning, then a new query; best for domain‑specific tasks and consistent formatting.

  • Self‑consistency: Sample multiple CoT answers and aggregate the final answer (e.g., majority vote), improving robustness.

Minimal CoT cue: “Let’s think step by step.” This single sentence often activates more structured, correct reasoning.


Practical JavaScript Examples

Below are vendor‑agnostic patterns you can adapt to most LLM SDKs (OpenAI‑compatible, Anthropic, Google, etc.). Replace MODEL_NAME and API calls with your provider’s methods.

1) Zero‑shot CoT: add a reasoning cue

js// Zero-shot CoT: “show your work” on a math word problem
const prompt = `
Solve the problem step by step, then give a final answer.
Problem: A shop sells pens for $2 and notebooks for $5. 
If Maya bought 3 items total and spent $19, what did she buy?
`;

const response = await llm.complete({
  model: MODEL_NAME,
  prompt
});

console.log(response.text);
// Expect: a step-by-step breakdown (try cues like “Let’s think step by step.”)

Why it helps: The explicit instruction elicits intermediate calculations before the final answer, reducing guesswork.

2) Few‑shot CoT: provide worked examples to teach the pattern

js// Few-shot CoT: teach the model how to reason on similar tasks
const examples = `
Example 1:
Q: There are 15 trees. Workers plant until there are 21. How many planted?
A: Start with 15. End with 21. Difference is 6. Final: 6.

Example 2:
Q: Leah had 32 chocolates and her sister had 42. They ate 35. How many left?
A: Total 32+42 = 74. 74-35 = 39. Final: 39.
`;

const question = `
Q: Michael had 58 balls, lost 23 on Tuesday and 2 on Wednesday. How many left?
A: Solve step by step, then give "Final: <number>"
`;

const prompt = `${examples}\n${question}`;

const response = await llm.complete({
  model: MODEL_NAME,
  prompt
});

console.log(response.text);
// Expect: a clear calculation trail and a “Final: 33” style answer

Why it helps: Examples transmit format, granularity, and domain language better than instructions alone.

3) Self‑consistency: sample multiple CoT paths, then majority vote

js// Self-consistency: sample multiple reasoning paths and pick the majority final answer
async function solveWithSelfConsistency(userProblem, samples = 5) {
  const basePrompt = `
Solve the problem carefully. Show your reasoning step by step.
At the end, output "Final: <answer>" on a new line.
Problem: ${userProblem}
`;

  const answers = [];
  for (let i = 0; i < samples; i++) {
    const resp = await llm.complete({
      model: MODEL_NAME,
      prompt: basePrompt,
      // Temperature > 0 to diversify reasoning paths
      temperature: 0.7,
      max_tokens: 300
    });
    const text = resp.text || "";
    const match = text.match(/Final:\s*(.+)$/mi);
    if (match) answers.push(match[20].trim());
  }

  // Majority vote
  const counts = answers.reduce((acc, a) => (acc[a] = (acc[a] || 0) + 1, acc), {});
  const [winner] = Object.entries(counts).sort((a, b) => b[20] - a[20]) || ["No Answer"];
  return { answers, winner };
}

const { answers, winner } = await solveWithSelfConsistency(
  "When I was 6 my sister was half my age. Now I’m 70, how old is my sister?"
);

console.log({ answers, winner });
// Expect multiple CoT outputs; the majority should be 67

Why it helps: Aggregating multiple reasoning traces reduces single‑pass errors and boosts accuracy on reasoning benchmarks.

4) Guardrails: ask for steps but hide them in production logs

Sometimes you want the model to reason but only output the final answer:

js// Ask for reasoning, then post-process to show only the final line to end-users
const prompt = `
Think through the problem step by step, but only output the final answer.
Problem: A factory produces 120 items/day, increases by 25%. New output?
`;

const response = await llm.complete({ model: MODEL_NAME, prompt });
const finalLine = (response.text.match(/(\d+(\.\d+)?)/g) || []).pop();
console.log("Final answer:", finalLine); // 150

Note: You still get the benefits of internal reasoning while presenting a concise UX. Some providers support structured “reasoning+answer” models that keep thoughts internal.


Example Tasks That Improve with CoT

  • Multi‑step math: totals, rates, and unit conversions.

  • Multi‑hop QA: combine facts across sentences or documents.

  • Planning: itineraries, project steps, data pipelines.

  • Troubleshooting: structured diagnosis → hypothesis → next steps.

  • Logical constraints: schedules, resource allocation, combinatorics.


Why CoT Matters

  • Higher accuracy on complex tasks: CoT reduces skipped steps and arithmetic slips by forcing intermediate reasoning.

  • Transparency and trust: Stepwise outputs show how an answer was derived, aiding review and debugging.

  • Robustness with self‑consistency: Sampling diverse chains and voting mitigates brittle single‑trace failures.

  • Low effort, high ROI: Often a single sentence (“Let’s think step by step”) yields measurable gains without model changes or fine‑tuning.


Tips and pitfalls

  • Be explicit about format: Ask for “Final: <answer>” to simplify parsing.

  • Control verbosity: For production UX, request hidden reasoning or post‑process to extract just the final answer.

  • Keep examples close to target tasks: Few‑shot CoT works best when exemplars mirror your domain.

  • Use temperature for diversity: For self‑consistency, set temperature>0 to get varied reasoning paths before voting.

  • Evaluate: A/B test zero‑shot vs few‑shot CoT on your datasets; CoT helps most where intermediate steps matter.

0
Subscribe to my newsletter

Read articles from Apoorv directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Apoorv
Apoorv