Building a Thinking AI Model from a Non-Thinking Model with CoT

Day 2 of the GenAI with JS cohort with Hitesh Choudhary sir and Piyush Garg sir was one of those “Ok… now we’re cooking!” kind of days.

Yesterday was about understanding how AI thinks.
Today was about making it think — and not just think once, but keep thinking until it’s sure.

We built something that takes a regular AI (the kind that answers instantly) and turned it into a patient, step-by-step problem solver using a method called Chain-of-Thought.

Most AI models can produce fluent answers instantly — but that doesn’t mean they’re thinking. They’re predicting what text should come next, not working through the problem like a human would.

For problems that require logical reasoning or calculations, this “instant guess” approach can lead to mistakes.

To fix this, we can force the model to slow down and work in steps. That’s the core idea behind Chain-of-Thought (CoT) prompting.

1. Why We Need a Thinking Loop

Without step-by-step reasoning:

Prompt:

What is 3 + 4 × 10 − 4 × 3?

Model (non-thinking):

31 ✅ (this time it’s correct… but often it’s not)

With enforced reasoning:

Prompt:

Let’s solve this step-by-step.

Model (thinking):

Apply BODMAS — solve multiplications first.
4 × 10 = 40 → 3 + 40 − 4 × 3
4 × 3 = 12 → 3 + 40 − 12
3 + 40 = 43
43 − 12 = 31
Answer: 31 ✅

The benefit?

The path to the answer is visible.
You can spot errors in reasoning.

2. START–THINK–EVALUATE–OUTPUT: The Enhanced CoT Flow

Instead of just START → THINK → OUTPUT, this new version adds EVALUATE steps in between reasoning thoughts.

Why add EVALUATE?

It’s like having the AI double-check its own work after each reasoning step before moving forward.

Example:

{"step": "START", "content": "The user wants me to solve 3 + 4 * 10 - 4 * 3"}
{"step": "THINK", "content": "This is a math problem using BODMAS"}
{"step": "EVALUATE", "content": "Alright, going good"}

This approach:

Reduces compounding errors
Makes each reasoning hop self-verified
Gives better debugging visibility

3. The Code

Here’s a Node.js script that uses Gemini with the @ai-sdk/google package to run this reasoning loop:

import "dotenv/config";
import { google } from "@ai-sdk/google";
import { generateText } from "ai";
import fs from "fs";

const historyPath = "cot_chat_history.json";

// Load history from file
function loadHistory() {
  if (fs.existsSync(historyPath)) {
    try {
      return JSON.parse(fs.readFileSync(historyPath, "utf-8"));
    } catch {
      console.error("Invalid JSON in chat history. Starting fresh.");
    }
  }
  return [{ role: "system", content: "You are a helpful AI assistant." }];
}

// Save history to file
function saveHistory(messages) {
  fs.writeFileSync(historyPath, JSON.stringify(messages, null, 2), "utf-8");
}

async function main() {
  const userPrompt = process.argv.slice(2).join(" ");
  if (!userPrompt) {
    console.log("Usage: node cot.js <your prompt>");
    process.exit(1);
  }

  const messages = loadHistory();
  messages.push({ role: "user", content: userPrompt });

  const system = `
  You are an AI assistant who works on START, THINK, EVALUATE, and OUTPUT format.
  For a given query:
    - START: Define the problem
    - THINK: Work through one reasoning step
    - EVALUATE: Check if reasoning so far is correct
    - OUTPUT: Give the final answer
  Rules:
    - One step per response
    - JSON only, no markdown
    - Never skip steps or output more than one object
  `;

  let emptyCount = 0;
  const maxEmpty = 3;

  while (true) {
    console.log("⏳ Calling Gemini API...");
    const { text } = await generateText({
      model: google("gemini-2.0-flash"),
      system,
      messages: messages.filter(m => m.content.trim() !== ""),
    });
    console.log("✅ Gemini API response received.");

    const matches = text?.match(/\{[\s\S]*?\}/g) || [];
    if (!matches.length) {
      if (++emptyCount >= maxEmpty) break;
      continue;
    }

    emptyCount = 0;
    let foundOutput = false;

    for (const objStr of matches) {
      try {
        const obj = JSON.parse(objStr);
        if (obj.step === "START") console.log(`🟢 [START] ${obj.content}`);
        if (obj.step === "THINK") console.log(`💭 [THINK] ${obj.content}`);
        if (obj.step === "EVALUATE") console.log(`🔍 [EVALUATE] ${obj.content}`);
        if (obj.step === "OUTPUT") {
          console.log(`✅ [OUTPUT] ${obj.content}`);
          foundOutput = true;
        }
      } catch {}
    }

    messages.push({ role: "assistant", content: text });
    saveHistory(messages);

    if (foundOutput) break;
  }
}

main();

4. Example Run

Command:

node cot.js "Can you solve 3 + 4 * 10 - 4 * 3"

Output:

🟢 [START] The user wants me to solve 3 + 4 * 10 - 4 * 3 maths problem
💭 [THINK] This is a typical math problem where we use BODMAS for calculation
🔍 [EVALUATE] Alright, going good
💭 [THINK] First, solve 4 * 10 = 40
🔍 [EVALUATE] Alright, going good
💭 [THINK] Equation is now 3 + 40 - 4 * 3
🔍 [EVALUATE] Alright, going good
💭 [THINK] Next, solve 4 * 3 = 12
🔍 [EVALUATE] Alright, going good
💭 [THINK] Equation is now 3 + 40 - 12
🔍 [EVALUATE] Alright, going good
💭 [THINK] Add first: 3 + 40 = 43
🔍 [EVALUATE] Alright, going good
💭 [THINK] Subtract: 43 - 12 = 31
🔍 [EVALUATE] Alright, going good
✅ [OUTPUT] 3 + 4 * 10 - 4 * 3 = 31

5. Why This Is Better Than Plain CoT

START: Defines the scope clearly
THINK: Moves one logical step at a time
EVALUATE: Catches mistakes early
OUTPUT: Final, validated result

This format essentially turns a “guessing” model into a “thinking” model that self-checks before answering.

6. Takeaway

By forcing a structured reasoning cycle — START → THINK → EVALUATE → OUTPUT — we can make any LLM (even fast, cheap ones) behave more reliably in problem-solving scenarios.

It’s a simple wrapper, but it transforms the quality of reasoning.
Instead of an instant guess, you get a clear, traceable, and self-verified thought process.

Building a Thinking AI from a Non-Thinking Model Using Chain-of-Thought

Table of contents