You might have heard of tools like Gemini, Grok, Claude, Chat GPT, etc. But what are they anyways? Do they really think by themselves or is it is just math on steroids?

Lets break it down.

But first, Lets see how an LLM works:

1. Tokenization

When you ask an LLM - “Hey, how are you?” do you think it really understands that?

Well, not really. A computer only understands in the form of 0s and 1s. The text that you give as an input gets converted in forms of “tokens“ (which are basically chunks). These tokens are then converted into numbers - and not just random numbers, but IDs from a predefined vocabulary that the model was trained on.

For example:

“Hey” → 1542
“how” → 874
“are” → 992
“you” → 671
“?” → 29

Now we have a sequence of numbers like:
[1542, 874, 992, 671, 29]

But here’s the thing — these numbers are still too dumb for the model to work with meaningfully.
That’s where vector embeddings come in.

2. Vector Embeddings

Although I have explained vector embeddings, still let me describe in brief in this article too.

After tokenization, each token ID gets mapped to a “vector” that represents its meaning in a high-dimensional space. The tokens with similar meanings will have their vectors get mapped close to each other. For example, laptop and computer (both come under electronics) will be mapped close to each other and apple and banana (both come under fruits)will get mapped close to each other. But both of these clusters will have a lot of distance between them.

3. Transformers

Once the tokens are turned into embeddings, they pass through layers of a neural network called a Transformer.

A transformer’s job is to process the relation of one token with another one. For example, If I say - “That dog is very cute“ , the transformer will relate it like it is in the figure:

(to experiment with this more, you can visit - https://llm-concept.vercel.app/ )

A lot of complex processing goes into this part. And finally - It comes to output prediction.

4. Output Prediction

After processing, the model predicts the next token. Just one (The most likely word or part of a word to follow).
It doesn’t think like a human — it’s just extremely good at predicting the next token based on patterns it has seen in its massive training data.

If we do this prediction thousands of times in a row, we get entire sentences that feel like real conversation.

Conclusion

So, to answer the original question:

Do they really think?
No. they don’t think or feel. It’s just math on steroids. but the math is so advanced and trained on so much human-generated text that it can mimic understanding remarkably well.

Do LLMs actually think?