As we what is RAG it the combination of technique which are LLM and information retrieval which gives more strength to the LLM

So Let me Enlighten you by telling you the LIMITATIONS of RAG

As we know there are two PHASES OF RAG in which it fails

~ Retrival

In some cases system does not pull right content or retrieve irrelevant information

~ Generation

The model provides wrong, incorrect or irrelevant poor data based on Retrival data

This both phases have to work together properly to get a proper output if Retrival fails the genration of output also fails no matters how good your llm is

~ Where it fails

1. Bad Retrieval → Bad Output

If the retriever can’t fetch the right documents (due to poor embeddings, indexing errors, or wrong query), the LLM will generate hallucinations.

Example: If the query embedding doesn’t match the stored vector well, the answer may come from irrelevant docs.

2. Low-Quality / Noisy Data

If the underlying documents are unstructured, messy, or contradictory, the model may pick misleading context.

3. Latency & Scaling Issues

With large datasets, retrieval can become slow or costly. Wrong infrastructure choices (like distant vector DB servers) can add lag and errors.

4. Hallucinations Still Happen

Even with the right docs, the LLM may fabricate connections or misinterpret text. RAG reduces hallucinations, but doesn’t eliminate them.

5. Bad Chunking

If documents are chunked randomly or without semantic awareness, the retriever may split important context.

6. Poor recall

Recall is simply how often the right document in the top-k retrieved results If recall is low, the model never sees the right info → it hallucinates or answers incorrectly.

There are the limitations of RAG

Where RAG fails

Subscribe to my newsletter

Bhushan Ingole

Bhushan Ingole