Retrieval-Augmented Generation (RAG) combines retrievers (that fetch relevant information) with generators (that produce natural language responses).

But like any system, RAG is not foolproof. If not designed carefully, it can fail in subtle but frustrating ways: wrong answers, incomplete context, or even hallucinations.

1. Poor Recall (Retriever Fails to Find Relevant Chunks)

The Problem:
Sometimes, the retriever doesn’t fetch the right chunks from the knowledge base. Even if the answer exists, the model may miss it because:

The query embedding doesn’t match the right document embedding.
Semantic meaning isn’t captured properly.

2. Bad Chunking (Losing Context at Boundaries)

The Problem:
If documents are chunked poorly (too long, too short, or splitting sentences midway), context gets lost.

3. Query Drift (Retriever Misinterprets the Query)

The Problem:
The retriever interprets the query too broadly or too narrowly, pulling irrelevant chunks. This happens when the query embedding doesn’t capture intent well.

4. Outdated Indexes (Knowledge Base Not Updated)

The Problem:
If the knowledge base is not updated regularly, RAG can fetch stale or incorrect information.

5. Hallucinations from Weak Context

The Problem:
Even when RAG retrieves chunks, if the context is too weak or incomplete, the LLM may start “filling gaps” with hallucinations.

Where RAG Fails

1. Poor Recall (Retriever Fails to Find Relevant Chunks)

2. Bad Chunking (Losing Context at Boundaries)

3. Query Drift (Retriever Misinterprets the Query)

4. Outdated Indexes (Knowledge Base Not Updated)

5. Hallucinations from Weak Context

Subscribe to my newsletter

Hrishith Savir

Hrishith Savir