A comprehensive overview of the challenges posed by restricted context windows in Retrieval-Augmented Generation (RAG) apps:.

Token Limit and Context window in RAG:

Large Language Models (LLMs): RAG models often rely on pre-trained LLMs for the generation process. These LLMs are trained on massive amounts of text data, but they typically have limitations on the number of tokens (words or sub-words) they can process at once. This is referred to as the "token limit."
Context Window: The LLM can only consider a limited window of text preceding the generation point. This window represents the context that will be used to perform the generation process.

Impact of Token Limits:

Limited Retrieval Scope leading to inaccurate results:
- During retrieval, RAG identifies relevant documents based on the user's query.
- However, due to token limits, even though information has been retrieved by the retrieval process, only a portion of each document might be fed into the LLM for context.
- So basically even though the information has been retrieved, as the entire retrieved information cannot be passed, the LLM might feed on the incomplete information and provide the answer.
Difficulty with Long-Range Dependencies:
- LLMs with limited context windows might struggle to capture relationships and dependencies between information points that are far apart in the retrieved text.
- Meaning if the context that has been extracted by the information retrieval process has long paragraphs and these huge set of paragraphs that have interrelated and important data, the entirety might not be taken into consideration leading to illogical connections in the generated output.

How context window of LLMS cause hindrance in RAG apps