How context window of LLMS cause hindrance in RAG apps

Farhan NaqviFarhan Naqvi
2 min read

A comprehensive overview of the challenges posed by restricted context windows in Retrieval-Augmented Generation (RAG) apps:.

Token Limit and Context window in RAG:

  • Large Language Models (LLMs): RAG models often rely on pre-trained LLMs for the generation process. These LLMs are trained on massive amounts of text data, but they typically have limitations on the number of tokens (words or sub-words) they can process at once. This is referred to as the "token limit."

  • Context Window: The LLM can only consider a limited window of text preceding the generation point. This window represents the context that will be used to perform the generation process.

Impact of Token Limits:

  • Limited Retrieval Scope leading to inaccurate results:

    • During retrieval, RAG identifies relevant documents based on the user's query.

    • However, due to token limits, even though information has been retrieved by the retrieval process, only a portion of each document might be fed into the LLM for context.

    • So basically even though the information has been retrieved, as the entire retrieved information cannot be passed, the LLM might feed on the incomplete information and provide the answer.

  • Difficulty with Long-Range Dependencies:

    • LLMs with limited context windows might struggle to capture relationships and dependencies between information points that are far apart in the retrieved text.

    • Meaning if the context that has been extracted by the information retrieval process has long paragraphs and these huge set of paragraphs that have interrelated and important data, the entirety might not be taken into consideration leading to illogical connections in the generated output.


0
Subscribe to my newsletter

Read articles from Farhan Naqvi directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Farhan Naqvi
Farhan Naqvi

๐Ÿš€ Passionate about AI/ML | Software Engineer | Research Enthusiast ๐Ÿš€ ๐Ÿ’ป As an Associate Software Engineer at Veritas Technologies LLC, I'm immersed in cutting-edge technologies, including C++, Elastic Stack (ELK), PostgreSQL, Docker, Kubernetes, and more. With a keen interest in AI and ML, I've delved into generative AI, machine learning, and deep learning, crafting projects that push the boundaries of innovation and efficiency. ๐Ÿ‘ฉโ€๐Ÿ’ป Additionally, I have a strong passion for research and have authored two papers on video processing during my undergrad. Currently, I'm exploring the bias in state-of-the-art LLMs, aiming to contribute to the understanding and mitigation of bias in AI systems.