RAGs 3: A Deep Dive

So far we have used mostly simple and naive systems to try and implement a RAG system. This will be perhaps the last post to deal with one such model.

I was dabbling with the idea about having a program, with which i could chat with a set of documents if and when needed. This is also the closest to an actual real life use case i’ve gone towards.

💡

Not adding any code to this one cause it’s been sitting in my drafts for too long and it seems like I should directly work more on the next iteration of this. Whatever i’ve mentioned still works as described albeit inefficiently. If asked, i’ll upload it to github and share a link

Understanding why

Firstly, in situations where there is a massive set of documents for a product or a tool, it can be difficult for newbies to just search for the right keywords and get what they are looking for with the first couple searches.

Notwithstanding what it says about the actual writing of such documentations themselves, I hope we can both agree, that there needs to be a solution apart from a complete rewrite.

One such solution could be implemented by using really fast small scale LLMs whose job is to parse the user’s search’s meaning, somehow figure out which parts of the documentation is actually relevant (the most tricky part of this whole thing) and then just summarizing the entire content of those documents into the LLM context such that the user can now talk to the bot about it and clear his doubts. Perhaps have it generate some examples that can help them get started.

Some production ready implementations of this idea already exists in several documentations of widely used tools, primarily seen within the docs of AI tools themselves, trying to showcase their functionality in their own documentation sites. But here we will be implementing only a very simple, local, and terminal based implementation of it, running on dummy data.

A lot of optimization is indeed left in the code, along with the entire front end and backend connections needed to display the results in a graphically pleasing and easy to use way.

Regardless of that, this is a simple and solid foundation upon which one can try to implement their own larger, better optimized and more user friendly implementations.

So let’s take each part of the problem one by one and deal with it.

Jsonify

The main method of storage of our data will be in Json or similar formats so that they can be efficiently stored and also efficiently transferred back and forth between the LLM and the web servers should we choose to deploy it somewhere.

The first step then, is to convert all our documentation into Json. That’s where this program comes in.

LLM

This is a general piece of code that will work with any program that needs an LLM interface.

The actual connection with an LLM, local or online will be implemented here, along with whatever useful helper functions we might need that are frequently needed.

Chatbot

This is the actual implementation of the chatbot that chats with the user and retrieves relevant information.

todo: the relevant files are rechecked with every single query, make sure that they stay in the memory until and unless the next query is wildly different from the previous ones.

I tried to change it somewhat to turn it into a college notes summarizer using a completely different approach that still doesn’t use the actual Sentence transformer or vector embedding methods, but it is unoptimized, slow and impractical for most use cases until improved.

Still, when done right it can be a great help to students. So if there is sufficient response from you guys, I will actually go in there, clean up and optimize the code and release a post on the usable version in here sometimes later.

Conclusion

So that was my (perhaps) last naive implementation of a RAG. At this point, the limiting factor has become just how slow it is to push raw walls of texts into the LLM and letting it decide which one is relevant.

Later on, once the metadata is put into an embedding, it will become much faster to let vector logic do it’s work, and let the LLM do what it’s actually meant to do, process the text and generate insights.
See you soon in the next post.

(note: i actually wrote this almost over a year ago and forgot to post this lol)

Exploring RAGs 3

Understanding why

Jsonify

LLM

Chatbot

Conclusion

Subscribe to my newsletter

Sukalyan Roy

Sukalyan Roy

Exploring RAGs 3

Understanding why

Jsonify

LLM

Chatbot

Potential related use

Conclusion

Subscribe to my newsletter

Sukalyan Roy

Sukalyan Roy