Basic RAG Architecture


What is RAG?
RAG stands for Retrieval Augmented Generation.
RAG technique enhances the capabilities of LLM by integrating them with external knowledge bases.
Breaking down the terms ‘Retrieval Augmented Generation’.
Retrieval - User query to LLM, instead of directly jump to answer; LLM uses the external sources to retrieve relevant information.
Augmented - After getting relavant information, LLM insert retrieved information into user prompt to enhance its context.
Generation - Using user query and relavant information, LLM gets the context and based on this context; LLM generate more accurate response.
Process
Indexing
In the Indexing process, we have full document that we divide into chunks and using LLM we create their vector embedding and store into vector database.
Retrieval
In Retrieval phase, User prompts to LLM, we create vector embeddings of user prompt, perform similarity search in vector databse and obtain relevant chunks.
3. Generation
In Generation phase, we have user prompt and relavant chunks, using these two thing, we create system prompt which has context of relevant chunks and execute query to LLM in order to get more accurate resposne.
Example: PDF RAG Agent
Workflow Diagram
Step-by-step Guide
Step 1 - Load PDF and divide into pages
file_path = './filename.pdf'
loader = PyPDFLoader(file_path)
docs = loader.load() # Divide pdf into pages
Step 2 - Create Chunks
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
)
split_docs = text_splitter.split_documents(documents=docs)
Step 3 - Create vector embedding of splitted text
embedder = GoogleGenerativeAIEmbeddings(
google_api_key=API_KEY,
model="models/text-embedding-004"
)
vector_store = QdrantVectorStore.from_documents(
documents=[],
url="http://localhost:6333",
collection_name="pdf_rag_agent",
embedding=embedder
)
vector_store.add_documents(documents=split_docs)
Step 4 - Take input/query from user
query = input('what would you like to know> ')
Step 5 - Create Vector embedding of user prompt and perform similarity to obtain relevant chunks
retriver = QdrantVectorStore.from_existing_collection(
url="http://localhost:6333",
collection_name="pdf_rag_agent",
embedding=embedder
)
search_result = retriver.similarity_search(
query=query
)
Step 6 - Create System prompt and insert relevant chunks
system_prompt = f"""
You are an AI assistant which help to answer the question based on given context.
Refer below context to give answers:
{search_result}
"""
messages = [
{ "role": "system", "content": system_prompt }
]
Step 7 - Process the user prompt
client = OpenAI(
api_key=os.environ['API_KEY'],
base_url="https://generativelanguage.googleapis.com/v1beta/openai/")
messages.append({ "role": "user", "content": query })
response = client.chat.completions.create(
model="models/gemini-1.5-flash-001",
response_format={"type": "json_object"},
messages=messages
)
parsed_output = json.loads(response.choices[0].message.content)
Github link
Checkout: https://github.com/prasad-genai/gen-ai-learning/blob/main/05/PDF_RAG_agent.py
Thanks for reading! If you found this helpful, don’t forget to like💖, drop a comment💬, and hit follow✅ for more!
Subscribe to my newsletter
Read articles from Prasad Sawant directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
