Basic RAG Architecture

What is RAG?

RAG stands for Retrieval Augmented Generation.

RAG technique enhances the capabilities of LLM by integrating them with external knowledge bases.

Breaking down the terms ‘Retrieval Augmented Generation’.

Retrieval - User query to LLM, instead of directly jump to answer; LLM uses the external sources to retrieve relevant information.
Augmented - After getting relavant information, LLM insert retrieved information into user prompt to enhance its context.
Generation - Using user query and relavant information, LLM gets the context and based on this context; LLM generate more accurate response.

Process

Indexing

In the Indexing process, we have full document that we divide into chunks and using LLM we create their vector embedding and store into vector database.
Retrieval

In Retrieval phase, User prompts to LLM, we create vector embeddings of user prompt, perform similarity search in vector databse and obtain relevant chunks.

3. Generation

In Generation phase, we have user prompt and relavant chunks, using these two thing, we create system prompt which has context of relevant chunks and execute query to LLM in order to get more accurate resposne.

Example: PDF RAG Agent

Workflow Diagram

Step-by-step Guide

Step 1 - Load PDF and divide into pages

file_path = './filename.pdf'

loader = PyPDFLoader(file_path)

docs = loader.load() # Divide pdf into pages

Step 2 - Create Chunks

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
)
split_docs = text_splitter.split_documents(documents=docs)

Step 3 - Create vector embedding of splitted text

embedder = GoogleGenerativeAIEmbeddings(
    google_api_key=API_KEY,
    model="models/text-embedding-004"
    )

vector_store = QdrantVectorStore.from_documents(
    documents=[],
    url="http://localhost:6333",
    collection_name="pdf_rag_agent",
    embedding=embedder
)

vector_store.add_documents(documents=split_docs)

Step 4 - Take input/query from user

query = input('what would you like to know> ')

Step 5 - Create Vector embedding of user prompt and perform similarity to obtain relevant chunks


retriver = QdrantVectorStore.from_existing_collection(
    url="http://localhost:6333",
    collection_name="pdf_rag_agent",
    embedding=embedder
)

search_result = retriver.similarity_search(
    query=query
)

Step 6 - Create System prompt and insert relevant chunks


system_prompt = f"""
You are an AI assistant which help to answer the question based on given context.


Refer below context to give answers:
{search_result}
"""
messages = [
    { "role": "system", "content": system_prompt }
]

Step 7 - Process the user prompt


client = OpenAI(
    api_key=os.environ['API_KEY'],
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/")


messages.append({ "role": "user", "content": query })

response = client.chat.completions.create(
            model="models/gemini-1.5-flash-001",
            response_format={"type": "json_object"},
            messages=messages
        )

parsed_output = json.loads(response.choices[0].message.content)

Github link

Checkout: https://github.com/prasad-genai/gen-ai-learning/blob/main/05/PDF_RAG_agent.py

Thanks for reading! If you found this helpful, don’t forget to like💖, drop a comment💬, and hit follow✅ for more!

Basic RAG Architecture

What is RAG?

Process

Indexing

Retrieval

3. Generation

Example: PDF RAG Agent

Workflow Diagram

Step-by-step Guide

Github link

Subscribe to my newsletter

Prasad Sawant

Prasad Sawant