Basic RAG Architecture

Prasad SawantPrasad Sawant
2 min read

What is RAG?

RAG stands for Retrieval Augmented Generation.

RAG technique enhances the capabilities of LLM by integrating them with external knowledge bases.

Breaking down the terms ‘Retrieval Augmented Generation’.

  • Retrieval - User query to LLM, instead of directly jump to answer; LLM uses the external sources to retrieve relevant information.

  • Augmented - After getting relavant information, LLM insert retrieved information into user prompt to enhance its context.

  • Generation - Using user query and relavant information, LLM gets the context and based on this context; LLM generate more accurate response.

Process

  1. Indexing

    In the Indexing process, we have full document that we divide into chunks and using LLM we create their vector embedding and store into vector database.

  2. Retrieval

    In Retrieval phase, User prompts to LLM, we create vector embeddings of user prompt, perform similarity search in vector databse and obtain relevant chunks.

    3. Generation

    In Generation phase, we have user prompt and relavant chunks, using these two thing, we create system prompt which has context of relevant chunks and execute query to LLM in order to get more accurate resposne.

Example: PDF RAG Agent

Workflow Diagram

Step-by-step Guide

Step 1 - Load PDF and divide into pages

file_path = './filename.pdf'
loader = PyPDFLoader(file_path)

docs = loader.load() # Divide pdf into pages

Step 2 - Create Chunks

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
)
split_docs = text_splitter.split_documents(documents=docs)

Step 3 - Create vector embedding of splitted text

embedder = GoogleGenerativeAIEmbeddings(
    google_api_key=API_KEY,
    model="models/text-embedding-004"
    )

vector_store = QdrantVectorStore.from_documents(
    documents=[],
    url="http://localhost:6333",
    collection_name="pdf_rag_agent",
    embedding=embedder
)

vector_store.add_documents(documents=split_docs)

Step 4 - Take input/query from user

query = input('what would you like to know> ')

Step 5 - Create Vector embedding of user prompt and perform similarity to obtain relevant chunks


retriver = QdrantVectorStore.from_existing_collection(
    url="http://localhost:6333",
    collection_name="pdf_rag_agent",
    embedding=embedder
)

search_result = retriver.similarity_search(
    query=query
)

Step 6 - Create System prompt and insert relevant chunks


system_prompt = f"""
You are an AI assistant which help to answer the question based on given context.


Refer below context to give answers:
{search_result}
"""
messages = [
    { "role": "system", "content": system_prompt }
]

Step 7 - Process the user prompt


client = OpenAI(
    api_key=os.environ['API_KEY'],
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/")


messages.append({ "role": "user", "content": query })

response = client.chat.completions.create(
            model="models/gemini-1.5-flash-001",
            response_format={"type": "json_object"},
            messages=messages
        )

parsed_output = json.loads(response.choices[0].message.content)

Checkout: https://github.com/prasad-genai/gen-ai-learning/blob/main/05/PDF_RAG_agent.py

Thanks for reading! If you found this helpful, don’t forget to like💖, drop a comment💬, and hit follow✅ for more!

1
Subscribe to my newsletter

Read articles from Prasad Sawant directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Prasad Sawant
Prasad Sawant