RAG Tutorial Using Open Source LLMs, Vector Database, and Curl Part 4

We've come a long way! In Part 1, we set up our environment. In Part 2, we populated our Qdrant vector database with embedded data. In Part 3, we learned how to query this database using user input to retrieve relevant context. Now, it's time for the "Generation" part of Retrieval Augmented Generation (RAG). We'll use a Large Language Model (LLM) to synthesize an answer based on the user's query and the retrieved context. We'll also touch upon enhancements like rerankers and guards.

Part 4: LLM Integration for Answer Synthesis

The Role of the LLM: The LLM's job is not to answer from its general pre-trained knowledge alone, but to use the specific context we've retrieved from our Qdrant database to formulate a precise and relevant answer to the user's original query.
Crafting the Prompt for the LLM: This is a critical step. You need to combine the user's original query and the retrieved context into a single, clear prompt for the LLM. A common structure is:
```
 Context:
 [Retrieved context snippet 1 from Qdrant]
 [Retrieved context snippet 2 from Qdrant]
 ...

 Question: [User's original query]

 Answer:
```
For example, if the user asked, "What are the benefits of using a vector database?" and Qdrant returned two relevant snippets:
- Snippet 1: "Vector databases excel at fast similarity searches in high-dimensional spaces."
- Snippet 2: "They are crucial for applications like semantic search and recommendation systems."

The prompt to the LLM would be:

    Context:
    Vector databases excel at fast similarity searches in high-dimensional spaces.
    They are crucial for applications like semantic search and recommendation systems.

    Question: What are the benefits of using a vector database?

    Answer:

Calling the vector DB: You can call vector DB and get the relivant content with payload and process the payload to create context.

#Query: Harry Potter Books
curl --request POST \
  --url http://localhost:6333/collections/{collection_name}/points/query \
  --header 'Content-Type: application/json' \
  --header 'api-key: api_keys' \
  --data '{
    "query": [
                -0.032501220703125,
                -0.025634765625,
                ........More......,
                0.0248565673828125,
                -0.00542449951171875
            ],

    "limit": 10,
    "with_payload": true,
    "with_vector": false,
    "score_threshold": 0.65
  }'

{
    "result": {
        "points": [
            {
                "id": 3,
                "version": 304,
                "score": 0.7628476,
                "payload": {
                    "text": "Harry Potter and the Philosopher's Stone,J. K. Rowling,English,1997,120,Fantasy"
                }
            },
            {
                "id": 13,
                "version": 304,
                "score": 0.7597176,
                "payload": {
                    "text": "Harry Potter and the Goblet of Fire,J. K. Rowling,English,2000,65,Fantasy"
                }
            },
            {
                "id": 14,
                "version": 304,
                "score": 0.7571551,
                "payload": {
                    "text": "Harry Potter and the Order of the Phoenix,J. K. Rowling,English,2003,65,Fantasy"
                }
            }
        ]
    },
    "status": "ok",
    "time": 0.008392055
}

Calling the LLM: Send this combined prompt to your chosen LLM (Ollama, LMStudio, or Cloudflare Workers AI) via its API.
Conceptual curl example for an OpenAI-compatible completion endpoint:

# Here is the message body
curl --request POST \
  --url https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/v1/chat/completions \
  --header "Authorization: Bearer {api_token}" \
  --header "Content-Type: application/json" \
  --data '
    {
      "model": "@cf/meta/llama-3.1-8b-instruct",
      "messages": [
        {
            "role": "system",
            "content": "You are an AI assistant. Use the following information to answer the user's question. When answering modify as a active voice.If you cannot find the answer in the provided information, reply with `Don't have enough information, please connect with support.`"
        },
        {
            "role": "user",
            "content": "Retrieved Information:\n---\nHarry Potter and the Philosopher's Stone,J. K. Rowling,English,1997,120,Fantasy\n---\nHarry Potter and the Goblet of Fire,J. K. Rowling,English,2000,65,Fantasy\n---\nHarry Potter and the Order of the Phoenix,J. K. Rowling,English,2003,65,Fantasy\n---\nHarry Potter and the Deathly Hallows,J. K. Rowling,English,2007,65,Fantasy\n---\nHarry Potter and the Prisoner of Azkaban,J. K. Rowling,English,1999,65,Fantasy\n---\nHarry Potter and the Half-Blood Prince,J. K. Rowling,English,2005,65,Fantasy\n---\nHarry Potter and the Chamber of Secrets,J. K. Rowling,English,1998,77,Fantasy\n---\n\nUser Question:\nHarry Potter Books"
        }
    ]
    }
'

#Here is response
{
    "result": {
        "response": "I've got the information you need!\n\nHere are the Harry Potter books listed, along with their publication year and page count:\n\n1. Harry Potter and the Philosopher's Stone (1997, 120 pages)\n2. Harry Potter and the Chamber of Secrets (1998, 77 pages)\n3. Harry Potter and the Prisoner of Azkaban (1999, 65 pages)\n4. Harry Potter and the Goblet of Fire (2000, 65 pages)\n5. Harry Potter and the Order of the Phoenix (2003, 65 pages)\n6. Harry Potter and the Half-Blood Prince (2005, 65 pages)\n7. Harry Potter and the Deathly Hallows (2007, 65 pages)\n\nLet me know if you need anything else!",
        "usage": {
            "prompt_tokens": 245,
            "completion_tokens": 163,
            "total_tokens": 408
        }
    },
    "success": true,
    "errors": [],
    "messages": []
}

Receiving and Presenting the Answer: Parse the LLM's response to extract the generated answer and display it to the user.

That's it! Your basic RAG system is ready! You've successfully retrieved context and used an LLM to generate an informed answer.

Now in the next step we will be creating a Laravel app following this tutorial. Let’s get started here.

Part 4: Generating Answers with LLMs and Enhancing Your RAG System

Part 4: LLM Integration for Answer Synthesis

Subscribe to my newsletter

Debjit Biswas

Debjit Biswas