Part 1: Understanding RAG Fundamentals & Setting Up Your Environment

Before diving headfirst into building a Retrieval Augmented Generation (RAG) system with a specific framework, it's crucial to grasp the fundamental concepts. Understanding the "why" and "how" behind each component empowers you to build more robust, flexible, and efficient systems, regardless of the language or framework you choose.
This series will guide you through building a RAG system from the ground up, using open-source models and OpenAI-compatible APIs. Our goal is to avoid vendor lock-in and reliance on third-party SDKs or wrappers, giving you full control and understanding.
Key Concepts You Need to Know
What is RAG (Retrieval Augmented Generation)?
What is a vector database?
What is an embedding?
What is a prompt?
What is a guard (guardrail)?
What is a reranker?
Model Used in this Example:
Type | Model | Settings |
Embeddings | bge-base-en-v1.5 | hosted on cloudflare / Size = 768 / Distance = Cosine |
Chat | llama-3-8b-instruct | hosted on cloudflare |
Now you know you have all the information about the basics; let’s start with the setup.
Prerequisites & Setup
Let's get our environment ready. We'll need tools for LLM interaction, embedding generation, a vector database, and API testing.
LLM and Embedding Provider: We need a way to generate embeddings and interact with an LLM. You have a couple of great options:
Locally Hosted (Self-Managed):
Ollama: Allows you to run open-source LLMs locally.
LMStudio: Another excellent tool for running LLMs on your machine. These provide OpenAI-compatible API endpoints, making them easy to integrate.
Cloud-Based (Managed Service):
Cloudflare Workers AI: Cloudflare offers a generous free tier for their AI services, including access to various open-source LLMs and embedding models through OpenAI-compatible endpoints. This is a fantastic option for getting started quickly.
- Sign up or log in at Cloudflare.com.
Vector Database: We'll use Qdrant, a powerful open-source vector database.
Qdrant Cloud: Offers a managed service. You can set up an account here: Qdrant Cloud Account Setup.
Self-Hosted Qdrant: You can also run Qdrant locally via Docker or other installation methods.
API Testing Tool: To interact with the APIs for embeddings, LLMs, and Qdrant, a tool like Insomnia or Postman will be very helpful.
- Download and install your preferred tool.
Gather Your Credentials: Once you've set up your Cloudflare/Ollama/LMStudio and Qdrant accounts/instances, make sure you have the following:
API endpoint URLs for your chosen LLM and embedding model.
Any necessary API keys.
Your Qdrant instance URL and API key (if applicable).
Create Your First Qdrant Collection: A "collection" in Qdrant is like a table in a traditional database, but it stores your vectors. Let's create one for our RAG application. You can do this via the Qdrant dashboard or its API.
For example, using
curl
(replace placeholders with your actual Qdrant URL, API key, and desired collection name/vector parameters):curl --request PUT \ --url http://localhost:6333/collections/replace_with_your_collection_name \ --header 'Content-Type: application/json' \ --header 'api-key: api-key' \ --data '{ "vectors": { "size": 768, "distance": "Cosine" } }'
Note: The
size
of the vector must match the output dimension of the embedding model you choose (e.g.,bge-base-en-v1.5
has 768 dimensions).
With these prerequisites in place, you're ready to move on to the next step: populating your vector database!
Subscribe to my newsletter
Read articles from Debjit Biswas directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Debjit Biswas
Debjit Biswas
Developer