Hey there, and welcome to my blog! I’m May — a passionate software developer with just enough experience to confidently break things and enough curiosity to fix them (usually).

I’ve been on this wild ride through the tech world since 2020-ish, but let’s just say things got seriously real in 2022 when I dove headfirst into software engineering. So whether we count it as three years or five, I’ve picked up some solid bruises—uh, I mean, experience—along the way.

I genuinely love turning “hmm, how does this work?” into “hey, I built this!” and I’m big on sharing what I learn in a way that’s honest, beginner-friendly, and occasionally chaotic.

Fun fact: This is my very first blog post! I was going to start with a chill intro about me and my tech journey, but then I thought—why not kick things off with an actual bang? So here we are.

In this post, I’m walking you through how I built my own RAG-powered AI assistant using LangChain and OpenAI. It’s a mix of research, code, and developer moments ( minor crises and major breakthroughs), broken down simply with visuals to help you follow along.

Whether you're new to RAG, curious how all these fancy acronyms fit together, or just here for the vibes—this one's for you. Let’s get into it 🚀

What Even is RAG? (And Why It’s So Cool)

Large Language Models (LLMs) like GPT-4 are incredibly smart — they can write essays, debug code, and even explain what a monad is (kind of). But here’s the thing: they don’t actually know what’s going on in your life, your app, or your files right now. Their knowledge ends at whatever data they were trained on.

That’s where RAG — Retrieval-Augmented Generation — swoops in like a genius sidekick. It gives your LLM access to real-time, custom information from your own content — whether that's a set of documents, a database, or a folder full of .txt files. Basically, it’s like giving ChatGPT a memory boost with your own notes.

At its core, RAG = LLM + Retrieval.
That means instead of just asking the model for answers and hoping it guesses well, you first fetch relevant info from your data and pass it in as context before the LLM replies.

And that makes RAG... kind of magical.

Why? Because:

🔹 It keeps your answers grounded in facts — not guesses
🔹 It works with your actual content (not just what's on the internet)
🔹 It’s flexible, updatable, and super useful for real-world apps

Inside the RAG Framework: Indexing, Retrieval, and Generation

Now that we know what RAG is, let’s break down how it actually works under the hood.

Indexing: Prepping the Knowledge

Before your AI can give smart answers, it needs to understand the content it's allowed to reference. This setup process is called indexing — and it’s all about organizing the knowledge for easy access later.

Here’s what happens during indexing:

Load your documents (text files, PDFs, Markdown, etc.)
Split the text into smaller, manageable chunks (so the AI doesn’t choke on long files)
Embed those chunks into vectors — which means converting them into a format a computer can search by meaning, not just keywords
Store those vectors in a vector database (like FAISS, Chroma, Pinecone, etc.)

Retrieval + Generation: The Magic Moment

Now, when a user asks a question, RAG kicks in.

The user types a query (e.g., “What’s in May’s resume?”)
A retriever searches the vector database to find the most relevant chunks
Those chunks are added as context to a prompt
The LLM (e.g., GPT-3.5 or GPT-4) uses that prompt to generate a response

Building My RAG-AI Assistant for My Portfolio

As part of improving the user experience on my portfolio, I built a Retrieval-Augmented Generation (RAG) chatbot assistant powered by LangChain and OpenAI.
Instead of just displaying static information, my site can now respond to visitor questions using real content from my CV, academic records, and project history — all stored and processed locally.

Whether someone wants to know “What tech have you worked with?” or “What’s your academic background?”, this assistant can respond instantly — and accurately — using my own data.

Tech Stack & Tools I Used

LangChain – To connect prompts, memory, and retrievers with ease
OpenAI (GPT-3.5-turbo) – For generating natural responses
LangChain’s Memory Vector Store – Stored vectorized chunks of my content
Plain Text Files – My resume, academic info, and project notes
Backend: Node.js – Set up endpoints and vector logic
Frontend: React + Tailwind CSS – Clean UI for the chatbot
Postman – For testing and debugging endpoints

How It Works (In My Project)

Preparing My Data: I wrote up my CV, academic record, and relevant project summaries as plain .txt files and loaded them into LangChain’s document loader.
Splitting and Embedding: The text was split into smaller chunks and converted into embeddings using OpenAI’s API, then stored in LangChain’s vector memory store.
User Input→ Retrieval: When a user types something like “What did May study?”, the system uses a retriever to find the most relevant chunks based on similarity.
Generation: Those chunks are added to a prompt and passed to the LLM. The model uses that specific context to generate an accurate, personalized answer.
Tying It Together With LangChain: LangChain made all of this super smooth — chaining loaders, vector storage, retrievers, and prompts without reinventing the wheel.

Challenges, Lessons & What’s Next

Building this project was an eye-opener — not because RAG is overly complex, but because stitching everything together in my own environment took real effort. I ran into a few bumps, like OpenAI rate limits (fun times), and learned just how powerful yet sensitive RAG systems can be. Tiny things—like how you chunk your data or pass prompts—can totally shift results. That said, it’s been incredibly rewarding, and I’m already working on the next version: a fully conversational assistant that stores chat history, remembers context better, and feels more like a guide than just a tool.

Stay Connected

The full source code isn’t public yet, but I’m always open to feedback, ideas, and collabs.

Feel free to check out my GitHub Profile — and maybe we can build/fork something cool down the line.

Or, better yet, visit my portfolio to try out the chatbot yourself!
Experiment, explore, and if you have thoughts — just drop them in the contact section. I’d love to hear what you think.

Resources & Credits

If you’d like to dig deeper into the tools and concepts mentioned in this post, here are some helpful resources that guided and powered this project:

LangChain Documentation – The framework that ties together retrieval, prompts, and models so intuitively
OpenAI API Docs – For GPT models, embeddings, and prompt building
Chroma Vector Store – A fast, open-source vector database used by LangChain
LangChain JS GitHub – If you're building with Node.js
Postman – For testing your API endpoints with ease
Retrieval-Augmented Generation (RAG) Guide – HuggingFace – A great high-level explanation of how RAG works

Thanks for reading. 💬✨.

🔖 Tags: #rag #langchain #openai #gpt35 #developerportfolio #aiassistant #reactjs #buildinpublic #javascript, #nodejs, #reactjs, #retrievalaugmentedgeneration

Zero to RAG AI