How Vector Databases and RAG Solve AI Hallucination Problems

Artificial Intelligence, or more accurately, Machine Learning has been a hot topic for the past few years, and rightfully so. With the rise of LLMs (Large Language Models), how we search, create, and communicate has fundamentally changed.

From asking how to do a recipe to fixing HVAC issues it has become an integral part of so many lives. Fact is, some companies have been working on machine learning for a long time like Google who integrated machine learning for spell check 2001.

There is one major issue with LLMs and that is hallucinations. Hallucinations are like statistical fluctuations within its guesswork. Guesswork you say?!? Yes - primarily LLMs are statistical analysis in which there is no true intelligence simply retrieval of data that statistically matches what you are looking for. Apple even wrote a white paper the illusion of thinking on how the LRM(Large Reasoning Models), which is built on top of LLMs, are not true intelligence and just hallucinate to a statistically ‘close’ answer. Does this mean LLMs will fall short in our search for true artificial intelligence? Yes, it is not an end all be all tool.   But I digress, LLMs are still an incredibly powerful tool that can handle so many different tasks to make jobs easier. The real question is how do we limit hallucinations within LLMs?

With RAG(Retrieval Augmented Generation) & fine tuning. I will give you an example to better explain:

Let’s say you have a company with public-facing documentation on different products. You want your customers to more easily search / use that documentation and possibly lower the amount of time support uses for talking to customers, but don’t want to upload it into an internet based LLM like ChatGPT. You are also afraid of hallucinations causing issues by providing customers with incorrect information.   There is a solution:

1.) Utilize an open source model like Mistral-7B-Instruct or Mistral-7B which checks all the boxes

Open weight license (Apache 2.0 or similar) → fully commercial use
Fast and efficient → excellent for inference and fine-tuning
Supported by major frameworks: HuggingFace Transformers, vLLM, and more
With great ecosystem with tools like:
- LangChain
- LlamaIndex
- Qdrant, Weaviate, Pinecone, ChromaDB, etc.

2.) Host that model locally, on a server on-prem, or if you are not against it in the cloud such as AWS, Azure, or Machine Learning specific hosting services.

3.) Use Langchain(my personal preference) or similar to create a RAG(Retrieval Augmented Generation) based system for your needs. This comprises typically of a model(Mistral-7B), a vector database, and your documents that are the facts.

4.) Once those systems are built we feed the vector database the information we want to use as the facts. This is where the RAG comes into play. You search your vector DB for the information then use the model to create the response in a grounded way. There is no need for the model to know anything other then how to formulate a response that is more easily digestible than “a+b=c” instead it says “based on your question you can do a+b=c because it fits more to your problem”. Of course you can remove the Model and just retrieve information like a powerful search engine, but as humans that’s not ideal. Why is that??

Humans are inherently biased toward human-sounding responses
We’re wired to:

Trust conversational tone more than raw data
Interpret empathy and context as signs of intelligence
Assign intent, reasoning, and meaning to well-structured language — even when none exists

That’s exactly why LLMs feel so convincing -
They don’t truly understand, but they sound like they do and that’s enough for our brains to trust them.
Even if the model is just rephrasing something like:
“You can reset your device by holding the power button for 10 seconds,”
it sounds more trustworthy when framed like:
“No problem! Based on what you’re asking, it sounds like your device might need a reset. You can do that by holding the power button for about 10 seconds that should do the trick.”

                   +-------------------+
                   |   User Query      |
                   +-------------------+
                             |
                             v
                   +-------------------+
                   |   Search Vector   |
                   |   Database (e.g., |
                   |   ChromaDB, etc)  |
                   +-------------------+
                             |
         +-------------------+-------------------+
         |                                       |
         v                                       v
+-------------------+                +-----------------------+
|  Retrieved Chunks |                |   Open Source LLM     |
|  (Relevant Docs)  |                |   (e.g., Mistral-7B)   |
+-------------------+                +-----------------------+
         |                                       ^
         +-------------------+-------------------+
                             |
                             v
                   +-------------------------+
                   |   Response Generator    |
                   | (LLM crafts answer from |
                   |   doc + prompt context) |
                   +-------------------------+
                             |
                             v
                   +-------------------+
                   |   Final Answer    |
                   +-------------------+
                             |
                             v
                   +---------------------------+
                   |  Feedback Logging System  |
                   | (Log uncertain responses, |
                   | enable fine-tuning later) |
                   +---------------------------+

5.) Once you have your system built with RAG then next thing to do is test it. Going through it internally or within close peers to have them find flaws. You’ll still have some slight issues, but you can check for the confidence of a models answer. If it doesn’t reach a specified limit, return “I’m sorry, but I was unable to find an answer to the question you are asking” -> Log that specific question that was asked and fine tune it for that or block that question if it wasn’t even contextually in that realm.

Once your RAG system is ready, you’ll likely want to make it public-facing. But how do you protect it from bots, abuse, or unauthorized users?

That’s where CIAM (Customer Identity and Access Management) comes in.

CIAM, or Customer Identity and Access Management, is essentially IAM tailored for customer-facing environments. It enables you to build a secure, slightly restricted portal where customers can easily sign up and access documentation or other services; while ensuring that only authorized users get through. A well-designed CIAM system becomes the foundation for a centralized customer experience and supports scalability as your platform grows.

You can build your own backend for access management using frameworks like PHP with Phalcon, Python with Django, or Node.js. However, in many cases, it’s more efficient to offload this responsibility to a platform like OneLogin CIAM. Offloading reduces technical overhead, avoids reinventing the wheel, and allows your team to focus on your core product.

OneLogin CIAM still offers full customizability via APIs and SDKs while providing out-of-the-box compliance with standards like GDPR, SOC 1 Type 2, SOC 2 Type 2, SOC 3, and following the NIST Cybersecurity Framework.

Regardless of which option you go with RAG, vector search, and grounded AI responses aren’t just technical challenges, they’re trust challenges. If your users can’t trust what the AI says (or who’s using it), the tech doesn’t matter.

What’s Next?

This is just the beginning of my new Tuesday series: Tech Tuesday, where I break down interesting technical systems, tools, and real-world projects and maybe how something works.

Coming soon:

Reverse-Engineering a Scam from the Inside (with Social Engineering and Honeypots)
Fixing LLMs with Real-Time Feedback Loops: Building a self-correcting RAG system
Tracking User Behavior Without Cookies: Building lightweight, privacy-respecting session intel
How I Built a Personal Threat Detection System: Using AI + browser fingerprinting
What Happens When You Inject IAM Context into Frontend UI?: Designing apps that show trust levels, not just enforce them

Follow along if you’re into engineering, real security stories, and the code that makes it all possible. I've been in it for nearly 10 years and I love to learn what is possible and push the boundaries of what can be done.

Sure, you could Google it. Or ask ChatGPT.
But OneLogin’s blog and learning center already have the answers and fewer hallucinations.

How Vector Databases and RAG Solve AI Hallucination Problems

That’s where CIAM (Customer Identity and Access Management) comes in.

What’s Next?

Subscribe to my newsletter

Jeffrie Budde

Jeffrie Budde