HyDE generates a hypothetical document ( based on Pre-Trained Knowledge of LLM ), from User Query, then instead of creating vector embeddings of user query and search in our vector database, it creates embeddings of that hypothetical Document , and find Relevant docs from database, then it use those Relevant docs as context with Query to generate more Accurate Response .

Hypothetical Document Embedding :

What is it ?

It's a technique where instead of directly searching with the query, the system generates a Hypothetical Document using its pre-trained knowledge based on the query, and then uses that fake document’s vector embeddings to find most relevant ( semantic search based ) document and use that as context for generating more Accurate Response .

✏

Note: We need Large LLMs to use this technique, so that they have more knowledge / context - to be able to generate hypothetical Docs based on that.

Where do we apply this in RAG ?

RAG contains , three major steps, Indexing Retrieving Generation , now Indexing is storing Data sources in Database by creating Vector embeddings of data chunks , Retrieving process starts, after receiving User Query to get relevant data and we pass it as context with user Query to Generation part, which finally generates Response.
So, this HyDE technique is applicable at second step , i.e RETRIEVAL

How does it work ?

On Receiving user query, we ask our LLM to create a hypothetical document, using its Pre-Trained Knowledge / Context ( it already have )
We process that document by creating its vector embeddings & doing Semantic Search on it , which gives us a more relevant Context.
Now with the Context and user Query, LLM generates more Precise Response for user.

How Response got More Accurate ?

We increased Context ( by providing , more relevant context ) by creating a Relevant fake document from Existing knowledge of LLM, with that augmented context, we got more Precise Response aligned with the user Query.

How is it different from Normal RAG ?

In Normal RAG, we do retrieval process by creating Vector embeddings on User Query directly, then searching for Semantically related data in Database to find Context, whereas in here, we are creating a hypothetical document, then using it to get Relevant Context - hence getting better Context.

Working Step by Step with Code & Visual :

From user-query, we create a hypothetical document
We find document’s vector embeddings then we do semantic search on those to get relevant context
Using that context and User Query, our LLM generates more Accurate Response.