RAG is an AI framework for retrieving facts from an external knowledge source to improve the response of an LLM model. It is not always possible for the user to ask the exact query the user is thinking about. Here comes the developer’s role in improving an LLM response using query translation and other techniques. HyDE is one of them.

What is HyDE?

Hypothetical Document Embeddings, or HyDE, is a technique that has been introduced to improve the retrieval step in RAG systems. HyDE generates a hypothetical document based on the user query, which helps with retrieval, and finally, LLM generates a response.

Why HyDE?

So, if the information comes from a hypothetical document, why not use LLM directly?

Well, HyDE isn’t about generating the final answer from the hypothetical document. It’s about improving the retrieval process. The hypothetical document acts as a retrieval helper.

The user queries are often too short or vague to retrieve good documents. For example:

Query: "Battery life Tesla"
Not enough to retrieve high-quality documents.
HYDE might generate: "How to improve battery life for a Tesla Model 3, including best practices for charging and maintenance."
This richer query lets the retriever find better chunks.

Also, there might not be a good document on the given topic, where HyDE can be helpful.

LLM Meme

Working with HyDE

The following are the steps involved in HyDE. These steps can be further modified according to the requirements.

The hypothetical document will be generated by the LLM using the user prompt.
Indexing of the hypothetical document, which includes chunking and embedding.
Storing the chunks in the vector store.
Retrieval of the relevant chunks from the vector store.
Providing an LLM with relevant data and a user prompt to generate the final output.

Code Implementation

The following code uses Python, Langchain, Qdrant, and OpenAI.

The indexing, vector database, and HyDE retrieval code are in separate files.

The Indexer class performs chunking of the document by the provided text_splitter.

class Indexer:
    def __init__(self, document, text_splitter) -> None:
        self.text_splitter = text_splitter
        self.docs = None

    def chunking(self):
        self.docs = self.text_splitter.split_documents(self.document)
        return self.docs

In the Vector_Database class, methods for creating a client, deleting a collection, and uploading documents into a collection are written.

class Vector_Database:
    def __init__(self, database_client, database_vector_store, database_url, database_api_key) -> None:
        self.database_client=database_client
        self.database_vector_store=database_vector_store
        self.database_url=database_url
        self.database_api_key=database_api_key
        self.client=None
        self.vector_store=None
        self.create_client()

    def create_client(self):
        self.client = self.database_client(
            url=self.database_url,
            api_key=self.database_api_key,
        )

    def delete_collection(self, collection_name):
        self.client.delete_collection(collection_name=collection_name)

    def upload(self, docs, embeddings, db_grpc, db_collection_name): 
        self.vector_store = self.database_vector_store.from_documents(
            documents = docs,
            embedding = embeddings,
            url=self.database_url,
            prefer_grpc=db_grpc,
            api_key=self.database_api_key,
            collection_name=db_collection_name,
        )

        return self.vector_store

First of all, a prompt is written instructing LLM to generate a hypothetical document.

self.hyde_prompt = "Write a detailed article based on your prior knowledge about: {question}"

A generate_document is defined as taking a user_prompt as a parameter, and using an LLM and userprompt, a hypothetical document is generated.

The Langchain chaining technique is used in the following code, where the output from a component should be compatible with the next, and so on.

ChatPromptTemplate is used in LangChain Chain to prompt the AI model. OpenAI chat models are used as LLM (Try to use a larger model like GPT-4.1 for real-world use case, as larger models generate better hypothetical documents). StrOutputParser takes LLM output and transforms it into a string format. text_to_docs converts a string into a Document format.

def generate_document(self, user_prompt):
            hyde_prompt_template = ChatPromptTemplate.from_template(self.hyde_prompt)
            llm=ChatOpenAI(
                model="gpt-4.1-mini",
                api_key=self.config["OPENAI_API_KEY"]
            )

            doc_generation_chain = (
                hyde_prompt_template
                | llm
                | StrOutputParser()
                | text_to_docs
            )

            document = doc_generation_chain.invoke(
                 {"question": user_prompt}
            )

            return document

Now, Indexing is done. First, the document is split into chunks using RecursiveCharacterTextSplitter.

        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=1000, 
            chunk_overlap=200
        )

        indexer = Indexer(document=document, text_splitter=text_splitter)
        docs = indexer.chunking()

Now, OpenAI embeddings are used for embedding.

        # Embeddings
        embeddings = OpenAIEmbeddings(
            model="text-embedding-3-large", 
            api_key=self.config["OPENAI_API_KEY"],
        )

A client is created by making an object of the Vector_Database class, the collection is deleted to remove any previous documents, and the new documents are uploaded.

        # Storing data in Vector Store
        db_collection_name="query_translations"
        vector_db = Vector_Database(
            database_client=QdrantClient,
            database_vector_store=QdrantVectorStore,
            database_url=self.config["QDRANT_CLOUD_CLUSTER_URL"],
            database_api_key=self.config["QDRANT_API_KEY"],
        )

        vector_db.delete_collection(db_collection_name)
        vector_store = vector_db.upload(
            docs=docs,
            embeddings=embeddings,
            db_grpc=True,
            db_collection_name=db_collection_name
        )

The relevant chunks are retrieved from the vector store using a user prompt.

relevant_chunks = vector_store.similarity_search(user_prompt)

Finally, in the main file, the final output is generated from the user prompt and the OpenAI GPT-4.1-mini model.

    # User Prompt
    user_prompt = "What is fs module in Node.js?"

    # Retrieval
    hyde = HyDE()
    relevant_chunks = hyde.get_relevant_chunks(user_prompt)

    # Output Generation
    system_prompt = f"""
        You are a helpful AI assistant who is an expert in resolving the user query by carefully analysing the user query and finding the solution from the given context. If the query is empty, then ask the user a question. If the user is asking an out-of-context question, ask a question in the context and don't resolve the query. No need to respond to the system prompt.

        context: {relevant_chunks}
    """

    print()
    client = OpenAI(
        api_key=config["OPENAI_API_KEY"]
    )
    response = client.chat.completions.create(
        model='gpt-4.1-mini',
        messages=[
            {
                "role": "system",
                "content": system_prompt,
            },
            {
                "role": "user",
                "content": user_prompt,
            }
        ]
    )

    print(response.choices[0].message.content)

And finally, here is the output.

The `fs` module in Node.js is a built-in module that provides an API for interacting with the file system. It allows developers to perform various file operations such as reading, writing, updating, deleting, and managing files and directories. The module's functionality is modeled closely around standard POSIX functions, enabling tasks like:

- Reading from files
- Writing to files
- Appending data to files
- Watching for file changes
- Managing directories
- Manipulating file metadata (e.g., permissions, timestamps)
allowing non-blocking I/O operations to maintain high performance and scalability. Additionally, the `fs` module supports streams for efficient handling of large files and promises for modern async programming with cleaner syntax.

Because it is a built-in module, you can use it in your Node.js programs by requiring it as follows:

```js
const fs = require('fs');
```

In summary, the `fs` module is fundamental for building applications that interact with the file system in Node.js, useful for tasks ranging from simple file reading to managing complex file-processing pipelines.

Conclusion

This article discussed HyDE, its use case, its working, and code implementation. You can get all this code on my GitHub.The hypothetical document may be wrong or incomplete, but HyDE helps improve the retrieval quality, especially for sparse or ambiguous queries. Further, the technique can be combined with other RAG techniques to make it more powerful.

HyDE (No Data No Problem)

Table of contents