Hey everyone, welcome to the Gen AI series.

I hope you have enjoyed the previous blog where we discussed step by step how to chat to our PDF using RAG approach.

We will be continuing that code and modifying it to get more accurate results.

Introduction to Query Transformation

The entire game is of getting relevant answer to the user, getting what the user not just ‘asked‘, but what the user intends to ask.

We are already aware of Google search, do we get only what we have searched for, or all the things related and relevant to that query?

User’s query is not accurate enough to get what he actually needs

It has to be polished enough to get that specific thing from the AI

User’s Query - can be more abstract and can be less abstract as well

Ek example -

Let’s say aap ki ek dukaan hai hardware ki, waha pe sirf aap ku pata hai ki konsi chiz kidhar hai and kya naam hai etc etc

Ab aap ko zarurat padgayi ki ek ladke ko dukaan par rakhe - gaahak bahut zyada hai, dukaan badi hai

Ab usko kuch kuch chize malum hai, lekin experience dukaan ka nahi hai

Isliye wo pehle din se sahi kaam nahi karega, galtiya karega, ek maango koi aur chiz laayega,

isko pehle se hi samjhana hoga ki gaahak kya kahega - gaahak shyd kuch aur puche, gaahak ko shyd na maloom ho us chiz ke bare me, gaahak galti se kuch aur puch le

Ab aapko ye sab train karna hoga so that apka ladka jab aap na ho dukaan par, jab bhi wo usko sambhal sake

Aise hi RAGs ka masla hai, they are intelligent, but still we need to polish them and make them reiterate on what user has asked and what to respond to him…

Parallel Query - Fan out retrieval

Let’s explore how this approach works

So, here we will be using langchain package

First, we have to import Multi Query Retriever

from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_openai import ChatOpenAI

Then, we need to call this Multi Query retriever on the regular retriever we have

Before that, we need multiple queries generated from the user query, so we are using ChatOpenAI model to perform that task

llm = ChatOpenAI(
    model_name="gpt-4o-mini",
    temperature=0,
    openai_api_key=os.environ.get("OPENAI_API_KEY")  
)

Next, we will be using multi query retriever to do multiple similarity searches and get relevant chunks from them and then we can derive the context for our AI model

retriever = QdrantVectorStore.from_existing_collection(
    url="http://localhost:6333",
    collection_name="langchain_learning",
    embedding=embedder
)

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=retriever.as_retriever(),
    llm=llm
)

Final step is to set the context and log our output, here it internally selects unique chunks from multiple results and returns that

relevant_docs = multi_query_retriever.invoke(user_query)

SYSTEM_PROMPT = f"""
    You are a helpful AI assistant who has access to a specific document of user,
    and user will ask questions from it, answer only those,
    i have given you context from where you have to answer it

    context={relevant_docs}
"""

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_query},
    ]
)

print(response.choices[0].message.content)

The output i got for the question “Explain the major challenges faced during AI model deployment.“

The major challenges faced during AI model deployment include:

Adjusting Workflows: Developers need to adapt their existing workflows, prompts, and data to work with the new models, which can have unique quirks, strengths, and weaknesses.

Versioning and Evaluation Infrastructure: Without proper infrastructure in place for versioning and monitoring the performance of the models, deployment can lead to numerous complications and operational headaches.

Regulatory Changes: Regulations surrounding AI technologies are constantly evolving. For example, AI resources can be heavily regulated as national security issues, and compliance with regulations such as the GDPR can be costly and complex.

Compute Resource Availability: Changes in laws can suddenly limit access to compute resources, such as being banned from purchasing GPUs from certain vendors, impacting the ability to deploy models effectively.

Intellectual Property Concerns: There are uncertainties regarding intellectual property when utilizing models trained on data that may not be owned by the developer. This can create hesitancies, especially for companies deeply invested in their IP.

These challenges highlight the importance of thorough planning and consideration of the evolving landscape of AI deployment.

Stay tuned, as we will discuss more approaches in Query Transformation in our upcoming blogs

Reciprocate Rank Fusion
Query Decomposition

Well, this is the end of one of the concepts of Advanced Rag approaches - Parallel Query Retrieval.

See you in the next blog

Let’s connect here on Twitter

Peace out ✌️

Query Transformation - RAG series

Table of contents

Introduction to Query Transformation

Parallel Query - Fan out retrieval

Subscribe to my newsletter

Mubashir Ahmed

Mubashir Ahmed