Step-back prompting is a technique to improve RAG’s reasoning by rephrasing the user’s query in such a way it considers broader aspects before searching. By stepping back the model first potentially identifies the user’s true intent and fetches more relevant knowledge chunks, and then generates a final answer based on this context.

For Example:

Original Question : Jan Sindel’s was born in what country?

Step Back Question : what is Jan Sindel’s personal history?

Let’s Code

Load the Data Source, Chunking and store it in Vector Database

pdf_path = Path(__file__).parent / "file_path.pdf"

# Load the document from the PDF file
loader = PyPDFLoader(file_path=pdf_path)
docs = loader.load()

# Split the document into smaller chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
split_docs = text_splitter.split_documents(documents=docs)

embeddings = OpenAIEmbeddings(
                model="text-embedding-3-small",
                api_key=OPENAI_API_KEY,
            )

# Create a new vector store - if collection doesn't already exist
vector_store = QdrantVectorStore.from_documents(
  documents=[],
  url="http://localhost:6333",
  collection_name="rrf", # Name of your collection in Qdrant
  embedding=embeddings
)

# Add the documents to the vector store
vector_store.add_documents(split_docs)

SubQuery Generation

retriever = QdrantVectorStore.from_existing_collection(
  url="http://localhost:6333",
  collection_name="storage",
  embedding=embeddings
)

user_query = "Jan Sindel’s was born in what country?"

hypothetical_prompt = f"""
You are a helpful assistant who answer user's query.

Example:

Query: Jan Sindel’s was born in what country?
Step-back Query: what is Jan Sindel’s personal history?
Answer: Jan Šindel was born in Hradec Králové, a city in the Kingdom of Bohemia, which is part of the present-day Czech Republic. 
"""

# Get hypothetical answer using OpenAI
hypothetic_answer = chat_model.invoke(
    [
        {"role": "system", "content": hypothetical_prompt},
        {"role": "user", "content": user_query}
    ]
).content

relevant_chunks = retriever.similarity_search(
  query=hypothetic_answer,
)

Generation Part

final_system_prompt = f"""
You are a helpful assistant who answer user's query by using the following pieces of context.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

context: {relevant_chunks}
"""

final_response = chat_model.invoke(
    [
        {"role": "system", "content": final_system_prompt},
        {"role": "user", "content": user_query}
    ]
)

print("\nFinal Answer:\n")
print(final_response.content)

Let’s Connect

LinkedIn: linkedin.com/in/revathi-p-22b060208

Twitter: x.com/RevathiP04

Step Back Prompting

Table of contents

Let’s Code

Load the Data Source, Chunking and store it in Vector Database

SubQuery Generation

Generation Part

Let’s Connect

Subscribe to my newsletter

Revathi P

Revathi P