AI Agents for Legal: Implementing Complex Document Search and Reasoning Agent using Qdrant and Llamaindex

Sachin KhandewalSachin Khandewal
15 min read

Introduction

Legal documents are notoriously complex. Their dense language, intricate structures, and reliance on precedent make them challenging to parse, even for seasoned professionals. This complexity creates significant bottlenecks in the legal industry, where time is of the essence. AI agents, powered by advanced retrieval and reasoning capabilities, can revolutionize this landscape. By rapidly surfacing relevant information from vast document repositories, these agents can dramatically enhance both productivity and accuracy. In this article, we’ll walk through a practical implementation of an AI agent for legal document search, leveraging advanced retrieval techniques and large language models (LLMs).

For this project, we’ll use the Artificial Intelligence for Legal Assistance(AILA) dataset, available on Kaggle. This dataset contains thousands of court case documents from the Supreme Court of India. These documents are prime examples of the structural complexity inherent in legal texts, featuring:

  • Standardized preamble: Case numbers, dates, parties involved, and court details

  • Procedural history: A timeline of the case’s journey through the legal system.

  • Arguments of the counsels: Detailed arguments from both the petitioner and the respondent.

  • Court’s analysis and reasoning: The core of the judgment, where the court dissects the arguments and applies legal principles.

  • Final order/judgment: The court’s conclusive decision.

Here’s an example of how the case files are structured:

Heramba Brahma and Another v State of Assam
Supreme Court of India

4 November 1982
Cr.A. No. 558 of 1982 (Cr.A. No. 114 of 1975, D/- 16 November 1981 : 1982 Cri LJ NOC 127 (Gau))
The Order of the Court is as follows:
Special leave granted. Printing dispensed with. Copies of paper book used by the High Court while hearing the appeal against appellants have been furnished to us. With the consent of parties, we proceeded to hear the appeal on merits.
In Sessions Case No. 75(D) of 1974, 17 accused including accused No. 2 Heramba Brahma and accused No. 3 Amar Singh Brahma (appellants herein) were tried for having committed offences under S. 120-B and S. 302 read with S. 34 of the Indian Penal Code. The learned Sessions Judge convicted accused Nos. 1, 2, 3, 4, 6, 7, 10 and 11 for having committed offences under Section 120-B and S. 304 read with Section 34, IPC and sentenced each of them to suffer rigorous imprisonment for 5 years and to pay a fine of Rs. 500/-, in default to suffer further RI for 6 months for the offence under S. 302 read with S. 34 of the Indian Penal Code.

The case document is extensive, so I have condensed it for visual clarity.

This intricate structure, combined with the specialized legal jargon, makes automated processing a significant challenge. However, cracking this nut is crucial for building effective legal tech solutions that can democratize access to legal information and assist legal professionals in their day-to-day work.

Technical Approach

Our technical approach focuses on creating a sophisticated retrieval system that can understand the nuances of legal documents. Here’s a high-level overview of the workflow:

  1. Document Ingestion: We’ll parse the legal documents and extract case details, titles, and dates for indexing.

  2. Vectorization and Indexing: We’ll represent these chunks using both dense and sparse vectors.
    - Dense vectors (embeddings) capture the semantic meaning of the text, allowing us to find conceptually similar passages. We will use this on all the case files.
    - Sparse vectors are ideal for keyword-based matching, making them perfect for searching specific terms, such as party names or legal statutes.

  3. Hybrid Search: We’ll combine dense and sparse vectors in a hybrid search approach to leverage the strengths of both methods using Reciprocal Rank Fusion (RRF) — more on this later.

  4. Filtering: We’ll enrich our data with metadata (e.g., dates) to enable precise filtering.

  5. Response Synthesis: We’ll use a large language model (LLM) to generate coherent and contextually relevant answers based on the retrieved information.

Here’s a diagram illustrating the workflow.

Press enter or click to view the image in full size

Source: Author

Prerequisites

Let’s set up our environment.

1. Groq for LLM Access:

We’ll use Groq for fast and efficient LLM inference. You’ll need to sign up for an API key on their website, https://console.groq.com/home.

After signing up, go to https://console.groq.com/keys to obtain the API key.

2. Qdrant Setup:

We’ll use Qdrant as our vector database because of its fast hybrid vector search capabilities. The easiest way to get started is with Docker:

First, download the latest Qdrant image from Docker Hub:

docker pull qdrant/qdrant

Then, run the service:

docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant

To access the dashboard in your browser, use:

http://localhost:6333/dashboard#/collections

3. Installation Requirements:

Save the following to a requirements.txt file and install them using pip install -r requirements.txt:

Go to the repo to access the requirements file:

In the next section, let’s create a collection in our vector database.

Qdrant Collection

Now, let’s create a Qdrant collection to store our legal documents. We’ll configure it to support both dense and sparse vectors.

# # --- 2. Initialize Qdrant Client and Create Collection ---
# using qdrant docker container
client = QdrantClient(url="http://localhost:6333")
collection_name = "hybrid_search_collection"
# USE ONLY IF RUNNING FOR THE FIRST TIME
print(f"\nCreating Qdrant collection: '{collection_name}'")
client.recreate_collection(
    collection_name=collection_name,
    vectors_config={
        "dense_vector": models.VectorParams(size=1024, distance=models.Distance.COSINE)
    },
    sparse_vectors_config={
        "sparse_vector": models.SparseVectorParams(
            index=models.SparseIndexParams(on_disk=False)
        )
    }
)
print("Collection created successfully.")

This will create the collection named hybrid_search_collection

Data Load

To get the data, first clone the repository:

git clone https://github.com/sachink1729/qdrant-hybrid-agents-legal-search.git

The data can be found in the data/ folder.

Let’s look at the data and extract useful fields for our use case.

We need to store the case details, titles, and dates.

Note: Storing dates can be tricky, as we need to convert them into UTC format.

Also, it’s important to ensure that the indexes of case documents, titles, and dates match; otherwise, retrieval may fail.

For faster indexing during experimentation, we’ll work with the first 100 documents.

To do this, use the following code snippet:

# data
import os
docs= []
dates = []
path = "data/Object_casedocs/"
for file in os.listdir(path):
    if file.endswith(".txt"):
        # store the file contents in docs
        data = open(path + file).read()
        docs.append(data)
        # put 4th line of the file in dates
        dates.append(data.split("\n")[3])


print("Total number of documents: ", len(docs))

print("Removing documents without dates")
months = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"]

# if any date doesnt contain any of the month, remove that entry from dates and docs

for id, date in enumerate(dates):
    # replace month with number in dates and final format is year-month-day
    if any(month in date for month in months):
        continue
    else:
        docs.pop(id)
        dates.pop(id)

print("Total number of documents after cleaning: ", len(docs))
#convert dates into UTC timestamps example data '26 September 1973'


print("Converting dates to UTC timestamps")
# convert months to number
months_to_number = {"January": 1, "February": 2, "March": 3, "April": 4, "May": 5, "June": 6, "July": 7, "August": 8, "September": 9, "October": 10, "November": 11, "December": 12}
for id, date in enumerate(dates):
    # replace month with number in dates and final format is year-month-day
    date_split = date.strip().replace("  ",' ').split(" ")
    date_split[1] = str(months_to_number[date_split[1]])
    dates[id] = date_split[2] + "-" + date_split[1] + "-" + date_split[0]
print("Successfully converted dates to UTC timestamps")
# print(dates)


# sample 100 docs and dates
docs = docs[:100]
dates = dates[:100]

print("Sampling only first 100 documents from the clean docs for this experiment")

# get titles
titles=[]
for doc in docs:
    title = doc.split("\n")[0]
    titles.append(title)
titles[:5]

print("Stored titles")
# get metadata
metadata = []
for id, title in enumerate(titles):
    metadata.append({"case_id": "C-" + str(id+1), "date": dates[id]})

print("Stored metadata")

Which prints:

Total number of documents:  2914
Removing documents without dates
Total number of documents after cleaning:  2904
Converting dates to UTC timestamps
Successfully converted dates to UTC timestamps
Sampling only first 100 documents from the clean docs for this experiment
Stored titles
Stored metadata

Ingestion

Next, we’ll build the ingestion workflow to populate our Qdrant collection.

Embedding Models

Before we proceed, we need to ensure we have access to both our dense and sparse embedding models:

You might be wondering: why are we using sparse embeddings?

Sparse embeddings are characterized by their high dimensionality (often matching the vocabulary size) and by being mostly zeros (sparse). They are highly interpretable, as each dimension can correspond to a specific feature or word and exhibit locality, representing each word or feature independently.

Sparse embeddings are particularly useful for information retrieval, search engines, and document classification, excelling at keyword matching and identifying important terms.

To load them, use:

import torch
import numpy as np
from qdrant_client import QdrantClient, models
from sentence_transformers import SentenceTransformer
from transformers import AutoModelForMaskedLM, AutoTokenizer


# --- 1. Initialize Open Source Models ---


print("Initializing models...")
# a) Dense Vector Model (384 dimensions)
dense_model = SentenceTransformer('BAAI/bge-large-en-v1.5', device='cuda' if torch.cuda.is_available() else 'cpu')
DENSE_VECTOR_SIZE = dense_model.get_sentence_embedding_dimension()
print(f"Dense model loaded. Vector size: {DENSE_VECTOR_SIZE}")


# b) Sparse Vector Model (SPLADE)
sparse_model_id = 'naver/splade-cocondenser-ensembledistil'
sparse_tokenizer = AutoTokenizer.from_pretrained(sparse_model_id)
sparse_model = AutoModelForMaskedLM.from_pretrained(sparse_model_id)
if torch.cuda.is_available():
    sparse_model.to('cuda')
sparse_model.eval()
print("Sparse model (SPLADE) loaded.")

Next, we require a function that generates sparse vectors from the SPLADE model.

This function converts a piece of text into a SPLADE sparse vector. We use SPLADE to score how important each word in a large vocabulary is for representing the input text. It then filters out all words with a score of zero. The final output is a compact representation containing only the important words (as numerical indices) and their corresponding importance scores.

def generate_splade_sparse_vector(text: str) -> models.SparseVector:
    """Generates a SPLADE sparse vector for a given text."""
    tokens = sparse_tokenizer(text, return_tensors='pt', truncation=True, padding=True) # Add truncation/padding
    if torch.cuda.is_available():
        tokens = {k: v.to('cuda') for k, v in tokens.items()}

    with torch.no_grad():
        logits = sparse_model(**tokens).logits

    # Apply ReLU and log, then perform max pooling over the token dimension (dim=1)
    # This reduces the (batch_size, sequence_length, vocab_size) to (batch_size, vocab_size)
    # For a single text, it becomes (vocab_size,) after squeeze()
    vec = torch.log(1 + torch.relu(logits)).max(dim=1).values.squeeze()

    # Ensure vec is 1D, which it should be after the above steps for a single input text
    if vec.dim() > 1:
        raise ValueError(f"Expected a 1D tensor for vec, but got {vec.shape}")

    # Get indices and values of non-zero elements
    # .nonzero() on a 1D tensor returns a 2D tensor of shape (num_non_zeros, 1)
    # .squeeze() will correctly turn (num_non_zeros, 1) into (num_non_zeros,)
    non_zero_indices = vec.nonzero().squeeze(dim=-1) # Squeeze last dim explicitly


    # Handle the case where squeeze might return a scalar if only one non-zero
    if non_zero_indices.dim() == 0: # If it's a scalar tensor
        indices = [non_zero_indices.item()]
        values = [vec[non_zero_indices].item()]
    else:
        indices = non_zero_indices.cpu().tolist()
        values = vec[non_zero_indices].cpu().tolist()

    return models.SparseVector(indices=indices, values=values)

Example:

generate_splade_sparse_vector('legal documents')

Which will print something like:

SparseVector(indices=[2192, 2231, 2375, 2457, 2576, 2592, 2726, 3235, 3423, 3648, 3661, 4205, 4482, 4981, 5074, 5160, 5371, 5416, 5491, 5523, 6206, 6254, 6426, 6764, 6796, 7010, 7063, 7099, 7450, 7816, 8170, 8744, 9385, 9894, 11091, 12653, 15359, 18001, 18777], values=[0.038346972316503525, 0.2988216280937195, 1.2188657522201538, 0.04518304765224457, 0.09444350004196167, 0.2894461452960968, 0.12064705044031143, 0.07816712558269501, 2.5761172771453857, 0.4955641031265259, 0.322893887758255, 0.05144835636019707, 0.23672449588775635, 1.5413520336151123, 0.013036026619374752, 1.6230740547180176, 0.49325647950172424, 0.09534130245447159, 2.60306715965271, 0.7195536494255066, 0.1041899025440216, 2.2376840114593506, 0.23321685194969177, 0.717644453048706, 0.18227262794971466, 0.13698546588420868, 0.24635086953639984, 0.06411391496658325, 0.20852689445018768, 0.36267027258872986, 0.3207723796367645, 0.0248417928814888, 0.09357268363237381, 0.041070833802223206, 0.15809932351112366, 0.2472669631242752, 0.04693509265780449, 0.05302770435810089, 0.048931341618299484])

Vector Ingestion

For each document, we perform two main actions:

  • We create a dense vector from the full document text. This vector captures the overall meaning and context of the document.

  • We create a sparse vector from the document’s title. This vector focuses on identifying the most important keywords for precise matching.

Finally, we bundle the document’s ID, its metadata (such as case ID and date), and both the dense and sparse vectors together and upload everything to the database collection in a single batch. This setup enables powerful “hybrid” searches that leverage both semantic meaning and keyword matching.

# --- 3. Prepare and Upsert Data ---
print("\nProcessing and upserting data...")
points_to_upsert = []
for i, (doc, title, meta) in enumerate(zip(docs, titles, metadata)):
    # Generate dense vector from the main document content
    dense_vector = dense_model.encode(doc).tolist()

    # Generate sparse vector from the title for keyword matching
    sparse_vector = generate_splade_sparse_vector(title)

    # Create the point with named vectors and payload
    points_to_upsert.append(
        models.PointStruct(
            id=i + 1,
            vector={
                "dense_vector": dense_vector,
                "sparse_vector": sparse_vector
            },
            payload=meta
        )
    )


client.upsert(
    collection_name=collection_name,
    points=points_to_upsert,
    wait=True
)
print(f"Upserted {len(points_to_upsert)} points into the collection.")

Hybrid Retrieval

Now for the fun part: retrieving the data. We’ll create a function that performs a hybrid search with filtering.

The search function performs a hybrid search using both dense (semantic) and sparse (keyword) vector representations of the query. It then combines the results from both search types using Reciprocal Rank Fusion (RRF) to provide a more comprehensive and relevant set of documents. The function also supports date-based filtering and returns the top 3 relevant documents as a formatted string.

To read more about hybrid search using Qdrant, read: https://qdrant.tech/documentation/concepts/hybrid-queries/

def search(query, date_filter):
    """
    Run this tool to do hybrid search for a given query.
    Returns the top ~ 3 relevant documents.
    Args:
        query (str): The query to search for.
        date_filter (dict): Optional filter for date range. {start: "1973-02-08T10:49:00Z", end: "2024-01-31 10:14:31Z"} default end date is 2025-07-26 and start date is 1901-01-01
    Returns:
        Context: The context based on the query
    """
    print(f"\nPerforming hybrid search for query: '{query}'")

    if not isinstance(date_filter, dict):
        date_filter = {}
    date_filter_structured = {
            "must": {
                "key": "date",
                "range": {
                    "gt": date_filter.get("start", "1901-01-01T00:00:00Z"),
                    "gte": None,
                    "lt": None,
                    "lte": date_filter.get("end", "2025-07-26T00:00:00Z")
                }
            }
        }


    # a) Generate vectors for the query
    query_dense_vector = dense_model.encode(query).tolist()
    query_sparse_vector = generate_splade_sparse_vector(query)


    # b) Create two separate search requests
    dense_request = models.SearchRequest(
        vector={
            "name": "dense_vector",
            "vector": query_dense_vector
        },
        limit=2,
        with_payload=True,
        filter=date_filter_structured,
    )


    sparse_request = models.SearchRequest(
        vector={
            "name": "sparse_vector",
            "vector": query_sparse_vector
        },
        limit=2,
        with_payload=True,
        filter=date_filter_structured,
    )


    # c) Perform the batch search
    results = client.search_batch(
        collection_name=collection_name,
        requests=[dense_request, sparse_request]
    )


    dense_results = results[0]
    sparse_results = results[1]


    print("\n--- Dense Search Results (Semantic) ---")
    for hit in dense_results:
        print(f"ID: {hit.id}, Score: {hit.score:.4f}, Payload: {hit.payload}")


    print("\n--- Sparse Search Results (Keyword) ---")
    for hit in sparse_results:
        print(f"ID: {hit.id}, Score: {hit.score:.4f}, Payload: {hit.payload}")


    # d) Fuse the results using Reciprocal Rank Fusion (RRF)
    fused_results = reciprocal_rank_fusion([dense_results, sparse_results])


    print("\n--- Fused Hybrid Search Results (RRF) ---")
    for doc_id, score in fused_results:
        original_doc = client.retrieve(
            collection_name=collection_name,
            ids=[doc_id]
        )
        print(f"ID: {doc_id}, Fused Score: {score:.4f}, Payload: {original_doc[0].payload}")

    # return doc from docs list using index as doc_id as Documnent 1 : docs[0] \n\n Document 2 : docs[1]


    context = """"""
    count = 0
    for doc_id, score in fused_results:
        context = context + "Document " + str(count+1) + ": " + str(docs[doc_id-1]) + "\n\n"
        count = count + 1
    return context


def reciprocal_rank_fusion(search_results_list, k=60):
    fused_scores = {}
    for results in search_results_list:
        for rank, hit in enumerate(results):
            doc_id = hit.id
            if doc_id not in fused_scores:
                fused_scores[doc_id] = 0
            fused_scores[doc_id] += 1 / (k + rank)
    reranked_results = sorted(fused_scores.items(), key=lambda item: item[1], reverse=True)
    return reranked_results

Example run:

# Run the search
search("cases vs government of india", date_filter={"start": "1973-02-08T10:49:00Z", "end": "2024-01-31 10:14:31Z"})

Output:

Performing hybrid search for query: 'cases vs government of india'
--- Dense Search Results (Semantic) ---
ID: 57, Score: 0.7304, Payload: {'case_id': 'C-57', 'date': '2004-4-12'}
ID: 90, Score: 0.7122, Payload: {'case_id': 'C-90', 'date': '1987-9-18'}

--- Sparse Search Results (Keyword) ---
ID: 13, Score: 15.3612, Payload: {'case_id': 'C-13', 'date': '1994-10-7'}
ID: 57, Score: 15.1042, Payload: {'case_id': 'C-57', 'date': '2004-4-12'}
--- Fused Hybrid Search Results (RRF) ---
ID: 57, Fused Score: 0.0331, Payload: {'case_id': 'C-57', 'date': '2004-4-12'}
ID: 13, Fused Score: 0.0167, Payload: {'case_id': 'C-13', 'date': '1994-10-7'}
ID: 90, Fused Score: 0.0164, Payload: {'case_id': 'C-90', 'date': '1987-9-18'}

Search Agent

To make this system truly powerful, we’ll build a search agent that can understand natural language queries and convert them into structured search requests.

Define the Generator class:

from llama_index.llms.groq import Groq
# from llama_index.llms.openrouter import OpenRouter


class Generators:
    def __init__(self, model="llama-3.3-70b-versatile"):
        """
        Initializes the Generators class with a specified language model.
        Args:
            model (str): The name of the model to use. Defaults to "llama-3.3-70b-versatile".
        """
        self.llm = Groq(model=model, api_key="groq_api_key", temperature=0.1)


    def get_llm(self):
        """
        Returns the currently initialized language model (LLM) instance.

        :return: The language model instance used by the Generators class.
        """
        return self.llm

Define the Search Agent:

from llama_index.core.agent import FunctionCallingAgent
from llama_index.core.tools import FunctionTool

search_tool = FunctionTool.from_defaults(fn=search)


class AgentController:
    def __init__(self):        
        self.llm = Generators("llama-3.1-8b-instant").get_llm()
        self.system_prompt = """
You are a legal agentic AI assistant.
Your task is to answer questions about legal documents.
The documents are stored in a Qdrant vector database.
You will use hybrid search to find relevant documents and then use the retrieved documents to answer the questions.
Answer using the documents and try to find the answer in the documents.
you need to execute the search tool to get the relevant documents. Don't make mispellings.
In any circumstances do no return half baked function calls to the user, if you cannot invoke the function, try again.
dont return the function call, only return the textual response.
"""
        self.agent = self.get_agent()


    def get_agent(self):
        agent = FunctionCallingAgent.from_tools([search_tool],
                                        llm=self.llm,verbose=True,
                                        system_prompt=self.system_prompt)

        return agent

    def chat(self, query: str):
        response_obj = self.agent.chat(query)
        return response_obj.response

Results and Next Steps

Let’s run some queries to test the power of this agent.

agent = AgentController()
agent.chat("what is the result of S. Velayudhan v Krishnan And Ors and when did it occur")

Output:

'The result of S. Velayudhan v Krishnan And Ors is that the High Court acquitted the respondents, and the Supreme Court dismissed the appeal.\n\nThe case occurred on 1 April 1998.'

With this system, you can now ask complex questions and get accurate, context-aware answers. For example:

  • “What is the precedent for intellectual property rights in software?”

  • “Find cases related to environmental law from the last 5 years.”

  • “Summarize the arguments in the case of Kesavananda Bharati v. State of Kerala.”

For higher volume use-cases, you could explore:

  • Optimizing the Qdrant deployment: Scale up your Qdrant cluster for better performance.

  • Fine-tuning your embedding model: Adapt your embedding model on a legal-specific dataset to improve retrieval accuracy.

  • Implementing a more advanced RAG pipeline: Incorporate techniques like query rewriting and document reranking to further enhance the quality of retrieved results.

Conclusion & Resources

In this article, we’ve built a powerful AI agent for legal document search and reasoning. We’ve seen how to leverage dense and sparse vectors, hybrid search, and LLMs to create a system that can understand the complexities of legal language and provide accurate, insightful answers. This is just the beginning of what’s possible with AI in the legal domain.

Key Takeaways

  • Hybrid search is a powerful technique for combining the strengths of semantic and keyword-based search.

  • Metadata filtering is crucial for narrowing down search results in large document collections.

  • LLMs can be used to synthesize coherent and contextually relevant answers from retrieved information.

GitHub

For the full code reference, please take a look at my repo:

References and Further Reading

10
Subscribe to my newsletter

Read articles from Sachin Khandewal directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Sachin Khandewal
Sachin Khandewal

I write about the latest in AI, NLP & Data Science. Connect on https://www.linkedin.com/in/sachink1729/