Integrating LlamaIndex and Qdrant Similarity Search for Patient Record Retrieval

Akriti UpadhyayAkriti Upadhyay
13 min read

Introduction

The medical field is currently experiencing a remarkable surge in data, a result of the progress in medical technologies, digital health records (EHR), and wearable health devices. The ability to effectively manage and analyze this intricate and varied data is vital for providing customized healthcare, advancing medical research, and enhancing patient health outcomes. Vector databases, which are specifically tailored for the efficient handling and storage of multi-dimensional data, are gaining recognition as an effective tool for a range of healthcare uses.

For example, currently, past patient record data is rarely leveraged by medical professionals in real-time, even though they are a treasure trove of information and can assist in diagnosis. What if we could build systems where doctors, nurses and caregivers could quickly access past patient records using just natural language inputs? What if historical test results could help generate recommendations for new treatment options?

This is the potential of AI in healthcare. From personalized diagnostics to targeted therapies, healthcare is on the cusp of becoming a whole lot smarter. In this article, I will demonstrate the capabilities and potential applications of vector databases in the healthcare sector.

Why Vector Search and LLMs?

Vector Search enables rapid exploration of large datasets by transforming data into vectors within a high-dimensional space, where similar items are clustered closely. This approach facilitates efficient retrieval of relevant information, even from vast datasets. LLMs, on the other hand, are AI models trained on diverse internet texts, capable of comprehending and generating human-like text based on inputs.

When combined, Vector Search and LLMs streamline the storage and search process for patient records. Each record undergoes embedding and converts it into a vector representing its semantic meaning, which is then stored in a database. During retrieval, a doctor inputs a search query, also converted into a vector, and the Vector Search scans the database to locate records closest to the query vector, which enables semantic search based on meaning rather than exact keywords.

Subsequently, retrieved records are processed through an LLM, which generates a human-readable summary highlighting the most relevant information for the doctor. This integration empowers doctors to efficiently access and interpret patient records, which facilitates better-informed decisions and personalized care. Ultimately, this approach enhances patient outcomes by enabling healthcare professionals to provide tailored recommendations based on comprehensive data analysis.

Let’s see how this is going to work with the help of the Retrieval Augmented Generation (RAG) technique incorporated with LlamaIndex and Qdrant Vector DB.

RAG Architecture

Retrieval Augmented Generation (RAG) enhances the effectiveness of large language model applications by incorporating custom data. By retrieving relevant data or documents related to a query or task, RAG provides context for LLMs, which improves their accuracy and relevance.

The challenges addressed by RAG include LLMs' limited knowledge, which is beyond their training data, and the necessity for AI applications to leverage custom data for specific responses. RAG tackles these issues by integrating external data into the LLM's prompt, which allows it to generate more relevant and accurate responses without the need for extensive retraining or fine-tuning.

RAG benefits by reducing inaccuracies or hallucinations in LLM outputs by delivering domain-specific and relevant answers, and by offering an efficient and cost-effective solution for customizing LLMs with external data.

Incorporating RAG with LlamaIndex

LlamaIndex is a fantastic tool in the domain of Large Language Model Orchestration and Deployment and particularly focuses on Data Storage and Management. Its standout features include Data Agents, which execute actions based on natural language inputs instead of generating responses, and it can deliver structured results by leveraging LLMs.

Moreover, LlamaIndex offers composability by allowing the composition of indexes from other indexes. It also has seamless integration with existing technological platforms like LangChain, Flask, and Docker, and customization options such as seeding tree construction with custom summary prompts.

Qdrant DB: A High-Performance Vector Similarity Search Technology

Qdrant acts both as a vector database and similarity search engine and has a cloud-hosted platform that helps find the nearest high-dimensional vectors efficiently. It harnesses embeddings or neural network encoders to help developers build comprehensive applications that involve tasks like matching, searching, recommending, and beyond. It also utilizes a unique custom adaptation of the HNSW algorithm for Approximate Nearest Neighbor Search. It allows additional payload associated with vectors and enables filtering results based on payload values.

Qdrant supports a wide array of data types and query conditions for vector payloads by encompassing string matching, numerical ranges, geo-locations, and more. It is built to be cloud-native and horizontally scalable. Qdrant maximizes resource utilization with dynamic query planning and payload data indexing which is implemented entirely in Rust language.

The HNSW Algorithm

There are many algorithms for approximate nearest neighbor search, such as locality-sensitive hashing and product quantization, which have demonstrated superior performance when handling high-dimensional datasets.

However, these algorithms, often referred to as proximity graph-ANN algorithms, suffer from significant performance degradation while dealing with low-dimensional or clustered data.

In response to this challenge, the HNSW algorithm has been developed as a fully graph-based incremental approximate nearest neighbor solution.

The HNSW algorithm builds upon the hierarchical graph structure of the NSW algorithm. While the NSW algorithm struggles with high-dimensional data, its hierarchical counterpart excels in this domain by offering optimal performance. The core concept of the HNSW algorithm involves organizing links based on their length scales across multiple layers. This results in an incremental multi-layer structure that comprises hierarchical sets of proximity graphs, each representing nested subsets of the stored elements within the NSW. The layer in which an element resides is chosen randomly, which follows an exponentially decaying probability distribution.

Building Medical Search System with LlamaIndex

To get started with utilizing RAG for building a medical search system, let’s create a synthetic dataset first.

Generating Synthetic Patient Data

As this is going to be synthetic data, let’s install a dependency named ‘Faker’.

!pip install -q faker

Create a CSV synthetic dataset.

import random
import pandas as pd
from faker import Faker
fake = Faker()


medical_condition_data = {
    'Hypertension': {
        'medications': ['Lisinopril', 'Amlodipine', 'Losartan', 'Hydrochlorothiazide'],
        'cholesterol_range': (100, 200),
        'glucose_range': (70, 110),
        'blood_pressure_range': (140, 90)  # systolic/diastolic
    },
    'Diabetes': {
        'medications': ['Metformin', 'Insulin', 'Glipizide', 'Sitagliptin'],
        'cholesterol_range': (100, 200),
        'glucose_range': (130, 200),
        'blood_pressure_range': (130, 80)
    },
   
}

def generate_patient_records(num_patients):
    patient_records = []
    for in range(numpatients):
        patient_id = fake.uuid4()
        name = fake.name()
        age = random.randint(18, 90)
        gender = random.choice(['Male', 'Female'])
        blood_type = random.choice(['A+', 'B+', 'AB+', 'O+', 'A-', 'B-', 'AB-', 'O-'])
        medical_condition = random.choice(list(medical_condition_data.keys()))
        patient_records.append({
            'Patient_ID': patient_id,
            'Name': name,
            'Age': age,
            'Gender': gender,
            'Blood_Type': blood_type,
            'Medical_Condition': medical_condition
        })
    return patient_records

def generate_test_results(num_patients):
    test_results = []
    for i in range(num_patients):
        patient_id = fake.uuid4()
        medical_condition = random.choice(list(medical_condition_data.keys()))
        cholesterol_range = medical_condition_data[medical_condition]['cholesterol_range']
        glucose_range = medical_condition_data[medical_condition]['glucose_range']
        blood_pressure_range = medical_condition_data[medical_condition]['blood_pressure_range']
        cholesterol = random.uniform(cholesterol_range[0], cholesterol_range[1])
        glucose = random.uniform(glucose_range[0], glucose_range[1])
        systolic = random.randint(blood_pressure_range[1], blood_pressure_range[0])
        diastolic = random.randint(60, systolic) 
        blood_pressure = f"{systolic}/{diastolic}"
        test_results.append({
            'Patient_ID': patient_id,
            'Medical_Condition': medical_condition,
            'Cholesterol': cholesterol,
            'Glucose': glucose,
            'Blood_Pressure': blood_pressure
        })
    return test_results

def generate_prescriptions(num_patients):
    prescriptions = []
    for i in range(num_patients):
        patient_id = fake.uuid4()
        medical_condition = random.choice(list(medical_condition_data.keys()))
        medication = random.choice(medical_condition_data[medical_condition]['medications'])
        dosage = f"{random.randint(1, 3)} pills"
        duration = f"{random.randint(1, 30)} days"
        prescriptions.append({
            'Patient_ID': patient_id,
            'Medical_Condition': medical_condition,
            'Medication': medication,
            'Dosage': dosage,
            'Duration': duration
        })
    return prescriptions

def generate_medical_history_dataset(num_patients):
    patient_records = generate_patient_records(num_patients)
    test_results = generate_test_results(num_patients)
    prescriptions = generate_prescriptions(num_patients)

   
    medical_history = []
    for i in range(num_patients):
        patient_id = patient_records[i]['Patient_ID']
        record = {**patient_records[i], test_results[i], prescriptions[i]}
        medical_history.append(record)

    return pd.DataFrame(medical_history)


medical_history_dataset = generate_medical_history_dataset(100)


medical_history_dataset.to_csv('medical_history_dataset.csv', index=False)

print("Synthetic medical history dataset created and saved to 'medical_history_dataset.csv'")

After the synthetic dataset is created, it will print that it’s created, and you can see it in your directory.

Synthetic medical history dataset created and saved to 'medical_history_dataset.csv'

Let’s see what our data looks like!

import pandas as pd
df = pd.read_csv("/content/medical_history_dataset.csv")
df.head()

The data looks fine; let’s convert it into a PDF format. While we could use the CSV format as is, PDF is the format in which many documents are stored in legacy systems – so using PDF as a base is a good way to build it for real-life scenarios.

We will load the PDF data using LlamaIndex SimpleDirectoryReader.

To convert the CSV dataset into a PDF document, install the following dependency.

!pip install -q reportlab

from reportlab.lib.pagesizes import letter
from reportlab.platypus import SimpleDocTemplate, Paragraph
from reportlab.lib.styles import ParagraphStyle
import pandas as pd

def create_pdf_from_dataframe(dataframe, output_file):
    doc = SimpleDocTemplate(output_file, pagesize=letter)
    styles = ParagraphStyle(name='Normal', fontSize=12)

   
    content = []

   
    for index, row in dataframe.iterrows():
        row_content = []
        for column_name, value in row.items():
            row_content.append(f"{column_name}: {value}")

       
        content.append(Paragraph(", ".join(row_content), styles))
        content.append(Paragraph("<br/><br/>", styles)) 

    doc.build(content)


create_pdf_from_dataframe(df, "output.pdf")

Make a directory and move the output pdf document into the directory.

import os

# Check current working directory
print(os.getcwd())

# Create 'static/' directory
if not os.path.exists('static/'):
    os.makedirs('static/')

!mv "/content/output.pdf" "static/"

Now that the dataset is ready, let’s move to building an RAG using this dataset. Install the important dependencies.

!pip install -q llama-index transformers
!pip install -q llama-cpp-python
!pip install -q qdrant-client
!pip install -q llama_hub

Load the data into SimpleDirectoryReader.

from llama_index import SimpleDirectoryReader

Using the Sentence splitter, split the documents into small chunks. Maintain the relationship between the Source document index, so that it helps in injecting document metadata.

from llama_index.node_parser.text import SentenceSplitter
text_parser = SentenceSplitter(
    chunk_size=1024,
)

Now, manually construct nodes from text chunks.

from llama_index.schema import TextNode

nodes = []
for idx, text_chunk in enumerate(text_chunks):
    node = TextNode(
        text=text_chunk,
    )
    src_doc = documents[doc_idxs[idx]]
    node.metadata = src_doc.metadata
    nodes.append(node)

Now generate embeddings for each node using the Hugging Face embeddings model.

from llama_index.embeddings import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en")

Now, it’s time to build a model using Llama CPP. Here, we’ll use the GGUF Llama 2 13B model. Using Llama CPP, we’ll download the model with the help of the model URL.

from llama_index.llms import LlamaCPP

model_url = "https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q4_0.gguf"

llm = LlamaCPP(
    model_url=model_url,
    model_path=None,
    temperature=0.1,
    max_new_tokens=256,
    context_window=3900,
    generate_kwargs={},
    model_kwargs={"n_gpu_layers": 1},
    verbose=True,
)

Let’s define Service context. It consists of the LLM model as well as the embedding model.

from llama_index import ServiceContext

service_context = ServiceContext.from_defaults(
    llm=llm, embed_model=embed_model
)

Now, create a vector store collection using Qdrant DB, and create a storage context for this vector store.

import qdrant_client
from llama_index.vector_stores.qdrant import QdrantVectorStore
client = qdrant_client.QdrantClient(location=":memory:")

from llama_index.storage.storage_context import StorageContext
from llama_index import (
    VectorStoreIndex,
    ServiceContext,
    SimpleDirectoryReader,
)

vector_store = QdrantVectorStore(client=client, collection_name="my_collection")
storage_context = StorageContext.from_defaults(vector_store=vector_store)

Now, create an index of the vector store where the service context, storage context, and the documents are stored.

index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context, service_context=service_context
)

Add the created node into the vector store.

vector_store.add(nodes)

To build a retrieval pipeline, generate a query embedding using a query string.

query_str = "Can you tell me about the key concepts for safety finetuning"

query_embedding = embed_model.get_query_embedding(query_str)

Then, construct a Vector Store query and query the vector database.

from llama_index.vector_stores import VectorStoreQuery

query_mode = "default"
# query_mode = "sparse"
# query_mode = "hybrid"

vector_store_query = VectorStoreQuery(
    query_embedding=query_embedding, similarity_top_k=2, mode=query_mode
)

query_result = vector_store.query(vector_store_query)
print(query_result.nodes[0].get_content())

You’ll get the following results:

Patient_ID: 58b70a59-eb30-4caa-b4b5-7871321515dd, Name: Kimberly Brown, Age:
32, Gender: Female, Blood_Type: O-, Medical_Condition: Diabetes, Cholesterol:
161.7899842312819, Glucose: 107.778261077734, Blood_Pressure: 100/81,
Medication: Sitagliptin, Dosage: 2 pills, Duration: 30 days
.......
.......
.......
Patient_ID: d4c865a0-d695-4721-bed9-9d47f5393bf4, Name: Michael Rowe, Age: 56,
Gender: Female, Blood_Type: O+, Medical_Condition: Hypertension, Cholesterol:
121.20389761494744, Glucose: 75.29441955653576, Blood_Pressure: 90/80,
Medication: Hydrochlorothiazide, Dosage: 2 pills, Duration: 22 days
Patient_ID: b91f4f27-6a6a-4005-8d3b-4c3b53efe57b, Name: James Wright, Age: 54,
Gender: Female, Blood_Type: A-, Medical_Condition: Diabetes, Cholesterol:
192.42692819824364, Glucose: 92.35717875040676, Blood_Pressure: 104/101,
Medication: Metformin, Dosage: 3 pills, Duration: 13 days

Now, parse the results into a set of nodes.

from llama_index.schema import NodeWithScore
from typing import Optional

nodes_with_scores = []
for index, node in enumerate(query_result.nodes):
    score: Optional[float] = None
    if query_result.similarities is not None:
        score = query_result.similarities[index]
    nodes_with_scores.append(NodeWithScore(node=node, score=score))

Then, put them into a retriever.

from llama_index import QueryBundle
from llama_index.retrievers import BaseRetriever
from typing import Any, List


class VectorDBRetriever(BaseRetriever):
    """Retriever over a qdrant vector store."""

    def init(
        self,
        vector_store: QdrantVectorStore,
        embed_model: Any,
        query_mode: str = "default",
        similarity_top_k: int = 2,
    ) -> None:
        """Init params."""
        self._vector_store = vector_store
        self._embed_model = embed_model
        self._query_mode = query_mode
        self._similarity_top_k = similarity_top_k
        super().init()

    def retrieve(self, querybundle: QueryBundle) -> List[NodeWithScore]:
        """Retrieve."""
        query_embedding = embed_model.get_query_embedding(
            query_bundle.query_str
        )
        vector_store_query = VectorStoreQuery(
            query_embedding=query_embedding,
            similarity_top_k=self._similarity_top_k,
            mode=self._query_mode,
        )
        query_result = vector_store.query(vector_store_query)

        nodes_with_scores = []
        for index, node in enumerate(query_result.nodes):
            score: Optional[float] = None
            if query_result.similarities is not None:
                score = query_result.similarities[index]
            nodes_with_scores.append(NodeWithScore(node=node, score=score))

        return nodes_with_scores

Create a Retriever Query Engine, and plug the above into it to synthesize the response.

from llama_index.query_engine import RetrieverQueryEngine

query_engine = RetrieverQueryEngine.from_args(
    retriever, service_context=service_context
)

Now, it’s time to query the Retriever Query Engine and see the response.

query_str = "Write prescription for Diabetes"

response = query_engine.query(query_str)

It’ll take a significant amount of time; be patient, and you will get the response:

Metformin, Dosage: 3 pills, Duration: 12 days

Please note that the answer is based on the context information provided and not on any prior knowledge or real-world data.

Let’s see the source node of this response.

print(response.source_nodes[0].get_content())

Following is the Source Node:

Patient_ID: ea9121cf-22d3-4053-9597-32816a087d6b, Name: Tracy Mendez, Age:
41, Gender: Male, Blood_Type: B+, Medical_Condition: Diabetes, Cholesterol:
155.00542923679996, Glucose: 142.74790733131314, Blood_Pressure: 95/83,
Medication: Metformin, Dosage: 2 pills, Duration: 30 days
Patient_ID: f970b6c9-2914-4374-8fce-985a9c3ad5c1, Name: Victor Burns, Age: 34,
Gender: Male, Blood_Type: AB+, Medical_Condition: Diabetes, Cholesterol:
123.4148196061812, Glucose: 135.01188456651374, Blood_Pressure: 108/82,
Medication: Insulin, Dosage: 1 pills, Duration: 26 days
Patient_ID: 5e051e8c-e507-44f1-b177-686222bc8402, Name: Edward Webb, Age:
25, Gender: Female, Blood_Type: O+, Medical_Condition: Diabetes, Cholesterol:
113.6267476444252, Glucose: 88.16188232757526, Blood_Pressure: 93/75,
Medication: Glipizide, Dosage: 3 pills, Duration: 5 days
Patient_ID: 47d9d8d3-870b-4084-b81b-2377742a0c45, Name: Yvonne Mosley, Age:
39, Gender: Female, Blood_Type: AB+, Medical_Condition: Hypertension,
Cholesterol: 110.03196972436749, Glucose: 83.9354746313523, Blood_Pressure:
137/117, Medication: Lisinopril, Dosage: 1 pills, Duration: 26 days
Patient_ID: 93d04076-5219-4f07-8d7c-26f9512864c9, Name: Jeffrey Solis, Age: 53,
Gender: Female, Blood_Type: O-, Medical_Condition: Hypertension, Cholesterol:
102.56021178266424, Glucose: 148.7046530174272, Blood_Pressure: 97/81,
Medication: Hydrochlorothiazide, Dosage: 3 pills, Duration: 5 days
Patient_ID: 144963e2-a9d8-4b9a-a6b8-87da14429a98, Name: Sabrina Figueroa,
Age: 66, Gender: Female, Blood_Type: B+, Medical_Condition: Hypertension,
Cholesterol: 163.80805126265315, Glucose: 98.1830736526342, Blood_Pressure:
92/64, Medication: Lisinopril, Dosage: 3 pills, Duration: 3 days
Patient_ID: 8e6f7b14-f84e-4415-bfa0-4b90bb998474, Name: Patricia Kline, Age: 34,
Gender: Female, Blood_Type: O-, Medical_Condition: Diabetes, Cholesterol:
175.96974315251947, Glucose: 93.85396377117868, Blood_Pressure: 112/99,
Medication: Metformin, Dosage: 1 pills, Duration: 12 days
Patient_ID: c20d1263-7757-4b36-94ef-92bc2d10cd88, Name: Michael Wilcox, Age:
18, Gender: Female, Blood_Type: A-, Medical_Condition: Diabetes, Cholesterol:
176.40579895801716, Glucose: 138.79382669587685, Blood_Pressure: 113/64,
Medication: Metformin, Dosage: 3 pills, Duration: 6 days
Patient_ID: b9793557-9e83-493b-a9aa-a248a1ebb222, Name: Brandon Tucker, Age:
29, Gender: Female, Blood_Type: O+, Medical_Condition: Diabetes, Cholesterol:
126.00771705145566, Glucose: 155.00180031226188, Blood_Pressure: 101/67,
Medication: Metformin, Dosage: 3 pills, Duration: 3 days

Note: The medical dataset used here is synthetic and fake. It has no relation to real medicine dosage or duration.

Conclusion

Leveraging the RAG architecture with tools like LlamaIndex, Llama CPP, and Qdrant Vector Store has been a fascinating journey. Through the utilization of Qdrant's sophisticated HNSW algorithm, searching through patients' medical histories and records has become effortless and rapid. This integration highlights the potential of innovative technologies to enhance healthcare processes, which can ultimately lead to improved patient care and outcomes.

This article was originally posted here: https://medium.com/@akriti.upadhyay/integrating-llamaindex-and-qdrant-similarity-search-for-patient-record-retrieval-7090e77b971e

Thanks for reading!

0
Subscribe to my newsletter

Read articles from Akriti Upadhyay directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Akriti Upadhyay
Akriti Upadhyay