Unlocking the Power of Hybrid RAG Systems: A Deep Dive into LlamaCloud's City Query Solution

Vedant PandyaVedant Pandya
8 min read

Project Overview

This project implements a hybrid Retrieval-Augmented Generation (RAG) system that combines structured database queries with unstructured document retrieval. The application allows users to ask natural language questions about US cities and receive accurate, contextual responses. The system uses a dual-pipeline approach:

  1. Structured Data Pipeline: SQL database containing city statistics (population, state)

  2. Unstructured Data Pipeline: LlamaCloud for document retrieval of general city information

The application is built with a Streamlit frontend that provides an intuitive chat interface, allowing users to interact with the system conversationally.


Architecture Details

Component Breakdown

1. User Interface Layer

  • Technology: Streamlit

  • Purpose: Provides chat interface, API key configuration, and displays responses

  • Key Features:

    • Chat history tracking using session state

    • Sidebar configuration for API keys and LlamaCloud settings

    • Real-time response generation with loading indicators

2. Structured Data Layer

  • Technology: SQLite (in-memory), SQLAlchemy

  • Purpose: Stores and queries factual city information

  • Components:

    • setup_database(): Initializes the database with city data

    • CityQueryEngine: Custom class for direct SQL querying

    • NLSQLTableQueryEngine: LlamaIndex component for natural language to SQL conversion

3. Unstructured Data Layer

  • Technology: LlamaCloud

  • Purpose: Stores and retrieves document-based city information

  • Components:

    • LlamaCloudIndex: Connects to pre-indexed documents in LlamaCloud

    • Vector query engine for semantic retrieval

4. LLM Integration Layer

  • Technology: Gemini 2.0 Flash

  • Purpose: Natural language understanding and response generation

  • Components:

    • Text generation model for synthesizing responses

    • Embedding model for vector search (when needed)

5. Query Orchestration Layer

  • Purpose: Routes queries to appropriate backends based on content

  • Logic:

    • Population/statistic queries โ†’ SQL pipeline

    • General information queries โ†’ LlamaCloud pipeline


Data Flow

  1. Input Phase:

    • User submits query through Streamlit chat interface

    • Query is added to chat history

    • System analyzes query content

  2. Processing Phase:

    • If query contains population keywords:

      • First attempt direct SQL via pattern matching

      • Fall back to NLSQLTableQueryEngine if needed

    • If query is about general information:

      • Route to LlamaCloud vector search
    • Response is generated using retrieved context

  3. Output Phase:

    • Response displayed in chat interface

    • Source of information (Database or LlamaCloud) is displayed

    • Response added to chat history


Implementation Details

Database Schema

  • The application uses a simple SQLite database with a single table:
CREATE TABLE city_stats (
    city_name VARCHAR(16) PRIMARY KEY,
    population INTEGER,
    state VARCHAR(16) NOT NULL
);

Data is populated with information about six major US cities:

  • New York City (New York)

  • Los Angeles (California)

  • Chicago (Illinois)

  • Houston (Texas)

  • Miami (Florida)

  • Seattle (Washington)


SQL Query Engine

The custom CityQueryEngine class provides specialized query capabilities:

  1. Direct Execution: execute_query() method runs raw SQL and formats results

  2. Specialized Queries:

    • query_highest_population(): Finds city with highest population

    • query_lowest_population(): Finds city with lowest population

    • query_all_cities_ranked(): Orders all cities by population

    • query_by_state(): Filters cities by state name

  3. Natural Language Processing:

    • process_population_query(): Uses regular expressions and keyword matching to convert natural language to appropriate SQL queries

    • Pattern matching for terms like "highest", "lowest", "in [state]", etc.


LlamaCloud Integration

The application connects to LlamaCloud for document retrieval:

  1. Configuration:

    • API key

    • Organization ID

    • Project name

    • Index name

  2. Index Connection:

    • Creates LlamaCloudIndex object

    • Configures vector query engine

  3. Query Execution:

    • Sends natural language query directly to LlamaCloud

    • Retrieves relevant document chunks

    • Uses context for response generation


Model Configuration

The application uses Gemini 2.0 Flash for both understanding and generation:

  1. LLM Model:

     gemini_model = Gemini(
         model="models/gemini-2.0-flash",
         api_key=st.session_state.GOOGLE_API_KEY,
         temperature=0.2
     )
    
  2. Embedding Model:

     gemini_embed_model = GeminiEmbedding(
         model_name="models/embedding-001",
         api_key=st.session_state.GOOGLE_API_KEY
     )
    
  3. Global Settings:

     Settings.llm = gemini_model
     Settings.embed_model = gemini_embed_model
    

Query Processing Logic

The core logic that determines how to handle each query:

# Check if this is a population query
if any(word in prompt.lower() for word in ['population', 'populous', 'big city', 'large city', 'small city']):
    # Try direct SQL approach first
    result = city_query_engine.process_population_query(prompt)
    if result:
        message_placeholder.markdown(f"{result}\n\n*Source: Database (Direct SQL)*")
    else:
        # Fall back to LLM-based SQL
        response = sql_query_engine.query(prompt)
        message_placeholder.markdown(f"{str(response)}\n\n*Source: Database*")
elif have_llamacloud:
    # For general information, use LlamaCloud
    response = vector_query_engine.query(prompt)
    message_placeholder.markdown(f"{str(response)}\n\n*Source: LlamaCloud*")
else:
    # If neither available
    message_placeholder.markdown("I'm unable to answer that question with the current configuration.")

Error Handling and Edge Cases

The application includes several error handling mechanisms:

  1. API Key Validation:

    • Checks if Google API key is present

    • Displays warning if missing

  2. LlamaCloud Connection:

    • Try/except block for connection attempts

    • Fallback to SQL-only mode if connection fails

    • UI indicators for connection status

  3. Query Processing:

    • Handles failed queries gracefully

    • Provides feedback on processing errors

    • Suggests query reformulation

  4. Empty Results:

    • Handles case where no cities match query criteria

    • Returns informative message


Installation and Dependencies

Required Packages

streamlit==1.31.0
llama-index==0.10.0
llama-index-llms-gemini==0.1.3
llama-index-embeddings-gemini==0.1.3
llama-index-indices-managed-llama-cloud==0.1.0
llama-index-core==0.10.0
sqlalchemy==2.0.27
pandas==2.1.4

Environment Setup

  1. Virtual Environment:

     python -m venv venv
     source venv/bin/activate  # On Windows: venv\Scripts\activate
    
  2. Package Installation:

     pip install -r requirements.txt
    
  3. API Keys:

    • Google API key for Gemini

    • LlamaCloud API key, organization ID, project, and index name

Running the Application

  1. Start Application:

     streamlit run app.py
    
  2. Configuration:

    • Enter API keys in sidebar

    • Confirm LlamaCloud connection status

  3. Usage:

    • Ask questions in the chat input

    • View responses with source attribution

    • Chat history is maintained during session


Example Queries and Responses

Structured Data Queries

  1. Query: "What is the population of New York City?" Response: "New York City has a population of 8,336,000 people and is located in New York." Source: Database (Direct SQL)

  2. Query: "Which city has the highest population?" Response: "New York City has a population of 8,336,000 people and is located in New York." Source: Database (Direct SQL)

  3. Query: "List all cities in California" Response: "City information:

    • Los Angeles: 3,822,000 people in California" Source: Database (Direct SQL)

Unstructured Data Queries

  1. Query: "Tell me about the history of Chicago" Response: [Detailed information about Chicago's history, retrieved from documents] Source: LlamaCloud

  2. Query: "What are the main attractions in Miami?" Response: [Information about Miami attractions, retrieved from documents] Source: LlamaCloud


Advanced Customization

Adding More Cities

To expand the database with additional cities:

# Add new city data
additional_rows = [
    {"city_name": "San Francisco", "population": 874961, "state": "California"},
    {"city_name": "Boston", "population": 675647, "state": "Massachusetts"},
    # Add more cities as needed
]

# Insert into database
for row in additional_rows:
    stmt = insert(city_stats_table).values(**row)
    with engine.begin() as connection:
        connection.execute(stmt)

Supporting Additional Query Types

To enhance query capabilities, extend the CityQueryEngine class:

def query_by_population_range(self, min_pop, max_pop):
    """Query cities within a population range"""
    query = f"SELECT city_name, population, state FROM city_stats WHERE population BETWEEN {min_pop} AND {max_pop} ORDER BY population DESC"
    return self.execute_query(query)

def process_population_query(self, query_text):
    # Add pattern matching for population range
    range_match = re.search(r"between\s+(\d[\d,]*)\s+and\s+(\d[\d,]*)", query_lower)
    if range_match:
        min_pop = int(range_match.group(1).replace(',', ''))
        max_pop = int(range_match.group(2).replace(',', ''))
        return self.query_by_population_range(min_pop, max_pop)

Enhancing LlamaCloud Document Retrieval

To improve document retrieval settings:

vector_query_engine = index.as_query_engine(
    similarity_top_k=3,  # Retrieve more documents
    response_mode="tree_summarize",  # Use tree summarization for better responses
    streaming=True  # Enable streaming responses
)

Troubleshooting

Common Issues and Solutions

  1. API Key Errors:

    • Error: "Error: Invalid API key"

    • Solution: Verify Google API key in the sidebar or environment variables

  2. LlamaCloud Connection Failures:

    • Error: "Error connecting to LlamaCloud"

    • Solution: Check organization ID, project name, and index name

  3. SQL Engine Errors:

    • Error: "Error setting up SQL query engine"

    • Solution: Ensure SQLAlchemy is properly installed and database setup is correct

  4. No Response to Queries:

    • Issue: System doesn't respond to certain questions

    • Solution: Check query routing logic and ensure appropriate engine is available


Performance Optimization

  1. Query Caching:

    • Implement Streamlit caching for repeated queries

    • Add @st.cache_data decorator to query functions

  2. Model Temperature:

    • Lower temperature (currently 0.2) for more deterministic responses

    • Increase for more creative but potentially less accurate answers

  3. SQL Query Optimization:

    • Use prepared statements for frequent queries

    • Add indexes for larger datasets


Security Considerations

  1. API Key Management:

    • Use environment variables instead of session state for production

    • Implement key rotation and expiration policies

  2. Input Validation:

    • Sanitize user input before processing

    • Validate SQL queries to prevent injection

  3. Data Privacy:

    • Consider data retention policies for chat history

    • Implement user authentication for sensitive data


Future Enhancements

  1. Expanded Database:

    • Add more cities and additional attributes (GDP, crime rate, etc.)

    • Support for historical population data

  2. Advanced NLP:

    • Implement intent classification for better query routing

    • Add support for multi-turn conversations with context awareness

  3. UI Improvements:

    • Add visualization for city comparison

    • Implement map view for geographical context

  4. Integration Options:

    • Support for additional vector databases (Qdrant, Pinecone)

    • Option to use alternative LLMs (locally via Ollama or through APIs)

  5. Evaluation Framework:

    • Implement CometML's Opik for tracing and observability

    • Add metrics for answer quality and retrieval relevance


Conclusion

This LlamaCloud RAG Demo showcases an effective approach to combining structured and unstructured data for comprehensive question answering. The hybrid architecture leverages the strengths of both SQL databases (for precise factual queries) and document retrieval (for nuanced, contextual information).

Key takeaways from this implementation:

  1. Intelligent Query Routing: The system automatically determines the best source for answering each question

  2. Streamlined User Experience: Simple chat interface hides the complexity of the underlying systems

  3. Extensible Architecture: Can be expanded with additional data sources and query capabilities

  4. Production-Ready Components: Uses industry-standard tools like LlamaIndex, SQLAlchemy, and Gemini

By effectively combining these technologies, the application demonstrates a powerful approach to building knowledge-intensive applications that can handle both structured and unstructured information within a unified interface.

0
Subscribe to my newsletter

Read articles from Vedant Pandya directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Vedant Pandya
Vedant Pandya

Hey there!๐Ÿ‘‹ I'm Vedant Pandya (He/Him) โ€“ a passionate explorer at the intersection of technology and innovation. With a robust background in Machine Learning, Generative Artificial Intelligence (Gen AI), Data Science, and Cloud Computing (both AWS and GCP), I've immersed myself in the dynamic realm of Industry 4.0 for over 4 years. As a trailblazing Machine Learning Engineer and Data Scientist, I thrive on translating complex concepts into tangible solutions. My journey spans diverse sectors, from consulting to industry to tech, and I've championed cutting-edge open-source projects from inception to reality. An advocate of continuous learning and growth, I'm deeply committed to fostering development and mentoring. I spearhead impactful learning initiatives within workplaces and academic institutions, empowering individuals to exceed their perceived limits. Certified by Google Cloud in Machine Learning and Data Science (MOOC - @Coursera), I'm also honored to be a Google Women TechMaker. I channel my insights as a content creator and blogger, shedding light on intricate tech nuances. My academic prowess shines with a Bachelor's degree in Information Technology, marked by distinction. Beyond the professional realm, I carry the pride of being raised by a single parent, instilled with values of dignity and resilience. Expertise: ๐Ÿš€ Industry 4.0 Visionary ๐Ÿ” NLP & Computer Vision Aficionado / Virtuoso โ˜๏ธ Google Cloud Advocate ๐Ÿ› ๏ธ AI & ML Architect ๐ŸŒฑ Empowering Mentor ๐ŸŒŸ Deep Learning Maven ๐ŸŽฎ Reinforcement Learning Connoisseur ๐ŸŒŒ Quantum Computing Trailblazer ๐ŸŒ Edge Computing Advocate Feel free to connect for invigorating conversations on AI, Machine Learning, Data Science, Quantum Computing, or the expansive world of Cloud Computing. Let's embark on a journey to unveil your latent potential ๐Ÿš€ Remember, all perspectives shared are exclusively mine and do not mirror the viewpoints of my employer. Key Words: AI Innovation, Cloud Pioneering, Tech Mentorship, Cutting-Edge ML, Strategic Partnerships, Quantum Leap in Tech, AI Advancements, Cloud Empowerment, Mentorship in Innovation, Industry 4.0, Natural Language Processing, Computer Vision, AWS & Google Cloud, Machine Learning, Artificial Intelligence (AI/ML), Program Management, Data Science, Google Cloud, AWS, Solutions Architecture, Personal Development, AI, ML & Automation, Strategic Partnership, Strategy Consulting.