Unlocking the Power of Hybrid RAG Systems: A Deep Dive into LlamaCloud's City Query Solution

Table of contents
- Project Overview
- Architecture Details
- Data Flow
- Implementation Details
- SQL Query Engine
- LlamaCloud Integration
- Model Configuration
- Query Processing Logic
- Error Handling and Edge Cases
- Installation and Dependencies
- Environment Setup
- Running the Application
- Example Queries and Responses
- Advanced Customization
- Troubleshooting
- Performance Optimization
- Security Considerations
- Future Enhancements
- Conclusion

Project Overview
This project implements a hybrid Retrieval-Augmented Generation (RAG) system that combines structured database queries with unstructured document retrieval. The application allows users to ask natural language questions about US cities and receive accurate, contextual responses. The system uses a dual-pipeline approach:
Structured Data Pipeline: SQL database containing city statistics (population, state)
Unstructured Data Pipeline: LlamaCloud for document retrieval of general city information
The application is built with a Streamlit frontend that provides an intuitive chat interface, allowing users to interact with the system conversationally.
Architecture Details
Component Breakdown
1. User Interface Layer
Technology: Streamlit
Purpose: Provides chat interface, API key configuration, and displays responses
Key Features:
Chat history tracking using session state
Sidebar configuration for API keys and LlamaCloud settings
Real-time response generation with loading indicators
2. Structured Data Layer
Technology: SQLite (in-memory), SQLAlchemy
Purpose: Stores and queries factual city information
Components:
setup_database()
: Initializes the database with city dataCityQueryEngine
: Custom class for direct SQL queryingNLSQLTableQueryEngine
: LlamaIndex component for natural language to SQL conversion
3. Unstructured Data Layer
Technology: LlamaCloud
Purpose: Stores and retrieves document-based city information
Components:
LlamaCloudIndex
: Connects to pre-indexed documents in LlamaCloudVector query engine for semantic retrieval
4. LLM Integration Layer
Technology: Gemini 2.0 Flash
Purpose: Natural language understanding and response generation
Components:
Text generation model for synthesizing responses
Embedding model for vector search (when needed)
5. Query Orchestration Layer
Purpose: Routes queries to appropriate backends based on content
Logic:
Population/statistic queries โ SQL pipeline
General information queries โ LlamaCloud pipeline
Data Flow
Input Phase:
User submits query through Streamlit chat interface
Query is added to chat history
System analyzes query content
Processing Phase:
If query contains population keywords:
First attempt direct SQL via pattern matching
Fall back to NLSQLTableQueryEngine if needed
If query is about general information:
- Route to LlamaCloud vector search
Response is generated using retrieved context
Output Phase:
Response displayed in chat interface
Source of information (Database or LlamaCloud) is displayed
Response added to chat history
Implementation Details
Database Schema
- The application uses a simple SQLite database with a single table:
CREATE TABLE city_stats (
city_name VARCHAR(16) PRIMARY KEY,
population INTEGER,
state VARCHAR(16) NOT NULL
);
Data is populated with information about six major US cities:
New York City (New York)
Los Angeles (California)
Chicago (Illinois)
Houston (Texas)
Miami (Florida)
Seattle (Washington)
SQL Query Engine
The custom CityQueryEngine
class provides specialized query capabilities:
Direct Execution:
execute_query()
method runs raw SQL and formats resultsSpecialized Queries:
query_highest_population()
: Finds city with highest populationquery_lowest_population()
: Finds city with lowest populationquery_all_cities_ranked()
: Orders all cities by populationquery_by_state()
: Filters cities by state name
Natural Language Processing:
process_population_query()
: Uses regular expressions and keyword matching to convert natural language to appropriate SQL queriesPattern matching for terms like "highest", "lowest", "in [state]", etc.
LlamaCloud Integration
The application connects to LlamaCloud for document retrieval:
Configuration:
API key
Organization ID
Project name
Index name
Index Connection:
Creates
LlamaCloudIndex
objectConfigures vector query engine
Query Execution:
Sends natural language query directly to LlamaCloud
Retrieves relevant document chunks
Uses context for response generation
Model Configuration
The application uses Gemini 2.0 Flash for both understanding and generation:
LLM Model:
gemini_model = Gemini( model="models/gemini-2.0-flash", api_key=st.session_state.GOOGLE_API_KEY, temperature=0.2 )
Embedding Model:
gemini_embed_model = GeminiEmbedding( model_name="models/embedding-001", api_key=st.session_state.GOOGLE_API_KEY )
Global Settings:
Settings.llm = gemini_model Settings.embed_model = gemini_embed_model
Query Processing Logic
The core logic that determines how to handle each query:
# Check if this is a population query
if any(word in prompt.lower() for word in ['population', 'populous', 'big city', 'large city', 'small city']):
# Try direct SQL approach first
result = city_query_engine.process_population_query(prompt)
if result:
message_placeholder.markdown(f"{result}\n\n*Source: Database (Direct SQL)*")
else:
# Fall back to LLM-based SQL
response = sql_query_engine.query(prompt)
message_placeholder.markdown(f"{str(response)}\n\n*Source: Database*")
elif have_llamacloud:
# For general information, use LlamaCloud
response = vector_query_engine.query(prompt)
message_placeholder.markdown(f"{str(response)}\n\n*Source: LlamaCloud*")
else:
# If neither available
message_placeholder.markdown("I'm unable to answer that question with the current configuration.")
Error Handling and Edge Cases
The application includes several error handling mechanisms:
API Key Validation:
Checks if Google API key is present
Displays warning if missing
LlamaCloud Connection:
Try/except block for connection attempts
Fallback to SQL-only mode if connection fails
UI indicators for connection status
Query Processing:
Handles failed queries gracefully
Provides feedback on processing errors
Suggests query reformulation
Empty Results:
Handles case where no cities match query criteria
Returns informative message
Installation and Dependencies
Required Packages
streamlit==1.31.0
llama-index==0.10.0
llama-index-llms-gemini==0.1.3
llama-index-embeddings-gemini==0.1.3
llama-index-indices-managed-llama-cloud==0.1.0
llama-index-core==0.10.0
sqlalchemy==2.0.27
pandas==2.1.4
Environment Setup
Virtual Environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
Package Installation:
pip install -r requirements.txt
API Keys:
Google API key for Gemini
LlamaCloud API key, organization ID, project, and index name
Running the Application
Start Application:
streamlit run app.py
Configuration:
Enter API keys in sidebar
Confirm LlamaCloud connection status
Usage:
Ask questions in the chat input
View responses with source attribution
Chat history is maintained during session
Example Queries and Responses
Structured Data Queries
Query: "What is the population of New York City?" Response: "New York City has a population of 8,336,000 people and is located in New York." Source: Database (Direct SQL)
Query: "Which city has the highest population?" Response: "New York City has a population of 8,336,000 people and is located in New York." Source: Database (Direct SQL)
Query: "List all cities in California" Response: "City information:
- Los Angeles: 3,822,000 people in California" Source: Database (Direct SQL)
Unstructured Data Queries
Query: "Tell me about the history of Chicago" Response: [Detailed information about Chicago's history, retrieved from documents] Source: LlamaCloud
Query: "What are the main attractions in Miami?" Response: [Information about Miami attractions, retrieved from documents] Source: LlamaCloud
Advanced Customization
Adding More Cities
To expand the database with additional cities:
# Add new city data
additional_rows = [
{"city_name": "San Francisco", "population": 874961, "state": "California"},
{"city_name": "Boston", "population": 675647, "state": "Massachusetts"},
# Add more cities as needed
]
# Insert into database
for row in additional_rows:
stmt = insert(city_stats_table).values(**row)
with engine.begin() as connection:
connection.execute(stmt)
Supporting Additional Query Types
To enhance query capabilities, extend the CityQueryEngine
class:
def query_by_population_range(self, min_pop, max_pop):
"""Query cities within a population range"""
query = f"SELECT city_name, population, state FROM city_stats WHERE population BETWEEN {min_pop} AND {max_pop} ORDER BY population DESC"
return self.execute_query(query)
def process_population_query(self, query_text):
# Add pattern matching for population range
range_match = re.search(r"between\s+(\d[\d,]*)\s+and\s+(\d[\d,]*)", query_lower)
if range_match:
min_pop = int(range_match.group(1).replace(',', ''))
max_pop = int(range_match.group(2).replace(',', ''))
return self.query_by_population_range(min_pop, max_pop)
Enhancing LlamaCloud Document Retrieval
To improve document retrieval settings:
vector_query_engine = index.as_query_engine(
similarity_top_k=3, # Retrieve more documents
response_mode="tree_summarize", # Use tree summarization for better responses
streaming=True # Enable streaming responses
)
Troubleshooting
Common Issues and Solutions
API Key Errors:
Error: "Error: Invalid API key"
Solution: Verify Google API key in the sidebar or environment variables
LlamaCloud Connection Failures:
Error: "Error connecting to LlamaCloud"
Solution: Check organization ID, project name, and index name
SQL Engine Errors:
Error: "Error setting up SQL query engine"
Solution: Ensure SQLAlchemy is properly installed and database setup is correct
No Response to Queries:
Issue: System doesn't respond to certain questions
Solution: Check query routing logic and ensure appropriate engine is available
Performance Optimization
Query Caching:
Implement Streamlit caching for repeated queries
Add
@st.cache_data
decorator to query functions
Model Temperature:
Lower temperature (currently 0.2) for more deterministic responses
Increase for more creative but potentially less accurate answers
SQL Query Optimization:
Use prepared statements for frequent queries
Add indexes for larger datasets
Security Considerations
API Key Management:
Use environment variables instead of session state for production
Implement key rotation and expiration policies
Input Validation:
Sanitize user input before processing
Validate SQL queries to prevent injection
Data Privacy:
Consider data retention policies for chat history
Implement user authentication for sensitive data
Future Enhancements
Expanded Database:
Add more cities and additional attributes (GDP, crime rate, etc.)
Support for historical population data
Advanced NLP:
Implement intent classification for better query routing
Add support for multi-turn conversations with context awareness
UI Improvements:
Add visualization for city comparison
Implement map view for geographical context
Integration Options:
Support for additional vector databases (Qdrant, Pinecone)
Option to use alternative LLMs (locally via Ollama or through APIs)
Evaluation Framework:
Implement CometML's Opik for tracing and observability
Add metrics for answer quality and retrieval relevance
Conclusion
This LlamaCloud RAG Demo showcases an effective approach to combining structured and unstructured data for comprehensive question answering. The hybrid architecture leverages the strengths of both SQL databases (for precise factual queries) and document retrieval (for nuanced, contextual information).
Key takeaways from this implementation:
Intelligent Query Routing: The system automatically determines the best source for answering each question
Streamlined User Experience: Simple chat interface hides the complexity of the underlying systems
Extensible Architecture: Can be expanded with additional data sources and query capabilities
Production-Ready Components: Uses industry-standard tools like LlamaIndex, SQLAlchemy, and Gemini
By effectively combining these technologies, the application demonstrates a powerful approach to building knowledge-intensive applications that can handle both structured and unstructured information within a unified interface.
Subscribe to my newsletter
Read articles from Vedant Pandya directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Vedant Pandya
Vedant Pandya
Hey there!๐ I'm Vedant Pandya (He/Him) โ a passionate explorer at the intersection of technology and innovation. With a robust background in Machine Learning, Generative Artificial Intelligence (Gen AI), Data Science, and Cloud Computing (both AWS and GCP), I've immersed myself in the dynamic realm of Industry 4.0 for over 4 years. As a trailblazing Machine Learning Engineer and Data Scientist, I thrive on translating complex concepts into tangible solutions. My journey spans diverse sectors, from consulting to industry to tech, and I've championed cutting-edge open-source projects from inception to reality. An advocate of continuous learning and growth, I'm deeply committed to fostering development and mentoring. I spearhead impactful learning initiatives within workplaces and academic institutions, empowering individuals to exceed their perceived limits. Certified by Google Cloud in Machine Learning and Data Science (MOOC - @Coursera), I'm also honored to be a Google Women TechMaker. I channel my insights as a content creator and blogger, shedding light on intricate tech nuances. My academic prowess shines with a Bachelor's degree in Information Technology, marked by distinction. Beyond the professional realm, I carry the pride of being raised by a single parent, instilled with values of dignity and resilience. Expertise: ๐ Industry 4.0 Visionary ๐ NLP & Computer Vision Aficionado / Virtuoso โ๏ธ Google Cloud Advocate ๐ ๏ธ AI & ML Architect ๐ฑ Empowering Mentor ๐ Deep Learning Maven ๐ฎ Reinforcement Learning Connoisseur ๐ Quantum Computing Trailblazer ๐ Edge Computing Advocate Feel free to connect for invigorating conversations on AI, Machine Learning, Data Science, Quantum Computing, or the expansive world of Cloud Computing. Let's embark on a journey to unveil your latent potential ๐ Remember, all perspectives shared are exclusively mine and do not mirror the viewpoints of my employer. Key Words: AI Innovation, Cloud Pioneering, Tech Mentorship, Cutting-Edge ML, Strategic Partnerships, Quantum Leap in Tech, AI Advancements, Cloud Empowerment, Mentorship in Innovation, Industry 4.0, Natural Language Processing, Computer Vision, AWS & Google Cloud, Machine Learning, Artificial Intelligence (AI/ML), Program Management, Data Science, Google Cloud, AWS, Solutions Architecture, Personal Development, AI, ML & Automation, Strategic Partnership, Strategy Consulting.