LangChain, SQL Agents & OpenAI LLMs: Query Database Using Natural Language
Introduction
Natural language querying allows users to interact with databases more intuitively and efficiently. By leveraging the power of LangChain, SQL Agents, and OpenAI's Large Language Models (LLMs) like ChatGPT, we can create applications that enable users to query databases using natural language. In this blog post, we'll discuss the key features of these technologies and provide a step-by-step guide on how to implement them for natural language database querying. We have also created an accompanying YouTube video tutorial for those who prefer a visual demonstration.
LangChain:
A Framework for Language Model-Powered Applications LangChain is a framework designed for building applications powered by language models. It provides a standard interface for chains, integrates with various tools, and offers end-to-end chains for common applications. The two main features of LangChain are data-awareness and agentic behavior.
Data-awareness enables the language model to connect to other sources of data, while agentic behavior allows the model to interact with its environment. By using agents, LangChain can dynamically decide which tools to call based on user input. This makes agents extremely powerful when used correctly.
Tools, Agents, and Toolkits in LangChain
A. Tools are functions that perform specific duties, such as Google Search, database lookups, or Python REPL. They take a string as input and return a string as output.
B. Agents are responsible for determining which actions to take and in what order. They can use tools, observe the output, or return to the user.
C. Toolkits are collections of tools that can be utilized by agents. LangChain supports various toolkits to help developers create powerful applications.
Agents
V. SQL Database Agent The SQL Database Agent is designed to interact with SQL databases, allowing users to ask questions in natural language and receive answers. Here's how to implement it:
pip install langchain openai pymysql --upgrade -q
Import the necessary libraries:
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType
from langchain.llms import OpenAI
os.environ['OPENAI_API_KEY'] = "your_openai_api_key"
os.environ["SERPAPI_API_KEY"] = "your_serpapi_api_key"
llm = OpenAI(temperature=0)
tools = load_tools(["serpapi", "llm-math"], llm=llm)
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION)
agent.run("Who is Leo DiCaprio's girlfriend? What is her current age raised to the 0.43 power?")
Output:
Leo DiCaprio's girlfriend is Eden Polani and her current age raised to the 0.43 power is 3.547023357958959.
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)
agent.run("Who is Leo DiCaprio's girlfriend? What is her current age raised to the 0.43 power?")
Output:
Entering new AgentExecutor chain...
I need to find out who Leo DiCaprio's girlfriend is and then calculate her age raised to the 0.43 power.
Action: Search
Action Input: "Leo DiCaprio girlfriend"
Observation: After rumours of a romance with Gigi Hadid, the Oscar winner has seemingly moved on. First being linked to the television personality in September 2022, it appears as if his "age bracket" has moved up. This follows his rumoured relationship with mere 19-year-old Eden Polani.
Thought: I need to find out Eden Polani's age
Action: Search
Action Input: "Eden Polani age"
Observation: 19 Years Old
Thought: I need to calculate her age raised to the 0.43 power
Action: Calculator
Action Input: 19^0.43
Observation: Answer: 3.547023357958959
Thought: I now know the final answer
Final Answer: Leo DiCaprio's girlfriend is Eden Polani and her current age raised to the 0.43 power is 3.547023357958959.
> Finished chain.
Leo DiCaprio's girlfriend is Eden Polani and her current age raised to the 0.43 power is 3.547023357958959.
SQL Database Agent
Database Schema and Resources
The examples in this blog post and the accompanying video tutorial use a specific database schema for demonstration purposes. This schema includes tables related to orders, products, customers, etc. To help you follow along with the examples and explore the database schema on your own, we've provided a link to the schema and resources. Access the database schema here.
The SQL Database Agent is designed to interact with SQL databases, allowing users to ask questions in natural language and receive answers. Here's how to implement it:
Import the necessary libraries
import os
from langchain.agents import *
from langchain.llms import OpenAI
from langchain.sql_database import SQLDatabase
Connect to your database:
pythonCopy codedb_user = "db_user"
db_password = "db_password"
db_host = "db_host"
db_name = "db_name"
db = SQLDatabase.from_uri(f"mysql+pymysql://{db_user}:{db_password}@{db_host}/{db_name}")
Set up the LLM, toolkit, and agent executor:
pythonCopy codefrom langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(model_name="gpt-3.5-turbo")
toolkit = SQLDatabaseToolkit(db=db)
agent_executor = create_sql_agent(
llm=llm,
toolkit=toolkit,
verbose=True
)
Query the database with natural language:
a. Describing a table and its relationships:
agent_executor.run("Describe the Order related table and how they are related")
b. Recovering from an error:
agent_executor.run("Describe the PurchaseDetails table")
c. Finding the top 5 products with the highest total sales revenue:
agent_executor.run("Find the top 5 products with the highest total sales revenue")
Output:
> Entering new AgentExecutor chain...
Action: list_tables_sql_db
Action Input: ""
Observation: customers, employees, payments, products, productlines, orderdetails, offices, orders
Thought:I should query the 'products' and 'orderdetails' tables to get the total sales revenue for each product
Action: query_checker_sql_db
Action Input: SELECT products.productName, SUM(orderdetails.quantityOrdered * orderdetails.priceEach) AS totalRevenue FROM products INNER JOIN orderdetails ON products.productCode = orderdetails.productCode GROUP BY products.productName ORDER BY totalRevenue DESC LIMIT 5;
Observation:
SELECT products.productName, SUM(orderdetails.quantityOrdered * orderdetails.priceEach) AS totalRevenue
FROM products
INNER JOIN orderdetails
ON products.productCode = orderdetails.productCode
GROUP BY products.productName
ORDER BY totalRevenue DESC
LIMIT 5;
Thought:The query looks correct, I should execute it to get the top 5 products with the highest total sales revenue.
Action: query_sql_db
Action Input: SELECT products.productName, SUM(orderdetails.quantityOrdered * orderdetails.priceEach) AS totalRevenue FROM products INNER JOIN orderdetails ON products.productCode = orderdetails.productCode GROUP BY products.productName ORDER BY totalRevenue DESC LIMIT 5;
Observation: [('1992 Ferrari 360 Spider red', Decimal('276839.98')), ('2001 Ferrari Enzo', Decimal('190755.86')), ('1952 Alpine Renault 1300', Decimal('190017.96')), ('2003 Harley-Davidson Eagle Drag Bike', Decimal('170686.00')), ('1968 Ford Mustang', Decimal('161531.48'))]
Thought:I can see that the top 5 products with the highest total sales revenue are: 1992 Ferrari 360 Spider red, 2001 Ferrari Enzo, 1952 Alpine Renault 1300, 2003 Harley-Davidson Eagle Drag Bike, and 1968 Ford Mustang.
Final Answer: The top 5 products with the highest total sales revenue are 1992 Ferrari 360 Spider red, 2001 Ferrari Enzo, 1952 Alpine Renault 1300, 2003 Harley-Davidson Eagle Drag Bike, and 1968 Ford Mustang.
> Finished chain.
The top 5 products with the highest total sales revenue are 1992 Ferrari 360 Spider red, 2001 Ferrari Enzo, 1952 Alpine Renault 1300, 2003 Harley-Davidson Eagle Drag Bike, and 1968 Ford Mustang.
d. Listing the top 3 countries with the highest number of orders:
agent_executor.run("List the top 3 countries with the highest number of orders")
Conclusion
In this blog post, we explored the capabilities of LangChain, SQL Agents, and OpenAI LLMs for natural language database querying. By combining these technologies, we can create powerful applications that allow users to interact with databases using natural language, making data retrieval more efficient and intuitive. As these technologies continue to develop, we can expect even more advanced and versatile natural language database querying applications in the future.
Remember to check out the accompanying YouTube video tutorial for a visual demonstration of the implementation process, and don't forget to explore the database schema used in the examples.
Now that you understand these technologies and their applications, it's time to start experimenting and creating your natural language database querying solutions!
Colab Notebook (Full Code): https://github.com/PradipNichite/Youtube-Tutorials/blob/main/Langchain_Agents_SQL_Database_Agent.ipynb
Read more in-depth articles:
For more detailed information and insights on AI-related topics, don't forget to visit the FutureSmart AI Blog. Our blog features in-depth articles that cover various aspects of AI, machine learning, and natural language processing.
Check out AI Demos:
If you're interested in exploring more AI tools and their applications, head over to AIDemos.com. AIDemos is a directory of video demos showcasing the latest AI tools and technologies. Our goal is to educate and inform users about the possibilities of AI and help them stay updated on the latest advancements.
Subscribe to my newsletter
Read articles from Pradip Nichite directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Pradip Nichite
Pradip Nichite
๐ I'm a Top Rated Plus NLP freelancer on Upwork with over $100K in earnings and a 100% Job Success rate. This journey began in 2022 after years of enriching experience in the field of Data Science. ๐ Starting my career in 2013 as a Software Developer focusing on backend and API development, I soon pursued my interest in Data Science by earning my M.Tech in IT from IIIT Bangalore, specializing in Data Science (2016 - 2018). ๐ผ Upon graduation, I carved out a path in the industry as a Data Scientist at MiQ (2018 - 2020) and later ascended to the role of Lead Data Scientist at Oracle (2020 - 2022). ๐ Inspired by my freelancing success, I founded FutureSmart AI in September 2022. We provide custom AI solutions for clients using the latest models and techniques in NLP. ๐ฅ In addition, I run AI Demos, a platform aimed at educating people about the latest AI tools through engaging video demonstrations. ๐งฐ My technical toolbox encompasses: ๐ง Languages: Python, JavaScript, SQL. ๐งช ML Libraries: PyTorch, Transformers, LangChain. ๐ Specialties: Semantic Search, Sentence Transformers, Vector Databases. ๐ฅ๏ธ Web Frameworks: FastAPI, Streamlit, Anvil. โ๏ธ Other: AWS, AWS RDS, MySQL. ๐ In the fast-evolving landscape of AI, FutureSmart AI and I stand at the forefront, delivering cutting-edge, custom NLP solutions to clients across various industries.