From Zero to Hero With LangChain: How to Make AI Agent That Access PDFs, APIs and Data Bases

Leo BchecheLeo Bcheche
11 min read

Have you ever wanted to build a chatbot that can answer questions about real-world data, like reading resumes, checking crypto prices, or searching a database?

In this guide, we’ll build a smart AI agent using LangChain*, a tool that connects large language models (like ChatGPT) with* external tools — such as APIs, PDF files, and databases.

Here’s what our agent will do:

  • Use an API to get live crypto prices.

  • Read multiple PDF resumes and answer questions about them.

  • Search and return data from a local SQLite3 database.


Requirements

Let’s get everything set up so you can build and run this agent on your own machine.

Virtual Environment

Create a Python Virtual Environment for this project. A virtual environment keeps your project’s dependencies separate. In your terminal, run:

python -m venv langchain-agent-env

Activate your Virtual Environment:

# On Linux/macOS:
source langchain-agent-env/bin/activate
# On Windows:
.\langchain-agent-env\Scripts\activate

Install all required packages:

pip install langchain-openai requests python-dotenv pymupdf

API KEY

Create your own API_KEY from OPEN AI at OpenAI Platform.

All the requests to the OpenAI model shown in this example are paid. You will need to add at least a small amount of credit in order to run this example.

Then, create a .env file in your project folder with your OpenAI key:

OPENAI_API_KEY=your_openai_api_key_here

Resumes in PDF

Create a folder named PDFs:

mkdir PDFs

Now create at least 5 fake resumes in PDF format and drop them into this folder. This is an example of one of the resumes created by ChatGPT for this article:

Create SQLite Database

Make a new Python file named create_db.py and paste in:

import sqlite3

conn = sqlite3.connect("data.db")
cursor = conn.cursor()

cursor.execute("""
CREATE TABLE IF NOT EXISTS products (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT,
    description TEXT,
    price REAL,
    stock INTEGER,
    category TEXT
)
""")

products = [
    ("Mechanical Keyboard", "RGB keyboard", 59.99, 20, "Peripherals"),
    ("Gaming Mouse", "High DPI mouse", 39.99, 35, "Peripherals"),
    ("24-inch Monitor", "Full HD", 129.99, 12, "Displays"),
    ("Laptop i5", "8GB RAM SSD", 749.00, 7, "Computers"),
    ("Gaming Chair", "Ergonomic", 199.00, 9, "Furniture")
]

cursor.executemany("""
INSERT INTO products (name, description, price, stock, category)
VALUES (?, ?, ?, ?, ?)
""", products)

conn.commit()
conn.close()
print("✅ Database created.")

Run it:

python create_db.py

Now you’ve got a local database full of product data to query later.


Create the AI Agent Code

Now, let’s create our AI Agent code. Create a new file and name it as ai_agent.py .

Imports, Setup and AI Model

Let’s break this first part of the code down in plain English, like I’m sitting next to you explaining how it works.

The first few lines are all about bringing in the tools we need. Think of it like grabbing ingredients before cooking. We start by importing ChatOpenAI from the langchain_openai package — this is what lets our code talk to ChatGPT. Next, we bring in initialize_agent and AgentType from LangChain. These are used to build our smart agent — basically, a chatbot that knows when to use which function. We also import Tool, which allows us to take any regular Python function and turn it into something the agent can use as a “tool” to answer questions.

Then we pull in some helpful libraries: requests to talk to external APIs, sqlite3 to talk to a database, and os to handle system stuff, like reading environment variables. Speaking of environment variables, we use load_dotenv() to read from a .env file — this is where you hide secret keys like your OpenAI API key so they’re not hardcoded in the script. Right after that, we make sure the key is loaded by referencing os.environ["OPENAI_API_KEY"].

Finally, we set up the actual language model we’re going to use: ChatGPT (specifically gpt-3.5-turbo). We do this by creating an llm object using ChatOpenAI, and we set temperature=0, which just means the responses will be more predictable and consistent — perfect when we want the same input to give the same output.

# Import the ChatOpenAI class to interact with OpenAI’s chat-based models
from langchain_openai import ChatOpenAI 
# Import functions to initialize an agent and specify its type
from langchain.agents import initialize_agent, AgentType 
# Import the Tool wrapper to turn Python functions into LangChain tools
from langchain.tools import Tool 
# Import the requests library for HTTP requests, sqlite3 for database access, 
# and os for environment operations
import requests, sqlite3, os 
# Import load_dotenv to load environment variables from a .env file
from dotenv import load_dotenv 
# Import PyMuPDF (fitz) to read text from PDF files
import fitz  

# Load environment variables from a .env file into the process’s environment
load_dotenv() 
 # Ensure the OPENAI_API_KEY environment variable is set for API authentication
os.environ["OPENAI_API_KEY"]
# Instantiate the ChatOpenAI language model with a specific model and deterministic output
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

Tool 1 | Crypto Price API

The first tool is like a little helper that knows how to check the price of any cryptocurrency (like Bitcoin or Ethereum) in real time.

We define a function get_crypto_price() that goes to CoinGecko — a public website with crypto data — and asks for the price of a coin in USD. It builds the web address (URL), sends a request, and if the response is successful, it pulls out the price and sends it back as a string like: “The price of bitcoin is $34,000 USD.”

Then we "wrap" this function using Tool.from_function(), which tells LangChain,
“Hey, this is a tool. When someone asks a question about crypto prices, use this.”

# TOOL 1: Define a function to fetch the USD price of a cryptocurrency via CoinGecko API
def get_crypto_price(crypto: str) -> str:
    # Build the API URL with the specified cryptocurrency ID
    url = f"https://api.coingecko.com/api/v3/simple/price?ids={crypto}&vs_currencies=usd"
    # Send a GET request to the CoinGecko API
    res = requests.get(url)
    # If the response is successful (HTTP 200), parse the JSON for the USD price
    if res.status_code == 200:
        price = res.json().get(crypto, {}).get("usd", "not found")
        # Return a formatted string with the price
        return f"The price of {crypto} is ${price} USD."
    # If the request failed, return an error message
    return "Failed to access the API."

# Wrap the get_crypto_price function as a LangChain tool named “CryptoPriceFetcher”
api_tool = Tool.from_function(
    name="CryptoPriceFetcher",
    description=(
        "Uses an API to fetch the current price of a cryptocurrency."
        "Example: bitcoin, ethereum. Answer with just the number: $0.00001 USD"
    ),
    func=get_crypto_price
)

Tool 2 | Resume Reader and Answerer

Next, we create a tool that can read all resumes in a folder and answer questions about them.

Here’s how it works:

  • It loops through all PDFs in the PDFs folder.

  • It opens each file and extracts the text from every page.

  • It puts all that text together into one big chunk.

  • Then it asks the language model (ChatGPT) to answer a question based on that text.

So if you ask something like,“Who has the most experience with Streamlit?”, this tool finds the answer by reading the resumes.

We again wrap this with Tool.from_function() and name it ResumeQA. Now LangChain knows it can use this tool when questions involve resumes or candidate info.

# TOOL 2: Define a function to answer questions based on all resumes (PDFs) in the 'PDFs/' folder
def ask_about_resumes(question: str) -> str:
    # Set the folder path containing PDF resumes
    folder_path = "PDFs"
    # Initialize an empty string to accumulate all extracted text
    corpus = ""
    # Iterate over each file in the specified folder
    for filename in os.listdir(folder_path):
        # Process only PDF files (case-insensitive check)
        if filename.lower().endswith(".pdf"):
            # Open the PDF file with PyMuPDF
            with fitz.open(os.path.join(folder_path, filename)) as doc:
                # Extract text from each page and append to the corpus
                corpus += "".join(page.get_text() for page in doc)
    # If no text was extracted, inform the user
    if not corpus.strip():
        return "No text could be extracted from the resumes."
    try:
        # Use the language model to answer the question based on the combined resume text
        return llm.invoke(
            f"Based on the following resumes:\n\n{corpus}\n\nAnswer this question: {question}"
        )
    except Exception as e:
        # If an error occurs, return the exception message
        return f"Error processing CVs: {str(e)}"

# Wrap the ask_about_resumes function as a LangChain tool named “ResumeQA”
resume_tool = Tool.from_function(
    name="ResumeQA",
    description=(
        "Answer any question based on the content of CVs located in the 'PDFs/' folder. "
        "You can answer about skills, experiences, technologies, or candidate profiles."
    ),
    func=ask_about_resumes
)

Tool 3 | Database Search Tool

The third tool lets the agent talk to a local database. This database contains a table of products (name, price, stock, etc.).

The function query_products() receives a SQL command like SELECT name, price FROM products, runs it on the database (data.db), and returns the results.

It’s a flexible tool — the agent can ask anything, like:

  • "What’s the most expensive product?"

  • "Show all products under $100."

And once again, we turn this into a tool with Tool.from_function() so our agent can pick it when a database-style question comes up.

# TOOL 3: Define a function to run arbitrary SQL queries against a SQLite database
def query_products(sql: str) -> str:
    try:
        # Connect to the SQLite database file “data.db”
        with sqlite3.connect("data.db") as conn:
            # Execute the provided SQL query
            cursor = conn.execute(sql)
            # Fetch all resulting rows
            rows = cursor.fetchall()
        # Return the rows as a string, or indicate no data was found
        return str(rows) if rows else "No data found."
    except Exception as e:
        # Return an error message if the query fails
        return f"Query error: {str(e)}"

# Wrap the query_products function as a LangChain tool named “SQLProductQuery”
db_tool = Tool.from_function(
    name="SQLProductQuery",
    description="Run SQL queries on a SQLite database. Example: SELECT name, price FROM products;",
    func=query_products
)

AI Agent

This part of the code is where we actually create the intelligent agent — the brain that will decide which tool to use when a user asks a question.

We start by placing all three tools — api_tool, resume_tool, and db_tool — into a list called tools. Think of this as packing the agent's toolbox. Each item in the list is a different skill it knows: one for crypto prices, one for reading resumes, and one for running database queries.

Then we use the initialize_agent() function to bring everything together. This is where LangChain connects the tools with the language model (llm) — in our case, ChatGPT using the gpt-3.5-turbo model. We pass in the tools, the model, and a reasoning type: AgentType.ZERO_SHOT_REACT_DESCRIPTION. That’s a fancy way of saying, “Let the agent figure out what tool to use based on the question, using just the tool descriptions — no examples needed.”

Lastly, we set verbose=False, which just tells the system not to print out all the internal reasoning steps. If you're debugging or curious about how the agent makes decisions, you can flip that to True to watch the agent think out loud.

So, this code is essentially the moment our tools come to life — the AI brain now knows what it's capable of, and it’s ready to respond to whatever you throw at it.

# Initialize the LangChain agent with the three tools defined above
tools = [api_tool, resume_tool, db_tool]
agent = initialize_agent(
    tools=tools,                                  # List of tools available to the agent
    llm=llm,                                      # Language model instance
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,  # Agent reasoning strategy
    verbose=False                                 # Disable verbose logging
)

The Client

This part of the code is a demo that tests how well the agent uses each of the three tools. It defines a list of example prompts — some asking for crypto prices, others asking about resumes, and a few querying the product database. Each prompt is paired with a label to show which tool it's meant to test. Then, in a loop, the agent is asked each question using agent.invoke(prompt). The agent reads the question, decides which tool to use based on its understanding, executes the function, and returns the result. The question and the output are printed so we can see how the agent responds in each case.


# Define example prompts to demonstrate each tool’s usage
examples = [
    ("\n--- API TOOL EXAMPLE ---\n", "What's the current price of cardano?"),
    ("\n--- PDF TOOL EXAMPLE ---\n", 
     "What is the name of the most experienced professional in Streamlit? "),
    ("\n--- PDF TOOL EXAMPLE ---\n",
    "Make a list of all professionals and order by experience time?.\n "
    "The list must contain Name, Time Of Experience (years), "
    "Level (Specialist, Senior, Mid-level, Junior) and Main Skills.\n "
    "Show the list in a formatted table like excel with rows and cols. "
    ),
    ("\n--- DATABASE TOOL EXAMPLE ---\n", 
    "List the name and price of all products in the products table. "
    "Show the list in a formatted table"),
    ("\n--- DATABASE TOOL EXAMPLE ---\n", "Which one is the most expensive product?")
]

# Iterate over each example: print the header, invoke the agent, and display the response
for header, prompt in examples:
    print(header)
    response = agent.invoke(prompt)
    print(response["input"] + "\n")
    print(response["output"])

Run The Code

Now, let’s run our code:

py ai_agent.py #or use "python" or "python3" instead of "py"

You should get a result similar to this:

The format of the responses may vary and in order to make it more accurate, you can improve the description of each tool’s example.


Conclusion

You just learned how to build an intelligent LangChain agent that:

  • Calls a crypto API to get live prices.

  • Reads resumes from a folder and answers questions about them.

  • Runs SQL queries on a local database.

This is just the beginning. You can add more tools, like sending emails, making Slack notifications, scraping websites, or even calling functions inside your own apps.

And you, the reader — what tool would you like your AI agent to have?

https://github.com/LBcheche/LangChain-AI-Agent

0
Subscribe to my newsletter

Read articles from Leo Bcheche directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Leo Bcheche
Leo Bcheche