Build Your Own YouTube AI ChatBot Using LangChain, Python, and Vector DB – Beginner Friendly Guide!


Have you ever watched a long YouTube video and wished you could just ask a question and get the answer from it? Or get a quick summary without watching it all?
Well, that's exactly what we're building today — a YouTube AI Assistant that can:
Understand YouTube videos using their captions,
Break that into chunks,
Embed and store those into a vector database,
Retrieve and generate answers to your questions using AI!
And yes, we’ll use Python 🐍, LangChain 🧠, Google’s Gemini (via LangChain), and FAISS for vector storage!
🔧 Tools & Technologies Used
Streamlit: For building an interactive web app
LangChain: For chaining LLMs, retrievers, and prompts
Google Gemini: For answering questions using LLM
FAISS: Vector database to store and retrieve document embeddings
YouTubeTranscriptAPI: To fetch YouTube captions
Python-dotenv: To manage environment variables
💡 What We’ll Cover (LangChain in 4 Steps)
We’ll break this project into four core LangChain steps:
Indexing: Extracting transcript, chunking it, embedding it, and storing in FAISS DB.
Retrieval: Fetching relevant chunks from DB based on the user's query.
Augmentation: Adding context to the user query using retrieved documents.
Generation: Asking the LLM to respond using that context.
🚀 Step-by-Step Walkthrough
🎬 Step 1: Indexing - Making the Video "Searchable"
Goal: Convert video transcript into a format that can be queried.
1.1. Extract Transcript
transcript_list = YouTubeTranscriptApi.get_transcript(video_id, languages=["en"])
transcript = " ".join(chunk["text"] for chunk in transcript_list)
We extract the auto-generated or uploaded captions from YouTube using the video ID.
1.2. Chunk the Text
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.create_documents([transcript])
Why chunking? Large texts are hard for models to process. So we break them down into smaller overlapping pieces to maintain context.
1.3. Generate Embeddings
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001") #use any other model if you like
Each chunk is converted into a numerical format (embedding) using Google’s Gemini embedding model — this lets us compare chunks mathematically!
1.4. Store in Vector DB (FAISS)
vector_store = FAISS.from_documents(chunks, embeddings)
These embeddings are stored in FAISS, a fast vector search library. Now our video content is indexed and ready to be searched!
🔍 Step 2: Retrieval - Finding Relevant Info
Goal: When a user asks a question, find relevant chunks from the transcript.
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 4})
retrieved_docs = retriever.invoke(question)
Here’s what’s happening:
User types a question.
We convert that question into an embedding.
We compare it against the stored video chunks in FAISS.
Return top
k=4
similar chunks.
🧩 Step 3: Augmentation - Combine Context & Query
Goal: Combine retrieved transcript chunks with the user's question into a prompt.
prompt = PromptTemplate(
template="""
You are a helpful assistant.
Answer ONLY from the provided transcript context.
If the context is insufficient, just say you don't know.
{context}
Question: {question}
""",
input_variables=["context", "question"],
)
We create a smart prompt that tells the LLM:
Stick to the transcript.
Don’t hallucinate.
Be detailed and helpful.
context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
final_prompt = prompt.invoke({"context": context_text, "question": question})
💬 Step 4: Generation - Let the LLM Answer
Goal: Use Google Gemini to generate the answer using the provided context.
llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash", temperature=0.7)
answer = llm.invoke(final_prompt)
Voila! You now have an AI that can answer questions from a YouTube video !!🎉 :p
📺 Bonus: Summarizing the Entire Video
You can also summarize the video using a similar technique:
summary = main_chain.invoke("Can you summarize the video")
Here, instead of asking a custom question, we use a fixed one like "Can you summarize the video"
and process it just like before. The retrieved chunks become context for summarization.
🧪 Sample Output
Input Video:
https://www.youtube.com/watch?v=HAnw168huqA
Sample Question: "What is the main topic discussed?"
Output: "The video discusses how LangChain helps build context-aware AI applications using modular components..."
🧠 Concepts You Learned
✅ Chunking text with overlap
✅ Generating embeddings
✅ Storing in a vector database (FAISS)
✅ Retrieving similar chunks based on query
✅ Contextual prompting with LangChain
✅ Generating responses using Google Gemini LLM
🛠 Try It Yourself
Fork and run the project from GitHub 👇
🔗 https://github.com/vishwajitvm/YouTube-AI-ChatBot
⚠️ Make sure your
.env
file has your Google Generative AI key and your system has Python 3.10+ installed.
📌 Final Thoughts
This project is a great way to learn:
How LLMs work behind the scenes.
How vector DBs make documents searchable.
How prompt engineering drives results.
Let me know if you’d like a Part 2 where we build voice support or add multi-video context!
If you enjoyed this, hit 💙, share, and follow me for more hands-on AI projects!
Subscribe to my newsletter
Read articles from Vishwajit Vm directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Vishwajit Vm
Vishwajit Vm
Hey, my name is Vishwajit, and I’m from New Delhi. I specialize in backend development and AI, working with technologies like Python, Node.js, Express.js, GraphQL, and vector databases to build intelligent, scalable solutions. Alongside my backend and AI expertise, I also work with PHP, Laravel, MySQL, MongoDB, React, Next.js, HTML, CSS, and more. I enjoy combining robust technical architecture with creative design, offering clients sophisticated solutions that are also cost-effective. I believe in continuously learning new tools and strategies to stay ahead of trends and deliver exceptional results. Through dedication and hard work, I’m focused on growing as a skilled software engineer and designer. With my passion for technology and design, I’m confident I can help you bring your creative vision to life. Let’s collaborate and create something truly extraordinary!