Build Your Own YouTube AI ChatBot Using LangChain, Python, and Vector DB – Beginner Friendly Guide!

Vishwajit VmVishwajit Vm
4 min read

Have you ever watched a long YouTube video and wished you could just ask a question and get the answer from it? Or get a quick summary without watching it all?

Well, that's exactly what we're building today — a YouTube AI Assistant that can:

  • Understand YouTube videos using their captions,

  • Break that into chunks,

  • Embed and store those into a vector database,

  • Retrieve and generate answers to your questions using AI!

And yes, we’ll use Python 🐍, LangChain 🧠, Google’s Gemini (via LangChain), and FAISS for vector storage!

🔗 Project GitHub Repo

🔧 Tools & Technologies Used

  • Streamlit: For building an interactive web app

  • LangChain: For chaining LLMs, retrievers, and prompts

  • Google Gemini: For answering questions using LLM

  • FAISS: Vector database to store and retrieve document embeddings

  • YouTubeTranscriptAPI: To fetch YouTube captions

  • Python-dotenv: To manage environment variables

💡 What We’ll Cover (LangChain in 4 Steps)

We’ll break this project into four core LangChain steps:

  1. Indexing: Extracting transcript, chunking it, embedding it, and storing in FAISS DB.

  2. Retrieval: Fetching relevant chunks from DB based on the user's query.

  3. Augmentation: Adding context to the user query using retrieved documents.

  4. Generation: Asking the LLM to respond using that context.

🚀 Step-by-Step Walkthrough

🎬 Step 1: Indexing - Making the Video "Searchable"

Goal: Convert video transcript into a format that can be queried.

1.1. Extract Transcript

transcript_list = YouTubeTranscriptApi.get_transcript(video_id, languages=["en"])
transcript = " ".join(chunk["text"] for chunk in transcript_list)

We extract the auto-generated or uploaded captions from YouTube using the video ID.

1.2. Chunk the Text

splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.create_documents([transcript])

Why chunking? Large texts are hard for models to process. So we break them down into smaller overlapping pieces to maintain context.

1.3. Generate Embeddings

embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001") #use any other model if you like

Each chunk is converted into a numerical format (embedding) using Google’s Gemini embedding model — this lets us compare chunks mathematically!

1.4. Store in Vector DB (FAISS)

vector_store = FAISS.from_documents(chunks, embeddings)

These embeddings are stored in FAISS, a fast vector search library. Now our video content is indexed and ready to be searched!


🔍 Step 2: Retrieval - Finding Relevant Info

Goal: When a user asks a question, find relevant chunks from the transcript.

retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 4})
retrieved_docs = retriever.invoke(question)

Here’s what’s happening:

  • User types a question.

  • We convert that question into an embedding.

  • We compare it against the stored video chunks in FAISS.

  • Return top k=4 similar chunks.


🧩 Step 3: Augmentation - Combine Context & Query

Goal: Combine retrieved transcript chunks with the user's question into a prompt.

prompt = PromptTemplate(
    template="""
    You are a helpful assistant.
    Answer ONLY from the provided transcript context.
    If the context is insufficient, just say you don't know.

    {context}
    Question: {question}
    """,
    input_variables=["context", "question"],
)

We create a smart prompt that tells the LLM:

  • Stick to the transcript.

  • Don’t hallucinate.

  • Be detailed and helpful.

context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
final_prompt = prompt.invoke({"context": context_text, "question": question})

💬 Step 4: Generation - Let the LLM Answer

Goal: Use Google Gemini to generate the answer using the provided context.

llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash", temperature=0.7)
answer = llm.invoke(final_prompt)

Voila! You now have an AI that can answer questions from a YouTube video !!🎉 :p


📺 Bonus: Summarizing the Entire Video

You can also summarize the video using a similar technique:

summary = main_chain.invoke("Can you summarize the video")

Here, instead of asking a custom question, we use a fixed one like "Can you summarize the video" and process it just like before. The retrieved chunks become context for summarization.


🧪 Sample Output

Input Video: https://www.youtube.com/watch?v=HAnw168huqA
Sample
Question: "What is the main topic discussed?"
Output: "The video discusses how LangChain helps build context-aware AI applications using modular components..."

🧠 Concepts You Learned

✅ Chunking text with overlap
✅ Generating embeddings
✅ Storing in a vector database (FAISS)
✅ Retrieving similar chunks based on query
✅ Contextual prompting with LangChain
✅ Generating responses using Google Gemini LLM

🛠 Try It Yourself

Fork and run the project from GitHub 👇
🔗 https://github.com/vishwajitvm/YouTube-AI-ChatBot

⚠️ Make sure your .env file has your Google Generative AI key and your system has Python 3.10+ installed.


📌 Final Thoughts

This project is a great way to learn:

  • How LLMs work behind the scenes.

  • How vector DBs make documents searchable.

  • How prompt engineering drives results.

Let me know if you’d like a Part 2 where we build voice support or add multi-video context!


If you enjoyed this, hit 💙, share, and follow me for more hands-on AI projects!

Vishwajit Vm

0
Subscribe to my newsletter

Read articles from Vishwajit Vm directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Vishwajit Vm
Vishwajit Vm

Hey, my name is Vishwajit, and I’m from New Delhi. I specialize in backend development and AI, working with technologies like Python, Node.js, Express.js, GraphQL, and vector databases to build intelligent, scalable solutions. Alongside my backend and AI expertise, I also work with PHP, Laravel, MySQL, MongoDB, React, Next.js, HTML, CSS, and more. I enjoy combining robust technical architecture with creative design, offering clients sophisticated solutions that are also cost-effective. I believe in continuously learning new tools and strategies to stay ahead of trends and deliver exceptional results. Through dedication and hard work, I’m focused on growing as a skilled software engineer and designer. With my passion for technology and design, I’m confident I can help you bring your creative vision to life. Let’s collaborate and create something truly extraordinary!