Build a RAG System for Video Q&A with Python & GPT-3.5

This blog post guides you through creating a Jupyter Notebook that utilizes OpenAI's GPT-3.5 model to answer your questions about YouTube videos.

Why a Jupyter Notebook?

Jupyter Notebooks provide an interactive environment for Python coding, making it easier to experiment, visualize data,and explain your code.

Prerequisites

Basic Python knowledge
An OpenAI API key (OpenAI)

Tools and Libraries

Python 3.x
openai library: pip install openai
pytube library (optional): pip install pytube
whisper library (optional): pip install whisper

The Notebook

Set up Environmental variables:

Create .env file to store API Keys

OPENAI_API_KEY = Here-Goes-Your-API-KEY
PINECONE_API_KEY = Here-Goes-Your-API-KEY

Import Libraries:

import os
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

YOUTUBE_VIDEO = "https://youtu.be/BrsocJb-fAo?si=veWyKdjyngCVtDU7"

Setting up the model

from langchain_openai.chat_models import ChatOpenAI
model = ChatOpenAI(openai_api_key = OPENAI_API_KEY, model="gpt-3.5-turbo")

Generate the transcription

whisper from OpenAI

import tempfile
import whisper
from pytube import YouTube

# only if transcript file does not exist

if not os.path.exists("video_transcript.txt"):
    video = YouTube(YOUTUBE_VIDEO)
    audio = video.filter(only_audio=True).first()

    whisper_model = whisper.load_model("base")

    with tempfile.TemporaryDirectory() as tmpdir:
        file = audio.download(output_path=tmpdir)
        transcription = whisper_model.transcribe(file, fp16=False)["text"].strip()

        with open("video_transcript.txt", "w") as file:
            file.write(transcription)

Test Generated Transcription

with open("video_transcript.txt") as file:
    transcription = file.read()

transcription[:20]

Load Transcription

from langchain_community.document_loaders import TextLoader

loader = TextLoader("video_transcript.txt")
text_transcription = loader.load()

Split Transcription

Generally, the document is too large, splitting is required to handle it. Recursive Character Splitter, splits the document into chunks of a fixed size. Let's split the transcription into chunks of 100 characters with an overlap of 20 characters

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
text_transcription_documents = text_splitter.split_documents(text_transcription)

Set up a Vector Store

We need an efficient way to store document chunks, their embeddings, and perform similarity searches at scale. To do this, we'll use a vector store.
A vector store is a database of embeddings that specializes in fast similarity searches.

from langchain_openai.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

from langchain_community.vectorstores import DocArrayInMemorySearch

vectorstore = DocArrayInMemorySearch.from_documents(text_transcription_documents, embeddings)

Use of Pinecone

Vector Store is an in-Memory vector store, we require a vector store that can handle large amounts of data and perform similarity searches at scale. For this example, we'll use Pinecone, create an account, set up an index, get an API key, and set it as an environment variable PINECONE_API_KEY, then, we can load the transcription documents into Pinecone:

from langchain_pinecone import PineconeVectorStore

index_name = "FROM PINECONE CONFIGURATION"

pinecone = PineconeVectorStore.from_documents(
    text_transcription_documents, embeddings, index_name=index_name
)

Define the Chain

Chain for processing questions and answers using a language model and potentially retrieval system.

from langchain_core.runnables import RunnablePassthrough
from langchain.prompts import ChatPromptTemplate
template = """
Answer the question based on the context below.  
If you can't answer the question, reply "I don't know"
Context: {context}
Question : {question}
"""
prompt  = ChatPromptTemplate.from_template(template)
from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()

chain = (
    {"context": pinecone.as_retriever(), "question": RunnablePassthrough()}
    | prompt
    | model
    | parser
)

This chain involves potentially retrieving information (pinecone.as_retriever()) for the context and uses RunnablePassthrough to keep the question unchanged.
The retrieved context (context) and the question (question) are combined as a dictionary in the first step.
The dictionary is then piped (|) to the prompt object, which will use the context and question to generate the final prompt for the language model.
The generated prompt is then piped to the language model (model) for processing.
Finally, the model's output is piped to the output parser (parser) to be interpreted as a string.

Ask
```
chain.invoke("Ask your question?")
```

Conclusion

This Blog post demonstrates a basic framework for building a question-answering system for YouTube videos using Python and GPT-3.5. Experiment with the prompt template and explore additional functionalities to enhance this system!

Retrieval Augmented Generation

Table of contents

Subscribe to my newsletter

Nestor Rojas

Nestor Rojas