How to Build Your First AI App Using Typescript, Langchain and Gemini

What are we building?

We are building a chatbot that answers your questions based on the context of your saved YouTube videos. You can customize this chatbot in any way you like..

Prerequisite : We are going to use Typescript as our scripting language , if you know JavaScript just follow along

Understand the Flow of the application

The flow of this application goes like this :

  • Client calls the /add-yt-video route and submits the YouTube video URL to the backend server.

  • The URL got saved in the MongoDB database for future reference

  • Then we start a yt-dlp child process to extract the transcript from YouTube video.

  • After obtaining the transcript, we divide it into chunks and, with the help of the Langchain framework, embed the data into a vector database.

    This was the workflow for storing the embeddings in the vector database. Next, we will follow the workflow of the RAG application, as shown below.

    • When a user asks a question to the LLM, the embedding model first finds the relevant data based on the user's query. Then, we send both the relevant context and our prompt to the LLM to provide an answer according to the context.

Step-by-step guide

Initialization of project

pnpm init 
pnpm add -D typescript 
tsc --init

These commands will initialize the Node.js application with TypeScript

Dependencies

Install the following dependencies :

pnpm add langchain @google/generative-ai express mongoose

Add YouTube Video Route

Now, in your main server.ts file, write a POST method so that users can submit the YouTube URL for processing.

// i am skipping the boilerplate code here 
const app = express();
app.post('/add-yt-video', async(req : Request,res: Response) => {
    const url = req.body.url;
    // skipping error handling with try catch (you do not skip)
    const monogoDocument = await MongoSchema.create({url:url});
    const transcript = getyoutubeTranscript(url); // see next step of article
    monogoDocument.transcript = transcript; // depends on your mongoDB schema
    await monogoDocument.save();
    res.status(201).json({message : "Youtube video added successfully"});
})

How will we get YouTube video transcript

For generating a YouTube video transcript, we can use the open-source tool yt-dlp (GitHub Repo). It offers many features, but we only need it for downloading transcripts in this case.

First download the yt-dlp exe file from the github repo, then only you can execute its commands in your machine.

Lets write the getyoutubeTranscript method :

import { exec } from 'child_process'; // to execute child process
import { promisify } from 'util'; // useful for promisifying callback based function
import * as fs from 'fs';
import * as path from 'path';
const execPromise = promisify(exec);
export const getyoutubeTranscript = async(url:string) : Promise<string|Array<{start: number, end: number, text: string}>> => {
   //creating a new folder for storing files
  const tempDir = path.join(__dirname, '../temp'); 
  if (!fs.existsSync(tempDir)) {
    fs.mkdirSync(tempDir, { recursive: true });
  }
  const ytDlpCommand = `ytd.exe --write-auto-sub --sub-lang en --skip-download --output "${tempDir}/%(id)s" ${url}`;
  try{
    await execPromise(ytDlpCommand); // executing the command
    const possibleFile = `${tempDir}/${videoId}.${language}.vtt`;
    let transcriptFile;
    if (fs.existsSync(possibleFile)) {
        transcriptFile = possibleFile;
    }
    const transcript = fs.readFileSync(transcriptFile, 'utf8'); // read the file
    return transcript;
  }catch(error){
    throw new Error(`Failed to fetch transcript: ${error instanceof Error ? error.message : String(error)}`);
  }
}

Now that we have the YouTube video transcript in our database, let's write the method to save chunks of this transcript as embeddings in our vector database.

Storing Embeddings in Vector Database

You can use any vector database you like. Today, I am using AstraDB.

  • First, we will split the transcript into chunks using the Langchain text splitter. Then, I am using Google's text-embedding-004 (documentaion) to create embeddings.

      import { DataAPIClient } from "@datastax/astra-db-ts";
      import { GoogleGenerativeAI } from "@google/generative-ai";
      import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
      import * as fs from "fs";
      const {
          ASTRA_DB_KEYSPACE,
          ASTRA_DB_APPLICATION_TOKEN,
          ASTRA_DB_API_ENDPOINT,
          ASTRA_DB_COLLECTION,
          GEMINI_API_KEY
      } = process.env; // get all the credentials 
    
      const genAI = new GoogleGenerativeAI(GEMINI_API_KEY);
      const model = genAI.getGenerativeModel({ model: "text-embedding-004" });
      const splitter = new RecursiveCharacterTextSplitter({
          chunkSize: 256, // you can define your chunk and overlapping size as per your data
          chunkOverlap: 100
      });
    
  • Now create collection in your vector database

      type SimilarityMatrics = "dot_product" | "cosine" | "euclidean"; // read about type of similarity matrics
      const client = new DataAPIClient(ASTRA_DB_APPLICATION_TOKEN);
      const db = client.db(ASTRA_DB_API_ENDPOINT, { namespace: ASTRA_DB_KEYSPACE });
      const createCollection = async (similarityMatrics: SimilarityMatrics = "dot_product") => {
          const res = await db.createCollection(ASTRA_DB_COLLECTION, {
              vector: {
                  dimension: 768, // depends on the output dimension of your model
                  metric: similarityMatrics
              }
          });
      };
    
  • The database collection is created , we can now load our embeddings in our vector database.

      const loadChatData = async () => {
          const collection = await db.collection(ASTRA_DB_COLLECTION);
          const transcript = fs.readFileSync("tempDir/videoID.vtt", "utf-8");
          const chunks = await splitter.splitText(transcript);
          // insert every chunk after embedding 
          for (const chunk of chunks) {
              try {
                const embeddingResponse = await model.embedContent({
                  content: {
                      role: "model",  
                      parts: [{ text: chunk }],
                  }
              });
                  const vector = embeddingResponse.embedding.values;
                  const res = await collection.insertOne({
                      $vector: vector,
                      text: chunk
                  });
                  console.log("Inserted:", res);
              } catch (error) {
                  console.error("Embedding error:", error);
              }
          }
      };
    

    Now our vector database is loaded with the embeddings of our transcript data.

Start asking your queries now

Now that we have stored the data embeddings in our vector database, we can write a method to get a response from the LLM based on our context. See the code below.

const client = new DataAPIClient(process.env.ASTRA_DB_APPLICATION_TOKEN!);
const db = client.db(process.env.ASTRA_DB_API_ENDPOINT!, { 
  namespace: process.env.ASTRA_DB_KEYSPACE 
});
// initialize your models
const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY!);
const chatModel = genAI.getGenerativeModel({ model: "gemini-1.5-flash-8b" });
const embeddingModel = genAI.getGenerativeModel({ model: "text-embedding-004" });
export const chat = async(req:Request, res:Response) => {
    const {message} = req.json(); 
    const embeddingResponse = await embeddingModel.embedContent({
      content: {
        role: "user",
        parts: [{ text: lastMessage }],
      }
    });
    const queryVector = embeddingResponse.embedding.values;
    // search for similar data in vector database
    const collection = await db.collection(process.env.ASTRA_DB_COLLECTION!);
    const searchResults = await collection.find(
      {},
      {
         sort: {
          $vector: queryVector
        },
        limit: 8 // modify the limit as you want
      }
    ).toArray();

    // context for similar data
    let context = searchResults.map(result => result.text).join("\n");
    context = JSON.stringify(context);
    const prompt = `Hello , You are a very smart LLM model , 
    please give the answer to the users query only on the basis of given context, 
    if you do not find any relevant context, answer : "No Relevant context found"
    -----------------------------------------------------------------------------
    START CONTEXT
    ${context}
    END CONTEXT
    ------------------------------------------------------------------------------
    Based on the context above, please respond to: ${message}
    `
    try {
          const result = await chatModel.generateContentStream({
            contents: [{ role: "user", parts: [{ text: prompt }]}],
            generationConfig: {
              temperature: 0.85,      
              topP: 0.92,             
              topK: 40,               
              maxOutputTokens: 250,   
            }
          });
          return result; // returns the answer 
        } catch (error) {
          throw error;
      }
}

Simply use this method as a POST method in any way you prefer to get answers from the LLMs based on your context. It might seem overwhelming at first, but it's not as difficult as you might think.

Congratulations! you have just made your first AI application.

Follow my blog for more such interesting articles

19
Subscribe to my newsletter

Read articles from Aniket Vishwakarma directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Aniket Vishwakarma
Aniket Vishwakarma

I am currently a final year grad, grinding, learning and solving problems by leveraging technology and my soft skills