Building a Resume Chatbot Using Qdrant, Wav2Lip and Groq

Introduction

Creating a Resume Chatbot has always been an intriguing project for those passionate about combining natural language processing (NLP) and machine learning (ML) technologies. However, adding a layer of interactive, realistic lip synchronization opens up a new dimension for user engagement and experience. In this article, I will walk you through my journey of building a cutting-edge Resume Chatbot by leveraging Qdrant for its powerful vector search capabilities and Wav2Lip for seamless, real-time lip-syncing.

The project marries the precision of Qdrant’s search engine with the visual appeal of Wav2Lip and the fast inference speed of Groq-Llama3, resulting in a chatbot that not only understands and responds to user queries with high accuracy but also does so in a visually engaging manner. By diving into the technical intricacies, I’ll demonstrate how these technologies can be integrated to create an innovative solution that stands out amongst others in the domain of conversational AI. Whether you’re a seasoned developer or a curious AI enthusiast, this article will provide insights and practical tips to inspire your next AI endeavor.

Problem Statement

Today, in 2024, for any business, creating a compelling and interactive user experience is crucial for engagement and retention. Traditional chatbots, while functional, often lack the dynamic and engaging elements needed to captivate users fully. Users demand more than just text-based interactions; they seek immersive experiences that mirror human conversation.

The challenge lies in developing a chatbot that not only provides accurate and contextually relevant responses but also delivers them in a manner that feels natural and engaging. Existing solutions often fall short in one of two ways: they either lack the conversational skills and precision needed for meaningful interactions, or they fail to provide the visual and auditory cues that enhance user engagement.

Qdrant

Qdrant is a powerful vector database and similarity search engine that operates as an API service, enabling efficient search for the nearest high-dimensional vectors. With Qdrant, embeddings can be transformed into comprehensive applications for matching, searching, recommending, and much more.

Wav2Lip

Wav2Lip is a neural network that adapts the video with a face, lip-synchronizing the speech audio different from the original. It is a technology that uses several models based on state-of-the-art neural networks to synchronize human lips in the video recording with an audio track.

Llama 3

Meta recently released Llama 3, one of the most powerful Open AI models to date. Llama 3 comes in two sizes: Llama 3 8B, with 8 billion parameters, and Llama 3 70B, with 70 billion parameters. Despite being relatively close in size to its predecessor, Llama 2, Llama 3 focuses on quality over quantity. Trained on over 15 trillion tokens of data and utilizing advanced training techniques, Llama 3 significantly outperforms Llama 2, showcasing the benefits of enhanced data and refined training methods.

The rest of the article is divided into the following parts:

Model Capabilities
Requirements
Code and Guide
Explanation
Results
Conclusion

Model Capabilities

Here is a table showcasing the performance of Llama 3 against other language models on various benchmarks:

Here’s what these benchmarks mean:

MMLU (Massive Multitask Language Understanding): A benchmark designed to understand how well a language model can multitask. The model’s performance is assessed across a range of subjects, such as math, computer science, and law.
GPQA (Graduate-Level Google-Proof Q&A): Assesses a model’s ability to answer questions that are challenging for search engines to solve directly. This benchmark evaluates whether the AI can handle questions that usually require human-level research skills.
HumanEval: Assesses how well the model can write code by asking it to perform programming tasks.
GSM8K: Evaluates the model’s ability to solve math word problems.
MATH: Tests the model’s ability to solve middle school and high school math problems.

Requirements

Groq API Key
Basic knowledge of RAGS, LLMS and vector storage
Basic knowledge of Wav2Lip library

Code and Guide

Module Installation

!pip -q install gradio PyPDF2 spacy langchain qdrant-client langchain-huggingface langchain_community anyio
!pip -q install tts ffmpeg-python gdown
!pip -q install opencv-python-headless
!apt-get install ffmpeg
!python -m spacy download en_core_web_sm
!pip -q install gtts playsound
!pip -q install langchain_groq
!pip -q install librosa==0.8.0
!pip install ffmpeg
!pip install librosa
!pip install numpy
!pip install torch
!pip install torchvision
!pip install moviepy
!pip install gdown
!pip install gradio

Importing Required Modules

import gdown
import gradio as gr
from PyPDF2 import PdfReader
import os
import subprocess
import uuid
import spacy
from langchain.text_splitter import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient
from qdrant_client.http import models
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_core.prompts import PromptTemplate
import numpy as np
from langchain_groq import ChatGroq
from gtts import gTTS
import urllib.request

Downloading Required Wav2Lip Model for Avatar Video Creation

def download_file_from_google_drive(file_url, output_path):
 gdown.download(file_url, output_path, quiet=False)
 print(f”Downloaded {file_url} to {output_path}”)

# Create checkpoints directory if it doesn’t exist
checkpoints_dir = ‘/content/Wav2Lip/checkpoints’
os.makedirs(checkpoints_dir, exist_ok=True)

# Google Drive URL and output file path
google_drive_url = “https://drive.google.com/uc?id=1l18x5wRD8numA1heqjkAN8sGRmnMADrS"
output_file_path = os.path.join(checkpoints_dir, ‘wav2lip.pth’)

# Download the file
download_file_from_google_drive(google_drive_url, output_file_path)

Groq API Key Initialization

groq_api_key = “GROQ_API_KEY” # Replace with actual method to fetch the key

Initializing LLM

llm = ChatGroq(temperature=0, model_name=”llama3–70b-8192", groq_api_key=groq_api_key)

VectorStoreRetriever Class

class VectorStoreRetriever:
 def __init__(self, client, collection_name, embedding_function):
   self.client = client
   self.collection_name = collection_name
   self.embedding_function = embedding_function

 def retrieve(self, query):
   embedding = self.embedding_function.embed_documents([query])[0]
   search_result = self.client.search(
   collection_name=self.collection_name,
   query_vector=embedding,
   limit=5 # Adjust as needed
   )
   return [hit.payload[‘content’] for hit in search_result]

Imports and Dependency Initialization

try:
 import numpy as np
 np_version = np.__version__
 print(f”Using numpy version {np_version}”)

 nlp = spacy.load(‘en_core_web_sm’)

Text Splitter and Embedding Function Initialization

text_splitter = RecursiveCharacterTextSplitter( chunk_size=512, chunk_overlap=20, length_function=len, is_separator_regex=False ) 
embedding_function = HuggingFaceEmbeddings(model_name=”BAAI/bge-large-en”)

Embedding Dimension Determination

sample_text = “This is a sample text to determine the embedding dimension.”
sample_embedding = embedding_function.embed_documents([sample_text])[0]
embedding_dimension = len(sample_embedding)

Qdrant Client Initialization

client = QdrantClient(“:memory:”)
collection_name = “cv_sections”

if not client.collection_exists(collection_name):
 client.recreate_collection(
 collection_name=collection_name,
 vectors_config=models.VectorParams(
 size=embedding_dimension,
 distance=models.Distance.COSINE,
 ),
 )

Extracting CV Information

def extract_cv_info(cv_text):
 doc = nlp(cv_text)
 sections = {“skills”: [], “experience”: [], “education”: []}
 current_section = None
 for token in doc:
 if token.text.lower() in [“skills”, “experience”, “education”]:
 current_section = token.text.lower()
 elif current_section:
 sections[current_section].append(token.text)
 for key in sections:
 sections[key] = “ “.join(sections[key])
 return sections

Text Chunking and Embedding

def chunk_text(text):
 return text_splitter.split_text(text)

def generate_embeddings(text_chunks):
 return embedding_function.embed_documents(text_chunks)

Indexing Data in Qdrant

def index_data_in_qdrant(data, collection_name):
 points = []
 for section, content in data.items():
 chunks = chunk_text(content)
 embeddings = generate_embeddings(chunks)
 for i, embedding in enumerate(embeddings):
 points.append(models.PointStruct(
 id=str(uuid.uuid4()),
 vector=embedding,
 payload={“section”: section, “content”: chunks[i]}
 ))
 client.upsert(collection_name=collection_name, points=points)

Retrieving Data with Filter

def retrieve_data_with_filter(collection_name, filter_conditions):
 query_filter = models.Filter(
 must=[models.FieldCondition(
 key=key,
 match=models.MatchValue(value=value)
 ) for key, value in filter_conditions.items()]
 )
 query_vector = np.zeros(embedding_dimension, dtype=np.float32)
 search_result = client.search(
 collection_name=collection_name,
 query_vector=query_vector.tolist(),
 limit=10,
 query_filter=query_filter
 )
 return [hit.payload for hit in search_result]

Extracting Text from PDF

def extract_text_from_pdf(pdf_file):
 text = “”
 pdf_reader = PdfReader(pdf_file)
 for page in pdf_reader.pages:
 text += page.extract_text()
 return text

Converting CV to Structured Documents

def convert_cv_to_structured_docs(cv_text, collection_name, filter_conditions):
 structured_data = extract_cv_info(cv_text)
 index_data_in_qdrant(structured_data, collection_name)
 return retrieve_data_with_filter(collection_name, filter_conditions)

LLM Interaction and Query Response

template = “””Answer the following question from the context
context = {context}
question = {question}
“””
prompt = PromptTemplate(input_variables=[“context”, “question”], template=template)

def get_context(query, retriever):
 return retriever.retrieve(query)

def respond_to_query(query, retriever):
 context = get_context(query, retriever)
 response = llm.invoke(prompt.format(question=query, context=” “.join(context)))
 return response

Text-to-Speech Conversion

def text_to_speech(text):
 from gtts import gTTS
 import os
 import subprocess

 language = ‘en’
 tts = gTTS(text=text, lang=language, slow=False)
 audio_file = “response.mp3”
 tts.save(audio_file)

 # Convert MP3 to WAV for Wav2Lip compatibility
 wav_output = “temp/temp.wav”

 # Ensure the temp directory exists
 os.makedirs(os.path.dirname(wav_output), exist_ok=True)

 if os.path.exists(wav_output):
 os.remove(wav_output)

 ffmpeg_command = [
 ‘ffmpeg’, ‘-y’, ‘-i’, audio_file, ‘-acodec’, ‘pcm_s16le’, ‘-ac’, ‘1’, ‘-ar’, ‘24000’, wav_output
 ]

 try:
 result = subprocess.run(ffmpeg_command, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)
 print(“FFmpeg conversion successful”)
 print(f”FFmpeg command output: {result.stdout}”)
 if os.path.exists(wav_output):
 print(“WAV file created successfully”)
 return wav_output
 else:
 print(“WAV file was not created.”)
 return None
 except subprocess.CalledProcessError as e:
 print(f”FFmpeg conversion error: {e.stderr}”)
 print(f”FFmpeg command output: {e.stdout}”)
 return None
 except Exception as e:
 print(f”Unexpected error during audio conversion: {str(e)}”)
 return None

Creating Avatar Video

def create_avatar_video(audio_file_path):
 wav2lip_dir = ‘/content/Wav2Lip’
 os.chdir(wav2lip_dir)

 # Define paths
 checkpoint_path = ‘/content/Wav2Lip/checkpoints/wav2lip.pth’
 face_path = ‘/content/3249935-uhd_3840_2160_25fps.mp4’
 result_path = ‘results/result_voice.mp4’
 # Ensure face video exists
 if not os.path.exists(face_path):
 print(f”Face video does not exist: {face_path}”)
 return None

 # Ensure audio file exists
 if not os.path.exists(audio_file_path):
 print(f”Audio file does not exist: {audio_file_path}”)
 return None

 # Ensure results directory exists
 os.makedirs(os.path.dirname(result_path), exist_ok=True)

 command = [
 ‘python’, ‘inference.py’,
 ‘ — checkpoint_path’, checkpoint_path,
 ‘ — face’, face_path,
 ‘ — audio’, audio_file_path,
 ‘ — nosmooth’
 ]

 try:
 print(f”Running Wav2Lip inference with command: {‘ ‘.join(command)}”)
 result = subprocess.run(command, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)
 print(“Avatar video generation command executed successfully”)
 print(f”Command output: {result.stdout}”)
 print(f”Command error output: {result.stderr}”)

 # Check if the video file is created
 if os.path.exists(result_path):
 print(“Video file created successfully”)
 with open(result_path, “rb”) as video_file:
 return video_file.read()
 else:
 print(“Video file was not created.”)
 return None
 except subprocess.CalledProcessError as e:
 print(f”Wav2Lip inference error: {e.stderr}”)
 print(f”Command output: {e.stdout}”)
 return None
 except Exception as e:
 print(f”Unexpected error during video creation: {str(e)}”)
 return None

Chatbot Integration

retriever = VectorStoreRetriever(client=client, collection_name=collection_name, embedding_function=embedding_function)

def chatbot(cv_file, query):
 cv_text = extract_text_from_pdf(cv_file)
 filter_conditions = {} # Define appropriate filter conditions based on requirements
 structured_docs = convert_cv_to_structured_docs(cv_text, collection_name, filter_conditions)
 response = respond_to_query(query, retriever)
 response_text = response.content # Extract text content from AIMessage object
 audio_file_path = text_to_speech(response_text)
 video_file_path = create_avatar_video(audio_file_path)
 return response_text, audio_file_path, video_file_path

Gradio Interface Setup

inputs = [
 gr.File(label=”Upload CV (PDF)”),
 gr.Textbox(label=”Enter your query”)
]

outputs = [
 gr.Textbox(label=”Response”),
 gr.Audio(label=”Audio Response”),
 gr.Video(label=”Avatar Video Response”)
]

gr.Interface(fn=chatbot, inputs=inputs, outputs=outputs, title=”CV Chatbot”).launch(debug=True)

Exception Handling

except Exception as e:
 print(f”An error occurred: {str(e)}”)

Explanation

This code sets up a comprehensive pipeline for processing CVs / resumes and responding to queries with an AI chatbot. The pipeline includes:

Initialization: Setting up API keys, models, and necessary components.
Data Processing: Extracting text from PDFs, splitting text, generating embeddings, and indexing data.
Retrieval: Using a vector store retriever to fetch relevant information.
Response Generation: Using an LLM to generate responses based on the retrieved context.
Multimedia Output: Converting responses to audio and creating an avatar video for a more engaging user experience.
User Interface: Providing a simple interface for users to interact with the system.

This pipeline integrates several advanced NLP and machine learning techniques, ensuring efficient and accurate processing of CVs / resumes and generating comprehensive responses to user queries.

Result

Github repo for the code:https://github.com/arnavgupta16/CV-chatbot-Qdrant-Wav2lip-LLAMA3

Conclusion

In conclusion, the utilization of tools like the Groq API, Qdrant, and Wav2Lip holds immense potential in aiding individuals in their career endeavors. By harnessing these technologies, users can streamline their job application process, gain access to personalized insights, and make more informed decisions about their career paths. These tools not only enhance efficiency but also provide valuable recommendations tailored to the unique skills and aspirations of each user. As the job market becomes more competitive and dynamic, empowering job seekers with such innovative solutions will equip them with the resources they need to navigate employment opportunities effectively.