Introduction

Hey, This is Amardeep Kumar with a strong interest in Artificial Intelligence. I am looking forward to expand my skills and excited to be a part of community that shares my passion.

I started python as my first coding language and build a basic AI. Working on this project sparked a renewed sense of curiosity and energy within me. I found myself become increasingly fascinated with possibilities of AI.

Here I will show how to make a basic voice assistant. Alright, time to suit up and feel like Tony Stark.

Project Setup:

1.Create a virtual environment :

To start this project we need to set a virtual environment. A virtual environment isolates your AI project, ensuring it uses the specific library versions it needs without interfering with other projects or system-wide settings. This improves dependency management, reproducibility, and simplifies collaboration.

Method 1 : Just directly select new virtual environment in vs code:

Method 2: run this code

python -m venv myenv

activate your virtual environment(you don’t need to do this if you follow method 1).

myenv\Scripts\activate

2.Install Setuptools:

This ensures you can properly manage dependencies and install other libraries needed for your AI project.

pip install setuptools

3.Some Libraries for AI Project, their Installation Codes and usage:

Speech_recognition: A speech recognition library captures and processes audio signals to extract features and recognize patterns. It then generates text output from the recognized words or sentences, enabling applications like voice assistants and transcription services.

import speech_recognition as sr4

pyttsx3 : It convert text into speech.

import pyttsx3

Now, this code will use both pyttsx3 (for text-to-speech functionality) and Speech_Recognition (for speech-to-text functionality).

import speech_recognition as sr
import pyttsx3
def speak_(text):
    engine.say(text)
    engine.runAndWait()
if __name__== "__main__":    
    speak("Initializing Jarvissssss.......")
    while True:
        #Listen for the wake word "Jarvis"
        #obtain audio from the microphone
        r = sr.Recognizer()    

        print("recognizing....")   
        try:
            with sr.Microphone() as source:
                print("Listening")
                audio = r.listen(source, timeout=2 , phrase_time_limit=1)
            word = r.recognize_google(audio)    
            if (word.lower() == "jarvis"):
                speak ("Yaa")
                #Listen for the command
                with sr.Microphone() as source:
                    print("Jarvis active...")
                    audio = r.listen(source)
                    command = r.recognize_google(audio)

                    processCommand(command)
        except Exception as e:
             print("Error; {0}".format(e))

Here we are using Google's Speech Recognizer . Google's recognizer is widely used in AI for its high accuracy, extensive language support, and real-time processing capabilities. Its ease of integration and scalability make it ideal for speech-based applications. Other recognizers like Microsoft Azure Speech, IBM Watson, or CMU Sphinx are also used, but Google's recognizer often stands out due to its superior accuracy, user-friendly API, and broader language support.

webbrowser: The webbrowser library in Python allows you to interact with web browsers, opening URLs in the default browser or a new tab/window. It provides functions like open(), open_new(), and open_new_tab() to control browser behavior.

import webbrowser
def processCommand(c):
    if "open google" in c.lower():
        webbrowser.open("https://google.com")
    elif "open facebook" in c.lower():
        webbrowser.open("https://facebook.com")
    elif "open youtube" in c.lower():
        webbrowser.open("https://youtube.com")
    elif "open linkedin" in c.lower():
        webbrowser.open("https://linkedin.com")

we can also add music library to play any music by adding its link and we can also add news updates by integrating a news API.

music = {
    "stars": "https://www.youtube.com/watch?v=JQbjS0_ZfJ0",
    "suzume":"https://www.youtube.com/watch?v=qal34e9v_pk",
    "never": "https://www.youtube.com/watch?v=_Z5-P9v3F8w"
}

now just go to https://newsapi.org/ and take any API you want

requests: The requests library in Python simplifies making HTTP requests (GET, POST, etc.) to interact with web services or APIs. It handles parameters, headers, cookies, and responses easily, providing tools for tasks like authentication and session management.

import requests

then we will add all this in our code

def processCommand(c):
    if "open google" in c.lower():
        webbrowser.open("https://google.com")
    elif "open facebook" in c.lower():
        webbrowser.open("https://facebook.com")
    elif "open youtube" in c.lower():
        webbrowser.open("https://youtube.com")
    elif "open linkedin" in c.lower():
        webbrowser.open("https://linkedin.com")
    elif c.lower().startswith("play"):
        song = c.lower().split(" ")[1]
        link = musiclibrary.music[song]
        webbrowser.open(link)
    elif "news" in c.lower():    
            r = requests.get(f" https://newsapi.org/v2/top-headlines?country=us&apiKey={newsapi}")
            data = r.json()

            #print the headlines    
            for article in data.get('articles', []):
                speak(article['title'])

Integrating with Open AI:

The initial setup is done. Now we have to integrate with Open AI.

Here I was very curious that why should I integrate my Virtual voice assistant with Open AI. So I searched and found that using OpenAI makes your voice assistant smarter and more natural. It can understand questions better, remember context, and have smoother conversations. It’s easy to set up and works for many tasks. Plus, it keeps improving over time.

I thought, I can handle all this on my own. Let’s see how far I can push myself.

I gather all the information and I got to know that Building your own virtual assistant platform requires significant resources, including large investments in AI model training, cloud infrastructure, and skilled AI experts. You'll need massive datasets, specialized hardware (like GPUs), and a team for ongoing development, testing, and maintenance. Costs can easily range from tens of thousands to millions of dollars, depending on the scale. Additionally, you'll need to manage data security, privacy compliance, and a user interface. Using existing platforms like OpenAI can greatly reduce costs and complexity for most use cases.

So I decided to use OpenAI. Let’s see how it works.

What is an API ?

An API (Application Programming Interface) is a set of rules that enables different software applications to communicate and interact with each other, acting as a bridge for data and services. Think of an API like a waiter: you place an order (request), the waiter delivers it to the kitchen (server), and brings back the food (response).

At first we have to create an account on OpenAI's platform and generating an API key from your account dashboard. The key is used to authenticate your requests when accessing OpenAI's models and services.

Link: https://platform.openai.com/settings/organization/api-keys

openai: The OpenAI library in Python provides tools to interact with OpenAI’s models like GPT for tasks such as text generation, summarization, and more. To use the OpenAI library we need an API key which we already created.

from openai import OpenAI
def aiProcess(command)   :
    client = OpenAI(api_key= "your OpenAI API key"
    )

    completion = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a helpful assistant name jarvis. Give short responses please"},
            {
                "role": "user",
                "content": command      }
  ]
)

    return(completion.choices[0].message.content)

After this we can add it in our processCommand.

def processCommand(c):
    if "open google" in c.lower():
        webbrowser.open("https://google.com")
    elif "open facebook" in c.lower():
        webbrowser.open("https://facebook.com")
    elif "open youtube" in c.lower():
        webbrowser.open("https://youtube.com")
    elif "open linkedin" in c.lower():
        webbrowser.open("https://linkedin.com")
    elif c.lower().startswith("play"):
        song = c.lower().split(" ")[1]
        link = musiclibrary.music[song]
        webbrowser.open(link)
    elif "news" in c.lower():    
            r = requests.get(f" https://newsapi.org/v2/top-headlines?country=us&apiKey={newsapi}")
            data = r.json()

            #print the headlines    
            for article in data.get('articles', []):
                speak(article['title'])
    else:
        # Let OpenAI handle the request
        output = aiProcess(c)     
        speak(output)

Text to Speech with gTTS:

The Google Text-to-Speech (gTTS) library is used in Python for converting text into spoken audio. We have already text to speech library (pyttsx3) but we can use gTTS cause it is better for high-quality, natural-sounding voices and robust multilingual support, making it ideal for user-facing applications. It also supports many languages and accents. However, it requires an internet connection and it is free for many use cases, but it operates on Google's Text-to-Speech API, which has usage limits. If you exceed the free quota or use it for commercial purposes, you may need to subscribe to a paid plan or purchase additional usage.

from gtts import gTTS
def speak(text):    
    tts = gTTS(text)
    tts.save('temp.mp3')

pygame : The pygame library in Python is used for creating 2D games and multimedia applications. It provides tools for handling graphics, sound, and user input, making it easy to develop interactive experiences. Its beginner-friendly and cross-platform nature makes it a popular choice for game development. We are using a temporary file to play an mp3 file in Python by writing the audio data to a temporary file and playing it with our library.

os : The os library in Python is a standard library module that provides a way to interact with the operating system. It allows developers to perform a variety of tasks related to file and directory manipulation, process management, and environment interaction. We are using this to remove the temporary MP3 file (temp.mp3) after it has been played. This cleanup prevents unused files from accumulating on the system and ensures efficient use of storage. It provides a cross-platform way to handle file operations.

from gtts import gTTS
import pygame
import os
def speak(text):    
    tts = gTTS(text)
    tts.save('temp.mp3')


    #Initialize pygame  mixer   
    pygame.mixer.init()

    #load the MP3 file

    pygame.mixer.music.load('temp.mp3')

    #Play the MP3 file
    pygame.mixer.music.play()

    #keep program running until the music stops playing
    while pygame.mixer.music.get_busy():
        pygame.time.Clock().tick(10)
    # not to give any argument in unload
    pygame.mixer.music.unload()    
    os.remove

Conclusion:

With everything in place, we are all set to run this voice assistant! Building it has been an exciting experience. From understanding the basics to implementing key features, I’ve learned so much along the way.

As a beginner, this is just the start of my learning journey, and I’m excited to dive deeper into the world of AI and technology. There’s so much more to explore, and I look forward to improving and creating even more amazing projects in the future.

Thank you for reading!

I Just Learned how to make a virtual voice assistant using python

Table of contents