Creating Jarvis: Your Personal Desktop Assistant ๐Ÿš€

In the world of technology, having a personal assistant can make our lives easier and more efficient. Imagine having a virtual assistant right on your desktop, ready to perform tasks, answer questions, and engage in conversations. In this blog, I'll guide you through the process of creating your own desktop assistant named Jarvis, inspired by the famous AI assistant from the Iron Man movies.

To bring Jarvis to life, we'll be using Python and several powerful libraries and APIs. So let's get started!

Prerequisites

Before we begin, make sure you have the following installed on your system:

  1. Python: Make sure you have Python installed on your system. You can download it from the official Python website (https://www.python.org/downloads/).

  2. Required Libraries: We'll be using the following libraries in our project, so ensure that they are installed by running the following command in your command prompt or terminal:

pip install speech_recognition openai

Great! Now that we have all the necessary prerequisites, let's dive into the code and understand how Jarvis works.

Code Overview

Here's the complete code for our Jarvis assistant:

import speech_recognition as sr
import os
import webbrowser
import openai
from config import apikey
import datetime

chatStr = ""


def chat(query):
    global chatStr
    print(chatStr)
    openai.api_key = apikey
    chatStr += f"Tanay: {query}\n Jarvis: "
    response = openai.Completion.create(
        model="text-davinci-003",
        prompt=chatStr,
        temperature=1,
        max_tokens=256,
        top_p=1,
        frequency_penalty=0,
        presence_penalty=0
    )

    say(response["choices"][0]["text"])
    chatStr += f"{response['choices'][0]['text']}\n"
    return response["choices"][0]["text"]

    with open(f"OpenAi/{''.join(prompt.split('intelligence')[1:1]).strip()}.txt", "w") as f:
        f.write(text)


def ai(prompt):
    openai.api_key = apikey
    text = f"OpenAI response for Prompt: {prompt} \n **************************** \n\n"
    response = openai.Completion.create(
        model="text-davinci-003",
        prompt=prompt,
        temperature=1,
        max_tokens=256,
        top_p=1,
        frequency_penalty=0,
        presence_penalty=0
    )

    print(response["choices"][0]["text"])
    text += response["choices"][0]["text"]
    if not os.path.exists("OpenAi"):
        os.mkdir("OpenAi")

    with open(f"OpenAi/{''.join(prompt.split('intelligence')[1:1]).strip()}.txt", "w") as f:
        f.write(text)


def say(text):
    os.system(f"say {text}")


def takecommand():
    r = sr.Recognizer()
    with sr.Microphone() as source:
        r.pause_threshold = 0.5  # You can change this value although this value is set to 0.8 by default
        audio = r.listen(source)
        try:
            print("Recognizing...")
            query = r.recognize_google(audio, language="en-in")  # For english voice
            # query = r.recognize_google(audio, language="hi-in")  # For hindi voice
            print(f"User said: {query}")
            return query
        except Exception as e:
            return "Some error occurred. Sorry from Jarvis"


if __name__ == '__main__':
    print("Say something")
    say("Hello Sir I am Jarvis. How can i help you?")

    while True:
        print("Listening...")
        query = takecommand()

        # For opening Websites
        sites = [["youtube", "https://www.youtube.com"], ["wikipedia", "https://www.wikipedia.com"],
                 ["google", "www.Google.com"]]
        for site in sites:
            if f" {site[0]}".lower() in query.lower():
                say(f"Opening {site[0]} Sir")
                webbrowser.open(site[1])

        # For opening music
        if "music" in query:
            musicPath = "---"
            os.system(f"open {musicPath}")

        # For Asking time
        elif "time" in query:
            strfTime = datetime.datetime.now().strftime("%H:%M")
            say(f"Sir the time is {strfTime}")

        # For opening any app
        elif "open facetime".lower() in query.lower():
            os.system(f"open /System/Applications/FaceTime.app")

        # For accessing OpenAI
        elif "Using Artificial Intelligence".lower() in query.lower():
            ai(prompt=query)

        elif "Jarvis Quit".lower() in query.Lower():
            exit()

        elif "reset chat":
            chatStr = ""

        # For chatting with Jarvis
        else:
            print("Chatting")
            chat(query)

Now, let's break down the code into different sections and understand its functionality.

1. Importing the Required Libraries

import speech_recognition as sr
import os
import webbrowser
import openai
import datetime

In this section, we import the necessary libraries for our project. These libraries provide us with functionalities like speech recognition, accessing the operating system, opening web browsers, working with AI models, and managing date and time.

2. Initializing Variables and API Key

chatStr = ""

Here, we initialize a variable named chatStr that will store the conversation between the user and Jarvis.

3. Implementing Chat Function

def chat(query):
    global chatStr
    print(chatStr)
    openai.api_key = apikey
    chatStr += f"Tanay: {query}\n Jarvis: "
    response = openai.Completion.create(
        model="text-davinci-003",
        prompt=chatStr,
        temperature=1,
        max_tokens=256,
        top_p=1,
        frequency_penalty=0,
        presence_penalty=0
    )

    say(response["choices"][0]["text"])
    chatStr += f"{response['choices'][0]['text']}\n"
    return response["choices"][0]["text"]

The chat function is responsible for carrying out a conversation with Jarvis. It takes a query as input and generates a response using the OpenAI model.

In this function, we first update the chatStr variable with the user's query. Then, we call the OpenAI API to get a response. We provide the conversation history (chatStr) as the prompt and specify various parameters like temperature, max_tokens, top_p, frequency_penalty, and presence_penalty to control the response generation.

The generated response is then passed to the say function, which uses the operating system's text-to-speech functionality to speak the response. Finally, we update the chatStr variable with the response and return it.

4. Implementing Other Functions

def ai(prompt):
    openai.api_key = apikey
    text = f"OpenAI response for Prompt: {prompt} \n **************************** \n\n"
    response = openai.Completion.create(
        model="text-davinci-003",
        prompt=prompt,
        temperature=1,
        max_tokens=256,
        top_p=1,
        frequency_penalty=0,
        presence_penalty=0
    )

    print(response["choices"][0]["text"])
    text += response["choices"][0]["text"]
    if not os.path.exists("OpenAi"):
        os.mkdir("OpenAi")

    # with open(f"OpenAi/prompt- {random.randint(1, 2343434356)}", "w") as f:
    with open(f"OpenAi/{''.join(prompt.split('intelligence')[1:1]).strip()}.txt", "w") as f:
        f.write(text)


def say(text):
    os.system(f"say {text}")


def takecommand():
    r = sr.Recognizer()
    with sr.Microphone() as source:
        r.pause_threshold = 0.5  # You can change this value although this value is set to 0.8 by default
        audio = r.listen(source)
        try:
            print("Recognizing...")
            query = r.recognize_google(audio, language="en-in")  # For english voice
            # query = r.recognize_google(audio, language="hi-in")  # For hindi voice
            print(f"User said: {query}")
            return query
        except Exception as e:
            return "Some error occurred. Sorry from Jarvis"

In addition to the chat function, the code also includes other functions like ai, say, and takecommand. These functions handle tasks such as accessing OpenAI, speech recognition, and executing various commands based on user queries.

5. Main Execution

if __name__ == '__main__':
    print("Say something")
    say("Hello Sir I am Jarvis. How can i help you?")
    # say("Hello I am. How can i you?")

    while True:
        print("Listening...")
        query = takecommand()

        # todo: Add more sites
        # For opening Websites
        sites = [["youtube", "https://www.youtube.com"], ["wikipedia", "https://www.wikipedia.com"],
                 ["google", "www.Google.com"]]
        for site in sites:
            if f" {site[0]}".lower() in query.lower():
                say(f"Opening {site[0]} Sir")
                webbrowser.open(site[1])

        # todo: Add a feature to play more specific songs like you did in websites
        # For opening music
        if "music" in query:
            musicPath = "/Users/tanayrajsrivastava/Downloads/We%20Rollin%20-%20Shubh.mp3"
            os.system(f"open {musicPath}")

        # For Asking time
        elif "time" in query:
            strfTime = datetime.datetime.now().strftime("%H:%M")
            say(f"Sir the time is {strfTime}")

        # todo: Add a feature to open more specific apps like you did in websites
        # For opening any app
        elif "open facetime".lower() in query.lower():
            os.system(f"open /System/Applications/FaceTime.app")

        # For accessing OpenAI
        elif "Using Artificial Intelligence".lower() in query.lower():
            ai(prompt=query)

        elif "Jarvis Quit".lower() in query.Lower():
            exit()

        elif "reset chat":
            chatStr = ""

        # For chatting with Jarvis
        else:
            print("Chatting")
            chat(query)

The main execution starts by printing a message and speaking a welcome greeting using the say function. It then enters an infinite loop, waiting for user input. The user's query is obtained using the takecommand function, which utilizes the speech_recognition library to convert speech into text.

After obtaining the query, the code checks for specific keywords to perform various actions. For example, it opens websites based on user commands, plays music, tells the time, and even interacts with the OpenAI model for intelligent responses.

Accessing the OpenAI API

To access the OpenAI API and use the model for generating responses, you'll need an API key. The API key should be stored securely and not shared publicly. In the provided code, the API key is imported from a separate configuration file named config.py using the apikey variable.

To obtain your own API key, follow these steps:

  1. Visit the OpenAI website (https://openai.com) and sign in to your account.

  2. Go to the API section and follow the instructions to create a new API key.

  3. Once you have the API key, store it in the config.py file, which should be created in the same directory as the main code. The config.py file should contain the following line:

apikey = "YOUR_API_KEY"

Replace "YOUR_API_KEY" with your actual API key.

Remember to keep your API key confidential and do not share it publicly.

Conclusion

Congratulations! ๐ŸŽ‰ You've successfully created your own desktop assistant named Jarvis. By leveraging the power of Python, speech recognition, and the OpenAI API, you've built a functional assistant capable of performing various tasks and engaging in conversations.

Feel free to explore and enhance Jarvis's capabilities by adding new features and integrating other APIs. With some creativity and further development, you can turn Jarvis into a highly personalized assistant tailored to your needs.

Remember to have fun and keep experimenting with new ideas. Enjoy your journey with Jarvis, and let your imagination soar!

1
Subscribe to my newsletter

Read articles from Tanay Raj Srivastava directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Tanay Raj Srivastava
Tanay Raj Srivastava

Sharing my journey as a Computer Science engineering student and my thoughts on the latest trends in tech.