How to Create a Voice-Controlled Web App Using JavaScript and Web Speech API

Voice-controlled applications are rapidly becoming an essential part of modern user interfaces. From smart assistants like Alexa to voice commands in mobile apps, interacting with technology via voice is no longer a futuristic idea — it’s here and now. But did you know that you can build your own voice-controlled web app with just JavaScript and the Web Speech API?

In this guide, we'll walk you through the steps of creating a simple voice-controlled web app that listens to your commands and performs actions based on them. You'll learn how to integrate the Web Speech API into your project and understand its key features.


What is the Web Speech API?

The Web Speech API provides two main functionalities:

  1. Speech Recognition: Converts spoken words into text.

  2. Speech Synthesis: Converts text into spoken words (text-to-speech).

In this tutorial, we'll focus on Speech Recognition to turn your voice into actionable commands in a web app.


Setting Up the Speech Recognition API

The Web Speech API is supported by most modern browsers, particularly Google Chrome. The main interface you'll work with for speech recognition is SpeechRecognition (or webkitSpeechRecognition for cross-browser compatibility).

To get started, let’s create a basic HTML page with JavaScript to listen to voice commands and display recognized text.


Step 1: Basic HTML Structure

Here’s a simple HTML structure with a button to start voice recognition and a div to display the recognized text:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Voice Controlled Web App</title>
    <style>
        body {
            font-family: Arial, sans-serif;
            padding: 20px;
            background-color: #f4f4f4;
        }

        #output {
            margin-top: 20px;
            padding: 10px;
            border: 1px solid #ccc;
            background-color: white;
            width: 100%;
            max-width: 500px;
            min-height: 50px;
        }

        button {
            padding: 10px 20px;
            background-color: #007BFF;
            color: white;
            border: none;
            cursor: pointer;
        }

        button:hover {
            background-color: #0056b3;
        }
    </style>
</head>
<body>

    <h1>Voice-Controlled Web App</h1>
    <p>Click the button and speak. Your voice will be recognized and displayed as text!</p>

    <button id="startBtn">Start Voice Recognition</button>

    <div id="output">Recognized text will appear here...</div>

    <script src="app.js"></script>

</body>
</html>

This basic setup includes:

  • A button to start listening to voice input.

  • A div to display the recognized speech text.


Step 2: Using the Web Speech API

Now, let's implement the speech recognition logic in JavaScript. Create a file called app.js and add the following code:

// Check for browser compatibility
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;

if (SpeechRecognition) {
    console.log('Your browser supports speech recognition.');

    // Create a new instance of SpeechRecognition
    const recognition = new SpeechRecognition();

    // Set recognition properties
    recognition.continuous = true; // Keep listening until manually stopped
    recognition.interimResults = false; // Don't show interim results
    recognition.lang = 'en-US'; // Set language

    // Start recognition when the button is clicked
    const startBtn = document.getElementById('startBtn');
    const output = document.getElementById('output');

    startBtn.addEventListener('click', () => {
        recognition.start();
        console.log('Voice recognition started. Speak into the microphone.');
    });

    // Handle the result event
    recognition.addEventListener('result', (event) => {
        const transcript = event.results[event.resultIndex][0].transcript;
        output.textContent = transcript; // Display the recognized speech
        console.log('Recognized Text:', transcript);

        // Perform actions based on voice commands (optional)
        if (transcript.toLowerCase().includes('hello')) {
            output.textContent += ' - You said hello!';
        }
    });

    // Handle errors
    recognition.addEventListener('error', (event) => {
        console.error('Speech recognition error:', event.error);
    });

} else {
    console.log('Speech recognition is not supported in this browser.');
}

Key Components:

  1. SpeechRecognition: We check for browser support using the SpeechRecognition or webkitSpeechRecognition API.

  2. recognition.start(): This starts the speech recognition process.

  3. recognition.addEventListener('result'): This event fires every time speech is recognized, allowing us to capture the recognized text and display it.


Step 3: Adding Voice Commands

Now that we can recognize speech, let's take it one step further by allowing the web app to respond to specific commands. For example, we can listen for keywords like "background" to change the page’s background color.

recognition.addEventListener('result', (event) => {
    const transcript = event.results[event.resultIndex][0].transcript;
    output.textContent = transcript; // Display the recognized speech

    // Perform actions based on voice commands
    if (transcript.toLowerCase().includes('background')) {
        document.body.style.backgroundColor = getRandomColor();
        output.textContent += ' - Background color changed!';
    }
});

// Function to generate a random color
function getRandomColor() {
    const letters = '0123456789ABCDEF';
    let color = '#';
    for (let i = 0; i < 6; i++) {
        color += letters[Math.floor(Math.random() * 16)];
    }
    return color;
}

In this example:

  • If the user says something containing the word "background," the background color of the page will change to a random color.

  • We use a simple getRandomColor() function to generate the new background color.


Step 4: Extending the App with More Commands

You can extend the app with additional voice commands to perform more actions. Here are some ideas:

  • Voice navigation: Use commands like "go to about page" to navigate to different sections of your website.

  • Form filling: Allow users to dictate form data such as name or email.

  • Media controls: Control music or video playback using voice commands like "play," "pause," or "next."

Let’s add a voice command to control font size:

if (transcript.toLowerCase().includes('bigger text')) {
    document.body.style.fontSize = 'larger';
    output.textContent += ' - Text size increased!';
}

if (transcript.toLowerCase().includes('smaller text')) {
    document.body.style.fontSize = 'smaller';
    output.textContent += ' - Text size decreased!';
}

Now, the app responds to voice commands like “bigger text” and “smaller text” by changing the font size of the page.


Step 5: Handling Multiple Languages

The Web Speech API also supports multiple languages. You can change the language by modifying the recognition.lang property. For example, to recognize speech in Spanish:

recognition.lang = 'es-ES'; // Set to Spanish (Spain)

By supporting multiple languages, you can create a more inclusive and global voice-controlled experience.


Final Project: Voice-Controlled To-Do List

Let’s combine everything we’ve learned and build a simple voice-controlled to-do list. This app will allow users to add tasks by speaking them out loud.

HTML:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Voice-Controlled To-Do List</title>
    <style>
        body {
            font-family: Arial, sans-serif;
            padding: 20px;
            background-color: #f4f4f4;
        }

        #output {
            margin-top: 20px;
            padding: 10px;
            border: 1px solid #ccc;
            background-color: white;
            width: 100%;
            max-width: 500px;
            min-height: 50px;
        }

        button {
            padding: 10px 20px;
            background-color: #007BFF;
            color: white;
            border: none;
            cursor: pointer;
        }

        ul {
            list-style-type: none;
        }

        li {
            padding: 10px 0;
        }
    </style>
</head>
<body>

    <h1>Voice-Controlled To-Do List</h1>
    <button id="startBtn">Start Voice Recognition</button>
    <ul id="todoList"></ul>

    <div id="output">Say "add task" to create a to-do.</div>

    <script src="app.js"></script>

</body>
</html>

JavaScript (app.js):

const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
const recognition = new SpeechRecognition();

recognition.continuous = true;
recognition.lang = 'en-US';

const startBtn = document.getElementById('startBtn');
const output = document.getElementById('output');
const todoList = document.getElementById('todoList');

startBtn.addEventListener('click', () => {
    recognition.start();
});

recognition.addEventListener('result', (event) => {
    const transcript = event.results[event.resultIndex][0].transcript;
    output.textContent = transcript;

    if (transcript.toLowerCase().includes('add task')) {
        const task = transcript.replace('add task', '').trim();
        const li = document.createElement('li');
        li.textContent = task;
        todoList.appendChild(li);
        output.textContent += ' - Task added!';
    }
});

recognition.addEventListener('error', (event) => {
    console.error('Error:', event.error);
});

Conclusion
Voice-controlled applications are transforming modern user interfaces, enabling interactions via speech. This guide demonstrates how to build a simple voice-controlled web app using JavaScript and the Web Speech API, focusing on speech recognition to convert voice commands into actions. Step-by-step, you'll learn to set up the Speech Recognition API, handle recognized speech, respond to specific voice commands, and extend functionality, including multi-language support. By the end, you'll create a voice-controlled to-do list, illustrating the API's potential for enhancing web app interactivity.

With the basic foundation in place, you can now extend this further to create more sophisticated voice-controlled web applications. Whether it’s for accessibility, productivity, or just fun, the possibilities are vast!

Happy coding!

0
Subscribe to my newsletter

Read articles from ByteScrum Technologies directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

ByteScrum Technologies
ByteScrum Technologies

Our company comprises seasoned professionals, each an expert in their field. Customer satisfaction is our top priority, exceeding clients' needs. We ensure competitive pricing and quality in web and mobile development without compromise.