Building a Text-to-Speech Converter Using React and Windows SpeechSynthesis

Introduction

Text-to-Speech (TTS) technology has become an essential tool for improving accessibility and user experience in modern applications. Whether it's for assisting visually impaired users, enabling hands-free interactions, or adding voice capabilities to web applications, TTS plays a crucial role. In this blog, we will build a simple yet effective TTS converter using React and the Windows SpeechSynthesis API. This project will allow users to enter text, choose from various voices, and listen to the synthesized speech in real time.

Understanding Windows SpeechSynthesis API

The Windows SpeechSynthesis API is a built-in web technology that allows developers to convert text into speech without needing external libraries. It provides various functionalities, including:

  • A list of available voices based on the system's installed speech engines.

  • Control over voice properties such as pitch, rate, and volume.

  • The ability to pause, resume, or cancel speech synthesis.

This API is supported in most modern browsers and is an excellent choice for integrating TTS capabilities into web applications.

Prerequisites

Before starting, ensure you have the following:

  • Basic knowledge of React.js

  • Node.js installed on your machine

  • A React project set up using Create React App

  • Install TailwindCSS and create tailwind.config.js file

Setting Up the Project

  • Let’s start by creating a react project using Create React App.

      npx create-react-app text-to-speech-converter
      cd text-to-speech-converter
    
  • Install the Lucide icon library for React applications

      npm install lucide-react
    
  • Installing TailwindCSS and creating the config file

      npm install -D tailwindcss
      npx tailwindcss init
    
  • Update the code inside the tailwind.config.js file

      /** @type {import('tailwindcss').Config} */
       export default {
      >   content: ["./src/**/*.{html,js}"],
          theme: {
            extend: {},
          },
          plugins: [],
        }
    
  • This is the folder structure that we get after the initial setup.

  • We will add the tailwind derivatives for each of the tailwind layers inside our main css file which is index.css

      @tailwind base;
      @tailwind components;
      @tailwind utilities;
    

Creating the Text-to-Speech Component

Inside the src folder, create a new file TextToSpeech.js and add the following code:

import { useEffect, useRef, useState } from 'react'
import { VoiceOption } from './Controls/VoiceOption';
import { Pause, Play, RotateCcw, Square, Volume2, VolumeOff } from 'lucide-react';

export const TextToSpeech = () => {
    //tracking text entered by user
    const [text, setText] = useState("");
    //state to store voices and handle them 
    const [voices, setVoices] = useState([]);
    //state to handle volume
    const [volume, setVolume] = useState(0.2);
    //state to keep track of the voice selected by the user
    const [selectedVoice, setSelectedVoice] = useState(null);
    //state variable to keep track is utterance is playing or not
    const [isPlaying, setIsPlaying] = useState(false);
    //state variable to keep track if utterance is paused or not
    const [isPaused,setIsPaused] = useState(false);
    //creating a refference for utterance
    const utteranceRef = useRef(null);

    useEffect(() => {
        //we fetch the voices from the windows.speechSynthesis API
        const loadVoices = () => {
            const availableVoices = window.speechSynthesis.getVoices().map(voice => ({
            default: voice.default,
            name: voice.name,
            lang: voice.lang,
            gender: voice.name.toLowerCase().includes("female") ? "Female" : "Male" 
            }));

            setVoices(availableVoices);
        };

        loadVoices();
        if(speechSynthesis.onvoiceschanged !== undefined){
            speechSynthesis.onvoiceschanged = loadVoices;
        }
    },[]);

    //handling the volume change and assigning this function to the slider we created for volume
    const handleVolumeChange = (e) => {
        setVolume(parseFloat(e.target.value));
    };

    //function to handle the voice change using the select option provided
    const handleVoiceChange = (voiceName) => {
        const voice = speechSynthesis.getVoices().find(v => v.name === voiceName);
        setSelectedVoice(voice || null);
    };


    const handlePlay = () => {
        //if text is empty we return 
        if(text.trim() === "") return;

        //we cancel the current synthesis first 
        if(utteranceRef.current){
            speechSynthesis.cancel();
        }

        //create a utterance
        const utterance = new SpeechSynthesisUtterance(text);
        utterance.volume = volume;

        //if we have selected a voice then we set that voice for the utterance
        if(selectedVoice){
            utterance.voice = selectedVoice;
        }

        utterance.onend = () => {
            setIsPlaying(false);
        };

        utterance.onpause = () => {
            setIsPlaying(false);
        };

        utteranceRef.current = utterance;
        speechSynthesis.speak(utterance);
        setIsPlaying(true);
        setIsPaused(false);

    };

    //function to pause the utterance
    const handlePause = () => {
        if(isPlaying){
            speechSynthesis.pause();
            setIsPaused(true);
            setIsPlaying(false);
        }
        else{
            speechSynthesis.resume();
            setIsPaused(false);
            setIsPlaying(true);
        }
    };

    //function to reset the utterance
    const handleReset = () => {
        speechSynthesis.cancel();
        setText("");
        setIsPlaying(false);
        setIsPaused(false);
    };

    return (
        <div className="max-w-4xl mx-auto">
            <div className="mb-8 flex flex-col bg-white shadow-lg border border-gray-100 rounded-xl">
                <VoiceOption voices={voices} handleVoiceChange={handleVoiceChange}/>
                <div>
                    <textarea onChange={(e) => setText(e.target.value)} className="text-lg outline-none p-6 resize-none w-full h-64" placeholder="Enter your text here..." value={text || ""} name="text" id="text" maxLength={5000}></textarea>
                </div>
                <div className="flex justify-between gap-3 p-4 border-t border-gray-100">
                    <div className="text-sm text-gray-500 flex items-center justify-between">
                        <p>{text.length} / 5000 characters</p>
                    </div>
                    <div className="flex items-center text-gray-600 gap-2">
                        {
                        volume === 0 ? 
                            <VolumeOff onClick={() => setVolume(0.2)} className="h-5 w-5 cursor-pointer" /> 
                            : 
                            <Volume2 onClick={() => setVolume(0)} className="h-5 w-5 cursor-pointer" />
                        }
                        <input onChange={handleVolumeChange} type="range" min="0" max="1" step="0.1" value={volume} />
                        <span className="flex-1 mr-2">{Math.floor(volume*100)}%</span>
                    </div>
                </div>
                <div className='flex flex-wrap justify-center gap-5 p-10'>
                    <button 
                        onClick={handlePlay} 
                        className="flex items-center justify-center gap-2 text-xl text-white w-32 p-2 rounded-xl disabled:cursor-not-allowed disabled:opacity-50 bg-blue-400" 
                        disabled={isPlaying || text.trim() === ''}
                        >
                            {isPaused ? <RotateCcw /> : <Play />}
                            {isPaused ? "Restart" : "Play"}
                    </button>
                    <button 
                        onClick={handlePause} 
                        className="flex items-center justify-center gap-2 text-xl text-white w-32 p-2 rounded-xl disabled:cursor-not-allowed disabled:opacity-50 bg-yellow-400" 
                        disabled={!isPlaying && !isPaused}
                        >
                            {isPaused ? <Play /> : <Pause />}
                            {isPaused ? "Resume" : "Pause"}
                    </button>
                    <button 
                        onClick={handleReset} 
                        className="flex items-center justify-center gap-2 text-xl text-white w-32 p-2 rounded-xl disabled:cursor-not-allowed disabled:opacity-50 bg-red-400" 
                        disabled={text.trim() === ''}
                        >
                            <Square /> Reset
                    </button>
                </div>
            </div>
        </div>
    )
};

Creating VoiceOptions.js component

export const VoiceOption = ({voices, handleVoiceChange}) => {
    return (
        <div className="p-4 border-b border-gray-100 overflow-hidden">
            <select onChange={(e) => handleVoiceChange(e.target.value)} className="w-2/3 bg-gray-50 text-slate-700 border rounded-lg py-2 px-4 outline-indigo-500 focus:outline-none focus:ring-2 focus:ring-indigo-500" name="" id="">
                <option value="">Default Voice</option>
                {voices.map((voice,index) => (
                    <option key={index} value={voice.name}>{voice.name} - {voice.lang} </option>
                ))}
            </select> 
        </div>
    )
};

Creating Header.js component

import {Volume2} from "lucide-react";

export const Header = () => {
  return (
    <div className="text-center mb-12">
        <div className="flex justify-center items-center mb-4">
            <Volume2 className="h-8 w-8 text-indigo-600" />
            <h1 className="ml-2 text-4xl font-bold text-slate-800">VoiceFlow</h1>
        </div>
        <p className="text-slate-500">Transform your text into natural-sounding speech</p>
    </div>
  )
};

Integrating the Component into the App

Now, add the TextToSpeech and Header components to the src/App.js :

import { Console, Header} from './Components';

function App() {

  return (
    <div className='min-h-screen bg-gradient-to-br from-indigo-100 via-purple-50 to-pink-100'>
      <div className='container mx-auto py-8 px-4'>
        <Header />
        <Console />
      </div>
    </div>
  );
}
export default App;

Running the Application

Start the development server:

npm start

Open http://localhost:3000 in your browser. You should see a simple interface where you can enter text, select a voice, and click the "Speak" button to hear the synthesized speech.

Conclusion

In this blog, we built a simple yet functional Text-to-Speech converter using React and the Windows SpeechSynthesis API. Added feature like volume control, speech pause, stop and reset functionality.

Try integrating this into your next project and enhance the accessibility of your web applications!

0
Subscribe to my newsletter

Read articles from Abhishek Sadhwani directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Abhishek Sadhwani
Abhishek Sadhwani