Python speech recognition when you are offline
Table of contents
In the first article, we talk and building a speech recognition system it uses the internet to connect to google and use its speech recognition algorithm, today in this article we going to build a speech recognition system when you are offline.
let's get started.
First of all, there is a python library called, VOSK. to install it on your computer type this command
pip3 install vosk
for more details please visit: https://alphacephei.com/vosk/install
now we need to install pyaudio:
pip install pyaudio
now our pre-requirements are done, now we have to build the offline speech recognition program. first, we need to import all the necessary
from vosk import Model, KaldiRecognizer
import pyaudio
here what we do is import the model and the kaldiRecognizer from the vosk library. along with that, we import py audio.
now we have to download the model for that go to this website and choose your preferred model and download it: https://alphacephei.com/vosk/models
here I use vosk-model-small-en-us-0.15
as my model
after downloading, you can see it is a compressed file unzip it in your root folder, like this
speech-recognition/
├─ vosk-model-small-en-us-0.15 ( Unzip follder )
├─ offline-speech-recognition.py ( python file )
now create a variable called " model " and type this
model = Model(r"C:\\Users\User\Desktop\python practice\ai\vosk-model-small-en-us-0.15")
what we do here we create a variable called " model " and assign our unzip folder location with the Model
speech-recognition/
├─ vosk-model-small-en-us-0.15 ( Unzip follder )
now we going to create a variable called " recognizer " and assign KaldiRecognizer
recognizer = KaldiRecognizer(model, 16000)
now as usual we have to import the microphone to our code so we import pyaudio
now we going to call the pyaudio to use the microphone to catch the voice.
mic = pyaudio.PyAudio()
stream = mic.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=8192) stream.start_stream()
now we enter the fun part :
while True:
data = stream.read(4096)
if recognizer.AcceptWaveform(data):
text = recognizer.Result()
print(f"' {text[14:-3]} '")
what we do here is write a while loop
create a variable called data and assign it to a " stream.read
(4096)
"
data = stream.read(4096)
now we create a decision " if " recognizer caught a waveform in the data
then assign the results to the variable called " text " and print it out.
if recognizer.AcceptWaveform(data):
text = recognizer.Result()
print(text)
this is the output we got :
{
"text" : "hello"
}
{
"text" : "hello"
}
{
"text" : "hi"
}
to remove {
and "text
" changes the print command to this :
print(f"' {text[14:-3]} '")
now the output will look like this, clean and more readable:
' '
' hello '
' hi '
' '
the complete code :
from vosk import Model, KaldiRecognizer
import pyaudio
model = Model(r"C:\\Users\User\Desktop\python practice\ai\vosk-model-small-en-us-0.15")
recognizer = KaldiRecognizer(model, 16000)
mic = pyaudio.PyAudio()
stream = mic.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=8192)
stream.start_stream()
while True:
data = stream.read(4096)
if recognizer.AcceptWaveform(data):
text = recognizer.Result()
print(f"' {text[14:-3]} '")
visit https://alphacephei.com/vosk/models
for more languages and models.
Subscribe to my newsletter
Read articles from Buddhi Ashen directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Buddhi Ashen
Buddhi Ashen
Hi there! My name is Buddhi Ashen and I am a student currently learning web development and design. I have a strong passion for building new things and am constantly seeking out opportunities to learn and grow as a developer. In my free time, I enjoy cycling and traveling to new places. I am excited to be a part of the Stack Overflow community and hope to contribute and learn from others in the field.