Vosk and Scala

Vosk is a speech recognition toolkit. It can work offline. I thought it would be interesting to feed some text into a multimodal graph, so I started testing it.

I added two dependencies into my SBT project definition as it is suggested in the documentation:

https://alphacephei.com/vosk/install

lazy val root = project
  .in(file("."))
  .aggregate(memory, hexagon, semantic)
  .settings(
    name := "binet",
    libraryDependencies ++= commonDependencies,
    libraryDependencies += "dev.zio" %% "zio" % "2.0.19",
    libraryDependencies += "net.java.dev.jna" % "jna" % "5.13.0", // <- VOSK
    libraryDependencies += "com.alphacephei" % "vosk" % "0.3.45" // <- VOSK
  )
  .dependsOn(
    memory % "test->test;compile->compile",
    hexagon % "test->test;compile->compile",
    semantic % "test->test;compile->compile"
  )

I downloaded a model for Russian language from here:

https://alphacephei.com/vosk/models

This page on StackOverflow was very useful to get it working:

https://stackoverflow.com/questions/68401284/use-the-microphone-in-java-for-speech-recognition-with-vosk

I had to set the correct sound format, which I didn't initially.

It lags behind by a second or two. I believe it's because I have no graphics card in my computer. It may be something else, though. I'm not sure.

I'm happy that speech recognition works, and is free of charge. Everything is ready for new experiments!

0
Subscribe to my newsletter

Read articles from Aleksandr Novikov directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Aleksandr Novikov
Aleksandr Novikov