Vosk and Scala
Vosk is a speech recognition toolkit. It can work offline. I thought it would be interesting to feed some text into a multimodal graph, so I started testing it.
I added two dependencies into my SBT project definition as it is suggested in the documentation:
https://alphacephei.com/vosk/install
lazy val root = project
.in(file("."))
.aggregate(memory, hexagon, semantic)
.settings(
name := "binet",
libraryDependencies ++= commonDependencies,
libraryDependencies += "dev.zio" %% "zio" % "2.0.19",
libraryDependencies += "net.java.dev.jna" % "jna" % "5.13.0", // <- VOSK
libraryDependencies += "com.alphacephei" % "vosk" % "0.3.45" // <- VOSK
)
.dependsOn(
memory % "test->test;compile->compile",
hexagon % "test->test;compile->compile",
semantic % "test->test;compile->compile"
)
I downloaded a model for Russian language from here:
https://alphacephei.com/vosk/models
This page on StackOverflow was very useful to get it working:
I had to set the correct sound format, which I didn't initially.
It lags behind by a second or two. I believe it's because I have no graphics card in my computer. It may be something else, though. I'm not sure.
I'm happy that speech recognition works, and is free of charge. Everything is ready for new experiments!
Subscribe to my newsletter
Read articles from Aleksandr Novikov directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by