Testing VOSK

161 views
Skip to first unread message

Bob Smith

unread,
Aug 25, 2022, 1:40:13 PM8/25/22
to hbrob...@googlegroups.com
VOSK is Linux package that does voice recognition. It
runs stand-alone (no internet needed) and can run on a
Raspberry Pi 3 or 4. It is implemented as a single
large shared object library and has API and examples
for all the common programming languages. You can get
it at:
https://alphacephei.com/vosk/

This note describes how I tested VOSK, gives the results,
and makes a few suggestion if you were to voice command
set for your robot.


Testing
I previously wanted to experiment with neural nets using
isolated word recognition as the goal. I recorded 60
samples each of 32 words. The 1920 sample were augmented
using sox to get 158112 training samples. A script passed
all the training samples past VOSK to see if it could
recognize the augmented words. This test used the small
English VOSK model.


Results
Each word had 4941 samples. The percent error rates for
the test vocabulary were:
00.00 center
00.00 close
00.00 for
00.00 one
00.00 right
00.00 robot
00.00 six
00.00 status
00.00 three
00.00 yes
00.00 zero
00.02 no
00.04 motor
00.08 open
00.12 forward
00.12 seven
00.24 on
00.28 speed
00.38 go
00.72 set
00.93 five
01.03 stop
01.11 nine
01.29 left
01.43 down
01.61 back
01.88 turn
02.00 two
03.13 up
04.79 off
05.05 rotate
09.73 eight


Notes and Suggestions
- While my samples were isolated words, VOSK can recognize
sentences as well. For example, "two.....three......four"
would be recognized as "two", "three", "for", and
"two..three..four" would be seen as "two three for"

- Some errors were fairly limited. For example, "stop" was
erroneously seen as "stomp" and "staff". You could work
around these error in your grammar/parser by defining
STOP = "stop" | "stomp" | "staff" # now 0% errors

- The augmentation used different values of volume, speed,
pitch, and noise in a script. Examples of the sox commands
are:
sox in.wav out.wav vol 1.1
sox in.wav out.wav speed 1.01
sox in.wav out.wav pitch 20
sox in.wav noise.wav synth whitenoise vol 0.005
sox -m in.wav noise.wav out.wav



I'll be at the Verbal SIG this evening and would be more
than happy answer questions or help you get started with
VOSK.

Bob



thomas...@gmail.com

unread,
Aug 26, 2022, 1:38:01 AM8/26/22
to hbrob...@googlegroups.com
We'll have to set you up next to talk next meeting.
--
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hbrobotics/7de4030f-9b23-aa5e-b164-808ebf0e9b3d%40linuxtoys.org.

Reply all
Reply to author
Forward
0 new messages