very small language model

164 views
Skip to first unread message

jbrow...@gmail.com

unread,
Dec 11, 2023, 3:34:29 PM12/11/23
to kaldi-help
All-

Does anyone know of a small language model that can run on one or two x86 cores, with an update rate of 1/2 sec or so, and correct sound-alike word errors ?  We are running Kaldi on small-form factor x86 (pico-ITX) for robotics and first-responder applications (which must run without cloud connectivity) and it works well, but in the presence of noise, different speakers, etc we get errors such as:

in the early days a king rolled the stake

  which should be corrected to:

in the early days a king ruled the state

Of course ChatGPT et. al. can do this easily but vastly exceeds our form-factor requirements. I've tried huggingface demos, I've emailed top execs researchers at a long list of AI outfits (stability.ai thirdai, OpenAI, Scale AI, etc) and Universities, but no luck so far. It seems their focus is only on large language models.

Thanks for any advice.

-Jeff

nshm...@gmail.com

unread,
Dec 22, 2023, 7:16:38 PM12/22/23
to kaldi-help
Overall you can't expect great accuracy in noise from a small model.

You can try k2/sherpa models like below, they are more accurate. int8 models are compact (below 200Mb)


jbrow...@gmail.com

unread,
Jan 11, 2024, 4:25:26 PM1/11/24
to kaldi-help
Hi Nickolay-

Thanks for your reply. Yes within limited form-factor applications (limited processing and memory) differences between Kaldi, Whisper, etc are less, plus the typical environment for small form-factor tends to be noisy with multiple talkers. In that case one or more downstream small language models can be key, especially where translation to machine readable commands (e.g. ROS) is required. A consensus of 2 out of 3 SLMs is desirable. We can't tell the fork-lift to immediately stop unless we are really sure of what was said (well we can, but don't wanna do that very often).

I took a quick look at the k2/sherpa model link. Is there a way to run these on text input only, assuming speech recognition has already occurred  ?  I.e. as a "language model only" ?

-Jeff
Reply all
Reply to author
Forward
0 new messages