Medical dictation/recognizers

53 views
Skip to first unread message

Clayton Hann

unread,
Oct 15, 2024, 12:50:43 PM10/15/24
to utterlyvoiceusers
Hey Tony, thanks for the software.  Its working really well.
I wanted to summarize what recognizers works best for medical dictation, in case people come by wondering.

Currently, Google Cloud v1 is working the best.  They are also somewhat pricey (this is relative, my dictation bill has varied from $10-25 over the past month, and I see about 100 patients a week.  Deepgram recently upgraded their medical dictation to Nova-2 and it is working far better, and is almost at a level of Google.  Unfortunalely Vosk and Whisper arent faring well due to inaccuracy (Vosk, not really a medical app) & speed (Whisper has high hardware requirements).


Directions for the future?  Hopefully Tony will consider these recognizers:

1.  Nabla.  Total surprise here.  Even better its free.  Seems like it will do streaming, but i have a hard time telling.  Free to signup at pro.nabla.com and get API key.  There is a quick and dirty dictation javascript app at https://github.com/nabla/sample-app  where you can save those files locally and then just try it out. It worked pretty darn well when I tried it, but its not as fast as Google.   Allegedly it uses Azure plus a trained vocabulary (https://www.nabla.com/blog/speech-to-text/). 

2.  Azure.    I think Tony said he would try to get this in at some point. 

Thanks again for the software.  Its so nice not to be a slave to Dragon Medical. 


Utterly Voice

unread,
Oct 15, 2024, 3:32:32 PM10/15/24
to Clayton Hann, utterlyvoiceusers
Hey Clayton,

Thank you for this summary! It is great to hear that Utterly Voice is working well for you.

Yes, Google Cloud can be a bit pricey. In case you don't already, you should always use "stop listening" instead of "pause listening" when you can, because utterances are still sent to the recognizer while the microphone is paused. We are assuming that cloud speech recognition will get cheaper over time due to the growing number of providers. We recently switched from Google Cloud to Digital Ocean for our cloud storage, and our monthly bill went from $25 to $5. Competition is good :-)

Yes, Deepgram has made some significant improvements lately in both accuracy and latency. It might be good to retry them periodically in case they surpass Google.

Nabla looks interesting. I have added that to our task list for future investigation. It does support streaming, which is great. However, it does not support streaming raw binary audio data. It supports streaming of base64 text audio data, which can increase latency somewhat. It still looks worth trying.

We are starting to review Azure right now. It looks very promising in general. It appears that they do not offer any medical-specific models. They do, however, provide a nice interface for creating custom models, and this is what they recommend to users looking for medical dictation. This might require more work for your setup, but it might result in improved accuracy, because it will be trained on your voice, and on the terms you use frequently.

-Tony

--
You received this message because you are subscribed to the Google Groups "utterlyvoiceusers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to utterlyvoiceus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/utterlyvoiceusers/bb5aa88d-d403-43e6-8cc2-d8343631475fn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages