Recognizer updates

Clayton Hann

unread,

May 13, 2024, 9:49:02 PM5/13/24

to utterlyvoiceusers

In the next version, would you mind updating the recognizers?

For deepgram, they have deprecated "tier" and now only use "model". They state they still support "tier" but it actually isn't working in some cases. I tried tier "nova" and model "2-medical" and it starts up fine but will not dictate at all.

I think the pipe dream would be to use Deepgram's "whisper-large" model.

Also it looks like Google is onto v2.

Otherwise the program is awesome. I showed it to another doc in the office, and she was blown away. Its arms and legs above what they give us which is Dragon Medical.

Utterly Voice

unread,

May 14, 2024, 10:23:39 AM5/14/24

to Clayton Hann, utterlyvoiceusers

Hello Clayton,

Thank you for the feedback and feature requests!

It is great to see that users are experimenting with other models. Our long term vision for Utterly Voice involves offering many recognizer options. The quality and customization abilities of these recognizers are improving every day, and we want our users to be able to take advantage of that.

Yes, Deepgram has altered their API since our implementation. The default model should work fine, but using other models may not work currently. We have this in our list of issues to fix soon. Unfortunately, the Whisper model provided by Deepgram does not support streaming, so it is not suitable for real-time dictation.

Unfortunately, Google's V2 speech to text enforces automatic endpointing. Basically, the service wants to determine the beginning and end of every utterance. This would mean that all microphone data would need to be sent to the service. This would create quite a large bill if you used dictation throughout the day. Utterly Voice currently only sends detected utterances to the service, which significantly reduces cost. We do have a feature request to allow manual endpointing. If and when they allow this, we will definitely add support for V2. However, we believe that the same models are used for both V1 and V2, because the transcription quality is the same. With V2, they have just added new features to the API.

We are actually focused on adding another recognizer option now. Stay tuned.

--
You received this message because you are subscribed to the Google Groups "utterlyvoiceusers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to utterlyvoiceus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/utterlyvoiceusers/7124f5f0-c5c3-4e54-951a-b8452aa341b7n%40googlegroups.com.

Utterly Voice

unread,

Jul 1, 2024, 10:16:26 AM7/1/24

to Clayton Hann, utterlyvoiceusers

Update: the latest version (1.07) has upgraded the Deepgram recognizer to configure other models.

Reply all

Reply to author

Forward