Hello Clayton,
Thank you for the feedback and feature requests!
It is great to see that users are experimenting with other models. Our long term vision for Utterly Voice involves offering many recognizer options. The quality and customization abilities of these recognizers are improving every day, and we want our users to be able to take advantage of that.
Yes, Deepgram has altered their API since our implementation. The default model should work fine, but using other models may not work currently. We have this in our list of issues to fix soon. Unfortunately, the Whisper model provided by Deepgram does not support streaming, so it is not suitable for real-time dictation.
Unfortunately, Google's V2 speech to text enforces automatic endpointing. Basically, the service wants to determine the beginning and end of every utterance. This would mean that all microphone data would need to be sent to the service. This would create quite a large bill if you used dictation throughout the day. Utterly Voice currently only sends detected utterances to the service, which significantly reduces cost. We do have a feature request to allow manual endpointing. If and when they allow this, we will definitely add support for V2. However, we believe that the same models are used for both V1 and V2, because the transcription quality is the same. With V2, they have just added new features to the API.
We are actually focused on adding another recognizer option now. Stay tuned.