Android Speech

8 views

Skip to first unread message

Osias Baptist

unread,

Jul 27, 2024, 7:30:44 PM7/27/24

to reigisura

So i'm looking into building a speech to text app for fun. I did some research and found an inbuilt Speech to Text API using RecognizerIntent that is free, but also found that google is now offerieng a cloud speech API that the charge for.

android speech

Download Zip ✑ https://bltlly.com/2zSwMA

To sum up what I did and what I want to to: I managed to introduce voice recognition feature into an Android application that is running on Android 4.2 on a tablet, and it works ok. Now I want to port my application on Google Glass but unfortunately I get the following error when I try to start the speech recognizer: error 5 -> ERROR_CLIENT(Other client side errors). The message guides me to find other errors that not related to SpeechRecognizer object, but I don't get any in my logs, not even warnings. So my question would be: When exactly do I get ERROR_CLIENT? and what should look the errors that block the recognizer to start?

First of all I found that SpeechRecognizer only works when my glasses are connected to the internet! Even so I still received from times to times ERROR 5. That was because I have a bad connectivity to the internet and from times to times my glass just disconnected from the internet without any notifications! I think this is an issue that must be solved for the next level of glasses. It just cannot disconnect from the internet without notifying you.

The error happens also if the Google search application has no permissions to the microphone. In that case, the phone speech recognition service will be disabled and the ERROR_CLIENT error will be triggered for all apps (the above case was verified on a Samsung phone running Android 11)

If you are after a server-less solution (i.e. inference done on the device, without connecting to a server), and you are working on CUDA GPU-less low-end devices like smartphones, you would need small model size, speedy and CPU-only inference.

thank you soooo much, I saw deepspeech examples but most of them are written with python, and I know nothing about python ): , if u know anything about connecting this model ( _kurdish_xlsr) with an android app, please let me know

AFAIK, CV is also aware of the problems for people with low-connection quality. Last year they had a competition for changing the current codebase to a PWA (progressive web application), which would be able to work offline by definition, which would also prevent low-bandwidth scenarios. But nobody came out for this implementation, it was a huge undertaking. But you already know the existing Android app, right?

Are there any alternatives to Android's native speech recognition engine that can be used on-device for an app? I do mean an SDK. But one that can be used on device (offline), as opposed to having to use a cloud based API.

Lately I have been diving into research regarding voice capture on the Oculus Quest. I have been searching for a way to enable voice to text and I had found plugins for this like Sphinx, but that is outdated. The Google Speech Kit Plugin is another option I looked into. This plugin has all of the features I would need to implement into my project however it has very little support for android. The only functionality that android has with this plugin is Text to Speech. Why is there little support for Android with the google speech kit plugin? Is there a way to convert an audio capture of some kind to work with the Google Speech Kit? Or are any other options for voice recognition on android?

It looks for specific text in the notification, in which case it plays whatever TTS you tell it to in Tasker. So it can key in on completely specific ST notifications (however you have defined them to be output from ST).

Now, a word on Tasker: It uses Profiles (trigger conditions) and Tasks (actions to do). Strictly speaking, these are separate and you can mix and match Profiles and Tasks all you want (once you have a lot). We just made the all-important Profile. Next comes the Task for it, which will also be easy.

When you are finished making the Task and Back out, you will be left at a little screen with what looks like a Play arrow in lower left. Hit this to play your Task and hear/test your TTS already (without needing to trigger Profile). Good for testing.

Tasker is odd for letting you change names of Profiles and Tasks. Hold your finger on the Profile or Task and it will turn blue. Now you will see along the top of the screen an A to change name, and other options like deleting it.

If your device is inconvenient to trigger, you can test for text in anything else that generates a notification, while testing. Like sending yourself an email (from an anonymous Chrome window if necessary).

New 1/31/17, Insert a Wait: You may find that your ST notification sound and your TTS arrive at the same time, which can make it hard to understand the TTS. If you space them out, you also get a little mental primer that your TTS is coming.

I just want to throw in there you can use TTS in ST without expensive official integrated speakers. You can use android, landroid, and other options. I personally think the best is with using EchoSistant.

Just to be clear - it is possible for SharpTools to have an individual sensor give an individual TTS message, right? I have nine motion sensors but really only care about three of them for verbal alerts. So they need to be handled individually.

And in cases where the filters might still be too broad for your needs (for example, the devices have very different names), you can use conditional logic in the Tasker task to determine if the event should be triggered. Since you have things like the device name, attribute name, and attribute value as parameters, it makes it easy to setup conditional logic in Tasker to only trigger as you desire.

As I learned in grad school, anybody that is giving constructive advice is improving your product for free and showing you other viewpoints. If one person speaks up, probably another hundred wanted that, too.

And with version four having been in work since the day of the last release (mind you there are two of us working this project) there are multiple hundreds of hours in he app and I get to overhaul the wiki.

For those looking for other ways to play TTS notifications through external speakers - you can use the tasker AutoVoice plugin ( ) to cast notifications through Chromecast audio, or any other chromecast enabled device.

This class provides access to the speech recognition service. This service allows access to the speech recognizer. Do not instantiate this class directly, instead, call SpeechRecognizer#createSpeechRecognizer(Context), or SpeechRecognizer#createOnDeviceSpeechRecognizer(Context). This class's methods must be invoked only from the main application thread.

The implementation of this API is likely to stream audio to remote servers to perform speech recognition. As such this API is not intended to be used for continuous recognition, which would consume a significant amount of battery and bandwidth.

ALICEbot is quite popular and so is Ultra Hal Assistant and Jaberwocky. For speech engine voices, you can usually get away with the ones that came with your PC, although some companies specialise in voices such as Cepstral. For Speech Recognition, one of the most robust PC based applications is Dragon Naturally Speaking but as with most speech recognition software, using a high quality noise cancelling microphone is essential.

A: You may need to tweak your system settings for an optimal experience. Here are some tips for setting up Android for speech and voice recognition. This includes troubleshooting issues you may encounter.

Speaking Email uses the Google TTS engine feature known as "Google Voice Typing" for dictation and commands. It works using the offline engine, so you need to download the offline voice (even if you are always online - as we have found the offline engine to be more accurate).

Speech Services by Google may also be a source of issues. Look in system settings > apps > Speech Services by Google. This needs to be enabled, and it has been known to auto update and cause speech system issues. Find the 'uninstall' button which may be under the dots menu, which will factory reset it.

An app for iPhone and Android that reads your email out loud to you. It intelligently extracts content from emails (minus the signatures, disclaimers and threads). And it lets you action your email - archive, mark-as-read, trash, flag, reply, or forward. It can be completely by voice command, or use the large on-screen buttons or full screen touch gestures.

Speech-to-text software makes it simple and easy to convert speech into text. While it used to be regarded as a tool mostly aimed at deaf or hard of hearing individuals, the functionality has become a mainstream resource for optimizing efficiency at the office, school, and in daily life.

The main difference between speech-to-text apps and transcription software is that speech-to-text applications convert real-time spoken words into written text. In contrast, transcription software creates a text copy from an audio file, which requires uploading a pre-recorded audio file.

Nagish is a free app that converts text-to-speech and speech-to-text in real time, making it easy to place and receive calls by typing and reading instead of or in addition to speaking and hearing. The calls are completely private, and the technology supports multiple languages, including English, Spanish, Hebrew, Italian, French, and Japanese.

Speechnotes is a free speech-to-text app that offers a simple and user-friendly interface. The app allows users to dictate while it saves the text automatically. Speechnotes supports various languages, including English, Spanish, French, German, and more.

Speech to Text is another popular speech-to-text app for Android devices. Includes real-time transcription, custom vocabulary, and support for multiple languages. The app allows users to edit transcriptions and export them as text files.