Speech Platform Runtime

0 views

Skip to first unread message

Clara Zellinger

unread,

Aug 3, 2024, 11:09:48 AM8/3/24

to travesejwin

The Microsoft Speech platform is used by Voice Elements for Text-To-Speech (TTS) and for Speech Recognition. Many languages are supported. License to use Microsoft Speech for TTS and Speech Recognition is included with your Windows OS license.

Microsoft stopped development on the Microsoft Speech platform in 2012. Instead of processing text-to-speech (TTS) or speech recognition (SR) on-premises, Microsoft now steers its customers to use their cloud services on Azure. Those services and other similar services on the cloud can provide excellent SR and TTS and can work in conjunction with the Voice Elements platform. However, since there is no charge for the Microsoft Speech Platform, we continue to support it as our go-to default facility for TTS and SR.

You should have this installed on your server in order to perform speech recognition functions within Voice Elements. Voice Elements has built out support for Microsoft Speech Platform, as long as you use Microsoft compatible grammar files. These are easy to create using the methods outlined in this article: Create Microsoft Speech Compatible Grammar Files

The Microsoft Speech Platform relies on different language packs in order to provide speech recognition capabilities for different languages. Microsoft Speech Platform supports 18 different languages and accents. You can download some of the more popular languages using the links below. For additional options, please contact Inventive Labs Technical Support.

The SDK is the tookit provided by Microsoft to use the Microsoft Speech Platform. All of this functionality is built into Voice Elements. You will not need to have this installed, unless you would like to use it to create Microsoft Compatible Grammar files.

After completing the steps above you can enable speech recognition before using the Play and PlayTTS methods by setting SpeechRecognitionEnabled to true before calling them. For information on how to use the speech recognition demo application for testing, see Test Speech Recognition with Voice Elements.

Please note that SpeechRecognitionNumberOfPorts should be set to a number that is equal to or less than the number of Speech Recognition Ports for which you are licensed. You can check your license entitlements in the Voice Elements Dashboard.

When I use the default Azure sample code to convert speech to text using the activated speech service, it shows that "Speech Recognition canceled: CancellationReason.Error" , "Error details: Runtime error: Failed to initialize platform (azure-c-shared)" . The question is that the same code can work in my local machine, but will raise such error message in my Lab server machine, don't know if there are any settings required for network setup. But I have checked that my speech service is open for all network by default. Any idea about this issue? Thanks in advance!

Sorry for the late reply, I have checked that all the additional dependencies required have already been installed in the linux environment, however, one thing not sure is that the guideline said: "For a native application, the Speech SDK relies on libMicrosoft.CognitiveServices.Speech.core.so. Make sure the target architecture (x86, x64) matches the application.". How can I check this? what is "libMicrosoft.CognitiveServices.Speech.core.so"? I guess I have installed it via pip install azure-cognitiveservices-speech?

GitHub link: GitHub - gtreshchev/RuntimeSpeechRecognizer: Multi-platform, real-time, offline speech recognition plugin for Unreal Engine. Based on Whisper OpenAI technology.
Marketplace link: Runtime Speech Recognizer in Code Plugins - UE Marketplace
Documentation link: Home gtreshchev/RuntimeSpeechRecognizer Wiki GitHub

RTX Remix is a revolutionary modding platform to remaster classic DirectX 8 and 9 games (that have fixed function pipelines) with path tracing, NVIDIA DLSS, AI-enhanced textures, and user-created assets.

RTX Remix, part of the NVIDIA Studio suite of apps, is composed of two core components that work together to enable modders to remaster classic PC games: the RTX Remix creator toolkit, and a custom RTX Remix runtime.

The RTX Remix creator toolkit, built on NVIDIA Omniverse and used to develop Portal with RTX, allows modders to assign new assets and lights within their remastered scene, and use AI tools to rebuild the look of any asset. The RTX Remix creator toolkit Early Access is coming soon.

The RTX Remix runtime captures a game scene, and replaces assets at playback while injecting RTX technology, such as path tracing, DLSS 3 and Reflex into the game. Already, mod developers have been using the RTX Remix runtime from Portal With RTX to create experimental ray traced scenes in numerous classic games.

There is ample opportunity to change how classic titles are played, and open source widens the possibilities beyond our imagination. We look forward to seeing what mod developers will build with source access.

Our main goal is to expand game compatibility and extend the features of Remix in collaboration with the community. In keeping with that aim, NVIDIA will accept pull requests on Github for code submissions from the community, provide feedback, and help advance code until it is mature enough to be merged into the official RTX Remix runtime.

Modding is all about community, and providing an open source RTX Remix runtime will help empower mod developers to expand Remix compatibility to even more classic PC games. We look forward to seeing how creators usher in this new era of modding with RTX Remix.

Platform-runtime unit tests are relatively cheap (in terms of developer productivity and CI time) whilst still allowing functionality to be verified across all supported platforms and in a realistic environment. Accordingly they're appropriate for many bugfixes and new features.

The Private.Infrastructure.TestServices.WindowHelper class exposes several static methods and properties to easily insert a control into the running visual tree, and to wait for modifications to the UI to have been fully processed, since updates to the UI typically take effect asynchronously.

The FindFirstChild and FindFirstParent extension methods are helpful in the common case that you want to retrieve a descendant or ancestor element by traversing the visual tree. They optionally take a condition to be met.

Note that for Android/iOS/macOS, the versions of the methods that allow native views to be traversed and retrieved are located in different namespaces. The complete set of usings to conditionally include is:

The test is ignored on iOS and Android since it verifies a feature that's not yet supported on those platforms. Note that since the Uno.UI.RuntimeTests assembly is compiled separately for each platform, we use compiler conditionals to ignore a test per-platform.

The test is an async method that returns Task because we want to perform asynchronous operations on the UI thread (add the view and wait for it to be measured and arranged). Since the Given_ListViewBase class is marked with the [RunsOnUIThread] attribute, we don't need to add it again to the method.

We create a new items source, then create a ListView and assign its ItemsSource property. Then we put the ListView inside of a Border (because this is the specific measurement scenario we wish to test), and add the Border to the active visual tree by assigning it to the WindowHelper.WindowContent property.

In this test we want to check that the item containers inside the list have been properly measured and arranged. We use the ContainerFromItem() method to get each container; we wrap it inside a WaitFor() check because, on some platforms, it takes a few UI loops for the list to materialize its items. Another way to get the containers would have been to use FindFirstChild() and an appropriate predicate.

The Speech Application Programming Interface or SAPI is an API developed by Microsoft to allow the use of speech recognition and speech synthesis within Windows applications. To date, a number of versions of the API have been released, which have shipped either as part of a Speech SDK or as part of the Windows OS itself. Applications that use SAPI include Microsoft Office, Microsoft Agent and Microsoft Speech Server.

In general, all versions of the API have been designed such that a software developer can write an application to perform speech recognition and synthesis by using a standard set of interfaces, accessible from a variety of programming languages. In addition, it is possible for a 3rd-party company to produce their own Speech Recognition and Text-To-Speech engines or adapt existing engines to work with SAPI. In principle, as long as these engines conform to the defined interfaces they can be used instead of the Microsoft-supplied engines.

In general, the Speech API is a freely redistributable component which can be shipped with any Windows application that wishes to use speech technology. Many versions (although not all) of the speech recognition and synthesis engines are also freely redistributable.

There have been two main 'families' of the Microsoft Speech API. SAPI versions 1 through 4 are all similar to each other, with extra features in each newer version. SAPI 5, however, was a completely new interface, released in 2000. Since then several sub-versions of this API have been released.

The Speech API can be viewed as an interface or piece of middleware which sits between applications and speech engines (recognition and synthesis). In SAPI versions 1 to 4, applications could directly communicate with engines. The API included an abstract interface definition which applications and engines conformed to. Applications could also use simplified higher-level objects rather than directly call methods on the engines.

In SAPI 5 however, applications and engines do not directly communicate with each other. Instead, each talks to a runtime component (sapi.dll). There is an API implemented by this component which applications use, and another set of interfaces for engines.