New service proposal for WaveNet voices integration

154 views
Skip to first unread message

Lei Shi

unread,
May 13, 2021, 7:35:56 PM5/13/21
to servic...@chromium.org, Joel Riley
Hello services-dev, 

Our team is trying to create a ChromeOS service under chromeos/services/. This service is designed for integrating network-based speech synthesis, generated by WaveNet, into ChromeOS. 

We highly appreciate your feedback on this! 

Background
Text-to-speech is an impactful accessibility feature that has a lot of applications. When consuming long-form text, the quality of the text-to-speech can have a positive impact on the user’s comprehension and provide a more pleasing experience. 
WaveNet generates a more natural sounding voice than traditional speech synthesis techniques. WaveNet is generated on Google servers and available for clients through Google’s internal API that provides an HTTPS interface via gRPC Web. 
We plan to integrate WaveNet voices into Chrome OS by implementing a mojo service that communicates with Google's internal API. The service passes utterance requests from clients to the internal API to synthesize audio. During this process, the service also attaches an API key to the request for authorization. 

Why services
The internal API requires authorization. Our team has created private API keys that should not be exposed to the public for security reasons (although we can update the API keys if they are leaked). Due to the nature of HTTP requests, calling internal API directly through JS extensions will easily expose the keys. Calling through a native service allows us to better protect the API key since the API key cannot be seen when inspecting network connections of the TTS engine extension using DevTools. 

Questions and concerns
We are relatively new to mojo services. How expensive is it to implement a service like this? Are there any examples we can refer to? Would the services require engineering maintenance (if so, what are potential maintenance tasks we need to do)? 

More: Googlers, see design at go/wavenet-chromeos-dd.

Best,
Joel and Lei

Lei Shi

unread,
May 17, 2021, 5:19:15 PM5/17/21
to servic...@chromium.org, Joel Riley
Hello services-dev,

To follow up with this thread, we have some preliminary design options as well as more context for the problem space. We look forward to your feedback:

More Background: The role of Mojo Service in the current TTS system

The current Chrome OS TTS system has several components interacting with Mojo services and JS extensions. On the client side (e.g., JS applications like Select-to-Speak), the client uses chrome.tts to manipulate text-to-speech. Then, Chrome’s underlying TTS mechanism talks to a specific TTS engine implemented with the chrome.ttsEngine APIs. For example, the Google TTS engine has a JS Extension component (see Google’s internal codebase) implemented with the chrome.ttsEngine APIs. The TTS engine’s JS Extension component uses chrome.mojoPrivate APIs to connect native C++ Mojo services (see Google TTS engine’s native implementation) with chrome.ttsEngine APIs. 


It is also possible to implement a TTS engine purely in JS without native C++ implementations. The TTS engine simply needs to implement all the logic in its JS Extension with the chrome.ttsEngine APIs. For example, the network-based TTS engine only has a JS component.

Design Option 1: Implementing a new Mojo Service for secure data fetch

We can implement a new, independent Mojo service in //chromeos/services for fetching data from Google's internal API. In the JS Extension of the TTS engine, we process the fetched data and handle TTS logic. This design option is similar to the design of network-based TTS engine, except that we hide the key inside the Mojo service.  

Design Option 2: Implementing a new Mojo Service for the TTS engine

Another option is to implement another native TTS engine and its Mojo service. The implementation will be under //chromeos/services/tts. We will still need a light-weight JS Extension to connect with chrome.ttsEngine APIs. This design option is similar to the Google TTS engine’s design.


Best,
Lei

Scott Violet

unread,
May 17, 2021, 5:55:29 PM5/17/21
to Lei Shi, Dominic Mazzoni, services-dev, Joel Riley
+dmazzoni as he is the a11y expert and is most likely to provide input on the best way to accomplish what you are after.

  -Scott

--
You received this message because you are subscribed to the Google Groups "services-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to services-dev...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/services-dev/CAHQHEJfVs1z4noDrRMn8moujBL6HK4-ctkaUnCwnv0zhx%2B8-cw%40mail.gmail.com.

James Cook

unread,
May 17, 2021, 11:15:11 PM5/17/21
to Scott Violet, Lei Shi, Dominic Mazzoni, services-dev, Joel Riley
drive-by question....

Are you interested in mojo services because:
1) You have some non-chromium C++ code (like from google3) that you need to run in a separate sandboxed process for security reasons, or
2) You have some JS code that needs to make network requests, but you want to hide the API key in C++ land where devtools can't see?

I couldn't tell from a quick scan of the design doc.

Colin Blundell

unread,
May 18, 2021, 6:04:39 AM5/18/21
to James Cook, Scott Violet, Lei Shi, Dominic Mazzoni, services-dev, Joel Riley
On Tue, May 18, 2021 at 5:15 AM 'James Cook' via services-dev <servic...@chromium.org> wrote:
drive-by question....

Are you interested in mojo services because:
1) You have some non-chromium C++ code (like from google3) that you need to run in a separate sandboxed process for security reasons, or
2) You have some JS code that needs to make network requests, but you want to hide the API key in C++ land where devtools can't see?

I couldn't tell from a quick scan of the design doc.

If it's (2), my instinct would be to take Design Option 1 specified below, i.e., limit the service interface and C++ to strictly the part that can't be done in JS. Caveat: I'm going just by the information that you've given on this thread :).

Best,

Colin
 

Lei Shi

unread,
May 18, 2021, 1:03:12 PM5/18/21
to Colin Blundell, James Cook, Scott Violet, Dominic Mazzoni, services-dev, Joel Riley
Hi Colin and James,

Thanks for the feedback.

It's (2): we need to make network requests but we would like to hide API keys in C++ side. 

Best,
Lei

Dominic Mazzoni

unread,
May 18, 2021, 1:11:48 PM5/18/21
to Scott Violet, Lei Shi, services-dev, Joel Riley
On Mon, May 17, 2021 at 2:55 PM Scott Violet <s...@chromium.org> wrote:
+dmazzoni as he is the a11y expert and is most likely to provide input on the best way to accomplish what you are after.

  -Scott

Thanks - I've been working with Lei and Joel on this and I've advised on the overall architecture, but I don't have any particular insight on this particular design question.
 

Scott Violet

unread,
May 18, 2021, 1:39:03 PM5/18/21
to Colin Blundell, James Cook, Lei Shi, Dominic Mazzoni, services-dev, Joel Riley
On Tue, May 18, 2021 at 3:04 AM Colin Blundell <blun...@chromium.org> wrote:


On Tue, May 18, 2021 at 5:15 AM 'James Cook' via services-dev <servic...@chromium.org> wrote:
drive-by question....

Are you interested in mojo services because:
1) You have some non-chromium C++ code (like from google3) that you need to run in a separate sandboxed process for security reasons, or
2) You have some JS code that needs to make network requests, but you want to hide the API key in C++ land where devtools can't see?

I couldn't tell from a quick scan of the design doc.

If it's (2), my instinct would be to take Design Option 1 specified below, i.e., limit the service interface and C++ to strictly the part that can't be done in JS. Caveat: I'm going just by the information that you've given on this thread :).

+1 to Colin's suggestions. This should mean the smallest change to your overall architecture and allow you to isolate purely the part you care about isolating.

  -Scott
 

Lei Shi

unread,
May 19, 2021, 7:10:49 PM5/19/21
to Scott Violet, Colin Blundell, James Cook, Dominic Mazzoni, services-dev, Joel Riley
Thank you all for the suggestions.

I made a sample change for adding mojo and service for fetching audio data. As mentioned earlier, we would like to use the service in JS extension. One related example is Remote Apps., which uses chrome.mojoPrivate.requireAsync to initialize the service on the JS side. 

We are trying to keep the service as minimal as possible. My questions are (1) where should we launch the service? and (2) how can we bind a JS interface for such a service? Are there any docs or sample changes we can refer to?

Best,
Lei

James Cook

unread,
May 19, 2021, 10:26:01 PM5/19/21
to Lei Shi, Scott Violet, Colin Blundell, Dominic Mazzoni, services-dev, Joel Riley
On Wed, May 19, 2021 at 4:10 PM Lei Shi <leil...@google.com> wrote:
Thank you all for the suggestions.

I made a sample change for adding mojo and service for fetching audio data. As mentioned earlier, we would like to use the service in JS extension. One related example is Remote Apps., which uses chrome.mojoPrivate.requireAsync to initialize the service on the JS side. 

This CL seems to be trying to launch an additional process. From your description, it sounds like you don't really want another process, you just want some C++ code to make network calls. Also, running an extra process costs memory, so it would be nice to avoid it. We generally run extra processes only when we need to sandbox things.

In the old days (pre-mojo) we would handle these things with private extension APIs in //chrome/browser/extensions/api/. There should be some Chrome OS examples in there. In these cases the existing browser process runs the C++ code.

services-folks, what's the recommendation today with mojo? How do folks use mojo to expose C++ to a trusted Chrome extension?

(A11y folks, I'm a bit confused by the existing TTS service, which I see listed in //chrome/utility/services.cc. I guess that needs to spawn a process?)

Apologies in advance if I'm confusing things here. I'm not too familiar with our a11y architecture.

Colin Blundell

unread,
May 20, 2021, 9:17:52 AM5/20/21
to James Cook, Lei Shi, Scott Violet, Colin Blundell, Dominic Mazzoni, services-dev, Joel Riley
I would also wonder whether this use case needs to launch a utility process to run the C++ or just do so in the browser process.

Ken Rockot

unread,
May 20, 2021, 10:09:47 AM5/20/21
to Colin Blundell, James Cook, Lei Shi, Scott Violet, Dominic Mazzoni, services-dev, Joel Riley
We expose mojom interfaces to Chrome extensions here: https://source.chromium.org/chromium/chromium/src/+/main:chrome/browser/extensions/chrome_extensions_browser_interface_binders.cc;drc=210d8586b0a90f6f619576f4df3c30943d11eae5;l=101

There you can filter against specific manifest permissions or extension IDs to decide whether to bind or not.

Since it sounds like process isolation is not necessary here, you can bind the receiver directly to an implementation in the browser process. Note that there's nothing special about in-process services; they're just objects with a mojo::Receiver/ReceiverSet.


Joel Riley

unread,
May 20, 2021, 4:37:39 PM5/20/21
to Ken Rockot, Colin Blundell, James Cook, Lei Shi, Scott Violet, Dominic Mazzoni, services-dev
We had a ChromeOS security consultation (https://b.corp.google.com/issues/184072388) to review our design at a high-level, where network requests happen in an out-of-process service, and the guidance we received was "If the ReadAloud C++ service runs out-of-process then I think that takes care of the main parsing-risk concern.". Would running the service in-process (browser) cause security concerns? Are there examples of C++ network requests to Google APIs that originate from the browser process?

James Cook

unread,
May 20, 2021, 7:34:49 PM5/20/21
to Joel Riley, Ken Rockot, Colin Blundell, Lei Shi, Scott Violet, Dominic Mazzoni, services-dev
What sort of data are you parsing as a result of these network requests?

I think Chrome makes a variety of network requests to Google endpoints from the browser process. For example, I think requests to the sync backend originate in the browser process.

Joel Riley

unread,
May 20, 2021, 7:37:35 PM5/20/21
to James Cook, Ken Rockot, Colin Blundell, Lei Shi, Scott Violet, Dominic Mazzoni, services-dev
I see. We would be parsing responses from the API we are communicating with, which will have base64 encoded audio data and timepoints that correspond to words.

Dominic Mazzoni

unread,
May 20, 2021, 7:42:32 PM5/20/21
to Joel Riley, James Cook, Ken Rockot, Colin Blundell, Lei Shi, Scott Violet, services-dev
You can use DataDecoder to safely parse JSON, XML, and other formats - on most platforms it does the parsing in another process for added safety, but it's all abstracted away so it's very easy to use. I think base64 decoding is considered safe and can be done in-process.


Lei Shi

unread,
May 21, 2021, 5:08:04 PM5/21/21
to Dominic Mazzoni, Joel Riley, James Cook, Ken Rockot, Colin Blundell, Scott Violet, services-dev
Hi folks,

Thank you all for the help. I still have somewhat naive questions:

I’ve tried to edit codes in chrome_extensions_browser_interface_binders.cc in this CL. However, I still have trouble understanding how this would surface the service to ChromeOS Extension, specifically: 
(1) Where should we initialize the service? For out-of-process services, it seems like they initialize the service in the render process. 
(2) Where should we put the JS bindings? I know mojom can generate JS files but I’m not sure where I should put the BUILD dependency (e.g., out-of-process services build their JS bindings in chrome/renderer/BUILD

Are there any more detailed documentation for wiring services? I highly appreciate your help and it would be great if you could provide some pointers to other similar services.

Best,
Lei

Lei Shi

unread,
May 25, 2021, 1:59:18 PM5/25/21
to Dominic Mazzoni, Joel Riley, James Cook, Ken Rockot, Colin Blundell, Scott Violet, services-dev
Hi folks,

Good morning and happy Monday. I'm bumping up this email thread to see if there are any further suggestions or sample CLs.

Best,
Lei

Scott Violet

unread,
May 25, 2021, 4:24:57 PM5/25/21
to Lei Shi, Dominic Mazzoni, Joel Riley, James Cook, Ken Rockot, Colin Blundell, services-dev
Ken is the expert on this, but I believe he is OOO at the moment. Does https://chromium.googlesource.com/chromium/src/+/refs/heads/main/docs/mojo_and_services.md answer your questions?

  -Scott

Lei Shi

unread,
May 25, 2021, 5:10:13 PM5/25/21
to Scott Violet, Dominic Mazzoni, Joel Riley, James Cook, Ken Rockot, Colin Blundell, services-dev
Hi Scott,

Thanks for sending the documentation. Unfortunately, I wasn't able to find detailed information about launching in-process service and connecting it with JS extension. 

Best,
Lei

Colin Blundell

unread,
May 26, 2021, 5:25:50 AM5/26/21
to Lei Shi, Scott Violet, Dominic Mazzoni, Joel Riley, James Cook, Ken Rockot, Colin Blundell, services-dev
While waiting for Ken to return, I would suggest tracing through the plumbing of one of the existing services in the file that you pointed to, e.g. the hookup of TtsStream or MachineLearningService. You could start by doing a git blame to find the CLs where those Bind<Foo>() functions were added as a starting point; it's likely that those CLs will either have or point to the other parts of the plumbing necessary (at least for those cases).

Best,

Colin

Scott Violet

unread,
May 26, 2021, 11:24:31 AM5/26/21
to Colin Blundell, Lei Shi, Dominic Mazzoni, Joel Riley, James Cook, Ken Rockot, services-dev
Great suggestion Colin. Another idea is to talk with someone from the security team. I suspect Daniel Cheng would be a good person to ask about this.

  -Scott
Reply all
Reply to author
Forward
0 new messages