PSA: VoiceGender deprecated in chrome.tts and chrome.ttsEngine APIs

720 views
Skip to first unread message

ka...@chromium.org

unread,
Sep 12, 2018, 12:53:58 PM9/12/18
to Chromium-dev

Summary

Through M70 in Chrome and Chrome OS, gender was a descriptor of Text-to-Speech (TTS) voices through the VoiceGender enum in the tts API and the ttsEngine API. This is deprecated in Chrome 71 without breaking existing extensions.


API modification bugs: crbug.com/863998 and crbug.com/863999

Motivation

Gender is not binary

VoiceGender was reflected as a binary in Chrome TTS and TTS engine APIs, but gender is not binary in the real world.


“A sex binary fails to capture even the biological aspects of gender” (source, relevant podcasts), but gender also includes dimensions of internal identity, and gender expression. None of these dimensions are binary. Chrome’s current VoiceGender enum with values only of “male” and “female” is not sufficient, and could exclude users who do not identify as male or female.

Gender may not be the most important descriptor of a TTS voice, but it’s the only one exposed

VoiceGender is purely a descriptor of TTS voices and cannot be used to change the way a voice speaks.


Other descriptors like age, pitch, timber, resonance, whether it’s meant to sound like a human or a robot, could be equally if not more important, depending on the use-case. For example, users may wish for a voice that sounds like a child, like a robot, or like a particular person, or it might be easier for particular users to understand voices that have a certain intonation or resonance.


Chrome TTS does not include any other descriptor besides gender, which feels inconsistent. We should either describe a voice in very many ways, or not at all, but not pick out one feature above the others. Chrome works to build for everyone, with everyone and so we want to ensure our descriptors abide by this principle.

The Chromium team is committed to gender-neutral code, and TTS should not be an exception

The existence of VoiceGender in a Chrome API may send a message that Google thinks that gender of “male” or “female” is an important quality in a TTS voice. We don’t think that!

SpeechSynthesis does not support gender

VoiceGender is not used in the broader web speech spec, so supporting it in Chrome is unnecessary.

Solution

Deprecate VoiceGender in Chrome OS APIs.

User-facing impact

Minimal

This change mostly impacts developers using the chrome.tts and chrome.ttsEngine APIs. It might reach users if developers using Chrome’s voices or Chrome’s TTS extension API were accessing VoiceGender and exposing that information to users. However, an audit of some of the top tts and ttsEngine extensions shows that many do not expose VoiceGender.

Technical Details

When VoiceGender is used by a chrome.tts or chrome.ttsEngine extension, a warning will be issued and VoiceGender will simply be ignored.


chrome.ttsEngine:

  1. Chrome will provide a manifest warning when a manifest supplies VoiceGender

  2. Chrome.ttsEngine will print Javascript console warning when VoiceGender is used in UpdateVoices


chrome.tts

  1. Chrome.tts will print a Javascript console warning when VoiceGender is used in chrome.tts.speak’s options object

  2. voice.gender will always return undefined

Rationales for not choosing alternative solutions:

These are included to provide some background on why Chrome is simply removing VoiceGender.

Remove VoiceGender, but add descriptive names to the TTS voices.

For example, instead of naming Chrome OS voices “Chrome OS US English”, we could name voices with human names.


This was not chosen because a name simply provides a surrogate for gender, and because Google in general does not name TTS voices.

Add more values to the enum VoiceGender besides “male” and “female”

This was not chosen because we do not feel that VoiceGender is important to end users, as information on its own (see above).

Add other descriptors of voices, such as “pitch”

First, this was not chosen because we do not want to replace VoiceGender with a substitute like “pitch” that has nearly the same meaning.


Second, users can already change the ‘pitch’ of speech by using the options object in chrome.tts.speak, so having a fixed descriptor of “pitch” doesn’t make sense as it is variable.


Third, to support this reasonably we would have to add many other descriptors at once, like pitch, timber, resonance, etc, and we do not believe there is a need for this information at this time.


Reply all
Reply to author
Forward
0 new messages