I'm trying to use the SSML
<phoneme> tag. The current documentation looks like it should just work (see
here and
here). However, there used to be a page which stated that this is a v1beta1 feature. Its
404 now, but there's an
archived version. So my first question probably is whether
<phoneme> is v1beta1 only or if it's been back-ported to v1.
I'm using the Java client library to access the service. The documentation does not explicitly state it, but I guess to use v1beta1 I just have to change all my imports, e.g. from
import com.google.cloud.texttospeech.v1.TextToSpeechClient;
to
import com.google.cloud.texttospeech.v1beta1.TextToSpeechClient;
and that should work?
I also tried to use the
<phoneme> tag on the
demo page. As pointed out by this
StackOverflow question (which sadly never got an answer), the demo page accesses the v1beta1 service URL, but strips out some SSML tags like
<voice>. I can confirm that
<phoneme> is also removed before the request is sent to the server.
Whereas <voice> works with my Java client, <phoneme> still does not. In the synthesized speech, only the text content of the element is spoken. For example, this input:
<speak>As you can hear, <phoneme alphabet="ipa" ph="ˌmænɪˈtoʊbə">this tag is ignored</phoneme>.</speak>
is spoken as: "As you can hear, this tag is ignored."