Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Fwd: string review for voice recognition app

23 views
Skip to first unread message

Tiffanie Shakespeare

unread,
Jun 8, 2015, 4:12:15 PM6/8/15
to dev-...@lists.mozilla.org
Greetings!

I am a designer working on Vaani (codename - real name TBD), a voice
recognition app for FxOS. Initially we are targeting English and Spanish so
it would be awesome if we could get some feedback on the strings.

Things are a little different here as we not only have to worry about
onscreen strings, but also what people can say and what people will hear.
In the spec, there are a few pages (15/16 and 22/23) showing tables of this
information the rest of the pages can you give some context for where/how
those strings will be used.

I would love feedback on all of the above. In our first stages, we are
going to be limited in what commands people can say so I would especially
love input on what commands people would be most likely to use. For example
in English I may say Call Tiffanie but in Spanish I may instead say Dial
Tiffanie so a literal translation may not be appropriate (hopefully that
made sense!). Of course the eventual goal would be to recognize not only
Call and Dial but other phrases like Ring up mum!

Here is where you can find the spec:
https://drive.google.com/open?id=0B5nBT2-RgS4NZmxzSXBCT1owajA&authuser=1

Please feel free to comment on anything though page 24 and up are still WIP.

Thanks so much for your time! :)

Sebastian Hengst

unread,
Jun 8, 2015, 6:36:58 PM6/8/15
to Tiffanie Shakespeare, dev-...@lists.mozilla.org
Hi,

in some languages, the name or number won't be at the end of the call
command, e.g. in German trying to call a person named Tim could be "Rufe
Tim an". For phone numbers, saying the number takes longer than the
name, so maybe a 'potential' call state is needed and needs to be
verified by the suffix.

The input and output can also vary because of the gender, but because
the addressbook lacks that info in general, we have to find a style
which avoids this. Instead of "Which one?" ("Welcher?" or "Welche?"), I
could ask "Which person?" ("Welche Person?") in German.

"The <nth> one" as a user reply can be two different replies in German,
e.g. "Die <nth> [Person]" (feminine) or "Der <nth>" masculine (the <nth>
person for e.g. a list of contacts for a male firstname).

For strings with a variable count of objects like phone number
categories, plural forms are needed for some languages (this could get
more complex for some languages if the verb depends on the gender and
count, but on the other hand, there are user-defined categories for
which the gender is unknown).

Cheers,
Sebastian

-------- Original-Nachricht --------
Betreff: Fwd: string review for voice recognition app
Von: Tiffanie Shakespeare <tshake...@mozilla.com>
Datum: 2015-06-08 22:12
> _______________________________________________
> dev-l10n mailing list
> dev-...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-l10n
>

Akerbeltz.org

unread,
Jun 9, 2015, 4:29:21 AM6/9/15
to dev-...@lists.mozilla.org, Tiffanie Shakespeare
Hi Tiffany,

Is the aim ultimately to support all locales which have made it to OS, including
future "langpack locales"? It would be cool of course but I'm not sure how that
would work since there is currently no such tool for our locale that I'm aware
of.

Michael


> On 08 June 2015 at 21:12 Tiffanie Shakespeare <tshake...@mozilla.com>

Michael Bauer

unread,
Jun 11, 2015, 2:51:24 PM6/11/15
to Tiffanie Shakespeare, dev-...@lists.mozilla.org
Maybe my question about how this will get handled in smaller languages
didn't get through so I'll just chip in my thoughts.

The two main issues I can see are not around translation but
a) languages with more morphology than English or Spanish (which is why
I think that's a really poor pair, from an l10n angle, even something
Slavonic would have been better)
b) How Vaani is supposed to tie into l10n in general

Regarding a), many languages will add things to a name, at the front or
end, in sentences like "Call Jack" which might become "Call
Jack[inflection]" or "Call J[inflection]ack" or some such. /Is <name>
your <relationship>/ will potentially come out as utter garbage in many
languages unless this is handled really carefully and will most likely
need l20n. To begin with, this question cannot be answered with the same
echo verb (see the yes/no point below) as most other questions I see in
the pdf. It most certainly needs 'Se and Chan e whereas most other OS
phrases are tortured into working with Tha/Chan eil. Secondly, referring
to family members often involves complex grammar and in many languages
will not work with a simple daisychain model. For example in
Irish/Welsh/Gaelic, the use of possessives triggers something called
lenition (change of the first sound of a following word). So whereas
"mother" is /màthair/ your mother is /do mhàthair/ (note the h):
/An i Màiri do mhàthair?/
[be? she Mary your mother]
However, "your husband" your invoke a pattern of alienable possession
using "an" (the) and "aig" (at) i.e.
/An e Seumas an duine agad//
//[/be? he James the man atyou]
For sure not all languages will be this much of a headache but many will
b a lot more complex than English or Spanish, neither of which have much
noun morphology, plurals aside.

The <nth> one
is also a tricky one, reference to 3rd person objects will invoke
reference to grammatical gender in many languages. So if a users thinks
of "one" being the phone number, this would be feminine and come out as
/An dàrna tè/
[the second femaleone]
But if they think of it one being "option/choice" it may equally be
/An dàrna fear/
[the second maleone]

Yes/No... I think we were promised some more l10n sensitive approaches
in OS regarding languages which have no two word Yes/No pair but I
haven't seen much of it. This will be tricky for many languages. For
example, "Do you want to call Jack" would probably call for /Tha/ or
/Chan eil/ but "Is this the song you wanted" would most likely need /'S
e/ or /Chan e/ - and since some of these refer to people, it would also
require a split betwenn /'S e/ (be he) and /'S i/ (be she).

I think for proof of concept, a pre-alpha or anything like that, you
should *really* pick the nastiest language (from a linguistic angle)
which is highly active in OS l10n. Irish or Welsh would be good
candidates, but even Czeck, Hungarian or Finnish would be more helpful
for long term sustainability.

Regarding b) ... A lot of the locales I see in OS don't have voice
recognition. Is this a trainable engine which the user can 'teach' to
recognize their own language (which would be real cool if set up right)
or is this a 'big languages' only toy?

Michael

--
*Akerbeltz <http://www.faclair.com/>*
Goireasan Gàidhlig air an lìon
Fòn: +44-141-946 4437
Facs: +44-141-945 2701

*Tha Gàidhlig aig a' choimpiutair agad, siuthad, feuch e!*
Iomadh rud eadar prògraman oifis, brabhsairean, predictive texting,
geamannan is mòran a bharrachd. Tadhail oirnn aig www.iGàidhlig.net
<http://www.iGaidhlig.net/>

Tiffanie Shakespeare

unread,
Jun 11, 2015, 4:42:27 PM6/11/15
to fi...@akerbeltz.org, Aaron Wu, Andre Natal, Eric Pang, Kelly Davis, Rob MacDonald, Sandip Kamat, dev-...@lists.mozilla.org
Hey Michael! Sorry about that, I'm not sure how I missed your first email,
but thanks for following up!

My gut reaction is "whoa - I'm in waaaay over my head!" o_O Thank you so
much for taking the time to write this up! It's been very helpful and I
clearly (as an English speaking) have overlooked a lot of important
nuances. Do you know someone I could work with to improve things and better
account for the "nasty" languages? Sadly I am totally unfamiliar with your
suggestions.

Unfortunately, I honestly can't answer most of your questions, but I've
included my fabulous team who hopefully can.

*Sandip* - is our eventual goal to support all device languages including
langpacks?

I know that we intend of having a "community" portion (as you see in the
spec) but that's less about training and more about expanding/improving the
language model already supported. IDK if a contributor would be able to
contribute to a language model before it's offered. Hopefully *Andre or
Kelly *can answer this. (re: the very last point in Michael's email)

Thanks again Michael!
You can also find me at tif on IRC :)



On Thu, Jun 11, 2015 at 11:50 AM, Michael Bauer <fi...@akerbeltz.org> wrote:

> Maybe my question about how this will get handled in smaller languages
> didn't get through so I'll just chip in my thoughts.
>
> The two main issues I can see are not around translation but
> a) languages with more morphology than English or Spanish (which is why I
> think that's a really poor pair, from an l10n angle, even something
> Slavonic would have been better)
> b) How Vaani is supposed to tie into l10n in general
>
> Regarding a), many languages will add things to a name, at the front or
> end, in sentences like "Call Jack" which might become "Call
> Jack[inflection]" or "Call J[inflection]ack" or some such. *Is <name>
> your <relationship>* will potentially come out as utter garbage in many
> languages unless this is handled really carefully and will most likely need
> l20n. To begin with, this question cannot be answered with the same echo
> verb (see the yes/no point below) as most other questions I see in the pdf.
> It most certainly needs 'Se and Chan e whereas most other OS phrases are
> tortured into working with Tha/Chan eil. Secondly, referring to family
> members often involves complex grammar and in many languages will not work
> with a simple daisychain model. For example in Irish/Welsh/Gaelic, the use
> of possessives triggers something called lenition (change of the first
> sound of a following word). So whereas "mother" is *màthair* your mother
> is *do mhàthair* (note the h):
> *An i Màiri do mhàthair?*
> [be? she Mary your mother]
> However, "your husband" your invoke a pattern of alienable possession
> using "an" (the) and "aig" (at) i.e.
> *An e Seumas an duine agad*
> *[*be? he James the man atyou]
> For sure not all languages will be this much of a headache but many will b
> a lot more complex than English or Spanish, neither of which have much noun
> morphology, plurals aside.
>
> The <nth> one
> is also a tricky one, reference to 3rd person objects will invoke
> reference to grammatical gender in many languages. So if a users thinks of
> "one" being the phone number, this would be feminine and come out as
> *An dàrna tè*
> [the second femaleone]
> But if they think of it one being "option/choice" it may equally be
> *An dàrna fear*
> [the second maleone]
>
> Yes/No... I think we were promised some more l10n sensitive approaches in
> OS regarding languages which have no two word Yes/No pair but I haven't
> seen much of it. This will be tricky for many languages. For example, "Do
> you want to call Jack" would probably call for *Tha* or *Chan eil* but
> "Is this the song you wanted" would most likely need *'S e* or *Chan e* -
> and since some of these refer to people, it would also require a split
> betwenn *'S e* (be he) and *'S i* (be she).

Michael Bauer

unread,
Jun 11, 2015, 5:02:13 PM6/11/15
to Tiffanie Shakespeare, Aaron Wu, Andre Natal, Eric Pang, Kelly Davis, Rob MacDonald, Sandip Kamat, dev-...@lists.mozilla.org
Hi Tiffanie

Sgrìobh Tiffanie Shakespeare na leanas 11/06/2015 aig 21:42:
> Hey Michael! Sorry about that, I'm not sure how I missed your first email,
> but thanks for following up!
Don't worry, it happens - and you're welcome!
> My gut reaction is "whoa - I'm in waaaay over my head!" o_O Thank you so
> much for taking the time to write this up! It's been very helpful and I
> clearly (as an English speaking) have overlooked a lot of important
> nuances. Do you know someone I could work with to improve things and better
> account for the "nasty" languages? Sadly I am totally unfamiliar with your
> suggestions.
I'm happy to be a linguistic sounding board (my main language being
Scottish Gaelic which is probably as nuts as it gets). Just bear in mind
that I'm mainly a localizer and not a developer and - though I'd love to
- can't provide fixes to code or anything like that.

I think, in a nutshell, the message can probably be reduced to:
Assume that English is a language which does very little to its words
and that in almost all cases, other languages have much much more
'grammar' going on.
So use placeholders sparingly and never daisychain phrases which are
heading for l10n; ideally run anything that goes beyond placeholders for
product names or numbers past a localizer and it's best to test any new
feature which is heading for l10n across a half dozen languages which
have different linguistic and typographical challenges. A good mix would
probably be Japanese (ideograms and abugida mixed script, inflecting),
Arabic (right to left, inflecting), Basque (highly agglutinating),
Irish/Gaelic (inflecting in "weird" ways, complex plurals, frequent
width issues, no yes/no pair), Thai/Hindi (script with diacritics above
and below, often with height issues)....

Michael Bauer

unread,
Jun 12, 2015, 9:32:53 AM6/12/15
to Kelly Davis, Tiffanie Shakespeare, Sandip Kamat, Eric Pang, dev-...@lists.mozilla.org, Aaron Wu, Rob MacDonald, Andre Natal


Sgrìobh Kelly Davis na leanas 12/06/2015 aig 07:13:
>
> If there is some case no handled by i20n of the UI text and speech
> recognition grammar, let us know!
Hard to say, I'm not up to speed regarding what stage l20n HAS reached
in Mozilla. Certainly as far as localization work goes there is not much
sign of it.
> This is a separate project and problem we are also working on.
>
> Part of the app will allow users to speak selected sentences to the
> system in a language X we currently have no speech recognition model
> for. We will collect these sentences on our server, then train up a
> speech recognition model to recognize speech in language X and allow
> this model to be downloaded to the device. This will however take on
> the order of 1000 hours of recorded a speech in language X.
Sounds cool. Will this SR tool be under a free license so we can
potentially use it elsewhere?

>
> PS: In addition to the recorded language samples of language X we will
> also need to create n-gram models of the language to create a speech
> recognition model. Currently we are going to partner with
> http://statmt.org/ngrams/ and also Kevin Scannell, who I assume you
> know Michael, and his n-gram model data http://crubadan.org So we
> should be fine for the near term.
Yes, I know Kevin :)

Michael Wolf

unread,
Jun 12, 2015, 9:54:00 AM6/12/15
to
Michael Bauer schrieb:
>> If there is some case no handled by i20n of the UI text and speech
>> recognition grammar, let us know!
> Hard to say, I'm not up to speed regarding what stage l20n HAS reached
> in Mozilla. Certainly as far as localization work goes there is not much
> sign of it.
>> This is a separate project and problem we are also working on.

Hello Tiffanie, hello Michael,

you can visit this l20n site: http://l20n.org/

Under the heading "NATURAL LANGUAGE" you will find the link "Learn L20n
by example". This leads you to a short tutorial.


Regards,

Michael Wolf


Sandip Kamat

unread,
Jun 12, 2015, 9:57:21 AM6/12/15
to Tiffanie Shakespeare, Eric Pang, fi...@akerbeltz.org, Kelly Davis, dev-...@lists.mozilla.org, Aaron Wu, Rob MacDonald, Andre Natal
Yes, eventually we'll build support for all device supported languages....
Right now in baby step phase though with English and Spanish first.

Sandip
On Jun 11, 2015 9:42 PM, "Tiffanie Shakespeare" <tshake...@mozilla.com>
wrote:

> Hey Michael! Sorry about that, I'm not sure how I missed your first email,
> but thanks for following up!
>
> My gut reaction is "whoa - I'm in waaaay over my head!" o_O Thank you so
> much for taking the time to write this up! It's been very helpful and I
> clearly (as an English speaking) have overlooked a lot of important
> nuances. Do you know someone I could work with to improve things and better
> account for the "nasty" languages? Sadly I am totally unfamiliar with your
> suggestions.
>

Sandip Kamat

unread,
Jun 12, 2015, 10:31:48 AM6/12/15
to fi...@akerbeltz.org, Eric Pang, Tiffanie Shakespeare, Kelly Davis, dev-...@lists.mozilla.org, Aaron Wu, Rob MacDonald, Andre Natal
Hi Michael,

I would like to understand what it means (efforts-wise) to add complicated
languages. In principle, I agree to "front-load" the complexity, however we
must support product commercialization for partners as we proceed. For
history, this is the first quarter ever that we've a full team working on
voice recognition (after I and Andre's lonely fights for over a year) so I
would like to keep the partner interest with some quick wins (launches)
before they run out of patience. They've asked for commercial level launch
implementation several times already.

Maybe there's a parallel path here that I would like to understand a bit
more. Sorry haven't been able to catch up with the full thread yet due to
travels in past couple weeks but this is something worth looking into.

Sandip
On Jun 12, 2015 3:19 PM, "Michael Bauer" <fi...@akerbeltz.org> wrote:

> Which is why I'd recommend adding at least a couple more 'complicated'
> languages fast because otherwise you build architecture which will fall
> over when it hits the first vocative or something. Remember how long it
> took to fix the locale issues in Firefox mobile (for locales not supported
> by Android)? And Microsoft has *still* not managed to make the leap to
> plural formatting because all their systems for built for English plurals...
>
> Michael
>
> Sgrìobh Sandip Kamat na leanas 12/06/2015 aig 14:57:
>
> Yes, eventually we'll build support for all device supported languages....
> Right now in baby step phase though with English and Spanish first.
>
> Sandip
>
>

Michael Bauer

unread,
Jun 12, 2015, 2:28:00 PM6/12/15
to Sandip Kamat, Eric Pang, Tiffanie Shakespeare, Kelly Davis, dev-...@lists.mozilla.org, Aaron Wu, Rob MacDonald, Andre Natal
Hi Sandip

Sgrìobh Sandip Kamat na leanas 12/06/2015 aig 15:31:
>
> Hi Michael,
>
> I would like to understand what it means (efforts-wise) to add
> complicated languages.
>
I'm not sure how one could measure that because I don't know what's
involved at your end when it comes to dealing with linguistic craziness
>
> In principle, I agree to "front-load" the complexity, however we must
> support product commercialization for partners as we proceed.
>
Yes, I thought it would be something like that, seeing that Spanish
speaking countries are amongst the current release markets. Wasn't there
a decision to stop taking on manufacturer wishlists for a year or so? I
recall hearing something in Portland but may be wrong.
>
> For history, this is the first quarter ever that we've a full team
> working on voice recognition (after I and Andre's lonely fights for
> over a year) so I would like to keep the partner interest with some
> quick wins (launches) before they run out of patience. They've asked
> for commercial level launch implementation several times already.
>
Ok
>
> Maybe there's a parallel path here that I would like to understand a
> bit more. Sorry haven't been able to catch up with the full thread yet
> due to travels in past couple weeks but this is something worth
> looking into.
>
I'd be happy to help as much as I can in terms of linguistic expertise,
though Kevin (if he has any sleep left he can but back on) is probably a
better port of call as he is a developer with a very good linguistic
head (and Irish is almost as nuts as Scottish Gaelic).

Michael

Sandip Kamat

unread,
Jun 14, 2015, 12:33:37 PM6/14/15
to fi...@akerbeltz.org, Eric Pang, Tiffanie Shakespeare, Kelly Davis, dev-...@lists.mozilla.org, Aaron Wu, Rob MacDonald, Andre Natal
Thanks Michael. To clarify, this is Mozilla's wishlist feature, not
partner's. But we need partners to commercialize this with their devices.
Hence the efforts to get it ready in an agile way so we can incrementally
launch in products.

Sandip
0 new messages