Data from different accents and effect on main target accent

Armin Oliya

unread,

Feb 14, 2018, 7:46:48 AM2/14/18

to kaldi-help

A few questions about dealing with multiple major accents (native speakers, different regions) ;

say main task is decoding american english (A) and we have a sizeable dataset in british english (B) that could be a waste if not used.

- is including set B during training A generally expected to improve results on A?

- are changes in lexicon/phones necessary to handle for the different pronounciations, or would you try using the defualt phones from A?

- besides combining all training material into A+B, is it also common to have specialized models for A and B separately, detect accent and use the right model in production?

i've seen personal assistants that differentiate between major accents and those which don't, so i wonder if it's about having large enough datasets per accent or whether a combined approach is also expected to have notable gains.

Thanks!

Armin Oliya

unread,

Feb 14, 2018, 8:16:44 AM2/14/18

to kaldi-help

Also to add:

- if you end up training on the combined A+B set, are there specific hyperparameter changes that you suggest? for example inclreasing <#leaves> <#gauss> in train_sat.sh

Daniel Povey

unread,

Feb 14, 2018, 1:06:49 PM2/14/18

to kaldi-help

If your aim is to do as well as possible on American English, then I don't think including British English will help. If your aim is to do as well as possible on both, then you have a choice between pooling them vs. training two separate systems. My understanding is that it doesn't make a huge amount of difference which way you do it. For simplicity, training a single system may be the best route.

If you are training a single combined system, it's OK to just use the American lexicon and let the acoustic model sort out any mismatches. Obviously if you were training just a British English system, you'd use the British lexicon.

Dan

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/6d9f11f1-9850-4b74-8302-7f838fc67721%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Armin Oliya

unread,

Feb 14, 2018, 6:00:12 PM2/14/18

to kaldi-help

Thanks Dan, clear.

On Wednesday, February 14, 2018 at 7:06:49 PM UTC+1, Dan Povey wrote:

If your aim is to do as well as possible on American English, then I don't think including British English will help. If your aim is to do as well as possible on both, then you have a choice between pooling them vs. training two separate systems. My understanding is that it doesn't make a huge amount of difference which way you do it. For simplicity, training a single system may be the best route.
If you are training a single combined system, it's OK to just use the American lexicon and let the acoustic model sort out any mismatches. Obviously if you were training just a British English system, you'd use the British lexicon.

Dan

On Wed, Feb 14, 2018 at 8:16 AM, Armin Oliya <armin...@gmail.com> wrote:

Also to add:

- if you end up training on the combined A+B set, are there specific hyperparameter changes that you suggest? for example inclreasing <#leaves> <#gauss> in train_sat.sh

On Wednesday, February 14, 2018 at 1:46:48 PM UTC+1, Armin Oliya wrote:
A few questions about dealing with multiple major accents (native speakers, different regions) ;
say main task is decoding american english (A) and we have a sizeable dataset in british english (B) that could be a waste if not used.

- is including set B during training A generally expected to improve results on A?
- are changes in lexicon/phones necessary to handle for the different pronounciations, or would you try using the defualt phones from A?
- besides combining all training material into A+B, is it also common to have specialized models for A and B separately, detect accent and use the right model in production?

i've seen personal assistants that differentiate between major accents and those which don't, so i wonder if it's about having large enough datasets per accent or whether a combined approach is also expected to have notable gains.

Thanks!

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Reply all

Reply to author

Forward