Data from different accents and effect on main target accent

151 views
Skip to first unread message

Armin Oliya

unread,
Feb 14, 2018, 7:46:48 AM2/14/18
to kaldi-help
A few questions about dealing with multiple major accents (native speakers, different regions) ; 
say main task is decoding american english (A) and we have a sizeable dataset in british english (B) that could be a waste if not used. 

- is including set B during training A generally expected to improve results on A? 
- are changes in lexicon/phones necessary to handle for the different pronounciations, or would you try using the defualt phones from A?
- besides combining all training material into A+B, is it also common to have specialized models for A and B separately, detect accent and use the right model in production? 

i've seen personal assistants that differentiate between major accents and those which don't, so i wonder if it's about having large enough datasets per accent or whether a combined approach is also expected to have notable gains. 

Thanks!

Armin Oliya

unread,
Feb 14, 2018, 8:16:44 AM2/14/18
to kaldi-help
Also to add: 

- if you end up training on the combined A+B set, are there specific hyperparameter changes that you suggest? for example inclreasing <#leaves> <#gauss> in train_sat.sh

Daniel Povey

unread,
Feb 14, 2018, 1:06:49 PM2/14/18
to kaldi-help
If your aim is to do as well as possible on American English, then I don't think including British English will help.  If your aim is to do as well as possible on both, then you have a choice between pooling them vs. training two separate systems.  My understanding is that it doesn't make a huge amount of difference which way you do it.  For simplicity, training a single system may be the best route.
If you are training a single combined system, it's OK to just use the American lexicon and let the acoustic model sort out any mismatches.  Obviously if you were training just a British English system, you'd use the British lexicon.

Dan


--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/6d9f11f1-9850-4b74-8302-7f838fc67721%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Armin Oliya

unread,
Feb 14, 2018, 6:00:12 PM2/14/18
to kaldi-help
Thanks Dan, clear.


On Wednesday, February 14, 2018 at 7:06:49 PM UTC+1, Dan Povey wrote:
If your aim is to do as well as possible on American English, then I don't think including British English will help.  If your aim is to do as well as possible on both, then you have a choice between pooling them vs. training two separate systems.  My understanding is that it doesn't make a huge amount of difference which way you do it.  For simplicity, training a single system may be the best route.
If you are training a single combined system, it's OK to just use the American lexicon and let the acoustic model sort out any mismatches.  Obviously if you were training just a British English system, you'd use the British lexicon.

Dan

On Wed, Feb 14, 2018 at 8:16 AM, Armin Oliya <armin...@gmail.com> wrote:
Also to add: 

- if you end up training on the combined A+B set, are there specific hyperparameter changes that you suggest? for example inclreasing <#leaves> <#gauss> in train_sat.sh



On Wednesday, February 14, 2018 at 1:46:48 PM UTC+1, Armin Oliya wrote:
A few questions about dealing with multiple major accents (native speakers, different regions) ; 
say main task is decoding american english (A) and we have a sizeable dataset in british english (B) that could be a waste if not used. 

- is including set B during training A generally expected to improve results on A? 
- are changes in lexicon/phones necessary to handle for the different pronounciations, or would you try using the defualt phones from A?
- besides combining all training material into A+B, is it also common to have specialized models for A and B separately, detect accent and use the right model in production? 

i've seen personal assistants that differentiate between major accents and those which don't, so i wonder if it's about having large enough datasets per accent or whether a combined approach is also expected to have notable gains. 

Thanks!

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages