training LM using phone sequences

Hua wang

unread,

Aug 29, 2018, 9:04:32 AM8/29/18

to phnrec

Hi:

I'm doing LRE recognization, here already have some phone sequences decoded by timit Phone Recognizer. ,

for examples below, each line per utterance.

pau eh m er n z eh hh iy uw m uw hh ay iy z er n hh ih ng dh iy uw iy n v ow f ay n b iy ih dh iy n pau

pau pau s iy iy z ah v iy iy y ay iy ah v ow n dh ow z ay m ah n pau

pau m l iy n z ow l iy eh n t ey s eh n ah hh ey n sh er sh ah ow m n pau m ay y ae hh uw pau

When training LM models, I'm using Srilm tools with no smooth .

srilmbin/ngram-count -order 3 -text MlfLang/${lang].text -lm LangModel/gram_${lang}_3 -addsmooth 0 .

But the eer=25% ! not good .

So can some one give any suggestion ?

Thanks.

Petr Schwarz

unread,

Aug 29, 2018, 9:11:55 AM8/29/18

to phn...@googlegroups.com

Hi. Usually some n-grams are missing in some languages. You can train one global model on all languages

and then to adapt it to particular languages, for example using language model interpolation. It helps a lot.

At least you should use only those n-grams that are correctly estimated for all languages in your scoring.

Petr

--

---
You received this message because you are subscribed to the Google Groups "phnrec" group.
To unsubscribe from this group and stop receiving emails from it, send an email to phnrec+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hua wang

unread,

Aug 31, 2018, 5:41:35 AM8/31/18

to phnrec

Hi, Petr Schwarz:

using your suggestion, it really got good . Thanks.

在 2018年8月29日星期三 UTC+8下午9:04:32，Hua wang写道：

Reply all

Reply to author

Forward