how kaldi handle oov has multi spelling ?

145 views
Skip to first unread message

Trương Trang

unread,
Jul 28, 2021, 3:58:18 AM7/28/21
to kaldi-help

Hi,
I'm have search many time for how kaldi can handle oov, which has multi spelling when training asr. Because when we define lexicon of word, when 1 oov has convert to phoneme, it's can be has multi spelling (sample in my language vietnames):
facebook: 'phây búc'
facebook: 'phết búc' ....
so how kaldi handle it, or It' just using only one lexicon of word for training. My question can be weird, but because i can't search and found any documentation relation.

so thank you.


Daniel Povey

unread,
Jul 28, 2021, 10:17:30 AM7/28/21
to kaldi...@googlegroups.com
Lexicons do support multiple pronunciations, that is very common.  I am not sure whether you are using the word "oov" correctly here- calling a word out of vocabulary would imply a pronunciation is not known for it.

--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/26ff8d99-42b4-4a14-a6ed-b2375618cdebn%40googlegroups.com.

Trương Trang

unread,
Jul 28, 2021, 10:27:59 PM7/28/21
to kaldi-help
tks Dan,
I'm sorry for using word 'oov' in this case, it's not make sense, i mean it's word not in my language, it's foreign language like 'english' word in my vietnamese language. Because it's foreign word, so each people can have another understand how to spell this, like my sample above.
I'm understand that lexicon can define word can has multi spell, but i'm not sure how AM can figure out, how to train each frame match multi phoneme of each spell. 
Like i have audio with transcript: "tôi đang mở facebook " (in english: i am opening facebook) and because facebook has multi spell in my spoken language ( 'phây búc' / 'phết búc')
so frame of audio representation for 'facebook' word:
|  frame of facebook |
|    frame_1     |  frame_2| 
|    phây/phết  |                |
I mean how AM can train exactly with one frame can be has multi phoneme mapping. 
Sorry because i can't describe good problem for you can understand.

Daniel Povey

unread,
Jul 28, 2021, 11:44:41 PM7/28/21
to kaldi-help
It chooses the one that matches best in each instance, or uses a weighted combination.  You could try reading the "HTK Book" to understand HMMs.

Trương Trang

unread,
Jul 28, 2021, 11:56:44 PM7/28/21
to kaldi-help

tks Dan, i will read more. thank you
Reply all
Reply to author
Forward
0 new messages