how kaldi handle oov has multi spelling ?

Trương Trang

unread,

Jul 28, 2021, 3:58:18 AM7/28/21

to kaldi-help

Hi,
I'm have search many time for how kaldi can handle oov, which has multi spelling when training asr. Because when we define lexicon of word, when 1 oov has convert to phoneme, it's can be has multi spelling (sample in my language vietnames):
facebook: 'phây búc'
facebook: 'phết búc' ....
so how kaldi handle it, or It' just using only one lexicon of word for training. My question can be weird, but because i can't search and found any documentation relation.

so thank you.

Daniel Povey

unread,

Jul 28, 2021, 10:17:30 AM7/28/21

to kaldi...@googlegroups.com

Lexicons do support multiple pronunciations, that is very common. I am not sure whether you are using the word "oov" correctly here- calling a word out of vocabulary would imply a pronunciation is not known for it.

--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/26ff8d99-42b4-4a14-a6ed-b2375618cdebn%40googlegroups.com.

Trương Trang

unread,

Jul 28, 2021, 10:27:59 PM7/28/21

to kaldi-help

tks Dan,

I'm sorry for using word 'oov' in this case, it's not make sense, i mean it's word not in my language, it's foreign language like 'english' word in my vietnamese language. Because it's foreign word, so each people can have another understand how to spell this, like my sample above.

I'm understand that lexicon can define word can has multi spell, but i'm not sure how AM can figure out, how to train each frame match multi phoneme of each spell.

Like i have audio with transcript: "tôi đang mở facebook " (in english: i am opening facebook) and because facebook has multi spell in my spoken language ( 'phây búc' / 'phết búc')

so frame of audio representation for 'facebook' word:

| frame of facebook |

| frame_1 | frame_2|

| phây/phết | |

I mean how AM can train exactly with one frame can be has multi phoneme mapping.

Sorry because i can't describe good problem for you can understand.

Daniel Povey

unread,

Jul 28, 2021, 11:44:41 PM7/28/21

to kaldi-help

It chooses the one that matches best in each instance, or uses a weighted combination. You could try reading the "HTK Book" to understand HMMs.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/96591d89-a79c-4ef7-b5b9-78f50a7b484an%40googlegroups.com.

Trương Trang

unread,

Jul 28, 2021, 11:56:44 PM7/28/21

to kaldi-help

tks Dan, i will read more. thank you

Reply all

Reply to author

Forward