How Predict <unk> words while decoding

Saranya V

unread,

Mar 23, 2021, 2:56:27 AM3/23/21

to kaldi-help

Hai All,

How to predict <unk> word for the words which is not in our lexicon.

Is it possible to do it.

Thanks

Regards,

Saranya

Daniel Povey

unread,

Mar 23, 2021, 3:08:14 AM3/23/21

to kaldi-help

You can either use the <unk> phone, which will give you <unk> with quite poor accuracy; or you could

create a phone n-gram language model to stand in for it, as in

/ceph-dan/kaldi/egs/tedlium/s5_r2/local/run_unk_model.sh

--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/5fd2beed-e290-4094-a948-9c6ce6000a8cn%40googlegroups.com.

Saranya V

unread,

Mar 23, 2021, 3:29:43 AM3/23/21

to kaldi...@googlegroups.com

Thanks for your reply Dan.

What I did is that,

In Lexicon.txt added <UNK> SPN

ex.

<UNK> SPN

zero z i r ou

one W AH n

two tx UW

three th r IY

Corpus:

0 zero

1 one

2 two

3 three

Created a graph using the above lexicon and corpus..

Tested with the sentence "i want to order one pizza"

My expected output is "<UNK> <UNK> <UNK> <UNK> one <UNK>"

but the actual output which i got is "one three two zero one right" .

Still I increased the probability of <UNK> but haven't predicted.

Is anything wrong here.

Thanks,

Regards,

Saranya

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CAEWAuySuyvT7BWOJA79EGcf2Cgu49QavUHkU%2BF1aRkNv74j4zw%40mail.gmail.com.

Daniel Povey

unread,

Mar 23, 2021, 4:37:37 AM3/23/21

to kaldi-help

Detecting OOVs is a very difficult task, especially consecutive ones, because word segmentation is hard if you don't know the

words in the first place.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CAGWA%2Bud9CLCfETztRa9wPVZYrctmkR7aLjEkZrGP%2BXssVeyWxw%40mail.gmail.com.

Saranya V

unread,

Mar 23, 2021, 8:33:16 AM3/23/21

to kaldi...@googlegroups.com

ok Thanks.

Tried to use run_unk_model.sh to predict <unk> words.

Sample exp:

Ex.

1. lexicon

<UNK> SPN

zero z i r ou

one W AH n

two tx UW

three th r IY

four f o r

2. Oov lexicon

complimentary k ax m p l i m EH n tx ER IY

beverage b EH V ER i j

yes y EH s

please p l IY z

go g ou

ahead AH h EH dx

Using the above oov and fixed lexicon created the L.FST (Which has the oov words phone combination along the base lexicon words) and then created a graph.

Tested with the sentence "six five yes"

My expected output is "six five <UNK>"

but the actual output which i got is "six five eight" .

Is my understanding correct? .. If yes why its not predicting <UNK>

Correct me if I am wrong

Thanks

Regards,

Saranya

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CAEWAuyTwvKJM1u81TehJuA1WEK3pixs2KE0oPpawCkJkOesu9g%40mail.gmail.com.

Reply all

Reply to author

Forward

Message has been deleted