How Predict <unk> words while decoding

238 views
Skip to first unread message

Saranya V

unread,
Mar 23, 2021, 2:56:27 AM3/23/21
to kaldi-help
Hai All,

How to predict <unk> word for the words which is not in our lexicon.

Is it possible to do it.

Thanks


Regards,
Saranya

Daniel Povey

unread,
Mar 23, 2021, 3:08:14 AM3/23/21
to kaldi-help
You can either use the <unk> phone, which will give you <unk> with quite poor accuracy; or you could
create a phone n-gram language model to stand in for it, as in
/ceph-dan/kaldi/egs/tedlium/s5_r2/local/run_unk_model.sh


--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/5fd2beed-e290-4094-a948-9c6ce6000a8cn%40googlegroups.com.

Saranya V

unread,
Mar 23, 2021, 3:29:43 AM3/23/21
to kaldi...@googlegroups.com
Thanks for your reply Dan.

What I did is that,

In Lexicon.txt added <UNK> SPN

ex.

<UNK> SPN

zero z i r ou

one W AH n

two tx UW

three th r IY


Corpus:


0  zero

1  one

2  two

3  three


Created a graph using the above lexicon and corpus..


Tested with the sentence "i want to order one pizza"


My expected output is "<UNK> <UNK> <UNK> <UNK>  one <UNK>"


but the actual output which i got is "one three two zero one right" .


Still I increased the probability of <UNK> but haven't predicted.


Is anything wrong here.


Thanks,

Regards,

Saranya





Daniel Povey

unread,
Mar 23, 2021, 4:37:37 AM3/23/21
to kaldi-help
Detecting OOVs is a very difficult task, especially consecutive ones, because word segmentation is hard if you don't know the
words in the first place.

Saranya V

unread,
Mar 23, 2021, 8:33:16 AM3/23/21
to kaldi...@googlegroups.com
ok Thanks.

Tried to use run_unk_model.sh to predict <unk> words.

Sample exp:

Ex.
1. lexicon

<UNK> SPN

zero z i r ou

one W AH n

two tx UW

three th r IY

four f o r


2. Oov lexicon

complimentary k ax m p l i m EH n tx ER IY

beverage b EH V ER i j

yes y EH s

please p l IY z

go g ou

ahead AH h EH dx


Using the above oov and fixed lexicon created the L.FST (Which has the oov words phone combination along the base lexicon words) and then created a graph. 


Tested with the sentence "six five yes"


My expected output is "six five <UNK>"


but the actual output which i got is "six five eight" .


Is my understanding correct? .. If yes why its not predicting <UNK>


Correct me if I am wrong


Thanks

Regards,

Saranya



Reply all
Reply to author
Forward
Message has been deleted
0 new messages