LM Rescoring

Roshan S Sharma

unread,

Apr 3, 2019, 4:58:24 PM4/3/19

to kaldi...@googlegroups.com

Hi,

I was wondering if there are good resources (read,papers) on rescoring in Kaldi, and how exactly it is implemented?

Why don't we set the graph cost to 0 before rescoring? Why do we subtract a cost, and what do we subtract? If the LM used for rescoring is trained on a larger vocab compared to the initial, what sort of pruning happens? Or vice-versa, if we reacore with an LM with a smaller vocab, is there any pruning?

Any advice would be greatly appreciated.

Thank you!

Roshan S Sharma

unread,

Apr 4, 2019, 1:07:18 AM4/4/19

to kaldi...@googlegroups.com

Sorry I wasn't specific: I was interested in looking at the RNNLM Rescoring.
We have lmrescore_pruned.sh, where we explicitly subtract costs, and lmrescore.sh where we use interpolation with the existing LM and RNNLM.

In this context, are there resources to look at this? (apart from the codebase)

--

Regards,

Roshan S. Sharma

Rudolf A. Braun

unread,

Apr 4, 2019, 9:48:39 AM4/4/19

to kaldi-help

It works to set the graph costs to 0 and then do rescoring with a new LM. However by doing so one is throwing away the transition and pronunciation probabilities (the graph cost is not just the LM cost), so it is more correct to first subtract the current LM costs, and then add the new ones. That's why you will see a cost subtracted (the LM already used).

When people talk about pruning in rescoring they mean that one does not fully rescore the lattice, but instead uses some intelligent method so that one avoids going down all paths. You have to do this when using an RNNLM because fully rescoring a lattice that was created from decoding with, for example, a lattice beam of 8 would take a very long time.

You cannot in kaldi rescore with a LM that has a different vocabulary than the initial one.

Roshan S Sharma

unread,

Apr 4, 2019, 1:10:36 PM4/4/19

to kaldi...@googlegroups.com

Thank you.

Are there work arounds to rescore on an LM with a larger vocab though?

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/7aa25ef9-39ff-471b-ad6d-454ce797fbf5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Daniel Povey

unread,

Apr 4, 2019, 1:15:41 PM4/4/19

to kaldi-help

If the LM has a larger vocab you can still convert to ARPA using the provided words.txt, I think it maps OOV's to '<unk>' if you set the right options.

Obviously it won't produce any words that weren't there originally.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CAJ1Uch4O8XjU9Kb_fLioWfxuu72aCo71MK_551f07DhhcQB4Sw%40mail.gmail.com.

Reply all

Reply to author

Forward