boosting detection of a set of words

Wonkyum Lee

unread,

Aug 10, 2016, 6:17:30 PM8/10/16

to kaldi-help

Hi folks,

I recently built general purpose ASR (chain model, tri-gram LM with 200k vocab).

I am applying this ASR to specific recognition task, where I am given a set of words that I should not miss in the audio. The words do not always exist in every utterances. But if it is there, I don't want to miss that word even though my overall WER would be degraded.

I know that if I build LM that is biased to those words, it would recognize those words more. However, I am wondering if there is a way to boost of detection of the set of words without re-training LM because the set of words would be given dynamically (even for each utterance).

Thanks,

Daniel Povey

unread,

Aug 10, 2016, 6:23:29 PM8/10/16

to kaldi-help

You could modify [boost] the probabilities of the arcs in the HCLG FST
that have those words on them. This isn't ideal, but it would help.
If you pre-compute a data-structure consisting of lists of state/arc
indexes giving arcs for each word in your vocabulary, you could quite
efficiently modify the probabilities of given words in the FST and
then revert them back after processing the utterance.

Dan

> --
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kaldi-help+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Anis Chihi

unread,

Aug 11, 2016, 10:29:43 AM8/11/16

to kaldi-help

Maybe you could adapt your DNN by retraining it on this set of words. This could be pretty fragile since you might lose some of the information stored in your DNN. Try small learning rates and few epochs, and try to monitor all that. It could probably help.

Cheers

Jan Trmal

unread,

Aug 11, 2016, 10:53:33 AM8/11/16

to kaldi-help

Another thing you could try is just to rescore using the boosted LM
(either by boosted G, or modify ARPA model and recompute const arpa
or I have a code that uses srilm libs to rescore using the ARPA model
directly).

Yes, it's just an ugly hack, as it won't help you to get more
(key)words into the lattice. But on the other hand, the lattice recall
(or STWV) is usually quite OK and the issue just the (key)word scores
are low. I was playing with this a couple of years back and saw ~3%
abs improvement on ATWV for most of the babel langs. Due to the lack
of bandwidth, I never got to use it in a real system.
YMMV -- the possible improvement probably depends also on how good is
your calibration technique.
y.

Wonkyum Lee

unread,

Aug 11, 2016, 2:22:19 PM8/11/16

to kaldi-help, dpo...@gmail.com

Thanks Dan.

That does not look straightforward to me but I guess it can surely boost words in the recognition process.
Let me go through that way.

Alternatively, I was thinking of another way, where I boost loglikelihood of the pre-defined words in the lattice so that we can enforce the 1best path to include that words. Do you think it would help as well?

Best,

Wonkyum

Wonkyum Lee

unread,

Aug 11, 2016, 2:38:53 PM8/11/16

to kaldi-help

Thanks Anis for your thought.

That might help. However, I am not planning to re-train acoustic model since the set of words that needs to be detected will be given dynamically. I can't train model to match all cases.

Thanks

Wonkyum Lee

unread,

Aug 11, 2016, 3:53:41 PM8/11/16

to kaldi-help

Thanks Yenda.

If we can rescore ARPA model directly without recomputing const arpa, that sounds cool. Can you share the code if you don't mind?

I also remember that STWV was OK in the babel language even though ATWV was low. In my application, oracle WER is around 5%, however 1best path WER is 25%. I guess rescoring with boosted LM would improve the number of hits.

Thanks,

Wonkyum

Jan Trmal

unread,

Aug 12, 2016, 3:40:21 PM8/12/16

to kaldi-help

OK, I'll try to look it up and will send it to you directly. I'm not
sure if we want to make this part of kaldi libraries, due to the srilm
license.
y.

Wonkyum Lee

unread,

Aug 12, 2016, 3:51:56 PM8/12/16

to kaldi-help

Thanks Yenda. I really appreciate it.

Wonkyum

lianrzh

unread,

Oct 8, 2020, 10:58:30 PM10/8/20

to kaldi-help

Hi, Wonkyum

I wonder if you have a good solution to this problem. If so, is it convenient to know about it.

Best,

lianrzh

Vishay Raina

unread,

Mar 18, 2021, 4:28:04 AM3/18/21

to kaldi-help

"the arcs in the HCLG FST that have those words on them"

Does that mean either on ilabel, or olabel, or both?

nshm...@gmail.com

unread,

Mar 18, 2021, 4:21:13 PM3/18/21

to kaldi-help

Words in the graph are on olabel, the inputs are senone ids (pdfs).

Reply all

Reply to author

Forward