beam width and lattice beam width

677 views
Skip to first unread message

SudKol

unread,
Jun 26, 2019, 11:14:41 AM6/26/19
to kaldi-help
Hi Kaldi-ers, 

I am using the aspire chain model to transcribe medical domain dictations in Irish English. I built a pronunciation model and language model using my data and am using it with the aspire acoustic model. I noticed that with the default values of beam size (13) and lattice-beam (6) parameters from steps/decode.sh, the model was skipping part of the audio. But, when I increase the beam size to 30 and lattice-beam to 9, it transcribes the full dictation although decoding takes a bit longer. Are there any heuristics for optimal values of these parameters? Or do I need to figure them out through trial and error?

Thanks
Sudheer

Daniel Povey

unread,
Jun 26, 2019, 12:56:12 PM6/26/19
to kaldi-help
If it skips a long part of audio at the end, it might be that it got stuck in a part of the graph from where it couldn't reach the rest of the graph.  Increasing the min-active should help prevent that.
Or possibly there is something weird about your LM (e.g. some kind of grammar)

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/3d286a3a-3b61-45af-a66b-0976b122a328%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sudheer Kolachina

unread,
Jun 26, 2019, 1:29:14 PM6/26/19
to kaldi...@googlegroups.com
Hi Dan,

It does skip a big part at the end. I will try increasing the min-active. It also skips two sentences in the middle of the dictation. I am using a regular arpa trigram pruned LM trained over a large corpus of post-edited dictations. What do you mean by some kind of grammar? If these parts were transcribed with higher beam width, does it still mean something’s wrong with the LM?

Thanks 
Sudheer 

On Wednesday, June 26, 2019, Daniel Povey <dpo...@gmail.com> wrote:
If it skips a long part of audio at the end, it might be that it got stuck in a part of the graph from where it couldn't reach the rest of the graph.  Increasing the min-active should help prevent that.
Or possibly there is something weird about your LM (e.g. some kind of grammar)

On Wed, Jun 26, 2019 at 11:14 AM SudKol <sudh...@tpro.ie> wrote:
Hi Kaldi-ers, 

I am using the aspire chain model to transcribe medical domain dictations in Irish English. I built a pronunciation model and language model using my data and am using it with the aspire acoustic model. I noticed that with the default values of beam size (13) and lattice-beam (6) parameters from steps/decode.sh, the model was skipping part of the audio. But, when I increase the beam size to 30 and lattice-beam to 9, it transcribes the full dictation although decoding takes a bit longer. Are there any heuristics for optimal values of these parameters? Or do I need to figure them out through trial and error?

Thanks
Sudheer

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/3d286a3a-3b61-45af-a66b-0976b122a328%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Daniel Povey

unread,
Jun 26, 2019, 6:13:11 PM6/26/19
to kaldi-help
An ARPA LM should be fine.
I still suspect there may be something weird about your setup.
The default min-active is 200, which is normally enough to stop these kinds of errors, and anyway I don't expect this to happen at all in systems with conventional ARPA LMs and no right context.  ( chain systems generally have left biphone context).
Dan


On Wed, Jun 26, 2019 at 1:29 PM Sudheer Kolachina <sudh...@tpro.ie> wrote:
Hi Dan,

It does skip a big part at the end. I will try increasing the min-active. It also skips two sentences in the middle of the dictation. I am using a regular arpa trigram pruned LM trained over a large corpus of post-edited dictations. What do you mean by some kind of grammar? If these parts were transcribed with higher beam width, does it still mean something’s wrong with the LM?

Thanks 
Sudheer 

On Wednesday, June 26, 2019, Daniel Povey <dpo...@gmail.com> wrote:
If it skips a long part of audio at the end, it might be that it got stuck in a part of the graph from where it couldn't reach the rest of the graph.  Increasing the min-active should help prevent that.
Or possibly there is something weird about your LM (e.g. some kind of grammar)

On Wed, Jun 26, 2019 at 11:14 AM SudKol <sudh...@tpro.ie> wrote:
Hi Kaldi-ers, 

I am using the aspire chain model to transcribe medical domain dictations in Irish English. I built a pronunciation model and language model using my data and am using it with the aspire acoustic model. I noticed that with the default values of beam size (13) and lattice-beam (6) parameters from steps/decode.sh, the model was skipping part of the audio. But, when I increase the beam size to 30 and lattice-beam to 9, it transcribes the full dictation although decoding takes a bit longer. Are there any heuristics for optimal values of these parameters? Or do I need to figure them out through trial and error?

Thanks
Sudheer

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/3d286a3a-3b61-45af-a66b-0976b122a328%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Sudheer Kolachina

unread,
Jul 15, 2019, 8:03:42 AM7/15/19
to kaldi...@googlegroups.com
I dug a little bit into this issue over the last few days. You were correct that there was something weird about my LM. Previously the corpus file I used to train the LM was in one sentence per line format. As a result, punctuation tokens like new-line and new-paragraph had no right context or had the </s> which is the default sentence boundary marker in ARPA LMs. I noticed that most of the skipped audio bits involved the new-line or new-paragraph token. I trained a new LM using longer contexts, one document per line. I don't run into this bits getting skipped problem even with default beam width of 15.   

Reply all
Reply to author
Forward
0 new messages