Pruning active tokens on the current frame

40 views
Skip to first unread message

re...@speechmatics.com

unread,
May 17, 2024, 12:45:28 PMMay 17
to kaldi-help
Hi,

I'm looking at the incremental lattice code, and sometimes the number of active tokens is too big to be determinized and stays like that for a while: https://github.com/kaldi-asr/kaldi/blob/master/src/decoder/lattice-incremental-decoder.cc#L113

I would like to increase the pruning if that happens for too long. 
- I've tried to call `PruneForwardLinks` after reducing lattice_beam, but I still have too many tokens. I think that happens because since we keep all the tokens of the current frames, we end up keeping at least one path per token in the final frame, which is sometimes a lot.

- I've tried to set `extra_cost` of the tokens of the last frame to `inf`:
template <typename FST, typename Token>
void LatticeIncrementalDecoderTpl<FST, Token>::MarkTokensForPruning(BaseFloat beam) {
  auto &last_frame_active = active_toks_.back();
  BaseFloat best_cost = std::numeric_limits<BaseFloat>::infinity();
  for (Token *tok = last_frame_active.toks; tok != NULL; tok = tok->next) {
    BaseFloat cost = tok->tot_cost;
    if (cost < best_cost) {
      best_cost = cost;
      best_tok = tok;
    }
  }

  for (Token *tok = last_frame_active.toks; tok != NULL; tok = tok->next) {
    auto diff = tok->tot_cost - best_cost;
    if (diff > beam) {
      tok->extra_cost = std::numeric_limits<BaseFloat>::infinity();
    }
  }
}
And then call the pruning code again, however I end up with errors such as:
`Error tracing best-path back (likely bug in token-pruning algorithm)`

How can I prune this tokens in a way that doesn't cause problems down the line?

Daniel Povey

unread,
May 18, 2024, 6:49:45 AMMay 18
to kaldi...@googlegroups.com
can't you temporarily reduce the decoder beam or max_active?
--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/6064b20c-507b-44df-9cc1-57440b0e2012n%40googlegroups.com.
Message has been deleted
Message has been deleted

re...@speechmatics.com

unread,
May 21, 2024, 2:18:09 AMMay 21
to kaldi-help
Somehow my message has been marked as deleted on the Google group, not sure if it has actually been sent.
I was saying that it's much nicer to have the flexibility to prune the tokens when we need rather than having to reduce the beam for the next frames and hope for the best.

I found what what wrong in my code, I needed to update `toks_` as well, as this is what is used to decode the next frame.

However I have a question: when pruning links in `PruneForwardLinks`, we use `next_tok->extra_cost`. However `next_tok` is on the last frame, this value is always 0. 
Why don't we compute that value to be the difference between the cost of the token minus the cost of the best token of that frame?
In the header I see:
  // exta_cost is >= 0.  After calling PruneForwardLinks, this equals the
  // minimum difference between the cost of the best path that this link is a
  // part of, and the cost of the absolute best path, under the assumption that
  // any of the currently active states at the decoding front may eventually
  // succeed (e.g. if you were to take the currently active states one by one
  // and compute this difference, and then take the minimum).
However it's not clear to me why we do it that way.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

re...@speechmatics.com

unread,
May 21, 2024, 2:18:13 AMMay 21
to kaldi-help
It would be more convenient to be able to decide to drop some tokens after decoding a frame, because then if the user requests a lattice right now I can do that, instead of having to wait for the next chunk to decode with a reduce beam.
But it shouldn't be too hard to modify GetCutoff to reduce the beam when there are too many active tokens in the past.

On Saturday 18 May 2024 at 11:49:45 UTC+1 Daniel Povey wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages