mkgraph / DeterminizeStar very slow for zerogram LM with loops

92 views
Skip to first unread message

Johannes Hennrich

unread,
Oct 17, 2019, 5:27:58 AM10/17/19
to kaldi-help
I added back-loops to my zerogram LM to recognize sequences of words instead of just single words. The resulting HCLG and the decoding works just fine, but the DeterminizeStar step of mkgraph gets really slow (30 minutes instead of 10 seconds without loops).

The "loops" are just "<eps>"-arcs from the final (word) back to the initial state. My G.fst looks like this (the real one looks the same but has 20.000 words):

Gs.png

Is there something wrong with my LM or is it normal that the determinization is THAT slow?

Daniel Povey

unread,
Oct 17, 2019, 2:53:40 PM10/17/19
to kaldi-help
It might help to just make those arcs self-loops to the start state. 
I suspect it is having to compile 20,000 different versions of the word-entry state, because of that structure.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/67a1a894-bc3d-4db6-ba48-f31d93b2e239%40googlegroups.com.

Johannes Hennrich

unread,
Oct 18, 2019, 9:12:46 AM10/18/19
to kaldi-help
What exactly do you mean by "self-loops to the start state"?
If I replace the "<eps>:<eps>" arcs from the accepting states (1,2,3,4,5) to the start-state 0 with a self-loop in 0, then I will not be able to recognize sequences of words (e.g. "Cat House").



Am Donnerstag, 17. Oktober 2019 20:53:40 UTC+2 schrieb Dan Povey:
It might help to just make those arcs self-loops to the start state. 
I suspect it is having to compile 20,000 different versions of the word-entry state, because of that structure.

On Thu, Oct 17, 2019 at 2:28 AM Johannes Hennrich <jhenn...@gmail.com> wrote:
I added back-loops to my zerogram LM to recognize sequences of words instead of just single words. The resulting HCLG and the decoding works just fine, but the DeterminizeStar step of mkgraph gets really slow (30 minutes instead of 10 seconds without loops).

The "loops" are just "<eps>"-arcs from the final (word) back to the initial state. My G.fst looks like this (the real one looks the same but has 20.000 words):

Gs.png

Is there something wrong with my LM or is it normal that the determinization is THAT slow?

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.

Jan Trmal

unread,
Oct 18, 2019, 10:14:44 AM10/18/19
to kaldi-help
I think Dan meant
0 0 X X 0.1
0 0 Y Y 0.3333
0
You get that idea, even if it's cryptic, right?
there will be only one stat and it will be both starting and final.
y.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/41278158-537d-4ece-9424-f519beb5dd1e%40googlegroups.com.

Johannes Hennrich

unread,
Oct 21, 2019, 5:29:23 AM10/21/19
to kaldi-help
Thanks Yenda, that did the trick. Now it finishes in a few seconds.

I thought it is a problem if the starting state is accepting, but apparently thats not an issue.
Reply all
Reply to author
Forward
0 new messages