Word probabilities to FST

152 views
Skip to first unread message

Phil

unread,
Apr 4, 2019, 5:31:21 PM4/4/19
to kaldi-help
Hi, I'm working on an experimental word-CTC acoustic model. For each frame, the model generates top-k probs:

e.g. : 

t | index_0 | ... | index_k
0     75500           3550
1 .     0 .                243
2 .     0 .                300
...
T .    755                40

associated to log probabilities, where the indexes are words in an 85k-word dictionary.
I want to rescore the output of the model with a grammar-FST (G.fst). Is there a straightforward way to translate this format to an fst-format where it can be rescored and generate a shortestpath? 

Daniel Povey

unread,
Apr 4, 2019, 5:38:23 PM4/4/19
to kaldi-help
Yeah, the FST (actually it would be an acceptor, hencer FSA) would have one start state and one state per frame, and you would have an arc for each word that was output, plus one for epsilon.  .  I assume those arcs would have costs on them, reflecting the probabilities from your model.  Rescoring that by composing with G.fst would be possible.  However you first need to understand the basics of FSTs first.   Do the tutorial at openfst.org.


--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/9f513f96-32cf-4038-a6d1-dd5cb78b1dd8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Phil

unread,
Apr 4, 2019, 5:50:39 PM4/4/19
to kaldi-help
Oh that makes perfect sense. Thank you.

On Thursday, April 4, 2019 at 2:38:23 PM UTC-7, Dan Povey wrote:
Yeah, the FST (actually it would be an acceptor, hencer FSA) would have one start state and one state per frame, and you would have an arc for each word that was output, plus one for epsilon.  .  I assume those arcs would have costs on them, reflecting the probabilities from your model.  Rescoring that by composing with G.fst would be possible.  However you first need to understand the basics of FSTs first.   Do the tutorial at openfst.org.


On Thu, Apr 4, 2019 at 2:31 PM Phil <philip...@gmail.com> wrote:
Hi, I'm working on an experimental word-CTC acoustic model. For each frame, the model generates top-k probs:

e.g. : 

t | index_0 | ... | index_k
0     75500           3550
1 .     0 .                243
2 .     0 .                300
...
T .    755                40

associated to log probabilities, where the indexes are words in an 85k-word dictionary.
I want to rescore the output of the model with a grammar-FST (G.fst). Is there a straightforward way to translate this format to an fst-format where it can be rescored and generate a shortestpath? 

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.

Phil

unread,
Apr 4, 2019, 8:44:53 PM4/4/19
to kaldi-help
Hi Dan,

I've been getting no-output for my composition and am wondering what's wrong. Here's an example sequence from the word-ctc FST I generated:

0 1 0 0 0.992731512
1 2 0 0 0.987134516
2 3 13403 13403 0.0216468126
2 3 13801 13801 0.957984328
3 4 0 0 0.990761697
4 5 0 0 0.985004902
5 6 0 0 0.979174674
6 7 0 0 0.985455453
7 8 0 0 0.899928987
7 8 1879 1879 0.0119872307
7 8 121582 121582 0.00832993444
7 8 125717 125717 0.00510402676
7 8 148843 148843 0.0194917303
7 8 327499 327499 0.00882347766
8 9 0 0 0.996489584
9 10 0 0 0.98880744
10 11 0 0 0.99289459
11 12 0 0 0.4248586
11 12 1879 1879 0.393045187
11 12 41127 41127 0.0264167953
11 12 214354 214354 0.00775688235
11 12 267149 267149 0.0181218777
11 12 274056 274056 0.0107227825
12 13 0 0 0.998292804
13

My LM fst:

fstrandgen --select=log_prob G.fst | fstprint
0 1 234775 234775
1 2 188644 188644
2 3 375228 0
3 4 329316 329316
4 5 375228 0
5 6 375228 0
6 7 371495 371495
7 8 360801 360801
8 9 375228 0
9

However, running the following:
fstcompose test.fst G.fst
is giving me a null output. Can you see anything incorrect in what I'm doing?

Daniel Povey

unread,
Apr 4, 2019, 8:46:25 PM4/4/19
to kaldi-help
I suspect that 375228 is some kind of disambiguation symbol in your G.fst.  You should remove that first, e.g. by projecting on the output.


To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Phil

unread,
Apr 4, 2019, 9:23:54 PM4/4/19
to kaldi-help
Removing the disambiguation symbol fixed it.
 Since there's a potential for repeated outputs from CTC, would it make sense to merge repeated outputs by composing the word-model output with a "CTC FST" e.g.
0 0 blank blank
0 1 a a 
1 1 a <eps>
1 0 blank blank
0
(don't know if this compiles but the idea being that repeated outputs map back to the same word state) 



Daniel Povey

unread,
Apr 4, 2019, 9:31:41 PM4/4/19
to kaldi-help
Hm, you could try that I guess.  I'm not sure how common those repeated outputs are in practice.
It might make it hard to recognize repetitions of a word.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Phil

unread,
Apr 5, 2019, 8:12:32 PM4/5/19
to kaldi-help
Couple of follow-up questions on this:
- is there a way to rescale the LM FST outside of Kaldi? 
- are there any optimizations to be made with OpenFST for this specific problem?
Right now, I am running:
fstcompose --compose_filter=sequence  in.fst G_projected.fst | fstshortestpath  > out.fst, but this doesn't give super great performance.

Are the operations in lattice-lmrescore significantly faster? I was also thinking of converting the FST to a Kaldi lattice to take advantage of the optimizations in that script.

Daniel Povey

unread,
Apr 5, 2019, 10:00:19 PM4/5/19
to kaldi-help

Couple of follow-up questions on this:
- is there a way to rescale the LM FST outside of Kaldi? 

Not sure what you mean by this, but probably not.  If you want to scale the LM costs you can probably do it with something like
fstprint foo.fst | awk '{ if(NF>4) { $5 *= 0.5} ;print;}' | fstcompile > bar.fst

 
- are there any optimizations to be made with OpenFST for this specific problem?
Right now, I am running:
fstcompose --compose_filter=sequence  in.fst G_projected.fst | fstshortestpath  > out.fst, but this doesn't give super great performance.

Not sure if you refer to speed or WER.
 
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Philip Cerles

unread,
Apr 5, 2019, 10:39:20 PM4/5/19
to kaldi...@googlegroups.com
Thanks that makes sense. 
Ah sorry for being unclear. I was referring to speed. 

You received this message because you are subscribed to a topic in the Google Groups "kaldi-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kaldi-help/LMDZxJDfrlg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
--
Philip A. Cerles
B.A. Applied Mathematics, 2016
University of California, Berkeley
Reply all
Reply to author
Forward
0 new messages