Grammars with on-the-fly parts.

Ďoďo Ivanecký

unread,

Apr 20, 2022, 5:42:10 PM4/20/22

to kaldi-help

Hi,

I am trying to build a dynamic grammar as described here:

https://kaldi-asr.org/doc/grammar.html

and in simple_demo.sh

I created a small test.

The main grammar is is:

0 1 Hallo Hallo 1.69314694
1 2 or or 1.38629401
1 2 and and 1.38629401
1
2 3 again again
3 4 dear dear
3 5 #nonterm:Dlist <eps>
4 5 Mike Mike
5

The dynamic one is:

0 1 #nonterm_begin <eps>
1 2 <eps> <eps> 5
1 2 Carlos Carlos 1.60943794
1 2 Josef Josef 1.60943794
1 2 Sam Sam 1.60943794
1 2 John John 1.60943794
1 2 Wei Wei 1.60943794
2 3 #nonterm_end <eps>
3

prepare_lang.pl was used to prepare the language data with nonterminal symbols:

tail phones.txt
}:_E 682
}:_I 683
}:_S 684
#0 685
#1 686
#nonterm_bos 687
#nonterm_begin 688
#nonterm_end 689
#nonterm_reenter 690
#nonterm:Dlist 691

tail words.txt
or 10
and 11
again 12
NOISE 13
#0 14
<s> 15
</s> 16
#nonterm_begin 17
#nonterm_end 18
#nonterm:Dlist 19

In the final step - after compilation of the grammars - when I run make-grammar-fst, I am getting an error:

/opt/kaldi/bin/make-grammar-fst --write-as-grammar=true --nonterm-phones-offset=687 MainC.fst 691 DynC.fst aaa.fst

ERROR (make-grammar-fst[5.5]:InitEntryOrReentryArcs():decoder/grammar-fst.cc:143) There is something wrong with the graph; did you forget to add #nonterm_begin and #nonterm_end to the non-top-level FSTs before compiling?

But I did not forget. i did check also LG.fst of the non-top-level fst and I see the symbols there.

Am I missing something?

Thanks for any hint.

Josef

Ďoďo

unread,

Apr 25, 2022, 11:02:45 AM4/25/22

to kaldi...@googlegroups.com

OK, so I found out what the issue is. Actually 2.

1. During CLG creation "fstcomposecontext" had no "--nonterm-phones-offset" parameter. That was critical. But it was still crashing because of #2.

2. I did use some demo script to build the HCLG.fst but to build a disambig list "grep '#' " is used. But such a grep also takes all #nonterms with and it results in a crash in grammar-context-fst. Fix is replacing "grep '#'" by "grep '[0-9]+'

Josef

--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/0585a5c6-a373-4a0a-ac98-2ba9836a9ae1n%40googlegroups.com.

Anh Nguyễn Mạnh Tiến

unread,

Mar 23, 2023, 5:41:24 AM3/23/23

to kaldi-help

I've the same problem to, but I can't find a demo script that contains "grep '#' ". Can you tell me where is it?

Vào lúc 22:02:45 UTC+7 ngày Thứ Hai, 25 tháng 4, 2022, Ďoďo đã viết:

Ďoďo

unread,

Mar 23, 2023, 10:03:47 AM3/23/23

to kaldi...@googlegroups.com

Uff, I am not able to find it after 1 year. But tell me what is in your phones/disambig.txt file. Just to see if it's really the same problem.

Jozef

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/11e69a17-2b92-411e-b7d3-d742001a9fddn%40googlegroups.com.

Anh Nguyễn Mạnh Tiến

unread,

Mar 23, 2023, 10:57:26 PM3/23/23

to kaldi-help

Thank you very much for your response. I've found the reason actually. I used Srilm to create an ARPA LM (for sub LM) and arpa2fst to create G.fst. Therefore in G.fst I don't have symbol #nonterm_begin and #nonterm_end. I think that is the reason. Do you have any solution to add #nonterm_begin and #nonterm_end to G.fst if we create from arpa2fst? Thanks in advance!

Vào lúc 21:03:47 UTC+7 ngày Thứ Năm, 23 tháng 3, 2023, Ďoďo đã viết:

Ďoďo

unread,

Mar 24, 2023, 3:15:27 AM3/24/23

to kaldi...@googlegroups.com

Please read https://kaldi-asr.org/doc/grammar.html

There is a section which says:

The user should never need to explicitly add these symbols to the words.txt and phones.txt files; they are automatically added by utils/prepare_lang.sh. All the user has to do is to create the file 'nonterminals.txt' in the 'dict dir' (the directory containing the dictionary, as validated by validate_dict_dir.pl).

I did not play with LM (arpa), but with grammars. So I just generated nonterminals.txt' from my simple grammar compiler. One nonterminal per line:

cat graph/nonterminals.txt
#nonterm:Dlist

Jozef

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/4020e7e0-f356-49d6-ad31-cc8c89cbe40an%40googlegroups.com.

Anh Nguyễn Mạnh Tiến

unread,

Mar 25, 2023, 3:07:31 AM3/25/23

to kaldi-help

But if my grammar is not unigram (e.g Dlist contain names such as Lionel Messi, Bruno Fernandes,...) How could I make a grammar from that n-gram without building LM (using Srilm or KenLM)?

Vào lúc 14:15:27 UTC+7 ngày Thứ Sáu, 24 tháng 3, 2023, Ďoďo đã viết:

Daniel Povey

unread,

Mar 25, 2023, 3:50:51 AM3/25/23

to kaldi...@googlegroups.com

you have to understand the basic concepts of FSTs. It's a simple graph with a loop state (say state 0) and you'd

have "lionel" state 0->2, "messi" state 2->1

"Bruno" state 0->3, "messi" state 3->1

and so on

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/eef613f3-dd15-46ab-8f56-a6b88367e03cn%40googlegroups.com.

Daniel Povey

unread,

Mar 25, 2023, 3:50:59 AM3/25/23

to kaldi...@googlegroups.com

.. and state 1 would be final

Reply all

Reply to author

Forward