creating Grammar in kaldi using Grammar file used in HTK

867 views
Skip to first unread message

Muhammad Salem

unread,
Nov 3, 2015, 5:10:31 AM11/3/15
to kaldi-help
I was wondering if there is a way to convert the lattice file grammar used in HTK to produce the .FST file in kaldi

Thanks in advance

Daniel Povey

unread,
Nov 3, 2015, 2:31:22 PM11/3/15
to kaldi-help
You would have to understand both formats and write a script to create the 'G.fst' used for the language model in Kaldi; it probably wouldn't be that difficult.  openfst.org has a good explanation of the FST format.
Read the section on decoding graph creation in the Kaldi documentation.

On Tue, Nov 3, 2015 at 5:10 AM, Muhammad Salem <mhmd.sl...@gmail.com> wrote:
I was wondering if there is a way to convert the lattice file grammar used in HTK to produce the .FST file in kaldi

Thanks in advance

--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Muhammad Salem

unread,
Nov 4, 2015, 4:15:00 AM11/4/15
to kaldi-help
I appreciate your answer i have another questionnaire if i may
I found a perl script attached on a similar question in the old discussion group

the script (slf2fsm.pl) usage is
slf2fsm.pl symbol_file [silence_symbol] < slf_file > fsm_file

I just want to check two things

1- Is slf_file  the the output file created by the Hparse command?
2- What exactly is the symbol file ? I assume it is the phoneme file?

I have attached the script i am mentioning

Sorry for bothering you again and thanks in advance
slf2fsm.pl

Daniel Povey

unread,
Nov 4, 2015, 3:08:26 PM11/4/15
to kaldi-help
I think 'slf' is the so-called 'standard lattice format' which means HTK-format lattices.  It's probably not the same as the output of HParse.  The symbol table is probably words.txt or phones.txt.
Dan


--

Tony Robinson

unread,
Nov 4, 2015, 3:46:46 PM11/4/15
to kaldi...@googlegroups.com
HTK's SLF is documented at http://www1.icsi.berkeley.edu/Speech/docs/HTKBook/node247_tf.html

HParse is documented at http://www1.icsi.berkeley.edu/Speech/docs/HTKBook/node247.html

The acoustic likelihoods and language model probabilities are optional in a SLF.   If they are included the term "lattice" is often used, if they are not then the term is "network".

HParse does not know about probabilities so produces a network.

Your link to slf2fsm.pl was broken for me.   Using Google's cache of the web I see the message:
You can get it with
wget www.danielpovey.com/files/slf2fsm.tgz
Dan
A very quick look at the code says that it works with lattices with acoustic likelihoods and language model probabilities.   In particular it says:

    undef $cost;

so my guess is that if it's given a network then $cost will remain undef and that's what you'll see in the output - which won't be pretty.

Really you should just give it a go and if it doesn't work use Google to get the links above.


Tony
--
Speechmatics is a trading name of Cantab Research Limited
We are hiring: www.speechmatics.com/careers
Dr A J Robinson, Founder, Cantab Research Ltd
Phone direct: 01223 794096, office: 01223 794497
Company reg no GB 05697423, VAT reg no 925606030
51 Canterbury Street, Cambridge, CB4 3QG, UK

Muhammad Salem

unread,
Nov 5, 2015, 8:16:30 AM11/5/15
to kaldi-help

 Thanks a lot for your replies
As per your answers I revised the old system and I could obtain lattices instead of networks i.e. costs are defined in file like (J=2 S=2 E=6 l=-0.942)
I guess now I can use the file slf2fsm.pl  !!

 
 
 

d.suv...@gmail.com

unread,
May 20, 2016, 8:21:19 PM5/20/16
to kaldi-help
Hello!

I also want to use with kaldi grammar written for HParse. But slf2fsm gives me a lot of errors like following:
slf2fsm: Unknown \320\273\320\265\320\272\321\201\320\270 mapped to #0
slf2fsm: Unknown !NULL mapped to #0
I tried to execute following commands:
HParse grammar.gr wnet
./slf2fsm.pl words.txt < wnet > test.fst
What am I doing wrong?

Tony Robinson

unread,
May 20, 2016, 11:53:00 PM5/20/16
to kaldi...@googlegroups.com
This looks more like a HTK error to me than a Kaldi issue.

HTK was written before UTF-8 was widely adopted.   It has many issues with non-ASCII characters.

Chances are you have a UTF-8 string in grammar.gr and it's being written to wnet as "\320\273\320\265\320\272\321\201\320\270".   Check this!

code6 tonyr: echo -e "\0320\0273\0320\0265\0320\0272\0321\0201\0320\0270"
лекси

If your wnet file is not UFT-8 as expected then your post belongs on eng-ht...@lists.cam.ac.uk (which could do with some traffic).

Also, as general advice, in order to receive help you need to help your audience respond to you.   So should do what you can to debug in advance.  In this case it is producing a minimal grammar.gr file that shows your problem then attaching that, wnet and words.txt.   In this case they should only be a few lines each.


Tony
--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dmitry Suvorov

unread,
May 21, 2016, 5:44:53 AM5/21/16
to kaldi...@googlegroups.com
Tony, I solved errors like "Unknown \320\273\320\265\320\272\321\201\320\270 mapped to #0" by converting output file of HParse to proper UTF-8 file (it is attached). But I still have errors messages "slf2fsm: Unknown !NULL mapped to #0". It is written in the HTK book that !NULL nodes are just using for reducing the number of arcs. What should I do with them?



--
You received this message because you are subscribed to a topic in the Google Groups "kaldi-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kaldi-help/TTcLgPb559I/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kaldi-help+...@googlegroups.com.
words.txt
gr.gr
wnet

Muhammad Salem

unread,
May 21, 2016, 7:19:21 AM5/21/16
to kaldi...@googlegroups.com

i mapped !NULL to <eps> it worked fine for me

Dmitry Suvorov

unread,
May 21, 2016, 7:46:05 AM5/21/16
to kaldi...@googlegroups.com
Muhammad, you wrote that you added costs to the output file of HParse. Did you just call HParse with -l parameter? Or you did something more?

Muhammad Salem

unread,
May 21, 2016, 7:50:36 AM5/21/16
to kaldi...@googlegroups.com

i didnt use Hparse i am using my own grammar tool to generate slf files for my specific application similar to output of Hparse

Dmitry Suvorov

unread,
May 25, 2016, 3:59:05 PM5/25/16
to kaldi...@googlegroups.com
Muhammad, is your tool open source or proprietary? I want to compare the quality and speed of recognition with kaldi based on n-gram language model and on grammar.

Muhammad Salem

unread,
May 26, 2016, 4:18:26 AM5/26/16
to kaldi...@googlegroups.com

its a property unfortunately of the uni but it wont help anyway cause its dedicated only for generating grammar files for quraanic verses

Reply all
Reply to author
Forward
0 new messages