Online KWS with low resource consumption

933 views
Skip to first unread message

Lucian Georgescu

unread,
Sep 29, 2015, 9:44:47 AM9/29/15
to kaldi-help
Hi,

I'm working on a project which suppose to develop a keyword spotting system, able to decode and search for terms online. For the moment, I have a LVCSR system and I'm testing the Kaldi KWS scripts (I succeeded to do a basic KWS using lattices generated after decoding process).

I have two questions:

1. I use a 3-gram language model, here is the dimension: ngram 1=64002, ngram 2=17939923, ngram 3=24912946. My HCLG.fst graph has around 10 GB. When I'm running "steps/decode.sh", it occupies around 25 GB memory (an enormous value).
Because I want to do keyword searching, my idea is to create a rule based grammar, that contains only the keyword terms that I'm looking for. I don't need to decode everything, I want only the searched terms and I want to map all others words to a <unknown> word.
Is that a solution? Do you have other ideas? If yes, how to build this special grammar (I know to use Thrax for creating a rule-based FST, but I don't have an idea how to transcribe the rest of the words as <unknown>).

2. Do you think is possible to create a KWS online system? I need to obtain a real time factor very low. In "Online decoding" example from Kaldi documentation, I saw the real time factor is 1.6. This value is only for decoding and the total real time factor will increase if I do KWS after that.

Thanks,

Lucian

Jan Trmal

unread,
Sep 29, 2015, 9:59:00 AM9/29/15
to kaldi-help
ad 1) I'm not sure if your idea would work well. Also, you would essentially limit yourself to only one set of keywords and would have to run decode again for a new kw set.
If you are asking about this just because you are worried about the memory consumption, Id approach it from a different angle -- prune the LM so that you will get to a reasonable memory consumption during decoding and then rescore using the original LM -- there is a functionality for rescoring directly using carpa models which I believe is faster and memory effective that the "old" way of rescoring using G.fst. 

ad 2) it is definitely possible, but it would take some, probably substantial, work. The kaldi kw search pipeline is setup with a different scenario -- decode and index once, search many times (with possibly different kw list each time), so it was optimized to achieve fast search times (once the index has been generated).  The decoding/lattice generation is usually the slowest part of it. From the lattice, the kw index is created and the actual search is really quick. 

y.


 

--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Daniel Povey

unread,
Sep 29, 2015, 3:31:16 PM9/29/15
to kaldi-help
What you are trying to do is quite hard.. as Yenda says, the Kaldi
keyword search tools are geared towards more of a audio-indexing type
of application. What you want sounds more like keyword-spotting or
wake-word spotting. There are neural-net approaches to this but it
would require quite a bit of work, it wouldn't be trivial with the
current tools.

Dan

Lucian Georgescu

unread,
Sep 30, 2015, 8:49:08 AM9/30/15
to kaldi-help, dpo...@gmail.com
Thank you for your answers.

@Yenda: There is not a problem for me if I will be limited to a set of keywords. Anyway, I would try to apply my idea, but I don't know how to transcribe all the words that are not in keyword list as a unique generic word. I really don't need all the transcription, I'm thinking will be much faster if I decode only the keywords. Could you give me a hint?

@Dan: I'm aware Kaldi KWS is a bit different than my task. Yeah, is something more like wake-word spotting. What method do you recommend me in this case? If lattice generation is the slowest part, how could I optimize it for my project? Is there possible to obtain the lattice only for my keywords?
Then, in KWS module, how fast is lattice indexing compared to lattice generation in decoding step?

Thanks.

Guoguo Chen

unread,
Sep 30, 2015, 9:48:51 AM9/30/15
to kaldi...@googlegroups.com, Daniel Povey
There's a type of keyword spotting methods called keyword/filler models, you may want to look into that. Basically you'll have to build a graph where you have paths for keywords and paths for non-keywords.

Alternatively, if you really want to do wake-word or hotword, you may want to have a look at my papers, e.g., "Small-footprint keyword spotting using deep neural networks".

Guoguo
--

Jan Trmal

unread,
Oct 1, 2015, 5:56:32 AM10/1/15
to kaldi-help, Daniel Povey
yes, Guoguo is right -- I guess some kind of filler/mumble model + your words would be exactly what you need.
Ad grammar, you don't need Thrax nor anything like that -- I think that would just be overkill. You could generate unigram grammar by hand in the fst format (or in the training text just replace all words which are not the keywords by some meta-word, say "<FILLER>") and use normal LM tools to train the LM -- there might be smarter ways how to prepare grammar/LM for filler/mumble models, as I think you might need some manual tweaking of the probabilities to avoid too many false alarms/non-detection).
You could of course use Thrax, but I don't recall anyone reporting using that together with kaldi.

y.

Jan Trmal

unread,
Oct 1, 2015, 6:04:04 AM10/1/15
to kaldi-help, Daniel Povey
Ad the indexing: you don't have to go through that stage as it actually converts the lattice into a structure for fast repeated search.  For your case it might be more effective to search directly on the lattice (or perhaps the "sausages") or just 1-best output, at least for starters.
y.

Lucian Georgescu

unread,
Oct 2, 2015, 10:25:59 AM10/2/15
to kaldi-help, dpo...@gmail.com
Hi,

@ Yenda: What type of output is able the online decoding module to provide? I mean online2-wav-GMM-latgen or online2-wav-faster-faster-nnet2-latgen. Can it provide lattice parts at every 10 seconds or another preset time? Because as you said, I don't need lattice indexing, but I wonder if there is a chance to get lattice parts or text transcription at a time. Specifically, I would like once a word is decoded, it can be searched in keyword list.

Daniel Povey

unread,
Oct 2, 2015, 4:26:07 PM10/2/15
to Lucian Georgescu, kaldi-help
It can provide the whole lattice at any interval you want, but not
parts of the lattice.
Dan
Reply all
Reply to author
Forward
0 new messages