how to read a kws index file

143 views
Skip to first unread message

Michael Capizzi

unread,
Mar 28, 2017, 12:33:47 PM3/28/17
to kaldi-help
I built an `index` file for `kws`, and outputted it as `text`, but I don't understand what I'm looking at.

Can someone help explain to me the columns of this file?

global 
0       1       669573  0       0.0078125,0,18
0       2       304284  0       4.36719,0,3
0       3       669522  0       7.10938,0,17
0       4       741310  0       0,16,30
0       5       0       1       -1.79395,3,108
0       6       326352  0       8.31641,4,14
0       7       57452   0       0,30,47
0       1       312278  0       5.11035,5,18
0       8       27874   0       0.000976562,43,49
0       9       311365  0       0,47,86
0       10      514461  0       0,86,108
0       11      490438  0       0,108,116
0       12      0       0       -2.78223,40,426
1       333     741310  0       0,0,12
1       5       1       1
2       1       669573  0       0,0,15
2       5       2       1
3       333     741310  0       0,0,13
3       5       3       1
4       7       57452   0       0,0,17
4       5       4       1

Jan Trmal

unread,
Mar 29, 2017, 4:04:46 AM3/29/17
to kaldi-help
the structure is not really suitable for people to analyze. it's an inverted index -- when you compose the index fst with the fst of the query, you will get fst that represents the utterances (and times + scores) in which the given phrase/keyword is located.

From the top of my head (you should look at the code to get an autoritative reply):
isymbols = words
osymbols = will have utterance id encoded in the upper 32bits, lower 32bits will be typically zero (or will contain disambig symbol)
cost = score,start time, end time 

it's based on the Can&Saraclar's paper
y.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michael Capizzi

unread,
Mar 29, 2017, 4:46:44 PM3/29/17
to kaldi-help
Thanks for your response @Yenda
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

Michael Capizzi

unread,
Mar 31, 2017, 11:21:17 AM3/31/17
to kaldi-help
I guess I have one more question, that is somewhat related to this.

I am interesting in implementing keyword search on a large scale. So I'm wondering if there is anyway to take these `index` files and move them into something like `elasticsearch` (or `neo4j`?) to do the search there?

I'm aware of the paper that was used to model this approach, and I'll be honest that I haven't fully grasped what it's doing yet, so a completely fair answer to my question is: "Go read the paper again".

But I thought I'd ask if anyone has any ideas on how that could work.  Or if it could work.

Thanks in advance for your thoughts.

Daniel Povey

unread,
Mar 31, 2017, 12:09:32 PM3/31/17
to kaldi-help
I see no reason why not.
I have a vague long-term plan to build a new, easier-to-use search capability, one that's more intuitive and much less technical than the current kws-index approach.  But that won't be happening in the next few months.

Dan


--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

Michael Capizzi

unread,
Mar 31, 2017, 1:09:48 PM3/31/17
to kaldi...@googlegroups.com
That sounds great, Dan.  If I can figure out how to do it, I'll happily submit a PR for anything I develop.

-M

You received this message because you are subscribed to a topic in the Google Groups "kaldi-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kaldi-help/s4m4YYa4O2U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kaldi-help+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages