Voice Activity Detection (VAD) in Kaldi

2,199 views
Skip to first unread message

Sarvesh Gupta

unread,
Apr 6, 2016, 9:57:52 AM4/6/16
to kaldi-help

How does Kaldi implement VAD while running run.sh with all the parameters set to default?

David Snyder

unread,
Apr 6, 2016, 11:00:26 AM4/6/16
to kaldi-help
Which example are you trying to run? AFAIK, the only VAD that is currently checked into Kaldi master is the frame-level, energy-based VAD used in the speaker and language ID examples. Is that what you're referring to?

Sarvesh Gupta

unread,
Apr 6, 2016, 12:09:07 PM4/6/16
to kaldi...@googlegroups.com

Yes, I want to know at which stage is kaldi using it and exactly how?

--
You received this message because you are subscribed to a topic in the Google Groups "kaldi-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kaldi-help/gtd7ErSLySw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kaldi-help+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

David Snyder

unread,
Apr 6, 2016, 1:13:03 PM4/6/16
to kaldi-help
Which example are you trying to run?

Sarvesh Gupta

unread,
Apr 6, 2016, 11:26:31 PM4/6/16
to kaldi...@googlegroups.com
For now, I am just running the default example provided in Kaldi. I mean to alter the VAD technique later on.

Sarvesh Gupta

unread,
Apr 8, 2016, 6:13:07 AM4/8/16
to kaldi...@googlegroups.com
My current position is:
1) I have built and made Kaldi in my system.
2) Installed the Timit dataset.
3) Ran the run.sh script in /Kaldi-trunk/egs/timit/s5.

Now, I intend to modify VAD but don't know where and how Kaldi is using it. Any help will be much appreciated.

David Snyder

unread,
Apr 8, 2016, 10:15:59 AM4/8/16
to kaldi-help, Vimal Manohar
The VAD I mentioned earlier is for speaker and language id; it is very simple, and is not going to work well for ASR.

Vimal has been working on a VAD for ASR, but it probably won't be ready for a few months. Vimal (cc'd) may be able to comment on its status. 

Daniel Povey

unread,
Apr 8, 2016, 10:17:45 AM4/8/16
to kaldi-help, Vimal Manohar
Also, VAD is not needed for TIMIT as there is no extended silence.  In the 'aspire' recipe there is a situation where there is a lot of silence so we could potentially use VAD (however, in the checked-in recipe we don't actually do so, we just recogize all the data with our full model).
Dan


You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

Sarvesh Gupta

unread,
Apr 16, 2016, 4:42:43 AM4/16/16
to kaldi...@googlegroups.com
Hi,

Given in the src directory is the file 'voice-activity-detection.cc' along with its .h extension. How can I use them while executing the timit example? I just want to know whether the use of VAD in timit causes a change in Word Error rate.

alex1.g...@gmail.com

unread,
Apr 16, 2016, 5:25:08 AM4/16/16
to kaldi-help
Hi

I do not believe that applying of VAD on TIMIT files will change WER... But you can try... I assume that you would like to delete silence at the beginning and at the end of every utterance... In this case you can simply apply VAD, to change waves (with deleted silences at the beginning and at the end), and currently you can run TIMIT recipe again... Again, I do not believe that you will see some change in the results...

Good luck.

David Snyder

unread,
Apr 16, 2016, 9:42:36 AM4/16/16
to kaldi-help, alex1.g...@gmail.com
Given in the src directory is the file 'voice-activity-detection.cc' along with its .h extension.

This VAD is a simple energy-based frame-level VAD used in language ID and speaker ID systems. Generally, it is used to filter out nonspeech frames prior to i-vector extraction. If you want to see how it is used, look at the scripts in egs/sre10/v1/sid.

I do not believe this VAD is suitable for ASR. People (mostly Vimal, cc'd earlier) are working on a VAD for ASR, but it isn't ready yet.
Reply all
Reply to author
Forward
0 new messages