Can webrtc voice detection implementation distinguish speech vs any other loud sound?

195 views
Skip to first unread message

Danail Kirov

unread,
Nov 6, 2017, 5:55:03 PM11/6/17
to discuss-webrtc
I am trying to use the webrtc audio processing implementation of voice detection to detect and distinguish human speech vs any other sound.
When I enable voice detection, I observer the following call stack:
AudioProcessingImpl::ProcessCaptureStreamLocked()
VoiceDetectionImpl::ProcessCaptureAudio()
WebRtcVad_Process() 
WebRtcVad_CalcVad16khz()
WebRtcVad_CalcVad8khzf()
GmmProbability()

Apparently GmmProbability() is the workhorse of the voice detection and documented as:
// Calculates the probabilities for both speech and background noise using
// Gaussian Mixture Models (GMM). A hypothesis-test is performed to decide which
// type of signal is most probable.
// - returns              : the VAD decision (0 - noise, 1 - speech).

Unfortunately, GmmProbability() does not seem able to distinguish a true human speech vs any other loud noise or sound.
I.e. if I just scratch the mic, I'll get "1-speech", if I leave my cup of coffee on the table, I'll get again "1-speech", yet I would like to distinguish these sounds from when I am actually speaking.
Is there any way to configure or change GmmProbability() to make such distinction?
Has anyone experimented with this?
Thanks,
Danail Kirov 

nauroz nausherwani

unread,
May 16, 2018, 2:30:41 AM5/16/18
to discuss-webrtc
Try setting aggressiveness mode to 3 and see it it helps.
best regards,
Nauroz
Reply all
Reply to author
Forward
0 new messages