Can webrtc voice detection implementation distinguish speech vs any other loud sound?

195 views

Skip to first unread message

Danail Kirov

unread,

Nov 6, 2017, 5:55:03 PM11/6/17

to discuss-webrtc

I am trying to use the webrtc audio processing implementation of voice detection to detect and distinguish human speech vs any other sound.

When I enable voice detection, I observer the following call stack:

AudioProcessingImpl::ProcessCaptureStreamLocked()

VoiceDetectionImpl::ProcessCaptureAudio()

WebRtcVad_Process()

WebRtcVad_CalcVad16khz()

WebRtcVad_CalcVad8khzf()

GmmProbability()

Apparently GmmProbability() is the workhorse of the voice detection and documented as:

// Calculates the probabilities for both speech and background noise using

// Gaussian Mixture Models (GMM). A hypothesis-test is performed to decide which

// type of signal is most probable.

// - returns : the VAD decision (0 - noise, 1 - speech).

Unfortunately, GmmProbability() does not seem able to distinguish a true human speech vs any other loud noise or sound.

I.e. if I just scratch the mic, I'll get "1-speech", if I leave my cup of coffee on the table, I'll get again "1-speech", yet I would like to distinguish these sounds from when I am actually speaking.

Is there any way to configure or change GmmProbability() to make such distinction?

Has anyone experimented with this?

Thanks,

Danail Kirov

nauroz nausherwani

unread,

May 16, 2018, 2:30:41 AM5/16/18

to discuss-webrtc

Try setting aggressiveness mode to 3 and see it it helps.
best regards,
Nauroz

Reply all

Reply to author

Forward

0 new messages