Next talk: "Analysis-by-Synthesis for Source Separation and Speech Recognition" by Michael Mandel, Sept 8th

8 views

Skip to first unread message

Colin Raffel

unread,

Sep 4, 2015, 1:24:25 PM9/4/15

to cu-neu...@googlegroups.com

Hi all, the next talk we'll be hosting will be on Tuesday, September 8th, at 4pm, in CEPSR 414. Michael Mandel, previously a PhD student here at Columbia in LabROSA and now an assistant professor at Brooklyn College, will be discussing his work on using deep networks for audio source separation and speech recognition. A talk abstract follows. Please redistribute, and see you there!

"Analysis-by-Synthesis for Source Separation and Speech Recognition"
Michael Mandel
Tuesday, September 8th, 4pm, CESPR 414

Separating speech from noise with a single microphone is a very underdetermined task, requiring a strong model of speech to be successful. This talk will present two such models, the first combines neural networks with exemplar based approaches for speech separation and recognition and the second provides a novel method for linking noise suppression using spectral masks with speech recognition using cepstral-domain features. The first system aims to reconstruct damaged or obscured speech using a concatenative speech synthesizer. This synthesizer is driven by a deep neural network-based selection function that predicts the similarity between pairs of noisy and clean speech "chunks". On the small-vocabulary CHiME2-GRID corpus, the resulting noise-free syntheses have speech quality much higher than similar approaches, almost as high as the original clean speech, but with lower intelligibility. The second system uses a full large vocabulary continuous speech recognition system as a structured prior model of speech and poses the estimation of the mel frequency cepstral coefficients (MFCCs) of the clean speech as an optimization problem. It thus finds the clean MFCCs that minimize a combination of the distance to reliable regions of the noisy observation and the negative log likelihood under the recognizer. This approach reduces speech recognition errors on the medium vocabulary AURORA4 task.

∿

Colin Raffel

unread,

Sep 7, 2015, 12:02:23 PM9/7/15

to cu-neu...@googlegroups.com

Hi all, a friendly reminder that Michael Mandel's talk is tomorrow. See you all there,