What about invoking media capture (navigator.getUserMedia, or its unfortunate Chrome equivalent navigator.webkitGetUserMedia) when you invoke the Web Speech API? It looks like they can capture simultaneously.
Then you can use the Media Recorder API or the Web Audio API to further process the captured audio.
However, if your intention is to feed the Web Speech API some pre-processed (or post-processed, it depends on your perspective ;)) audio, then that is not currently possible, I believe.
Note that there was an
intent to implement (and this may actually be implemented, but disabled by default) to support something similar (but not quite) to this, but according to the comments on the W3C bug (linked within that thread), it is intended for original microphone sources only, so post processing is not supposed to work, I guess.