There is no Web API, nor Chromium or Chrome source code that I am aware of, for Linux, which provides a means to capture system audio, "What-U-Hear" from speakers or headphones.
You can use navigator.mediaDevices.getDisplayMedia() (even though there is a systemAudio option Chrome and Chromium do not actually capture entire system audio, only audio output to the captured Tab) to capture Tab audio (and video, where the video track can be removed from the MediaStream) on YouTube and on the same Tab use navigator.mediaDevices.getUserMedia() to capture microphone input.
Connect each respective MediaStreamTrack to a Web Audio API MediaStreamAudioDestinationNode then connect the MediaStreamTrack from MediaStreamAudioDestinationNode which will be a merged stream input of all audio nodes connected thereto, to MediaStreamAudioSourceNode for audio output (MediaStreamAudioDestinationNode on Chrome and Chromium do not output silence per specification the last time I checked). Alternative you can use MediaStreamTrackGenerator to manually create a single MediaStreamTrack with multiple inputs.