WebCodecs AudioEncoder sample rate input for Opus

guest271314

unread,

Jul 30, 2022, 11:29:33 AM7/30/22

to media-dev

Let's say we capture audio input from microphone that is 2 channel, 44100 sample rate.

In order to avoid resampling the input data to 48000 we set the sample rate of encoder configuration to 44100.

{
numberOfChannels: 2,
sampleRate: 44100,
codec: 'opus'
}

AFAICT Opus does not support 44100 sample rate https://github.com/xiph/opus/issues/43, though we proceed anyway.

We create AudioData objects with the input sample rate and pass to AudioEncoder encode().

The encoder will encode Opus at 44100, as evidenced at Media tab in DevTools.

The problem is that there is an audible pop when playback with Media Source Extensions commences, and the audio playback completes 1 second before it is suppoed to.

To reproduce simply upload a WAV file having 2 channels, 44100 sample rate.

<!doctype html>
<html>
<head></head>
<body>
<input type="file" accept=".wav">
<script>
document.querySelector('input[type=file]').onchange = async(e)=>{
const wav = e.target.files[0].slice(44);

let timestamp = 0
, array = []
, chunks = []
, config = {
numberOfChannels: 2,
sampleRate: 44100,
codec: 'opus'
};

for (let i = 0; i < wav.size; i += 441 * 4 * 6) {
const int16 = new Int16Array(await wav.slice(i, i + 441 * 4 * 6).arrayBuffer());
// https://stackoverflow.com/a/35248852
const channels = [new Float32Array(441 * 6), new Float32Array(441 * 6)];
for (let i = 0, j = 0, n = 1; i < int16.length; i++) {
const int = int16[i];
// If the high bit is on, then it is a negative number, and actually counts backwards.
const float = int >= 0x8000 ? -(0x10000 - int) / 0x8000 : int / 0x7fff;
// deinterleave
channels[(n = ++n % 2)][!n ? j++ : j - 1] = float;
}
const data = new Float32Array(441 * 2 * 6);
const left = new Float32Array(channels.shift());
const right = new Float32Array(channels.shift());
data.set(left, 0);
data.set(right, left.length);
const frame = new AudioData({
timestamp,
data,
sampleRate: 44100,
format: 'f32-planar',
numberOfChannels: 2,
numberOfFrames: 441 * 6,
});
timestamp += frame.duration;
array.push(frame);
}
console.log(array);
const encoder = new AudioEncoder({
error(e) {
console.log(e);
},
output: async(chunk,metadata)=>{
if (metadata.decoderConfig) {
config.description = metadata.decoderConfig.description;
}
chunks.push(chunk);
}
,
});
console.log(await AudioEncoder.isConfigSupported(config));
encoder.configure(config);
for (const audioData of array) {
encoder.encode(audioData);
}
await encoder.flush();
console.log(chunks);

const audio = new Audio();
audio.controls = true;
audio.addEventListener('canplay', async(e)=>{
await audio.play();
}, {once: true});

No resampling Web API exists that I am aware of.

How to get rid of the initial pop when playback commences and playback the last 1 second of audio when input AudioData and configuration to AudioEncoder is 44100?

Or is resampling to 48000 input necessary to avoid the pop at initial playback and 1 second less playback duration?

guest271314

unread,

Jul 30, 2022, 8:12:37 PM7/30/22

to media-dev, guest271314

Another example where we avoid the initial pop at playback yet still lose 1 second of playback at HTMLMediaElement time line when we don't resample input to 48000. When playing the WAV file at HTMLMediaElement and when resampling to 48000 the timeline ends at 1:35. When input is 44100 playback ends at 1:34.

<!DOCTYPE html>
<html>
<head></head>
<body>
<input type="file" accept=".wav" />
<script type="module">

document.querySelector('input[type=file]').onchange = async (e) => {

const data = new Int16Array(
await e.target.files[0].arrayBuffer()
).slice(44);

let config = {
numberOfChannels: 2,
sampleRate: 44100,
codec: 'opus',

};

const encoder = new AudioEncoder({
error(e) {
console.log(e);
},
output: async (chunk, metadata) => {
if (metadata.decoderConfig) {
config.description = metadata.decoderConfig.description;
}
chunks.push(chunk);
},
});
console.log(await AudioEncoder.isConfigSupported(config));
encoder.configure(config);

await encoder.flush();

const audio = new Audio();

audio.controls = audio.autoplay = true;
const events = [
'loadedmetadata',
'loadeddata',
'canplay',
'canplaythrough',
'play',
'playing',
'pause',
'waiting',
'progress',
'seeking',
'seeked',
'ended',
'stalled',
'timeupdate',
];
for (const event of events) {
audio.addEventListener(event, async (e) => {
if (e.type === 'timeupdate') {
if (!ms.activeSourceBuffers[0].updating) {
ms.activeSourceBuffers[0].timestampOffset = audio.currentTime;

}
}
});
}
document.body.appendChild(audio);

const ms = new MediaSource();
ms.addEventListener('sourceopen', async (e) => {
console.log(e.type, config);
URL.revokeObjectURL(audio.src);
const sourceBuffer = ms.addSourceBuffer({
audioConfig: config,
});
console.log(ms.activeSourceBuffers);

sourceBuffer.onupdate = (e) => console.log(e.type);

sourceBuffer.mode = 'sequence';
for (const chunk of chunks) {
await sourceBuffer.appendEncodedChunks(chunk);
}
});
audio.src = URL.createObjectURL(ms);
};
</script>
</body>
</html>

blade_runner.wav

guest271314

unread,

Jul 30, 2022, 8:44:28 PM7/30/22

to media-dev, guest271314

Observing the 1st EncodedAudioChunk the byteLength is 8 and the duration is set to 60000, where the 2d EncodedAudioChunk byteLength is 409 and the duration is 60000. The 1st EncodedAudioChunk duration does not appear to be accurate.

Reply all

Reply to author

Forward