Creating an audio service for mojo.

Dale Curtis

unread,

Oct 14, 2014, 8:18:22 PM10/14/14

to mojo...@chromium.org

Hi all,

I put together basic audio support for media in mojo here:

https://codereview.chromium.org/651373003/

Long term this isn't really the right approach though. Instead we'll want a service which allows clients to create and render audio input and output streams. Before talking about any design I'd like to share what the current audio output system looks like; it's a bit simplified, but this is the gist of what happens today:

-----> is an IPC

#####> is a SyncSocket write.

<===== is an action by the OS level AudioOutputStream.

AudioOutputDevice (Renderer) AudioOutputController (Browser)

Create --------> Create ShMem, SyncSocket, AudioOutputStream.
<-------- Creation okay!

Play --------> Start AudioOutputStream (asynchronous)
| <######## Request first buffer. (delay=0)
Prepare #0
|
Prepared #0 ########> Buffer #0 complete.
...
...
<======== AudioOutputStream reads buffer #0 from ShMem.
| <######## Request next buffer. (delay=hardware delay + received)

....................................
... Process repeats ~ every 20ms ...
....................................

|
Prepare #N
|
... <======== Observe #N not ready, AudioOutputStream times out after ~10ms.
... <######## Request next buffer. (delay=hardware delay + silence)
|
Prepared #N ########> Buffer #N complete.
|
Prepare #N+1
|
Prepared #N+1 ########> Buffer #N+1 complete (overwrites buffer #N).
...
...
<======== Observe #N, #N+1 completed. Read ShMem.

I realize that's a bit complicated, so here are the most notable aspects:

The first buffer is requested before the OS level stream starts. The first OS level callback gets the previously requested buffer.
We perform a "timed wait" (via select()) if buffers are not ready when the OS level callback comes in.
Each buffer fulfilled by the renderer sends a counter over the SyncSocket to the browser so the browser knows which buffer has been filled.
Without timeouts, the flow is a simple producer consumer problem. Browser requests buffer, renderer provides.
When a timeout occurs, counter values are discarded until the received counter matches the expected counter or a timeout occurs.

I'd expect any API we design in mojo to support the concept of failed fulfillment (a.k.a client timeouts). Practically, and as it is today, what this likely means is that both a shared memory interface and a message passing interface. Towards that end, for output only, I think we'll want a mojom kind of like this:

[Client=AudioOutputStreamClient]
interface AudioOutputStream {
Create(AudioParameters params) => (bool success);

// Starts or stops the OS level stream from reading from the shared memory.

Start();
Stop();

// Client has written the next buffer to the shared memory.

OnNextBufferPrepared()
};

interface AudioOutputStreamClient {

// OS or other type error occurred.

OnError();

// Client should write the next buffer into the shared memory.

OnPrepareNextBuffer(uint32 audio_delay_bytes);
}

If all this sounds reasonable I'll start codifying it into a design document of sorts.

- dale

Viet-Trung Luu

unread,

Oct 14, 2014, 10:50:13 PM10/14/14

to Dale Curtis, mojo...@chromium.org

This doesn't seem quite right to me. I think you probably want to avoid inflicting (direct use of) shared memory on generic Mojo applications. Instead, you probably want to use data pipes.

Also, I'd argue that we move to a playback-scheduling system, wherein you can (if you want) tell the service when it should start/stop playing (and the system can report when it actually happened).

I suggest something like:

interface AudioService {

// [[no need for "success"; on failure, the output_stream message pipe

// is just closed]]

CreateAudioOutputStream(AudioOutputStreamParameters params,

handle<data_pipe_consumer> data,

AudioOutputStream& output_stream);

};

interface AudioOutputStream {

// |when_to_start| and |when_started| are on the same clock as given by

// |MojoGetTimeTicksNow()|; set the former to some suitably invalid value

// (which we should add) to mean "ASAP".

Start(int64 when_to_start) => (int64 when_started);

// [[or should it be "pause" instead of "stop"?]]

Stop(int64 when_to_stop) => (int64 when_stop);

};

(Possibly |AudioOutputStream| should have a client interface to report errors, or whatever else.)

- dale

To unsubscribe from this group and stop receiving emails from it, send an email to mojo-dev+u...@chromium.org.

Dale Curtis

unread,

Oct 15, 2014, 12:58:50 PM10/15/14

to Viet-Trung Luu, mojo...@chromium.org

Thanks for the feedback! There are a couple problems with using a data pipe. One is the concept of cancellation, the other is clock skew; media time (in flash and html5) is driven by a clock based on the number of audio frames rendered.

On clock skew first, since it dove tails with cancellation: the frequency at which data is generated by the client should exactly match (mostly, but always w/o cumulative drift) the actual frequency of hardware callbacks; for reasons of audio video synchronization.

What that means is that we can't just have a client which fills a data pipe until full, it needs to to deliver data on a schedule with awareness of what the current hardware delay is. We can make approximations and corrections, but it'd be better if we didn't have to; see FakeAudioConsumer for an example of correction for clock skew.

Secondly, on cancellation, when a client misses its deadline the service needs to drop one or more chunks of data that may not have arrived yet. A single chunk sized data pipe would stall, a > 1 sized data pipe would trigger the first two callbacks two fast (moving time in odd ways). It also risks the pipeline getting stuck forever behind if the client ever gets more than one buffer behind.

Also, I'd argue that we move to a playback-scheduling system, wherein you can (if you want) tell the service when it should start/stop playing (and the system can report when it actually happened).

Do you have a use case for this? None of today's clients would use something like this.

I suggest something like:

interface AudioService {
// [[no need for "success"; on failure, the output_stream message pipe
// is just closed]]
CreateAudioOutputStream(AudioOutputStreamParameters params,
handle<data_pipe_consumer> data,
AudioOutputStream& output_stream);
};

interface AudioOutputStream {
// |when_to_start| and |when_started| are on the same clock as given by
// |MojoGetTimeTicksNow()|; set the former to some suitably invalid value
// (which we should add) to mean "ASAP".
Start(int64 when_to_start) => (int64 when_started);

// [[or should it be "pause" instead of "stop"?]]
Stop(int64 when_to_stop) => (int64 when_stop);
};

Note that the form of |CreateAudioOutputStream()| allows commands (e.g., |Start()|) to be pipelined to |output_stream| immediately.

Neat! I didn't know that was possible.

(Possibly |AudioOutputStream| should have a client interface to report errors, or whatever else.)

According to UMA statistics errors happen 0.01% of the time, so it could definitely be a v2 or something that we auto-recover from on the service side.

Andrew Scherkus

unread,

Oct 15, 2014, 1:35:53 PM10/15/14

to Dale Curtis, Viet-Trung Luu, mojo...@chromium.org

I feel there's a reasonable case for inflicting shared memory on certain Mojo applications, but perhaps not all.

For example, live-audio production applications (e.g., Ableton, Reason, WebAudio) typically generate audio on demand when asked.

Now not everyone needs that level of access (it's a PITA to work with), so perhaps it'd possible to layer a datapipe-style API over top.

Andrew

Dale Curtis

unread,

Oct 15, 2014, 2:08:14 PM10/15/14

to Viet-Trung Luu, mojo...@chromium.org

Apologies, it took me a bit longer to grok what you were suggesting here. It could work well for media playback, but it may have issues with clients that can't prebuffer data (real-time comm). Synchronization also depends on how well MojoGetTimeTicksNow() tracks the hardware clock frequency. The clock could also skew if the initial playback delay isn't just a function of a single stream's audio data (it's not on Linux, but OSX and Windows are).

Dale Curtis

unread,

Oct 15, 2014, 2:11:57 PM10/15/14

to Andrew Scherkus, Viet-Trung Luu, mojo...@chromium.org

Yeah, I think this is the right approach, the primitive API should be the best we can design, influenced by what we've learned with the existing Chrome audio stack; but not everyone needs that.

For clients which don't need that (audio only, non-live, etc), we can provide an abstraction on top of the primitives which allows simpler use via data pipe or whatever the friendliest interface is.

- dale

Reply all

Reply to author

Forward