Notes from SG13 meeting at Kona

38 views

Skip to first unread message

Roger Orr

unread,

Feb 27, 2019, 5:37:03 PM2/27/19

to SG 13 (I/O)

Hello all,

Here are notes of the SG13 afternoon session at Kona, where we were looking at http://wg21.link/p1386 and a draft response from Apple (D1501R0)

Regards,

Roger.

Notes from SG13 meeting during WG21 meeting in Kona, 2019-02-22

There were 15 people attending for some or all of the afternoon.

The session began with a presentation by the authors of P1386R0 (about 2/3 of the room has read the paper.)

Authors are working on D1386R1 which should be in the post meeting mailing.

The most significant change is to use mdspan (see http://wg21.link/P0009 - in LEWG, targetting C++23) in the buffer implementation.

Interface covers audio data representation, device selection and configuration, and a real-time audio thread that talks to them.
Some APIs are purely callback based, like Core Audio on Mac and ASIO on Windows, which lets you give it a callback, and it runs your callback on a system-provided high-priority thread. Polling API like WSAPI on Windows, where you can block on an event, and when it's triggered a buffer is available. In that case, you have to create your own high-priority audio thread. If you have an embedded device without threads, you can't wait, but you need to poll for the buffer.
The authors are hoping for an interface that lets you easily do either approach.

Audio buffers can be either row-major or column-major.

Q: Does the interleaving change at runtime?
A: Won't change device-to-device. e.g. When the user switches from built-in speakers to Bluetooth speakers, we don't need the layout to change. Drivers might vary. e.g. ASIO might give a different interleaving than WSAPI on Windows.
Note: mdspan does give the option to decide the layout at runtime.

At the driver layer, things vary a lot. The OS layer gives you *A* buffer type. The proposed std::audio is just a wrapper around the native buffer type with a good C++ interface.

Note: When audio people say "buffer", they don't mean you're buffering. We mean a 2d array of data that is not always used for buffering.
LEWG may help with naming for the proposal, noting that "buffer" is a term of art.

There was a discussion about whether buffers needed to be volatile. After discussion the authors, and others, felt that this was not needed.
The paper should say explicitly that we've considered volatile access, and we think it's not needed.

Attempts to contact embedded developers should be made by the paper authors to confirm this.

Q: Interleaved/deinterleaved aspect: In the audio system layer it'll be one or the other. How does the client figure out which is used? Libraries might want to provide both options and use the one that matches the environment.
A: We expose it: see the device class.

Audio developers expect some names to appear in their code, so we provide those as extensions to the mdspan interfaces by some mechanism. Making mdspan an implementation detail means some of the BufferOrderTypes could allow the buffers to be non-contiguous.

Q: How do you deal with channel layout changes?
A: Won't happen in the middle of a callback. You'll get some notification that the channel layout has changed, and then you'll deal with the new number of channels on the next callback.

Discussion about the various ways to use mdspan to implement buffer -- LEWG to be consulted for guidance.

There was discussion about offering a list of input and a list of output buffers.
- Every known implementation provides 0 or 1 input buffer and 0 or 1 output buffer.

Q: Could this be modelled using some sort of concept hierarchy?
A: When this comes from the driver, it comes already type-erased, so the implementation would need to do the runtime check to dispatch.

Another open question is whether SampleType and BufferOrderType might be different between input and output buffers.

For the device discovery API we should look at SG1's work on executors and hardware discovery. Whatever you come up with should be consistent.
There are design questions over the device interface and querying a consistent set of values

There is another design question about querying the valid values to pass to a setter.

There are other issues too, with settings that may apply at the physical / logical level, with sharing devices, and cases when setting a value does not 'take'

The description of settings such as sample rate needs to be more nuanced, at least for CoreAudio.
When the user or API sets the hardware device settings, that sets it for everyone. An application can *also* request to interact with the device in a certain format and the OS would adapt that to the hardware's setting.

So, 2 ways to set settings, one that's global and another that applies just to the way the local application interacts.

Separate APIs to set the client sample rate vs the hardware sample rate. You might want them to be the same, but you need to specify exactly what happens.

The model in the current paper needs revision after checking against implementation practice.
Need to look at other abstraction layers and do what they do.

When settings change there is (typically) a callback. Need to provide a description of the behaviour of this

There are also various ways to manage the callback (event loops being one). The paper will need to specify something in terms of sequenced/happens-before.
There also needs to be a statement of constraints on user code

In WSAPI, you register a notification callback object, and that gets called on an unspecified thread that is not the main thread. Suspect it's the audio thread because methods say things should not be blocking, etc.

There was a comment about the partial attempt to discuss 'real time' in the paper. This is *not* the ajuthor's intent
The motivation is to provide a unified thin layer around existing practice, and not define audio processing.

Presenting device callback interface.

This has connect (with a callback) and a polling method. Some platforms only support connect(), some only support process(), some support both.

Q: how do I figure out which one to use?
A: it depends on the platform, it is an open question whether connect/wait should throw when not supported.

Needs some specification of, for example, what happens if I start on one thread and stop on another.

"is_running" has the same issue as shared_ptr::is_unique (now deprecated) -- see the notes for use_count()

We decided to template the device on the "driver type", which would then tell you what the native sample type is, and so on.
Another design question is whether to use an enumeration or a tag that you can specialize yourself on your own tag types

Suggestion to take any callable, constrained with concepts instead of directly taking a std::function, and do the type erasure internally.

The paper needs to discuss exception handling, even if it decides to mandate nothrow callbacks

There was a question about support for spatial audio formats, and systems like Dolby Atmos where you render to an encoded spatial format, and the OS renders that to the underlying audio hardware. These formats are a mixture of streams with different meaning.
(There are examples of this in the Android SDK)

Further discussion about registering callbacks - multiple callbacks, and callbacks with incompatible sampling rate or encoding

There should be discussion about how to write *portable* code in the next version of the paper.

You can have multiple devices simultaneously. Typically the OS would have one as default input and another as default output. For a simple player, you just ask for the default output and use that, otherwise you need to enumerate and pick which one to use.
More discussion about what sort(s) of API are required to query available devices and drivers.

It was thought that the reasoning needs to go in the paper.

Guidance from LEWGI was to return an optional when there might not be a default device.

Additional data structures

We would also need a lock-free queue. There are a couple of proposals in flight in this area

Another thing we might need is timestamps which can be used to e.g. synchronize with video.
For offline processing there might need to be a special device type.

Moved on to discussion of D1501R0, a draft document giving Apple's response to the paper

One concern is around policy-based audio routing. In iOS some audio applications can interrupt and preempt other audio, so the availability of devices can come and go, and we are wondering how this fits in with this proposal.
For example if you plug in your Bluetooth device and you switch to that, or when you are listening to music and a call interrupts that.

This is something that needs looking into. If we put something into the standard, we need to make sure that something is reasonable implementable.
OTOH I don't think we need to cover everything, e.g. things that only make sense in one platform. We could do like in threads where the native handle is exposed and then the user can do whatever they need with it.

Another thing you could look at is browsers, WebAudio, and AudioWorklet. I think a lot of this is done through SDL compiled to JavaScript.

Another concern is privacy policies, like microphone privacy. This is another one of those things that maybe does not explicitly belong in this API.
I understand that this isn't really the concern of the stdlib, but it's important to make sure that the specification does not make these scenarios impossible.

--- Polls ---

Do we want to pursue audio at WG21 level given, given that it will take away effort from other activity?

SF F N A SA
4 3 2 0 0

Do we want to pursue the general direction of D1386R1 as presented?

SF F N A SA
5 2 2 0 0

Someone felt like this is the lowest-level possible and the vast majority of people would work at a higher level.
The response was that there's a lot of pain around writing these lower-level parts portably. I see value in making all the other interesting higher-level layers portable.

Discussion over whether exploring a high-level API in a proposal would be more useful.
It is an interesting question because it might be useful to standardize the high-level first and then work on this low-level scope.

Part of making this decision would be an investigation of how many developers come into contact with audio APIs at this level.

Do we prefer a TS as the ship vehicle for a standard audio API?

SF F N A SA
2 6 0 1 0

The 'against' was that the amount of stuff that we're looking at doesn't leave many questions that I feel would be answered by the TS.

Reply all

Reply to author

Forward

0 new messages