Help me analyzing this systrace / Low Latency Audio / Crackling on Nexus 4

Nils Schneider

unread,

Nov 28, 2013, 4:06:30 AM11/28/13

to andro...@googlegroups.com

Hello there,

I can't get rid of crackling audio on my Nexus 4 although I do not find a mistake in my app. I've setup everything properly, things such as the native sample rate of 48khz, the buffer size of 240, being lock-free, that all is still causing crackles that I'm unable to understand when looking at the systrace.

I've uploaded it here: http://heatvst.com/temp/n4_240_inCallback_crackles.html

The callback that does the computation is "OSLcbk", I gave it a name so I can clearly find it in systrace, the other threads of my app are "eider.heat.full".

There are a few things that I do not understand:

fRdy2 is never above 240 although I use three buffers (one more for safety).
The writing times of the FastMixer are not evenly spreaded just before the crackles come up. What is this an indicator for?
The core on my Nexus 4 is only 50% used and therefore is being downclocked. I'd like to prevent any downclocking to prevent crackles. Any hints how to do that? I even thought about burning the core with dummy calculations as long as there is time left for the current frame.
I tried to set the affinity of the callback thread "OSLcbk" to the second core, although the call doesn't throw any error android obviously ignores it

Any help is highly appreciated.

Regards,

Nils

Nils Schneider

unread,

Nov 28, 2013, 10:25:35 AM11/28/13

to andro...@googlegroups.com

I did some further investigation and added sections to my native code to help identifying when OpenSL ES calls me to fill data into the played buffers.

This is the new systrace, on my Nexus 4 with a block size of 240 again:

http://heatvst.com/temp/n4_240_bufferfilling.html

You can see here how I fill my 3 buffers from the thread "ptexecute". After that I call play and you can see that OpenSL calls me almost immediately to do a refill of the first buffer that has been played. But as you can see it takes a lot of time until OpenSL calls me again to request the second buffer and so on.

Would be nice if an expert could tell me if this looks okay or something is wrong here.

All this is crackling of course, I need to call Java to record the sections, that this can be slow only is without question. Just asking if the order of the events makes sense.

Athos Bacchiocchi

unread,

Nov 28, 2013, 11:41:42 AM11/28/13

to andro...@googlegroups.com

Hi Nils,

Sorry but your links lead to white pages (at least for me). What are you feeding to the output queue? Some internally generated signal, or samples taken from the input queue?

--
You received this message because you are subscribed to the Google Groups "android-ndk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to android-ndk...@googlegroups.com.
To post to this group, send email to andro...@googlegroups.com.
Visit this group at http://groups.google.com/group/android-ndk.
For more options, visit https://groups.google.com/groups/opt_out.

Nils Schneider

unread,

Nov 28, 2013, 12:10:22 PM11/28/13

to andro...@googlegroups.com

Hi again,

having heared that the systrace output only works on Chrome, I'll try it with some screenshots.

I use a generated signal, I create it inside of the OpenSL callback, as suggested by Raph Levien, he does the same in his synthesizer and presented it as "the solution" for low latency audio. It makes sense, everything else would add latency of one block at least.

Here are the shots:

These are from the first plot.

http://heatvst.com/temp/systrace_shot1.png

http://heatvst.com/temp/systrace_shot2.png

The writing times of the FastMixer are wierd, I do not know what that means exactly. The value of fRdy2 is never above 240, although I use 3 buffers I'd expect to have 720.

This is the second run:

http://heatvst.com/temp/systrace2_shot1.png

http://heatvst.com/temp/systrace2_shot2.png

You see here how I fill the buffers before playing. After starting playback, OpenSL requests one new buffer, but after that, it takes a long time until OpenSL requests a new buffer again. Seeing 3 fills and 1 refill before fRdy2 even goes up to its value of 240, then dropping down again immediately is confusing. An explanation would be nice.

Meanwhile I did a third run with a very large block size of 2048 just for testing purposes (3 buffers again).

http://heatvst.com/temp/systrace3_shot1.png

You can see here how OpenSL obviously calls me way too late to ask for a refill, If OpenSL asks for a new buffer there should be two times 2048 samples left for the audio driver to keep playback smooth, 92ms of time before crackling should start to begin. But the fRdy2 counter drops immediately, it wants me to calculate 2048 samples so late that I can't keep up.

Regards,

Nils

Brendon Costa

unread,

Nov 28, 2013, 2:24:47 PM11/28/13

to andro...@googlegroups.com

On 29 November 2013 04:10, Nils Schneider <lyve...@googlemail.com> wrote:

ing heared that the systrace output only works on Chrome,

I don't have time to look at the systrace graphs and understand what is happening, but I have successfully used low latency audio on the Nexus 4 running 4.2.2 (Cant remember what block size and sample rate was on that device but I assume you got that correct). Some issues that came up which maybe you have already looked into:

1) Regardless of how many blocks queued into OpenSL, or the block size used at that level (you don't have to use the native block size it is just optimal). If the processing time for generating a block of any size ever takes more than the time of one *NATIVE* block size (even if not using that), you will hear a discontinuity in the audio stream.

This got me at first. I thought it may be possible to add extra latency to the queue to soak up some processing jitter. We only had audio available every 20msec so if using 5msec native blocks we would do a lot of work once every 4 blocks. This work was about 7msec worth every 20msec. But because it was over the 5msec boundary it would click.

I tried increasing the queue size to account for the processing jitter and tried increasing the block size. But the limitation seems to be on the native pulling thread which MUST always complete in 5 msec.

2) Obviously as mentioned in the talk you need to review you code for any memory allocations/deallocations, use of locks or other possibly bad function calls that could also do mallocs or lock (they used a snprintf example from memory).

Nils Schneider

unread,

Nov 28, 2013, 2:59:59 PM11/28/13

to andro...@googlegroups.com

Thank you for your answer!

Knowing that 1) is not a bug, but a "feature" is good to know, I initially had the same idea to reduce jitter, failed always. for larger block sizes this is okay, but for a higher number of buffers this is a pity because on lower end devices with a single core I have no chance to compensate against interrupts. A good idea would then infact be to put processing into a separate thread to add additional queues, but as I cannot raise the priority of my own thread I can choose between two options that are both not optimal.

What is driving me mad is that my processing usually is not that expensive that it won't fit into the native block size, but as I do not eat up a complete core the system decides to throttle down the core and *THAT* brings me into huge trouble because I exceed the time of 5ms then. There isn't a way to keep a core at its maximum frequency? I even thought about doing some spinning after audio generation completed just to keep the CPU core happy.

Another thing I noticed is, that even if the Nexus 4 has 4 cores, I often run on core 0, together with a bunch of services. If the sensor service e.g. comes into play, the OpenSL callback is called delayed, again a source for glitches.

Setting the thread affinity is unfortunately ignored.

Point 2) is clear of course, only very raw math processing is done in the callback, a mix of
computation together with a good amount of table look ups :) using lock free ringbuffers for everything...

Raph Levien

unread,

Nov 28, 2013, 5:17:36 PM11/28/13

to andro...@googlegroups.com

I took a quick look but not a deep analysis. Here's what I see. Starting with your first systrace:

1. I also wonder why fRdy2 is at 240 even though you're asking for 3 buffers - it really should be 480 most of the time in this case, and occasionally fall to 240 (when you have two buffers like this, falling to one is a sign of a potential problem but should not be an audible discontinuity). Here I'm talking about the value in the SLDataLocator_AndroidSimpleBufferQueue object.

2. What's causing the delays seems to be a kernel migration (461.864 ms). These happen for power management reasons (in this case to shut CPU 2 down) are definitely a source of added delays on N4, and a good reason to run an additional buffer. In my experience, running two buffers handles pretty much all migration events without clicks.

3. In your OSLcbk thread, I see it computing for a while (2ms or so) in response to FastMixer waking it up, but I also see a blip about 3ms later. I don't see this in my app and wonder where it's coming from.

In your second systrace, fRdy2 is all over the place (it seems to be multiples of 80), and this leads me to wonder if you're requesting a buffer size other than native.

It is not possible to reserve a certain amount of CPU. That's an interesting feature request for the future. I would not recommend running a dummy load, as it will really hurt battery and is certainly not a guarantee for being run at a higher clock rate (and could have the effect of slowing you down due to thermal throttling). For now the best approach to this is to run highly optimized algorithms. That's basically why I'm doing careful NEON optimization for all the CPU-critical functions - FM core, resonant filter.

Take another careful look at your simple buffer queue locator and then I can take a closer look if there are still some questions.

Nils Schneider

unread,

Nov 28, 2013, 6:03:57 PM11/28/13

to andro...@googlegroups.com

In contrast to you I run a stereo signal, any other things in the initialization look more or less the same. I can guarantee that my buffer is of 240 in size, but stereo. So i enqueue blocks of 240*sizeof(short)*2 bytes.

May the value of 240 for fRdy2 be a cause by KitKat? I have this problems also on my Nexus 7a now (512/44100 native) but this device has run without any trouble before.

the algorithm has been tweaked a lot and it has a clock-per-instruction value of 0.33-0.7. There is a lot of computational work to be done for each sample, but I have my guidelines to produce a good sound and therefore cannot reduce the workload. For example, all modulations and parameter changes are completely smoothed, an oscillator can e.g. change its frequency for every sample and another one is that all oscillators are fully bandlimited, i.e. are aliasing free at any frequency they have to produce.

If looking at the load of the core that does the computation, there is, in theory, still room left for more features. For example reverberation is planned. But first I want to have it stable on as many devices as possible before adding features.

Nils Schneider

unread,

Nov 28, 2013, 7:11:57 PM11/28/13

to andro...@googlegroups.com

One thing I'm noticing right now in the second systrace: My callback gets called four times for every 20ms. if my block size wouldn't be 240 but 80, shouldn't I get called more often?

Nils Schneider

unread,

Nov 29, 2013, 5:17:23 AM11/29/13

to andro...@googlegroups.com

Hello Raph,

I did a systrace of your synthesizer and I can see that your fRdy2 now also is 240 only at max. Is this maybe an android bug of 4.4 KitKat?

Cheers,

Nils

Raph Levien

unread,

Nov 29, 2013, 11:08:15 AM11/29/13

to andro...@googlegroups.com

That's also the next thing I was going to try. It's not what I was expecting, but at this point I'm not sure if it's an actual bug or whether it's an expected diff based on the changes made in KitKat.

I will investigate further and get back to this thread. It's a holiday in the US so it will probably take a few days.

Raph

Nils Schneider

unread,

Nov 30, 2013, 6:04:44 AM11/30/13

to andro...@googlegroups.com

Good to hear that, looking forward to an answer.

This one:

http://heatvst.com/temp/systrace4_clockfreqproblems.png

is a good example of why the kernel brings me into trouble. You can clearly see that I'm out of the time frame just because Android decides to reduce the clock of the core. As soon as the clock frequency is raised again, everything works as expected.

High overhead is going on here because of my custom systrace sections, but the problem persists even if the overhead is gone. Other things come into play then, for example delayed execution of the callback thread because of the same core processes the sensor data, clock speed is still a point.

The terrible thing about this is, that it worked better before I did a lot of my last code optimizations. Because there was more work to do, the clock speed was higher and there were less problems.

Omri

unread,

Nov 30, 2013, 10:42:06 PM11/30/13

to andro...@googlegroups.com

Not sure if related or not.
Are you getting this when touch sounds are off as well?
https://groups.google.com/forum/#!topic/android-ndk/3-pznLfH4gQ

Glenn Kasten

unread,

Dec 4, 2013, 8:19:11 PM12/4/13

to andro...@googlegroups.com

Thanks for reporting this issue, and for your analysis.

I’ve prepared a changelist that appears to improve performance for me, but I would appreciate your testing also if you can build platform from source (or know someone else who can). Unfortunately, I cannot post binaries.

The source code patch is here: https://android-review.googlesource.com/#/c/71421

I believe that the root causes of this issue are a combination of:

1. In Android 4.4 (“KitKat”), the number of client-to-server buffers for fast tracks changed from 2 to 1, in order to further reduce latency. This is unrelated to the OpenSL ES buffer count.

2. On at least two devices (Nexus 4 and Nexus 5), there is scheduling jitter e.g. the migration delay that you noted.

3. Some apps themselves contribute timing jitter, and need to be able to tolerate such jitter.

The above patch only addresses #1 by restoring the number of client-to-server buffers for fast tracks back to 2. It does not address #2 or #3.

I would appreciate feedback from original poster or others on the effectiveness of this patch.

If/when the changelist or a derivative is submitted to AOSP, I can’t commit on when it would appear in a binary build.

In the longer term, I would like to continue to work together with our OEM and SoC partners to reduce scheduling jitter, and to provide more effective ways for apps to negotiate with the platform what level of app jitter should be tolerated.

Nils Schneider

unread,

Dec 5, 2013, 4:39:17 PM12/5/13

to andro...@googlegroups.com

Hi Glenn,

great to hear from you.

I have no idea how to build from source but I would be willing to wipe my Nexus 7 just to test your patch.

Just trying... so far my VM with Ubuntu 12.04 is building the source... I don't know yet how to apply the patch though, but that can't be that complicated. I'll keep you updated!