How to find out which are the fast cores on a device?

254 views
Skip to first unread message

John Dallman

unread,
Feb 13, 2024, 10:18:30 AMFeb 13
to andro...@googlegroups.com
Preface: I am not producing an Android app. I work for a software component business, creating shared libraries, compiled from C and C++ code, for use in third-party customers' apps. I test my libraries in a command-line test harness, which I run in the ADB shell. I am only producing software for 64-bit ARM, because none of the customers want 32-bit code. 

The libraries I produce can use multi-threading to improve performance. This is done by creating additional "worker" threads and splitting up the work between them. This is done using pthreads, or WIN32 threads on Windows. I have not implemented multi-threading on Android yet, but that time is approaching, and it's a good example for a problem I have: how do I discover the identities of the fast, medium and slow cores on a device? 

On Windows, I can use the PROCESSOR_RELATIONSHIP data structure to do this. On macOS, I can use sysctlbyname() to find out how many cores there are at each performance level, but I can;t get their identities. On Android (and Linux), I can look into /sys/devices/system/cpu/cpufreq. There I can see various policies, and find out which cores they apply to. There's documentation at https://docs.kernel.org/admin-guide/pm/cpufreq.html which tells me about the power scaling and suchlike. But it was written by Intel, when all the cores in any processor Intel built were the same, and doesn't cover systems with a mixture of core types.

I haven't found anything that tells me which policy applies to the fastest cores, which to the medium, and which to the slow. I can deduce this, on my current devices: they have only one top-speed core, so the policy that only applies to one core must be it. But I don't know if that generalises to all devices?

Thanks in advance,

John

enh

unread,
Feb 13, 2024, 11:41:52 AMFeb 13
to andro...@googlegroups.com
i think you need to look in /sys/devices/system/cpu/cpu*/cpu_capacity,
group cores with the same value, and then sort those groups. (and note
that if i look at those on a pixel 8, for example, i actually get
three big-middle-little groups, not just big-little.)

i think in practice you'll get the same results from
/sys/devices/system/cpu*/cpufreq/cpuinfo_max_freq, but that's not
necessarily guaranteed --- i think the cpu_capacity stuff is the
_intended_ way for you to query "which are the good ones?".
> --
> You received this message because you are subscribed to the Google Groups "android-ndk" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to android-ndk...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/android-ndk/CAH1xqgn1ANOf98GGovWaFuNdpu%3DmNW7xrqeksVW6njmQZq10Ow%40mail.gmail.com.

John Dallman

unread,
Feb 13, 2024, 1:25:07 PMFeb 13
to andro...@googlegroups.com
Yup, that works. The feature went into the Linux kernel in December 2016 (https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-devices-system-cpu). It's in the Android 12 on my main test devices, but not in the Android 9 on the device I keep for validating on the oldest standard I support. 

It's also in the ARM Linuxes I have running (Amazon Linux 2 and RHEL 8.x), but not any of the Intel Linuxes I have. 

Thanks very much.

John    

yu...@unity3d.com

unread,
Feb 14, 2024, 7:19:27 AMFeb 14
to android-ndk
Your questions are very spot on John.

If you are able to feed your worker threads with small and even chunks of workload (say 1ms), then you can probably be fine by loading all cores. Or maybe even only the small ones if the deadline is not super tight, like 16ms for a 60fps game. (small cores are much much slower though)

For manual control over core affinity, I am not aware of an easy solution which takes into account different workloads (short and long), different core types, throttling and power efficiency.

We are using a multi-tiered detection system, with cpu_capacity being the most reliable one (but broken on some devices). cpuinfo_max_freq is worthy too. the older devices you have to support, the more potential issues.

As a result, you usually get a mask of the top half+- of the total number of CPUs.

Yury

enh

unread,
Feb 14, 2024, 10:28:07 AMFeb 14
to andro...@googlegroups.com
On Wed, Feb 14, 2024 at 4:19 AM 'yu...@unity3d.com' via android-ndk
<andro...@googlegroups.com> wrote:
>
> Your questions are very spot on John.
>
> If you are able to feed your worker threads with small and even chunks of workload (say 1ms), then you can probably be fine by loading all cores. Or maybe even only the small ones if the deadline is not super tight, like 16ms for a 60fps game. (small cores are much much slower though)
>
> For manual control over core affinity, I am not aware of an easy solution which takes into account different workloads (short and long), different core types, throttling and power efficiency.
>
> We are using a multi-tiered detection system, with cpu_capacity being the most reliable one (but broken on some devices).

if you can think of a CTS test that would prevent more of whatever
breakage you've seen, let me know...
> To view this discussion on the web visit https://groups.google.com/d/msgid/android-ndk/a3838d4b-e7aa-4ff2-ba86-5f0fa733ebd6n%40googlegroups.com.

John Dallman

unread,
Feb 14, 2024, 11:41:34 AMFeb 14
to andro...@googlegroups.com
On Wed, Feb 14, 2024 at 12:19 PM 'yu...@unity3d.com' via android-ndk <andro...@googlegroups.com> wrote:
Your questions are very spot on John.

Thanks!
 
If you are able to feed your worker threads with small and even chunks of workload (say 1ms), then you can probably be fine by loading all cores. Or maybe even only the small ones if the deadline is not super tight, like 16ms for a 60fps game. (small cores are much much slower though)

It's not a question of frame-rate deadlines, but of getting mathematical modelling operations that can take whole seconds through as fast as possible. I'm producing technical computing software for Android: it is usually used for handling data from Windows/Mac/Linux applications out on construction sites or on factory shop-floors. People want it on mobile devices rather than the cloud because Wi-Fi tends not to exist, or not be reliable, in the places where they want to use Android or iOS.    
 
The older devices you have to support, the more potential issues.

I think I'm prepared to say "Android 12 or later for using multiple threads." 

John

yu...@unity3d.com

unread,
Feb 15, 2024, 4:49:01 AMFeb 15
to android-ndk
On Wednesday, February 14, 2024 at 4:28:07 PM UTC+1 enh wrote:
if you can think of a CTS test that would prevent more of whatever
breakage you've seen, let me know...

I'd be happy to!
Searching through slack archives... I found this dated 2019, from a colleague:
> But as far as I can see cpu_capacity is only affected if scaling_max_freq is lowered.
> And so far I have only seen this happening on Exynos model of S10 
So, we've seen cpu_capacity varying on this device when overheating. (while it should not, should it? it's DMips/MHz or something like that, a measure of max CPU performance)
It may have been caused by scaling_max_freq changing... no idea if it's even supposed to change too. may have something to do with gaming modes on phones, for example.
Unfortunately, I don't have the device anymore at hand, so can't provide any other details than these. It may have even been fixed in a software update.

I don't see a full list of CTS test cases so I can't imagine one that fits right away, but something along "foreach core query cpu_capacity, do some work to heat up, foreach core query cpu_capacity again, fail if changed"

yu...@unity3d.com

unread,
Feb 15, 2024, 4:52:38 AMFeb 15
to android-ndk
I think you should be quite safe with cpu_capacity, or even using all cores and let the governor/scheduler decide what to do, if you can split your work into chunks of meaningful size. Microsecond-sized chunks are suboptimal. Millisecond-sized could be fine; measuring the performance is always a good idea!

John Dallman

unread,
Feb 15, 2024, 7:09:40 AMFeb 15
to andro...@googlegroups.com
> we've seen cpu_capacity varying on this device when overheating. (while it should not, 
> should it? it's DMips/MHz or something like that, a measure of max CPU performance)

It seems to be a relative measure for different cores in the system. The hardware I've looked at is
  1. An Ampere Altera Linux server, where all the cores are identical, and all have a capacity figure of 1024.
  2. A Snapdragon 8 Gen 1 Mobile HDK from Qualcomm, where the fastest core has capacity of 1024, the medium cores have 805, and the small ones 261.       
It is not plausible that the cores in the Ampere box have exactly the same performance as the fastest core in the MHDK: it is much faster from observation of the time our testing takes to run. 

I have not overheated any of these to see if or how cpu_capacity varies. 

John

--
You received this message because you are subscribed to the Google Groups "android-ndk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to android-ndk...@googlegroups.com.

enh

unread,
Feb 15, 2024, 8:44:12 PMFeb 15
to andro...@googlegroups.com
sadly, i suspect _that's_ the point where my test would get kicked out
of CTS :-(

(...or be completely ineffective because the fact the device has been
running CTS for the last 5 hours means it's already hot!)

> foreach core query cpu_capacity again, fail if changed"
>
> --
> You received this message because you are subscribed to the Google Groups "android-ndk" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to android-ndk...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/android-ndk/053ee74a-4915-4799-9587-9f26e18e7d0cn%40googlegroups.com.

yu...@unity3d.com

unread,
Feb 16, 2024, 3:45:27 AMFeb 16
to android-ndk
Reply all
Reply to author
Forward
0 new messages