AOSP build time vs CPU core count (up to 56)

2,772 views
Skip to first unread message

Christian Gagneraud

unread,
Jun 1, 2022, 4:22:01 PM6/1/22
to Android Building
Hi all,

We're looking into reducing our build time.
The first benchmark we did was to measure build time of aosp_arm64 versus number of parallel jobs.
We have a machine with 28 cores (56 threads), so i ran a script to build aosp from 4 jobs up to 56 jobs with an increment of 4.

The results surprised me, in a nutshell, with ccache enabled and 100% hit rate
4 cores: 1h
8 cores: 42m
12 cores: 35m
16+ cores: 32m

Throwing more than 16 cores doesn't influence build time anymore, the curve is flat!

I just can't explain what is going on here. Does anyone know why the build process doesn't scale?

I'm currently running the same benchmark with ccache disabled. Will post an update once i have the data.

Anyone knows how to disable some of the build tasks? For example i've noticed that there is a lot of java source processing and wonder if they are all needed. I'm pretty sure that some documentation is generated, in our case we don't need that.

As well, any tips or feedback around reducing build times would be more than welcome.

Thanks,
Chris

Dan Willemsen

unread,
Jun 6, 2022, 3:44:11 PM6/6/22
to android-...@googlegroups.com
This is probably one of two things:

1. Disk Speed. Once you saturate your CPUs (and have the associated memory required -- somewhere around 2GB/core is usually decent), the next thing you'll often hit is disk bandwidth. This is where ccache tends to hurt more than it helps, unless the ccache is on a completely separate [also fast] disk (full cache hits may get better, but writing the cache ends up being 2x the disk bandwidth). Remote cache/execution plus Bazel's Build without the bytes concept is a more ideal caching solution for this case, but that's not available for Android at this point.

2. That -j isn't a great limiter. It worked "okay" when every build action was single threaded, but more and more tools are multithreaded these days. So if we have 64 cpu threads, and use -j64, if you get really unlucky with a tool that spawns a process thread for every CPU, we could end up with 4096 (64*64) process threads running, well over the 64 specified by `-j`. In a more limited case, this may mean that on average you're still using a good portion of your CPU, even with smaller -j values.

Anyone knows how to disable some of the build tasks? For example i've noticed that there is a lot of java source processing and wonder if they are all needed. I'm pretty sure that some documentation is generated, in our case we don't need that.
 
You're probably seeing metalava/etc. While that is used to generate documentation, it is also used to generate the API stubs used later in the build, so it can't just be turned off.

If you haven't seen it yet, Soong's Performance doc is a good overview on how some performance issues can be diagnosed.

Hopefully you're doing many more incremental builds than full builds, but I know that gets really complicated with some use cases.

---

For reference, the newer desktops at Google end up being 2x 18-core Intel Xeon Gold 6154 (so 72 threads) or the Ryzen 3995WX (128 threads), at least with NVME, but often extended with SSDs. We also run builds on a variety of GCE-based machines, from 32 to 128 vCPUs, and the difference there between pd-standard and pd-ssd can be substantial.

On my Intel desktop (w/NVME), without ccache, I'm seeing about a 10% improvement from -j36 -> -j74, so a little better than your comparison, but not a ton:

`lunch aosp_arm64-userdebug; m` (effectively -j74) takes 32m46s
`lunch aosp_arm64-userdebug; m -j54` takes 34m00s
`lunch aosp_arm64-userdebug; m -j36` takes 36m19s

With RBE (Google's implementation of https://bazel.build/community/remote-execution-services):

`lunch aosp_arm64-userdebug; USE_RBE=true m` takes 18m46s

- Dan

--
--
You received this message because you are subscribed to the "Android Building" mailing list.
To post to this group, send email to android-...@googlegroups.com
To unsubscribe from this group, send email to
android-buildi...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/android-building?hl=en

---
You received this message because you are subscribed to the Google Groups "Android Building" group.
To unsubscribe from this group and stop receiving emails from it, send an email to android-buildi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/android-building/a5b97997-311e-4913-a829-111d0433d227n%40googlegroups.com.

Christian Gagneraud

unread,
Jun 6, 2022, 3:46:12 PM6/6/22
to Android Building
On Thursday, 2 June 2022 at 08:22:01 UTC+12 Christian Gagneraud wrote:
Hi all,

We're looking into reducing our build time.
The first benchmark we did was to measure build time of aosp_arm64 versus number of parallel jobs.
We have a machine with 28 cores (56 threads), so i ran a script to build aosp from 4 jobs up to 56 jobs with an increment of 4.

The results surprised me, in a nutshell, with ccache enabled and 100% hit rate
4 cores: 1h
8 cores: 42m
12 cores: 35m
16+ cores: 32m

Here are my results with ccache disabled. These benchmarks were done on android 11 branch.

4 cores: 4h15m
8 cores: 2h20m
12 cores: 1h40m
16 cores: 1h25m
20+ cores: 1h15m down to 55 minutes

Again, the curve flatten with increasing cores.

I am now exploring RBE (Remote build execution), i found some information on the web (joined the magic group to access the docs), i have enabled the API on gcloud.
I haven't try a build yet, because i'm having a hard time to understand how to setup all of these.
Ultimately, we're interesting in reducing our build time in our CI.
Our builds are done in AWS, but we'll consider moving to gcloud if it's worth it.

In another thread, i found this table:
Screenshot_20220605_104826.png
Timing of local builds don't really match my benchmarking, odd.

Does anyone knows more about what should be a typical CI setup?
eg. How many cores is needed for the machine that run ninja, and how to size the worker pool? eg. number of workers and worker machine spec.

Looking at logs from ci.android.com, i can see NINJA_REMOTE_NUM_JOBS="500".
I wonder if the worker pool can be auto-scaled or not.

Thanks,
Chris


Christian Gagneraud

unread,
Jun 7, 2022, 2:52:55 PM6/7/22
to Android Building
Hi Dan,

Thanks a lot for your detailed answers, it is really helpful.
The machine i used for benchmarking build time has indeed less than ideal mass storage configuration.
I will do more benchmarking on another machine that has nvme.

I as well started to look into RBE (see my other email).

Thanks again,
Chris
Reply all
Reply to author
Forward
0 new messages