build profile 'gaps'

105 views
Skip to first unread message

lakiere

unread,
Feb 11, 2019, 10:38:57 AM2/11/19
to bazel-discuss
Dear all

I'm currently working on an existing project to migrate a GNU make system to a bazel one. I've been analyzing a build profile from a module we migrated so far with the experimental profiling tool provided by bazel, and compared it to a full 'drake' build (a software package which i'm comparing our builds with). Both builds are done using remote cache on the same remote caching server.

When comparing these two profiles, it is noticeable that our builds show a lot of 'gaps' in between every action where i can't see what bazel is doing. The gaps in our build vary from 100-200ms, while the gaps for the drake build vary from 10-20ms which is significantly faster due to the amount of total actions in the full build.
We also noticed that our 'computeMerkleDigests' take significantly longer compared to the ones from the drake build (~300ms compared to ~5ms), we believe this is due the way we include our headers since its a little bit of a mess for now.

Has anyone else had a similar experience or an idea on what might be the cause of the issue?

The repo for drake can be found here

Kind regards
Henning Lakiere


Screenshot from 2019-02-11 16-27-18.png
Screenshot from 2019-02-11 16-28-08.png

Lukács T. Berki

unread,
Feb 11, 2019, 10:42:06 AM2/11/19
to lakiere, bazel-discuss, Tobias Werth
--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/07141627-b628-40c5-acee-7a3df44b86ab%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Lukács T. Berki | Software Engineer | lbe...@google.com | 

Google Germany GmbH | Erika-Mann-Str. 33  | 80636 München | Germany | Geschäftsführer: Paul Manicle, Halimah DeLaine Prado | Registergericht und -nummer: Hamburg, HRB 86891

Tobias Werth

unread,
Feb 11, 2019, 11:11:49 AM2/11/19
to Lukács T. Berki, lakiere, bazel-discuss
Hi Henning,

we exclude certain task types from the JSON profile by default which might explain the gaps.
You currently have to change Bazel's source to change what we include in the profile:
(I have an open PR to add an option to configure this from the command line, but it needs more discussion.)

Jakob also noticed the slow computeMerkleDigests computations and started rewriting the merkle tree construction:
This implementation should be significantly faster.

Tobi

  

lakiere

unread,
Feb 14, 2019, 4:32:52 AM2/14/19
to bazel-discuss
Hi Tobias

I've followed your information and the results were really promising. The digests took 5 times less execution time which cuts around 6-7s (20%) of our build time which is great.

The profiling related change I was hoping to get more information after the changes i made:

       'src/main/java/com/google/devtools/build/lib/runtime/commands/ProfileCommand.java'
boolean isProfiling(ProfilerTask type) {
-        return !type.isVfs()
+        return !type.isVfs();
             
// CRITICAL_PATH corresponds to writing the file.
-            && type != ProfilerTask.CRITICAL_PATH
-            && type != ProfilerTask.SKYFUNCTION
-            && type != ProfilerTask.ACTION_EXECUTE
-            && type != ProfilerTask.ACTION_COMPLETE
-            && !type.isStarlark();
       
}
     
},

All this shows me are all ACTION_EXECUTION blocks and everything i already saw without the change, and i assume compiling is just a subpart of the ACTION_EXECUTION. Its something, but not really what i was looking for since all spaces in between are still undeclared on what is happening there and noticeably take longer compared to other builds.
There are also ACTION_EXECUTION blocks where nothing seems to be happening.

If im missing something, or if the changes i made are wrong please let me know.
On the merkletree part, good job!

Kind regards,
Henning
Screenshot from 2019-02-14 10-24-58.png

Tobias Werth

unread,
Feb 14, 2019, 5:26:41 AM2/14/19
to lakiere, bazel-discuss
Is there an easy way to reproduce your build?

lakiere

unread,
Feb 14, 2019, 5:39:48 AM2/14/19
to bazel-discuss
Unfortenately not as this is a closed source project.

I've been looking to build other projects with bazel to find any with similar problems but i havent found one yet.

lakiere

unread,
Feb 15, 2019, 9:19:26 AM2/15/19
to bazel-discuss
Eureka

I was able to reproduce the issue with a simple 30 hello worlds within our repository, building it the same way we build our own module.

Building the hello worlds outside the repo would take about ~1 second without extraction of the bazel binary and starting up blaze.
Building it within the repository just with regular bazel build has a similar result and the same for building within the docker that is being used to build our own module.

For reasons, we try to have as little bazel in the front end as possible so we have a 'make' wrapper around bazel itself. Using that to build the hello world package takes a
whopping 44 seconds. Compiling a single hello world source file takes up to 6 seconds. So i'll look into how the wrapped is calling bazel and where the problem lays.

Furthermore the profile doesnt show any major gaps when being build without a remote cache, everything is nice and lined up just like when building with drake.
It is when a remote cache is being used that there is unspecified time during the ACTION_EXECUTION phases. Although the drake build didnt seem that affected by the remote cache
i might have missed a detail when building it so i will look into that again.

With some more building and thinking, i concluded that it is probably the combination of using a docker and using remote cache what causes the gaps.

I will keep this thread updated if i find out more.

Kind regards,
Henning
Reply all
Reply to author
Forward
0 new messages