Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

[ANNOUNCE] BuildBuddy RBE experiment - Data review

147 views
Skip to first unread message

Álvaro Vilaplana García

unread,
Feb 27, 2024, 8:52:49 AM2/27/24
to Repo and Gerrit Discussion
Dear community, 

On December 29th, we embarked on an exciting journey by announcing the commencement of our experimentation phase with BuildBuddy [1], a new RBE provider, aimed at exploring alternatives to our current RBE Provider, GCP.

Today, I am thrilled to share with you the culmination of our efforts—the insightful data collected throughout this period.

I would like to remark on 2 important points:

- Differences between both approaches: BuildBuddy RBE self-hosted [2] and Google Cloud Remote Build Execution (RBE) both optimize software build processes but vary in deployment and management. BuildBuddy RBE self-hosted offers control and customization, allowing deployment on own servers or cloud, while Google Cloud RBE is a fully managed service by Google. BuildBuddy integrates with existing workflows, providing flexibility, while Google Cloud RBE integrates seamlessly with GCP services. BuildBuddy RBE self-hosted demands maintenance, and scalability handling, unlike Google Cloud RBE, which offloads these tasks to Google, suiting varied organizational needs.

- For this experiment, we've set up an additional Jenkins server (not publicly accessible) dedicated to executing RBE builds with BuildBuddy, operating alongside the current Jenkins server (https://gerrit-ci.gerritforge.com/).

Data collection

Before analysing the data, it's imperative to elucidate our data collection methodology. To procure the build time, we employed two APIs:

- Query changes API to list the changes [3]:

- List of checks API [4] for a specific change and a revision number: 
https://gerrit-review.googlesource.com/changes/{change_number}/revisions/{revision_number}/checks

Notes:
- Build number is a unique number represented by the tuple: (change number, revision number).
- All the graphs show builds in chronological order.
- The build numbers are not shown in the graphs for readable purposes.
- Builds labelled as "RUNNING" or those lacking specification according to the API have been excluded from the calculations.

Key Performance Indicators

- Average Build Time: Calculate the average build time for each platform (GCP RBE and BuildBuddy RBE) to understand the typical time it takes to complete a build on each platform.

- Percentage of Builds Faster: Determine the percentage of builds that are completed faster on BuildBuddy RBE compared to GCP RBE. This helps assess which platform is more efficient in terms of build time.

- Overall Success Rate / Failure Rate: Calculate the overall success and failing rate of builds on BuildBuddy RBE. This considers both successful and failed builds to provide a comprehensive view of platform reliability.

- Outliers (>60 minutes): Identify the percentage of builds that exceed a certain threshold, such as 60 minutes in BuildBuddy RBE. This helps pinpoint builds that take exceptionally long and may require investigation or optimization.

- Average Build Time Reduction: Determine the average reduction in build time when using BuildBuddy RBE compared to GCP RBE. This quantifies the efficiency improvement gained by using the BuildBuddy platform.

Phases

Our process has been segmented into two distinct phases:

Phase 1: Spanning from December 28th, 2023, to February 9th, 2024, during which RBE BuildBuddy operated against the Gerrit master branch.
Phase 2: Commencing from February 10th, 2024, until the present day, with RBE BuildBuddy operating across the Gerrit master, stable-3.7, stable-3.8, and stable-3.9 branches.


Phase 1
To make the data more readable and understandable, I have split the data into 2 graphs:
- Figure 1: RBE Successful Build time for Gerrit master between 28th December 2023 to 18th January 2024
- Figure 2: RBE Successful Build time for Gerrit master between 19th January 2024 to 9th February 2024

Total number of builds
masterstable-3.7stable-3.8stable-3.9
GCP Builds489???
BB Builds489NANANA


Build status
BB SuccessfulBB Failed
GCP Successful39017
GCP Failed082

It's worth noting that 3.47% of BuildBuddy builds failed and the reason was attributed to misconfiguration regarding the number of executor containers.

Average build time when GCP and BB Successful
Minutes
GCP Average18.69
BB Average10.2

Where the average build time reduction is 8.49 minutes and 96,4% (376 out of 290 builds) of BB builds are faster than GCP builds.

We found 1.2% (6) outliers BB Successful builds, that occurred because the Jenkins server required a restart, resulting in temporary disruptions.

change_numberrevision_numbertotal_revisionsbranchrbe_gcp_startedrbe_gcp_staterbe_gcp_time_minutesrbe_bb_startedrbe_bb_staterbe_bb_time_minutes
40039812master2024-01-03 07:05:28SUCCESSFUL6.72024-01-03 07:05:28SUCCESSFUL868.68
3996571136master2024-01-05 02:25:28SUCCESSFUL13.72024-01-05 02:25:28SUCCESSFUL1293.55
3996571436master2024-01-05 10:55:29SUCCESSFUL21.452024-01-05 21:41:32SUCCESSFUL137.47
40095822master2024-01-05 13:30:29SUCCESSFUL14.522024-01-05 21:41:48SUCCESSFUL154.3
24781277master2024-01-17 16:15:30SUCCESSFUL26.622024-01-17 16:15:29SUCCESSFUL67.17
40659712master2024-02-02 20:05:14SUCCESSFUL14.182024-02-02 20:05:26SUCCESSFUL79.55


Average time when GCP and BB Failed
Minutes
GCP Average17.68
BB Average23.29

Conclusions:
Assessing performance and stability, the results are promising, with the BuildBuddy platform showcasing superior performance, as highlighted in the table "Average build time when GCP and BB Successful". Additionally, issues with BuildBuddy failing builds during successful GCP builds have been addressed, primarily stemming from resolved configuration problems. Although outliers represent a mere 1.23%, their significance is negligible. However, despite these favourable outcomes, caution is warranted due to the higher volume of builds in GCP compared to BuildBuddy, attributed to GCP's operation across stable branches.

Phase 2
To make the data more readable and understandable, I have split the data into 4 graphs:
- Figure 3: RBE Successful Build time for Gerrit master
- Figure 4: RBE Successful Build time for Gerrit stable-3.9
- Figure 5: RBE Successful Build time for Gerrit stable-3.8
- Figure 6: RBE Successful Build time for Gerrit stable-3.7

Successful BB Build status / Successful GCP Build status 
masterstable-3.9stable-3.8stable-3.7Total
Builds11926611162


Average time when GCP and BB Successful
Minutes
GCP Average13.91
BB Average8.45

Where the average build time reduction is 5.46 minutes and  90,74% (147 out of 162 builds) of BB builds are faster than GCP builds.

Failed BB Build status / Failed GCP Build status
masterstable-3.9stable-3.8stable-3.7Total
Builds30121144


Failed BB Build status /Successful GCP Build status 
masterstable-3.9stable-3.8stable-3.7Total
Builds12003

It's worth noting that 1.85% of BuildBuddy builds failed.

Average time when GCP and BB Failed
Minutes
GCP Average10.96
BB Average9.43

Conclusions:
The findings indicate that the BuildBuddy scenario demonstrates a more consistent performance, due to the on-premises allocated resources, as emphasized in the table "Average build time when GCP and BB Successful," with comparable volumes of builds. Moreover, the stability remains highly consistent, evident from the table "Failed BB Build status / Successful GCP Build status," alongside the absence of outliers.

References:

______________________________

Álvaro Vilaplana García
Figure_1_RBE_Successful_build_time.png
Figure_2_RBE_Successful_build_time.png
Figure_4_RBE_Successful_build_time.png
Figure_5_RBE_Successful_build_time.png
Figure_3_RBE_Successful_build_time.png
Figure_6_RBE_Successful_build_time.png
Reply all
Reply to author
Forward
0 new messages