Optimizing the initial repo sync

1,440 views
Skip to first unread message

Marc Herbert

unread,
May 26, 2016, 8:39:44 PM5/26/16
to Chromium OS dev
Hi,

Tentative short story:

Could repo be made smart enough not to download the same (and big) git bundle multiple times in a row?


Long story:

I performed various repo --trace sync experiments in order to locate "hotspots" in the initial, full repo checkout and see if it could be made faster. Think a fully scripted build from scratch.

First I naively try repo sync --current-branch --no-tags. This made me realize that repo sync is for the very first sync using git bundles in many places.

So next I went and tried repo sync --no-clone-bundle --current-branch --no-tags and it saved a bit of time but not much. Looks like git bundle optimizations (whatever they are) are pretty good at offsetting the cost of including all tags and branches.

Bundles or not and unless you're building the browser (who does that in os-dev? :-), third_party/kernel/vX.Y take the lion's share. Not a surprise.

Now what came as a surprise is repo downloading the same 1.3G https://chromium.googlesource.com/chromiumos/third_party/kernel/clone.bundle  FIVE TIMES. Once per kernel. More disappointing: repo does not even download these five times in parallel but *consecutively*. I guess it's because they all eventually share the same .repo/project-objects/chromiumos/third_party/kernel.git backend and repo knows that from the manifest. The shared backend is a great optimization for disk space and incremental repo sync but it seems to hurt the initial repo sync fairly bad.

One workaround could be to hack the manifest before the repo sync and remove all but the desired kernel. Temporary and not very pretty.

Comments, ideas, thoughts, encouragements, pointers to source, documentation,...? Thanks!


Marc

Mike Frysinger

unread,
May 27, 2016, 12:26:14 AM5/27/16
to Marc Herbert, Chromium OS dev
support for checking out the same repo to multiple locations with diff branches is something we added.  git bundles weren't really around at that time.  so we probably missed edge cases like this.  feel free to dive in :).

--
--
Chromium OS Developers mailing list: chromiu...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-os-dev?hl=en


Marc Herbert

unread,
May 27, 2016, 1:37:38 AM5/27/16
to Chromium OS dev, marc.h...@gmail.com
On Thursday, 26 May 2016 21:26:14 UTC-7, Mike Frysinger wrote:
support for checking out the same repo to multiple locations with diff branches is something we added.  git bundles weren't really around at that time.  so we probably missed edge cases like this.

Thanks for the confirmation(s).
 
feel free to dive in :).

Dive, not just dip? Saving ~5 Gigabytes is quite tempting but time is unfortunately not infinite...

https://chromium.googlesource.com/external/repo/

"Upstream first" or not for something like this? https://gerrit.googlesource.com/git-repo ?

Anyone has any idea whether Android ever uses the same "shared object" feature https://gerrit.googlesource.com/git-repo/+/8d20116038ff78  or is it just Chromium?

Mike Frysinger

unread,
May 27, 2016, 11:43:18 AM5/27/16
to Marc Herbert, Chromium OS dev
i wasn't sure if the shared repo CLs made it upstream.  i recall there being some fricition.  but if it's all in upstream, then yes, sending CLs to the repo guys directly would be preferable.
-mike

Reply all
Reply to author
Forward
This conversation is locked
You cannot reply and perform actions on locked conversations.
0 new messages