Qualcomm 2.7 -> 3.4+ Data Upgrade Status, Issues, and Plans

347 views
Skip to first unread message

Martin Fick

unread,
Aug 18, 2021, 4:59:27 PM8/18/21
to repo-d...@googlegroups.com
I am happy to report that we (Qualcomm) have made great progress in the
upgrade speeds of older Gerrit versions. More specifically, we have made enough
improvements that we currently can now perform an offline notedb migration of
our largest instance's latest dataset (3.6M+ changes, 16K+ projects, 4+TB) on
NFS in less than 3 hours using the latest tip version of Gerrit v2.16! This is
particularly great since our previous attempt was on an older data set (about
1/3 smaller) and on local disks (not NFS), and it took over 3 days!!!

------------- HIGHLIGHT -------------
For anyone considering upgrading from anything before 2.16, we STRONGLY
suggest trying the latest 2.16-stable tip to see if upgrades and migration is
much better for you now!

------------- PERFORMANCE OBJECTIVE -------------
As mentioned in the previous thread, our objective is to perform the entire
2.7-3.4+ upgrade in under 4 hours. While many improvements are still needed
to get there, this is huge leap towards that goal, and we believe we still
have many things we can potentially improve. The current full upgrade timeline
for the following 3 phases, for us, looks like :

* PHASE 1 - Upgrade from 2.7 to 2.16 (schema migrations)
-> ~3.5 hours

* PHASE 2 - NoteDB migration
-> ~2.5hours + 1 hour to repack All-Users (we hope to inline this in the
migration soon)

* PHASE 3 - Upgrade from 2.16 to 3.4+ (schema migrations + indexing)
-> ~22hrs

Which is still over a day and not realistically fast enough yet even if we
relax our objective a bit. However, note that the previously longest phases is
now the shortest phase, so I am optimistic that there are some low hanging
fruit in those other phases still.

------------- FUTURE IMPROVEMENTS -------------
--- Workflow ---
We would like to reduce the amount of user steps required to perform such a
long upgrade and we will likely be submitting changes soon to add additional
switches to make things simpler. For example:

-An "init" switch to make draft changes become private changes instead of the
WIP default (just completed this)

-An "init" switch to control the thread counts for indexing since it seems to
default to 1 currently

--- Performance ---
To meet our 4 hour objective we would like the to have something like:

* 0.5 hrs -> PHASE 1 - Upgrade from 2.7 to 2.16 (schema migrations)
* 2.0 hrs -> PHASE 2 - NoteDB migration
* 1.5 hrs -> PHASE 3 - Upgrade from 2.16 to 3.4+ (schema migrations +
indexing)

To get there, we will try:

* PHASE 1 - Upgrade from 2.7 to 2.16 (schema migrations)

-making schema 139 create an initial empty commit so that the user ref
does not get rewritten in schema 146
-not creating an empty commit in schema 146 if it is not going to be used
-continuing to optimize All-Users.git inefficiencies
-parallelizing some of the longer schema upgrades

PHASE 2 - NoteDB migration:

-adding repacking for All-Users to the noteDb migration after a certain number
of draft changes being written to the repo (currently repos are only repacked
after they have had many changes written to them, so All-Users is generally
unlikely to get repacked)

-ordering the "migration" phase slices the same way that the rebuild phase
slices are ordered (by project, instead of by change#s) to hopefully get some
better caching results (for reading change ref values).

-investigate why the diff caches seem to get overwritten during the migration
to see if we can eventually gain some more speed by pre-populating them

* PHASE 3 - Upgrade from 2.16 to 3.4+ (schema migrations + indexing)

-parallelizing the initial per project change ref scanning used just to split
the indexing up into slices (this alone can take more than an hour)
-re-using the results of the initial change ref scanning for each slice
instead of rescanning the changes for each slice
-investigating using the diff caches better
-investigating if there are bulk indexing options or APIs that can speed
things up

We have spent very little time so far on phases 1 and 3, hopefully we will get
more ideas, especially on phase 3! We welcome other suggestions!

Thanks,

-Martin

--
The Qualcomm Innovation Center, Inc. is a member of Code
Aurora Forum, hosted by The Linux Foundation

doug.r...@wandisco.com

unread,
Aug 19, 2021, 3:16:32 PM8/19/21
to Repo and Gerrit Discussion
Congratulations!  That's a GREAT reduction in NoteDB migration timing!

Patrick Hiesel

unread,
Aug 24, 2021, 7:27:38 AM8/24/21
to Martin Fick, repo-d...@googlegroups.com
Fantastic results, congratulations!

I recently looked at how we do index version upgrades and was wondering if instead of re-indexing all changes from scratch for every new schema version we could just index the new fields and add them to existing docs as well as drop fields that were dropped from the schema.

Some fields - for example anything related to diffs - are expensive to compute but also don't change often.

WDYT?

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/2046916.jXAeu0JPfY%40mfick-lnx.

Martin Fick

unread,
Aug 24, 2021, 3:59:47 PM8/24/21
to Patrick Hiesel, repo-d...@googlegroups.com
On 2021-08-24 05:27, Patrick Hiesel wrote:
> Fantastic results, congratulations!

Thanks!

> I recently looked at how we do index version upgrades and was
> wondering if instead of re-indexing all changes from scratch for every
> new schema version we could just index the new fields and add them to
> existing docs as well as drop fields that were dropped from the
> schema.

If that is faster, it would be nice. It may depend on which index
(Lucene/other) is being used whether this would help.

> Some fields - for example anything related to diffs - are expensive to
> compute but also don't change often.

It would be good if we could ensure that the diff caches are able to be
used, we have not been able to figure out how to get pre-populated
diff caches to be re-used on indexing or on migration.

Nguyen Tuan Khang Phan

unread,
Feb 3, 2022, 4:09:35 PMFeb 3
to Repo and Gerrit Discussion
Hi,

We currently just did a test upgrade from 2.14 to 2.16 in order to reach 3.4. However, during 2.16 upgrade we faced an increased slow down at step " Migrating data to schema 154 ..    " which took  (39499.524 s) compared to other steps which are less than a second usually. We didn't even start indexing yet.

Nasser Grainawi

unread,
Feb 3, 2022, 5:10:35 PMFeb 3
to Nguyen Tuan Khang Phan, Repo and Gerrit Discussion
On Feb 3, 2022, at 2:09 PM, Nguyen Tuan Khang Phan <phan....@gmail.com> wrote:

Hi,

We currently just did a test upgrade from 2.14 to 2.16 in order to reach 3.4. However, during 2.16 upgrade we faced an increased slow down at step " Migrating data to schema 154 ..    " which took  (39499.524 s) compared to other steps which are less than a second usually. We didn't even start indexing yet.

This might fit better into a new thread. Can you start one and provide some info on your setup? Notably, Schema 154 migrates accounts to NoteDb, so information on the # of accounts you have and on what kind of disk (local, nfs; spinning/ssd) your All-Users git repo lives would be especially helpful. Please also include the exact 2.16 version you used (anything earlier than 2.16.28 does not have the latest optimizations).

Nasser


On Tuesday, August 24, 2021 at 3:59:47 PM UTC-4 MartinFick wrote:
On 2021-08-24 05:27, Patrick Hiesel wrote: 
> Fantastic results, congratulations! 

Thanks! 

> I recently looked at how we do index version upgrades and was 
> wondering if instead of re-indexing all changes from scratch for every 
> new schema version we could just index the new fields and add them to 
> existing docs as well as drop fields that were dropped from the 
> schema. 

If that is faster, it would be nice. It may depend on which index 
(Lucene/other) is being used whether this would help. 

> Some fields - for example anything related to diffs - are expensive to 
> compute but also don't change often. 

It would be good if we could ensure that the diff caches are able to be 
used, we have not been able to figure out how to get pre-populated 
diff caches to be re-used on indexing or on migration. 

-Martin 

-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation 

-- 
-- 
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

--- 
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

David Ostrovsky

unread,
Feb 6, 2022, 3:46:12 AMFeb 6
to Repo and Gerrit Discussion
Can you break down the numbers in PHASE 3, and also clarify how often
you do reindexing? If it's something: 1 hour schema migrations + 21 hrs
reindexing at 3.4+ only, have you considered delta reindexing approach:
backup the prod data, perform full migration on staging machine. Copy
index directory to production site, skip offline reindex step and perform
online reindexing of changes that changed during migration process. 

Luca Milanesio

unread,
Feb 6, 2022, 5:12:37 AMFeb 6
to Repo and Gerrit Discussion, Luca Milanesio, David Ostrovsky
That’s actually a good idea, thanks for sharing it.

The high-availability plugin [1] also has a delta-reindex mode: you just set the start date on the $GERRIT_SITE/data/high-availability/change file and it will reindex all changes created/edited after that date.
The only issue is just removed changes after that date will still exist in the index; however, they should be traceable from the httpd_log and can be removed separately.

HTH

Luca.


-- 
-- 
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

--- 
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

Martin Fick

unread,
Feb 10, 2022, 3:23:23 PMFeb 10
to David Ostrovsky, Repo and Gerrit Discussion
On 2022-02-06 01:46, David Ostrovsky wrote:
> MartinFick schrieb am Mittwoch, 18. August 2021 um 22:59:27 UTC+2:
>
>> ------------- PERFORMANCE OBJECTIVE -------------
>> As mentioned in the previous thread, our objective is to perform the
>> entire
>> 2.7-3.4+ upgrade in under 4 hours. While many improvements are still
>> needed
>> to get there, this is huge leap towards that goal, and we believe we
>> still
>> have many things we can potentially improve. The current full
>> upgrade timeline
>> for the following 3 phases, for us, looks like :
>>
>> * PHASE 1 - Upgrade from 2.7 to 2.16 (schema migrations)
>> -> ~3.5 hours
>>
>> * PHASE 2 - NoteDB migration
>> -> ~2.5hours + 1 hour to repack All-Users (we hope to inline this in
>> the
>> migration soon)
>>
>> * PHASE 3 - Upgrade from 2.16 to 3.4+ (schema migrations + indexing)
>>
>> -> ~22hrs
>
> Can you break down the numbers in PHASE 3, and also clarify how often
> you do reindexing? If it's something: 1 hour schema migrations + 21
> hrs
> reindexing at 3.4+ only,

These numbers are actually quite out of date now. We did get the
opportunity to work on the schema migrations and the indexing.
The indexing is indeed the bulk of the upgrade from 2.16 to 3.5
(our latest target). Those schema migrations are only a few minutes,
not even close to an hour fortunately! We did a presentation at the
last user summit with more up-to-date numbers, and they were much
better. Indexing is quite fast now, around 2 hours only! Since
these numbers are very much out of date, and not everyone has
watched our presentation (which is also a bit out of date now), I
will try to get another email out soon to summarize our latest
results, but we are still improving them!

The indexing could us some fixes because 3.5 can't seem to
handle the old format for the auto-merge refs properly. Kaushik is
working on fixes for that. We are also exploring the ES approach
now since our IT would like to use it, and it seems from our
analysis that ES provides read after write consistency which the
current Lucene approach doesn't seem to. We are working on a fix
for indexing to disable that consistency during offline re-indexing
since it isn't needed then, and this seems to bring the ES
implementation up to speed with the Lucene implementation, to the
point that neither is now the bottleneck for indexing. Reading the
git data from the repos seems to be currently the bottleneck.


> have you considered delta reindexing
> approach:
> backup the prod data, perform full migration on staging machine. Copy
> index directory to production site, skip offline reindex step and
> perform
> online reindexing of changes that changed during migration process.

Thank you for the suggestion David. We really want to avoid this
approach as it is more complicated, and potentially error prone.
We will likely however use a similar approach to at least
pre-populate Gerrit's persistent caches since that is very simple
thing to do, and hard to get wrong, and thus unlikely to get
out-of-date info accidentally. We really want to make this fast
and easy for everyone!
Reply all
Reply to author
Forward
0 new messages