[RFC] Cache on patch-sets?

82 views
Skip to first unread message

lucamilanesio

unread,
Apr 24, 2015, 4:11:07 AM4/24/15
to repo-d...@googlegroups.com
Hi all,
I am writing to request a poll / opinion on whether we should introduce a new cache on patch-sets by id.

We are experiencing high volumes of SQL queries on patch_sets to retrieve records by change_id / patch_set_id
(over 160K per day, over 30% of the overall time spent on ReviewDB) ... I reckon most of them are actually caused by people hitting the list of open changes in Gerrit home page ... and I reckon that most of the 160K queries are returning the same items ;-)

As we do have pagination and the list of first page changes wouldn't typically be more than 100 items, would it make sense to have those items cached to avoid the explosion of SQL queries for patch-sets?

P.S. Of course we wouldn't use the cached value when actually looking at the details of a patch-set or when performing on-line edit. This wouldn't generate however 160K queries I guess ;-)

Feedback is more than welcome :-)

Luca.




Martin Fick

unread,
Apr 24, 2015, 2:48:38 PM4/24/15
to repo-d...@googlegroups.com
On Friday, April 24, 2015 01:11:07 AM lucamilanesio wrote:
> Hi all,
> I am writing to request a poll / opinion on whether we
> should introduce a new cache on patch-sets by id.
>
> We are experiencing high volumes of SQL queries on
> patch_sets to retrieve records by change_id /
> patch_set_id
> (over 160K per day, over 30% of the overall time spent on
> ReviewDB) ... I reckon most of them are actually caused
> by people hitting the list of open changes in Gerrit home
> page ... and I reckon that most of the 160K queries are
> returning the same items ;-)

I would be curious to find out exactly what is causing those
queries before commenting any more,

-Martin

--
The Qualcomm Innovation Center, Inc. is a member of Code
Aurora Forum, hosted by The Linux Foundation

Dave Borowitz

unread,
Apr 24, 2015, 3:11:59 PM4/24/15
to lucamilanesio, repo-discuss
On Fri, Apr 24, 2015 at 1:11 AM, lucamilanesio <luca.mi...@gmail.com> wrote:
Hi all,
I am writing to request a poll / opinion on whether we should introduce a new cache on patch-sets by id.

-1 until we know that there is no better way to understand/eliminate these queries, as Martin suggests.
 
We are experiencing high volumes of SQL queries on patch_sets to retrieve records by change_id / patch_set_id

My #1 guess is GetRelated. This is a huge source of pain for us, and I plan to rewrite it from scratch in the next couple of weeks. If your server doesn't have https://gerrit-review.googlesource.com/66289 in it then it's even worse.
 
(over 160K per day, over 30% of the overall time spent on ReviewDB) ... I reckon most of them are actually caused by people hitting the list of open changes in Gerrit home page ... and I reckon that most of the 160K queries are returning the same items ;-)

These really should not be. All the info on search result pages (with limited exceptions like stars) should be coming straight from the secondary index, that's why we have stored fields. If it needs to make additional queries then we screwed something up.
 
As we do have pagination and the list of first page changes wouldn't typically be more than 100 items, would it make sense to have those items cached to avoid the explosion of SQL queries for patch-sets?

See above, what doesn't make sense is why the change list would be touching the SQL DB _at all_ in the first place.
 
P.S. Of course we wouldn't use the cached value when actually looking at the details of a patch-set

I don't know that we are currently doing so  but we could mitigate some of this by using an ETag when listing revisions, so the client does more caching.
 
or when performing on-line edit. This wouldn't generate however 160K queries I guess ;-)

Feedback is more than welcome :-)

Luca.




--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dave Borowitz

unread,
Apr 24, 2015, 3:14:34 PM4/24/15
to lucamilanesio, repo-discuss
If you want to get some more details as to where this is coming from, you can run a patched build with some static methods to collect sample stack traces:

ResultSet<PatchSet> byChange(ReviewDb db, Change.Id id) {
  if (random < 0.01) {
    new Exception().printStackTrace();
  }
  return db.patchSets().byChange()
}

Luca Milanesio

unread,
Apr 25, 2015, 6:02:28 PM4/25/15
to Dave Borowitz, repo-discuss
According to JavaMelody, the HTTP call that triggers the SQL query is:

Will insert now the code suggested by Dave to get the full stack-trace.
As the REST API /changes is executed when someone navigates the common changes screen … it is quite obvious to get hundreds of thousands of SQL queries :-O

Luca.

Dave Borowitz

unread,
Apr 27, 2015, 12:51:31 AM4/27/15
to Luca Milanesio, repo-discuss
Yeah, that's the standard set of options for search results that aren't the user dashboard. So there's some computation in ChangeJson that isn't managing to use the stored fields.

Thanks for pointing this out, I will try to take a look at it this week.

Luca Milanesio

unread,
Apr 27, 2015, 2:43:30 AM4/27/15
to Dave Borowitz, repo-discuss
Hi Dave,
thanks for your feedback and for looking into it.

In a nutshell then we shouldn’t need any cache as in theory SQL access shouldn’t be triggered at all for the home changes screen, correct?

Luca.

lucamilanesio

unread,
May 5, 2015, 3:32:29 AM5/5/15
to repo-d...@googlegroups.com, dbor...@google.com
Hi Dave,
did you have any chance to look at the Lucene usage on the home screen?

Luca.

--
--
To unsubscribe, email repo-discuss+unsubscribe@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss+unsubscribe@googlegroups.com.

lucamilanesio

unread,
May 7, 2015, 3:35:21 AM5/7/15
to repo-d...@googlegroups.com, dbor...@google.com
Just an update of my stats from last week:
1,165,379 SQL queries for loading patch-sets by ID (!!!) - Taking 22% of time spent on DB
683,535 SQL queries for loading changes by ID (!!) - Taking 25% of the time spent on DB
223,424 SQL queries for loading the patch-sets associated to a change (!) - Taking 8% of the time spent on DB

I have noticed that we currently have a cache for changes (com.google.gerrit.server.git.ChangeCache) but is used only for the Git server part and is not indexed by ID.

As people are experiencing problems and increased stress on the DB (remember the discussion thread about the increase of thread pool when upgrading to 2.10.x ?) ... by temporarily caching changes and patch-sets when people are hitting their home screen it would reduce the load on the DB by 55% altogether, which is *HUGE*.

Is anyone interested in working on:
- Introducing a change/patch-set cache OR
- Avoiding the use of DB (using only Lucene indexes) for retrieving change/patch-set 

Luca.

Shawn Pearce

unread,
May 7, 2015, 12:48:05 PM5/7/15
to lucamilanesio, repo-discuss, David Borowitz
On Thu, May 7, 2015 at 12:35 AM, lucamilanesio <luca.mi...@gmail.com> wrote:
Just an update of my stats from last week:
1,165,379 SQL queries for loading patch-sets by ID (!!!) - Taking 22% of time spent on DB
683,535 SQL queries for loading changes by ID (!!) - Taking 25% of the time spent on DB
223,424 SQL queries for loading the patch-sets associated to a change (!) - Taking 8% of the time spent on DB

I have noticed that we currently have a cache for changes (com.google.gerrit.server.git.ChangeCache) but is used only for the Git server part and is not indexed by ID.

As people are experiencing problems and increased stress on the DB (remember the discussion thread about the increase of thread pool when upgrading to 2.10.x ?) ... by temporarily caching changes and patch-sets when people are hitting their home screen it would reduce the load on the DB by 55% altogether, which is *HUGE*.

Something is wrong. Users hitting their "home screen" are running 3 search results on the secondary index. None of those should be touching the database. They should be running on Lucene (or whatever the index is) and dumping data cached by the secondary index.

Are there complex submit rules on the projects in question?

Did you try doing a randomized log statement to record stack traces for e.g. 1% of database calls to see the Java stack trace at the server that is causing this traffic to the SQL database?


Is anyone interested in working on:
- Introducing a change/patch-set cache OR

We (Googlers) are loathe to add a new cache. We have actually been turning caches off as we are fighting with pauses caused by the Java GC being unable to keep up with the load on our servers.

- Avoiding the use of DB (using only Lucene indexes) for retrieving change/patch-set 

This is already supposed to be working. If its not, its a bug that needs to be fixed.

Dave Borowitz

unread,
May 7, 2015, 12:57:06 PM5/7/15
to Shawn Pearce, repo-discuss, luca milanesio

Shawn just said all the same things I did :)

I haven't had a chance to look yet but I will today. I was going to be doing some index plumbing work the next few days anyway.

Dave Borowitz

unread,
May 7, 2015, 2:25:36 PM5/7/15
to Shawn Pearce, repo-discuss, luca milanesio
I really should have been able to guess this without having to actually instrument it :)

        at com.google.gerrit.server.query.change.ChangeData.ensureCurrentPatchSetLoaded(ChangeData.java:137)
at com.google.gerrit.server.change.ChangeJson.formatQueryResults(ChangeJson.java:281)
at com.google.gerrit.server.query.change.QueryChanges.query0(QueryChanges.java:145)
at com.google.gerrit.server.query.change.QueryChanges.query(QueryChanges.java:132)
at com.google.gerrit.server.query.change.QueryChanges.apply(QueryChanges.java:99)
at com.google.gerrit.server.query.change.QueryChanges.apply(QueryChanges.java:1)
at com.google.gerrit.httpd.restapi.RestApiServlet.service(RestApiServlet.java:328)

Two points:
1. I don't think we need to load patch sets as aggressively.
2. The index work I was literally about to start on today includes storing patch sets in the index.

So I think this should be pretty easy to work around.

Dave Borowitz

unread,
May 7, 2015, 3:04:25 PM5/7/15
to Shawn Pearce, repo-discuss, luca milanesio
I have a topic to eliminate these. It was a bit more subtle than expected:

Thanks again for noticing, Luca.

David Pursehouse

unread,
May 8, 2015, 12:42:16 AM5/8/15
to Dave Borowitz, Shawn Pearce, repo-discuss, luca milanesio
On 05/08/2015 04:04 AM, 'Dave Borowitz' via Repo and Gerrit Discussion
wrote:
> I have a topic to eliminate these. It was a bit more subtle than expected:
> https://gerrit-review.googlesource.com/#/q/topic:dont-load-patch-sets
>

The topic name was changed:

https://gerrit-review.googlesource.com/#/q/topic:search-no-lazy-load
> <mailto:s...@google.com>> wrote:
>
> On Thu, May 7, 2015 at 12:35 AM, lucamilanesio
> <luca.mi...@gmail.com <mailto:luca.mi...@gmail.com>>
> --
> --
> To unsubscribe, email repo-discuss...@googlegroups.com
> More info at http://groups.google.com/group/repo-discuss?hl=en
>
> ---
> You received this message because you are subscribed to the Google
> Groups "Repo and Gerrit Discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to repo-discuss...@googlegroups.com
> <mailto:repo-discuss...@googlegroups.com>.

Luca Milanesio

unread,
May 8, 2015, 9:22:20 AM5/8/15
to repo-discuss
Thanks Dave, thank you all for looking into it :-)
It would be nice as suggested by David to get if ported as well to 2.10/2.11.

Luca.

David Pursehouse

unread,
May 8, 2015, 9:53:16 AM5/8/15
to Luca Milanesio, repo-discuss

On 8 May 2015 22:22, "Luca Milanesio" <luca.mi...@gmail.com> wrote:
>
> Thanks Dave, thank you all for looking into it :-)
> It would be nice as suggested by David to get if ported as well to 2.10/2.11.
>

I've already cherrypicked the series to stable-2.11

> To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

Luca Milanesio

unread,
May 8, 2015, 9:59:30 AM5/8/15
to David Pursehouse, repo-discuss
Is the 2.10 affected as well? Or is our policy just to cherry pick on the version-1?

Luca.

Saša Živkov

unread,
May 8, 2015, 11:26:02 AM5/8/15
to Luca Milanesio, David Pursehouse, repo-discuss
On Fri, May 8, 2015 at 3:59 PM, Luca Milanesio <luca.mi...@gmail.com> wrote:
Is the 2.10 affected as well?

I just debugged it and can confirm that the 2.10 is indeed affected.
 
Or is our policy just to cherry pick on the version-1?

I can produce another 2.10.
The 4 changes are small and I guess it will not be too difficult to cherry-pick them to the stable-2.10 too.

Saša Živkov

unread,
May 12, 2015, 5:43:30 AM5/12/15
to Luca Milanesio, David Pursehouse, repo-discuss
On Fri, May 8, 2015 at 5:25 PM, Saša Živkov <ziv...@gmail.com> wrote:


On Fri, May 8, 2015 at 3:59 PM, Luca Milanesio <luca.mi...@gmail.com> wrote:
Is the 2.10 affected as well?

I just debugged it and can confirm that the 2.10 is indeed affected.
 
Or is our policy just to cherry pick on the version-1?

I can produce another 2.10.
The 4 changes are small and I guess it will not be too difficult to cherry-pick them to the stable-2.10 too.
 
It turns out that cherry-picking these series on stable-2.10 is more complex than I expected.
There are many hard to follow/resolve conflicts... which likely means that many other commits
need also to be cherry-picked... So I am not sure if the effort is justified for the stable-2.10.
How many 2.10 users are really affected by this issue?

Luca Milanesio

unread,
May 12, 2015, 6:51:18 AM5/12/15
to Saša Živkov, David Pursehouse, repo-discuss
See my short answer in-line :-)

Sent from my iPhone

On 12 May 2015, at 10:43, Saša Živkov <ziv...@gmail.com> wrote:



On Fri, May 8, 2015 at 5:25 PM, Saša Živkov <ziv...@gmail.com> wrote:


On Fri, May 8, 2015 at 3:59 PM, Luca Milanesio <luca.mi...@gmail.com> wrote:
Is the 2.10 affected as well?

I just debugged it and can confirm that the 2.10 is indeed affected.
 
Or is our policy just to cherry pick on the version-1?

I can produce another 2.10.
The 4 changes are small and I guess it will not be too difficult to cherry-pick them to the stable-2.10 too.
 
It turns out that cherry-picking these series on stable-2.10 is more complex than I expected.
There are many hard to follow/resolve conflicts... which likely means that many other commits
need also to be cherry-picked... So I am not sure if the effort is justified for the stable-2.10.
How many 2.10 users are really affected by this issue?

Everyone :-) ... How many can still live with the problem? Most of them by they'll complain of the slowdown (if DB is not fast enough)

Saša Živkov

unread,
May 12, 2015, 8:45:08 AM5/12/15
to Luca Milanesio, David Pursehouse, repo-discuss
On Tue, May 12, 2015 at 12:51 PM, Luca Milanesio <luca.mi...@gmail.com> wrote:
See my short answer in-line :-)

Sent from my iPhone

On 12 May 2015, at 10:43, Saša Živkov <ziv...@gmail.com> wrote:



On Fri, May 8, 2015 at 5:25 PM, Saša Živkov <ziv...@gmail.com> wrote:


On Fri, May 8, 2015 at 3:59 PM, Luca Milanesio <luca.mi...@gmail.com> wrote:
Is the 2.10 affected as well?

I just debugged it and can confirm that the 2.10 is indeed affected.
 
Or is our policy just to cherry pick on the version-1?

I can produce another 2.10.
The 4 changes are small and I guess it will not be too difficult to cherry-pick them to the stable-2.10 too.
 
It turns out that cherry-picking these series on stable-2.10 is more complex than I expected.
There are many hard to follow/resolve conflicts... which likely means that many other commits
need also to be cherry-picked... So I am not sure if the effort is justified for the stable-2.10.
How many 2.10 users are really affected by this issue?

Everyone :-) ... How many can still live with the problem? Most of them by they'll complain of the slowdown (if DB is not fast enough)

I now also checked the 2.9.4 and the same issue is also present there.
So, this is not something introduced with 2.10 but probably first observed on 2.10.
I guess this means that 2.10 is not slower than 2.9? Why no one has this issue with 2.9?

Luca Milanesio

unread,
May 12, 2015, 8:46:55 AM5/12/15
to Saša Živkov, David Pursehouse, repo-discuss
Actually one of our clients is experiencing as well on 2.9 :-)
However, we cannot fix the world and if people lived with it until now … it would be then the improved performance will be one more reason to upgrade to 2.11 :-)

Let’s not bother then too much ;-)

Luca.
Reply all
Reply to author
Forward
0 new messages