Lots of DB SQL Queries on Gerrit changes home page

329 views
Skip to first unread message

lucamilanesio

unread,
Nov 23, 2015, 9:02:06 AM11/23/15
to Repo and Gerrit Discussion
Hi Gerrit devs,
we are running on GerritHub.io the "latest and greatest" of Gerrit master (2.12-rc0-247-g6a86b1b) but I still see quite a lot of SQL queries when an anonymous user is accessing https://review.gerrithub.io.

It all seems caused by this REST API execution:
curl 'https://review.gerrithub.io/changes/?n=25&O=81' -H 'Pragma: no-cache' -H 'Accept: application/json' -H 'Referer: https://review.gerrithub.io/' -H 'Cache-Control: no-cache'

I was hoping that after all the fixes made on master in improving the use of Gerrit caches ... the # of generated SQL queries would have gone down.
It actually did go down but not enough :-(

That specific HTTP REST API is actually still responsible for the 87% of our HTTP latency :-O
My understanding is that the list of open changes should be 100% resolved using Lucene Indexes, but I am possibly wrong :-(

The "top" SQL queries are:

SELECT T.change_key,T.created_on,T.last_updated_on,T.owner_account_id,T.dest_project_name,T.dest_branch_name,T.status,T.current_patch_set_id,T.subject,T.topic,T.original_subject,T.submission_id,T.row_version,T.change_id FROM changes T WHERE T.change_id=?

Is this SQL query triggered by the "O=81" on the query string? Is this the reason why we are triggering one SQL query for each change we find in the index? Do we really need to call from GWT UX the "O=81" for anonymous users? Can we just display a bit less and save CPU time?

Any feedback / experience on this topic is highly appreciated :-)

Luca.




Saša Živkov

unread,
Nov 23, 2015, 9:22:25 AM11/23/15
to lucamilanesio, Repo and Gerrit Discussion
Looking at the QueryChanges, the "-O" flag means:

  @Option(name = "-O", usage = "Output option flags, in hex") 

81 hex == 1000 0001, which means that we are setting the LABELS | DETAILED_ACCOUNTS options with O=81.

Looks like the DETAILED_ACCOUNTS option requires loading of each change in order to display
full name of the change owner.



Any feedback / experience on this topic is highly appreciated :-)

Luca.




--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Luca Milanesio

unread,
Nov 23, 2015, 10:23:52 AM11/23/15
to Saša Živkov, Repo and Gerrit Discussion
Thanks Saša, 
that really explains the reason behind the SQL queries.

Do we really need the DETAILED_ACCOUNTS for rendering the listing?

Luca.

lucamilanesio

unread,
Nov 23, 2015, 10:53:10 AM11/23/15
to Repo and Gerrit Discussion, ziv...@gmail.com
Will try to disable it and see if it still works :-)

Luca.

-- 
-- 
To unsubscribe, email repo-discuss+unsub...@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

--- 
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss+unsubscribe@googlegroups.com.

David Ostrovsky

unread,
Nov 23, 2015, 2:57:28 PM11/23/15
to Repo and Gerrit Discussion

Am Montag, 23. November 2015 15:02:06 UTC+1 schrieb lucamilanesio:
Hi Gerrit devs,
we are running on GerritHub.io the "latest and greatest" of Gerrit master (2.12-rc0-247-g6a86b1b) but I still see quite a lot of SQL queries when an anonymous user is accessing https://review.gerrithub.io.

It all seems caused by this REST API execution:
curl 'https://review.gerrithub.io/changes/?n=25&O=81' -H 'Pragma: no-cache' -H 'Accept: application/json' -H 'Referer: https://review.gerrithub.io/' -H 'Cache-Control: no-cache'


How you correlate this REST API call with SQL query? Javamelody plugin?
 
I was hoping that after all the fixes made on master in improving the use of Gerrit caches ... the # of generated SQL queries would have gone down.
It actually did go down but not enough :-(

That specific HTTP REST API is actually still responsible for the 87% of our HTTP latency :-O
My understanding is that the list of open changes should be 100% resolved using Lucene Indexes, but I am possibly wrong :-(


No. You are correct. We have even unit test that verifies that this
query doesn't touch the database: [1]. I just re-checked on most
recent master (v2.12-rc0-264-g6f7ccf5). It works here as expected.

The "top" SQL queries are:

SELECT T.change_key,T.created_on,T.last_updated_on,T.owner_account_id,T.dest_project_name,T.dest_branch_name,T.status,T.current_patch_set_id,T.subject,T.topic,T.original_subject,T.submission_id,T.row_version,T.change_id FROM changes T WHERE T.change_id=?

The better question is to corelate the top SQL query with the actual action that triggers it.
I think that database is not touched when list of changes is rendered. At least not on most
recent master.


lucamilanesio

unread,
Nov 25, 2015, 6:36:24 PM11/25/15
to Repo and Gerrit Discussion
Hi David, see my feedback below.


On Monday, November 23, 2015 at 7:57:28 PM UTC, David Ostrovsky wrote:

Am Montag, 23. November 2015 15:02:06 UTC+1 schrieb lucamilanesio:
Hi Gerrit devs,
we are running on GerritHub.io the "latest and greatest" of Gerrit master (2.12-rc0-247-g6a86b1b) but I still see quite a lot of SQL queries when an anonymous user is accessing https://review.gerrithub.io.

It all seems caused by this REST API execution:
curl 'https://review.gerrithub.io/changes/?n=25&O=81' -H 'Pragma: no-cache' -H 'Accept: application/json' -H 'Referer: https://review.gerrithub.io/' -H 'Cache-Control: no-cache'


How you correlate this REST API call with SQL query? Javamelody plugin?

Yes, and it lists the SQL queries executed on the /changes/ URI invocation.
 
 
I was hoping that after all the fixes made on master in improving the use of Gerrit caches ... the # of generated SQL queries would have gone down.
It actually did go down but not enough :-(

That specific HTTP REST API is actually still responsible for the 87% of our HTTP latency :-O
My understanding is that the list of open changes should be 100% resolved using Lucene Indexes, but I am possibly wrong :-(


No. You are correct. We have even unit test that verifies that this
query doesn't touch the database: [1]. I just re-checked on most
recent master (v2.12-rc0-264-g6f7ccf5). It works here as expected.

However I've seen this code fragment in ChangeData.java:

  public static void ensureChangeLoaded(Iterable<ChangeData> changes)

      throws OrmException {

    ChangeData first = Iterables.getFirst(changes, null);

    if (first == null) {

      return;

    } else if (first.notesMigration.readChanges()) {

      for (ChangeData cd : changes) {

        cd.change();

      }

    }


    Map<Change.Id, ChangeData> missing = Maps.newHashMap();

    for (ChangeData cd : changes) {

      if (cd.change == null) {

        missing.put(cd.getId(), cd);

      }

    }

    if (missing.isEmpty()) {

      return;

    }

    for (Change change : first.db.changes().get(missing.keySet())) {

      missing.get(change.getId()).change = change;

    }

  }


It seems that if ChangeData.change is null (how can it be? Can Lucene return a null object?) then the changes with that corresponding IDs are loaded from the DB.


Is it possible that I have problems on my Lucene Index? Is there any situation where the Lucene index returns the ID but not the object?

 

The "top" SQL queries are:

SELECT T.change_key,T.created_on,T.last_updated_on,T.owner_account_id,T.dest_project_name,T.dest_branch_name,T.status,T.current_patch_set_id,T.subject,T.topic,T.original_subject,T.submission_id,T.row_version,T.change_id FROM changes T WHERE T.change_id=?

The better question is to corelate the top SQL query with the actual action that triggers it.
I think that database is not touched when list of changes is rendered. At least not on most
recent master.



One thing is the integration test, execute with simulated mocks and data ... another thing is real life production with real data.

There are still lots of places in ChangeData where the DB is accessed in the methods invoked during the /changes/ API call, it could possibly be that those conditions are not tested yet in any integration test.

Luca. 

David Ostrovsky

unread,
Dec 2, 2015, 10:31:16 AM12/2/15
to Repo and Gerrit Discussion

Am Donnerstag, 26. November 2015 00:36:24 UTC+1 schrieb lucamilanesio:
Hi David, see my feedback below.

On Monday, November 23, 2015 at 7:57:28 PM UTC, David Ostrovsky wrote:

Am Montag, 23. November 2015 15:02:06 UTC+1 schrieb lucamilanesio:
Hi Gerrit devs,
we are running on GerritHub.io the "latest and greatest" of Gerrit master (2.12-rc0-247-g6a86b1b) but I still see quite a lot of SQL queries when an anonymous user is accessing https://review.gerrithub.io.

It all seems caused by this REST API execution:
curl 'https://review.gerrithub.io/changes/?n=25&O=81' -H 'Pragma: no-cache' -H 'Accept: application/json' -H 'Referer: https://review.gerrithub.io/' -H 'Cache-Control: no-cache'


How you correlate this REST API call with SQL query? Javamelody plugin?

Yes, and it lists the SQL queries executed on the /changes/ URI invocation.

Can you reproduce it on staging instance? Can you attach a debugger to it?
As i said in my previous comment, i debugged my test instance, and I haven't
seen any database operations. The last option would be to instrument the code
and deploy it on your production instance. Dave did something similar here: [1].



Luca Milanesio

unread,
Dec 2, 2015, 10:36:22 AM12/2/15
to David Ostrovsky, Repo and Gerrit Discussion
Hi David,
I have some more background: if I try to reproduce the problem with an *identical* version of Gerrit with a brand-new index and changes ... the problem does not happen.

I am trying now to add piece by piece to make it "similar" to the production environment:
- adding plugins
- adding changes
- reindexing

The investigation continues ... but once resolved will definitely relieve our production servers from over 80% of their load over HTTP :-D

Luca.

--
--
To unsubscribe, email repo-discuss...@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

lucamilanesio

unread,
Dec 15, 2015, 7:02:51 PM12/15/15
to Repo and Gerrit Discussion, david.o...@gmail.com
I'm starting to get some clarity on the extra queries generated:

Top#1 - Starred changes

When an authenticated user go to the Gerrit home page, its list of starred changes isn't indexed (understandable) and isn't cached :-( 

Top#2 - Draft changes

Draft changes aren't indexed (why?) and not cached either :-(

... will keep on investigating ...

Luca.

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss+unsubscribe@googlegroups.com.

lucamilanesio

unread,
Dec 16, 2015, 4:36:10 AM12/16/15
to Repo and Gerrit Discussion, david.o...@gmail.com
Forgot to include the SQL statements:

#1 - Starred changes:
SELECT T.account_id,T.change_id FROM starred_changes T WHERE T.account_id=?

#2 - Draft changes:
SELECT T.line_nbr,T.author_id,T.written_on,T.status,T.side,T.message,T.parent_uuid,T.range_start_line,T.range_start_character,T.range_end_line,T.range_end_character,T.change_id,T.patch_set_id,T.file_name,T.uuid FROM patch_comments T WHERE T.status='d' AND T.author_id=?

Those two queries are always triggered when someone (authenticated) is visiting the Gerrit home page ... which (specifically for #2) sometimes comes at a quite expensive price (max execution time is over 300 msec).

I was wondering why the draft changes aren't indexed, can anyone clarify?

Luca.

David Ostrovsky

unread,
Dec 16, 2015, 5:10:15 AM12/16/15
to Repo and Gerrit Discussion

Am Mittwoch, 16. Dezember 2015 10:36:10 UTC+1 schrieb lucamilanesio:
Forgot to include the SQL statements:

#1 - Starred changes:
SELECT T.account_id,T.change_id FROM starred_changes T WHERE T.account_id=?


This is a known problem. Edwin's recent work should improve it.

#2 - Draft changes:
SELECT T.line_nbr,T.author_id,T.written_on,T.status,T.side,T.message,T.parent_uuid,T.range_start_line,T.range_start_character,T.range_end_line,T.range_end_character,T.change_id,T.patch_set_id,T.file_name,T.uuid FROM patch_comments T WHERE T.status='d' AND T.author_id=?


This query is not for draft changes but for draft comments. Sounds like a bug
to me. Why should draft comments be retrieved when change list screen is
rendered. Can you provide a stack trace where this happens?

Luca Milanesio

unread,
Dec 16, 2015, 5:19:11 AM12/16/15
to David Ostrovsky, Repo and Gerrit Discussion
Hi David,
... gosh my sleep deprivation starts impacting my communication skills :-)

Yes, it is the draft comments the query, not changes :-)
Let me see if I can manage to get some call stack ...

More queries to come ... when you customise the fields you want to see on the change screen, situation gets much worse ! :-O

P.S. I'm so interested in getting this problems sorted out ... as they are responsible for over the 80% of the HTTP load time :-)

Luca.

--
--
To unsubscribe, email repo-discuss...@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

Luca Milanesio

unread,
Dec 16, 2015, 6:22:43 AM12/16/15
to David Ostrovsky, Repo and Gerrit Discussion
Hi David,
the draft query is possibly triggered by the use of "draftBy:<user>" search operator.

See the code below:

  @Override
  public boolean match(final ChangeData object) throws OrmException {
    return !args.plcUtil
        .draftByChangeAuthor(args.db.get(), object.notes(), accountId)
        .isEmpty();
  }

If a user defines this query as its own "home page" on Gerrit ... my poor DB will be hammered every time he comes to his home page.
Does not seem a bug ... but rather a missing possible optimisation.

The query is for filtering the changes that have draft comments on them.

Luca.

Luca Milanesio

unread,
Dec 16, 2015, 10:54:16 AM12/16/15
to David Ostrovsky, Repo and Gerrit Discussion
I am getting a clearer overall picture now: it seems that our change index is not used in a lot of situations, depending on the query string.

In particular looking at today's stats, my top query has become since midnight:

#1 => 45,648 hits 
#2 => 19,405 hits

I thought (naively) that our Lucene code were resolving the predicate using the data present in the index ... and I was *actually wrong* !
Apart from basic queries, most of them are still going to the DB :-(

If you think that in GerritHub.io we only have 45,415 changes, in only one day we select *ALL* of them one by one (and more) by change-id, even though all of them are in the Lucene index.
IMHO this isn't an effective use of our indexes, we can do much better than that :-)

Think about query #2: we are retrieving *ALL* the patch-sets of a change and sort them by patch-set-id ... from a search query? Why should we do that?

I am trying now to dig into the HTTP logs to get all the different types of Gerrit queries that are performed in one day, so to identify the ones that are triggering a DB query instead of a Lucene Index lookup.

... will keep the mailing list posted :-)

Luca.

Luca Milanesio

unread,
Dec 16, 2015, 11:28:07 AM12/16/15
to David Ostrovsky, Repo and Gerrit Discussion
I am digging down now on the query killer #1 and I think I've found a point where it gets triggered *a lot* of times:

    public ChangeControl controlFor(Change.Id changeId, CurrentUser user)
        throws NoSuchChangeException, OrmException {
      Change change = db.get().changes().get(changeId);
      if (change == null) {
        throw new NoSuchChangeException(changeId);
      }
      return controlFor(change, user);
    }

Whenever you need to check if a user has permissions to see a change ... we ruthlessly hammer the poor DB asking over and over and over again to fetch the FULL record of a change.
I would like to change that piece of code and use a "changes wrapper" that before going to the DB just checks if that change can be possibly be found on Lucene and thus avoid hammering down.

Does anybody know WHY are we going down the DB without asking Lucense? Is there any problem you can foresee in going the index-way?

Luca.

Luca Milanesio

unread,
Dec 16, 2015, 12:02:09 PM12/16/15
to David Ostrovsky, Repo and Gerrit Discussion
There is another likely suspect and is called "ensureChangeLoaded":

  public static void ensureChangeLoaded(Iterable<ChangeData> changes)
      throws OrmException {
    ChangeData first = Iterables.getFirst(changes, null);
    if (first == null) {
      return;
    } else if (first.notesMigration.readChanges()) {
      for (ChangeData cd : changes) {
        cd.change();
      }
    }

    Map<Change.Id, ChangeData> missing = Maps.newHashMap();
    for (ChangeData cd : changes) {
      if (cd.change == null) {
        missing.put(cd.getId(), cd);
      }
    }
    if (missing.isEmpty()) {
      return;
    }
    for (Change change : first.db.changes().get(missing.keySet())) {
      missing.get(change.getId()).change = change;
    }
  }

This is actually called in two situations:

a) before rendering the Query results
b) when a query string with other predicates needs to be evaluated

In the above case, even if Lucene has actually found the entries and they have been read ... they are retrieved (for some reason) again from the DB.
Was this made for recovering index inconsistencies and "fix-them-on-the-fly" with the DB data?

Luca.

Luca Milanesio

unread,
Dec 16, 2015, 12:58:23 PM12/16/15
to David Ostrovsky, Dave Borowitz, Repo and Gerrit Discussion
... and now getting even closer! See the code below:

  private ChangeData toChangeData(Document doc, Set<String> fields,
      String idFieldName) {
    ChangeData cd;
    // Either change or the ID field was guaranteed to be included in the call
    // to fields() above.
    BytesRef cb = doc.getBinaryValue(CHANGE_FIELD);
    if (cb != null) {
      cd = changeDataFactory.create(db.get(),
          ChangeProtoField.CODEC.decode(cb.bytes, cb.offset, cb.length));
    } else {
      int id = doc.getField(idFieldName).numericValue().intValue();
      cd = changeDataFactory.create(db.get(), new Change.Id(id));
    }

Even if the change is indexed BUT the CHANGE_FIELD binary value (protobuf-encoded value of the ChangeData object) is null ... the ChangeData object gets returned by the Lucene index with ONLY the reference to the change (change-id) but not the data.

This then triggers a terrible shooting of SQL queries on the underlying DB for every change returned by the Lucene index!!!
So ... if this is really my case, introducing a "lookup on Lucene" logic in front of the Change query layer wouldn't help at all.

@Dave as you are the Lucene expert here :-) Do you know any reason why the CHANGE_FIELD in Lucene could be null? Is there a way to analyse at low level the index files and understand what's going on?

In the meantime I am going to submit a patch to log at WARN level such situations as they should be noticed and investigated by the Gerrit sysadmin IMHO.

Luca.

lucamilanesio

unread,
Dec 16, 2015, 1:11:19 PM12/16/15
to Repo and Gerrit Discussion, david.o...@gmail.com, dbor...@google.com
Pushed [2] with some more warnings when this problem happens. Will try to roll-out this on GerritHub.io and see the outcome :-)

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss+unsubscribe@googlegroups.com.

David Ostrovsky

unread,
Dec 17, 2015, 2:04:05 AM12/17/15
to Repo and Gerrit Discussion

On Wednesday, December 16, 2015 at 5:28:07 PM UTC+1, lucamilanesio wrote:
I am digging down now on the query killer #1 and I think I've found a point where it gets triggered *a lot* of times:

@Luca, again, do you think, you have found the reason for database queries?
Do you have a proof in form of stack trace for us? As pointed out in my previous
comment, if you have problem to isolate the place, where the queries are triggered,
why don't we instrument gerrit or gwtorm (or both) to dump the stack traces? Let's
not assume any broken places in the code, but only talk about facts (stack traces).

Luca Milanesio

unread,
Dec 17, 2015, 3:18:12 AM12/17/15
to David Ostrovsky, Repo and Gerrit Discussion
Hi David,
I can't possibly generate 41K stack-traces in production, without shooting myself in the foot :-)
You know that generating a stack-trace is quite resource intensive :-O

That's why I started contributing some initial "warning" logs on the most suspicious places and after that I could start tracking down the function calls by reproducing the same cases in a test system.

As usual ... will keep the mailing list posted on the progress of my findings ...

Luca.

Luca Milanesio

unread,
Dec 17, 2015, 3:41:20 AM12/17/15
to David Ostrovsky, Repo and Gerrit Discussion
Just a picture of the past 24h of the SQL hits per minute:


You may notice the 4500 sql calls / minute (75 SQL queries per second) around 7 AM CET, and there wasn't so much traffic going to the site at the moment.
Can't generate 4500 stack traces in production :-(

I have to proceed with code-inspection first ...

Luca.

Matthias Sohn

unread,
Dec 17, 2015, 4:16:34 AM12/17/15
to Luca Milanesio, David Ostrovsky, Repo and Gerrit Discussion
On Thu, Dec 17, 2015 at 9:41 AM, Luca Milanesio <luca.mi...@gmail.com> wrote:
Just a picture of the past 24h of the SQL hits per minute:


You may notice the 4500 sql calls / minute (75 SQL queries per second) around 7 AM CET, and there wasn't so much traffic going to the site at the moment.
Can't generate 4500 stack traces in production :-(

I have to proceed with code-inspection first ...

you could log stack-traces for a small fraction of the requests in order to reduce the overhead.
At this high rate of db queries you should still get enough information from these samples.

-Matthias

Luca Milanesio

unread,
Dec 17, 2015, 4:21:39 AM12/17/15
to Matthias Sohn, David Ostrovsky, Repo and Gerrit Discussion
True, possibly ONLY the ones coming from the changes table.
Let's say one in a thousand would be good enough :-)

Luca.

On 17 Dec 2015, at 09:16, Matthias Sohn <matthi...@gmail.com> wrote:

On Thu, Dec 17, 2015 at 9:41 AM, Luca Milanesio <luca.mi...@gmail.com> wrote:
Just a picture of the past 24h of the SQL hits per minute:

<Screen Shot 2015-12-17 at 08.36.56.png>

You may notice the 4500 sql calls / minute (75 SQL queries per second) around 7 AM CET, and there wasn't so much traffic going to the site at the moment.
Can't generate 4500 stack traces in production :-(

I have to proceed with code-inspection first ...

you could log stack-traces for a small fraction of the requests in order to reduce the overhead.
At this high rate of db queries you should still get enough information from these samples.

-Matthias

Luca Milanesio

unread,
Dec 17, 2015, 4:40:50 AM12/17/15
to Matthias Sohn, David Ostrovsky, Repo and Gerrit Discussion
Saw the work of starred changes cache started by Edwin [1], why not indexing them as well?

Luca Milanesio

unread,
Dec 17, 2015, 5:50:02 AM12/17/15
to Matthias Sohn, David Ostrovsky, Repo and Gerrit Discussion
... instrumenting now ... will post laster today the stack-traces so that we can comment on "facts" rather than code-inspection-based guesses :-)

Luca.

On 17 Dec 2015, at 09:21, Luca Milanesio <Luca.Mi...@gmail.com> wrote:

David Ostrovsky

unread,
Dec 17, 2015, 5:53:33 AM12/17/15
to Repo and Gerrit Discussion

Am Donnerstag, 17. Dezember 2015 11:50:02 UTC+1 schrieb lucamilanesio:
... instrumenting now ... will post laster today the stack-traces so that we can comment on "facts" rather than code-inspection-based guesses :-)


+1. Can't wait to see the "facts" ;-)

Luca Milanesio

unread,
Dec 17, 2015, 6:51:41 AM12/17/15
to David Ostrovsky, Repo and Gerrit Discussion
Stack-traces coming ... there are actuall 10 different scenarios where a Gerrit query (which isn't supposed to be resolved on the DB) triggers a Get change by ID query on the DB:

#1 - First conflicts query 

>> com.google.gerrit.server.query.change.ChangeQueryBuilder.parseChange(ChangeQueryBuilder.java:975) >> com.google.gerrit.server.query.change.ChangeQueryBuilder.conflicts(ChangeQueryBuilder.java:479) >> sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source) >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> java.lang.reflect.Method.invoke(Method.java:606) >> com.google.gerrit.server.query.QueryBuilder$ReflectionFactory.create(QueryBuilder.java:352) >> com.google.gerrit.server.query.QueryBuilder.operator(QueryBuilder.java:273) >> com.google.gerrit.server.query.QueryBuilder.operator(QueryBuilder.java:258) >> com.google.gerrit.server.query.QueryBuilder.toPredicate(QueryBuilder.java:227) >> com.google.gerrit.server.query.QueryBuilder.children(QueryBuilder.java:311) >> com.google.gerrit.server.query.QueryBuilder.toPredicate(QueryBuilder.java:217) >> com.google.gerrit.server.query.QueryBuilder.parse(QueryBuilder.java:189) >> com.google.gerrit.server.query.QueryBuilder.parse(QueryBuilder.java:208) >> com.google.gerrit.server.query.change.QueryChanges.query0(QueryChanges.java:144) >> com.google.gerrit.server.query.change.QueryChanges.query(QueryChanges.java:133) >> com.google.gerrit.server.query.change.QueryChanges.apply(QueryChanges.java:99) >> com.google.gerrit.server.query.change.QueryChanges.apply(QueryChanges.java:40) >> com.google.gerrit.httpd.restapi.RestApiServlet.service(RestApiServlet.java:329)

#2 - Access control on changes

>> com.google.gerrit.server.project.ChangeControl$GenericFactory.controlFor(ChangeControl.java:60) >> com.google.gerrit.server.ChangeUtil.findChanges(ChangeUtil.java:350) >> com.google.gerrit.server.change.ChangesCollection.parse(ChangesCollection.java:88) >> com.google.gerrit.server.change.ChangesCollection.parse(ChangesCollection.java:42) >> com.google.gerrit.httpd.restapi.RestApiServlet.service(RestApiServlet.java:240) >> javax.servlet.http.HttpServlet.service(HttpServlet.java:729) >> com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287) >> com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277) >> com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182)

#3 - Another conflicts query 

>> com.google.gerrit.server.query.change.ChangeQueryBuilder.parseChange(ChangeQueryBuilder.java:975) >> com.google.gerrit.server.query.change.ChangeQueryBuilder.conflicts(ChangeQueryBuilder.java:479) >> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> java.lang.reflect.Method.invoke(Method.java:606) >> com.google.gerrit.server.query.QueryBuilder$ReflectionFactory.create(QueryBuilder.java:352) >> com.google.gerrit.server.query.QueryBuilder.operator(QueryBuilder.java:273) >> com.google.gerrit.server.query.QueryBuilder.operator(QueryBuilder.java:258) >> com.google.gerrit.server.query.QueryBuilder.toPredicate(QueryBuilder.java:227) >> com.google.gerrit.server.query.QueryBuilder.children(QueryBuilder.java:311) >> com.google.gerrit.server.query.QueryBuilder.toPredicate(QueryBuilder.java:217) >> com.google.gerrit.server.query.QueryBuilder.parse(QueryBuilder.java:189) >> com.google.gerrit.server.query.QueryBuilder.parse(QueryBuilder.java:208) >> com.google.gerrit.server.query.change.QueryChanges.query0(QueryChanges.java:144) >> com.google.gerrit.server.query.change.QueryChanges.query(QueryChanges.java:133) >> com.google.gerrit.server.query.change.QueryChanges.apply(QueryChanges.java:99) >> com.google.gerrit.server.query.change.QueryChanges.apply(QueryChanges.java:40) >> com.google.gerrit.httpd.restapi.RestApiServlet.service(RestApiServlet.java:329) 

... the list is long, but we can start from here :-)

As I said ... all generated by Gerrit queries, I have the list as well, which is long as it is taken from our production logs.

Luca.

Luca Milanesio

unread,
Dec 17, 2015, 7:14:52 AM12/17/15
to David Ostrovsky, Repo and Gerrit Discussion
My first comments and ideas of improvements (for facilitating the discussion):

On 17 Dec 2015, at 11:51, Luca Milanesio <Luca.Mi...@gmail.com> wrote:

Stack-traces coming ... there are actuall 10 different scenarios where a Gerrit query (which isn't supposed to be resolved on the DB) triggers a Get change by ID query on the DB:

#1 - First conflicts query 

>> com.google.gerrit.server.query.change.ChangeQueryBuilder.parseChange(ChangeQueryBuilder.java:975) >> com.google.gerrit.server.query.change.ChangeQueryBuilder.conflicts(ChangeQueryBuilder.java:479) >> sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source) >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> java.lang.reflect.Method.invoke(Method.java:606) >> com.google.gerrit.server.query.QueryBuilder$ReflectionFactory.create(QueryBuilder.java:352) >> com.google.gerrit.server.query.QueryBuilder.operator(QueryBuilder.java:273) >> com.google.gerrit.server.query.QueryBuilder.operator(QueryBuilder.java:258) >> com.google.gerrit.server.query.QueryBuilder.toPredicate(QueryBuilder.java:227) >> com.google.gerrit.server.query.QueryBuilder.children(QueryBuilder.java:311) >> com.google.gerrit.server.query.QueryBuilder.toPredicate(QueryBuilder.java:217) >> com.google.gerrit.server.query.QueryBuilder.parse(QueryBuilder.java:189) >> com.google.gerrit.server.query.QueryBuilder.parse(QueryBuilder.java:208) >> com.google.gerrit.server.query.change.QueryChanges.query0(QueryChanges.java:144) >> com.google.gerrit.server.query.change.QueryChanges.query(QueryChanges.java:133) >> com.google.gerrit.server.query.change.QueryChanges.apply(QueryChanges.java:99) >> com.google.gerrit.server.query.change.QueryChanges.apply(QueryChanges.java:40) >> com.google.gerrit.httpd.restapi.RestApiServlet.service(RestApiServlet.java:329)

We could easily get the change data from the index here, without having to go to the DB.
The code is actually there but it seems that we are not indexing by the change number (PATH_LEGACY_ID)?

  private static final Pattern PAT_LEGACY_ID = Pattern.compile("^[1-9][0-9]*$");
  private static final Pattern PAT_CHANGE_ID = Pattern.compile("^[iI][0-9a-f]{4,}.*$");



#2 - Access control on changes

>> com.google.gerrit.server.project.ChangeControl$GenericFactory.controlFor(ChangeControl.java:60) >> com.google.gerrit.server.ChangeUtil.findChanges(ChangeUtil.java:350) >> com.google.gerrit.server.change.ChangesCollection.parse(ChangesCollection.java:88) >> com.google.gerrit.server.change.ChangesCollection.parse(ChangesCollection.java:42) >> com.google.gerrit.httpd.restapi.RestApiServlet.service(RestApiServlet.java:240) >> javax.servlet.http.HttpServlet.service(HttpServlet.java:729) >> com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287) >> com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277) >> com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182)

This is the one I have already found by code-inspection: again I can't see any reason why we couldn't get the same info from the Index without having to go to the DB.


#3 - Another conflicts query 

>> com.google.gerrit.server.query.change.ChangeQueryBuilder.parseChange(ChangeQueryBuilder.java:975) >> com.google.gerrit.server.query.change.ChangeQueryBuilder.conflicts(ChangeQueryBuilder.java:479) >> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> java.lang.reflect.Method.invoke(Method.java:606) >> com.google.gerrit.server.query.QueryBuilder$ReflectionFactory.create(QueryBuilder.java:352) >> com.google.gerrit.server.query.QueryBuilder.operator(QueryBuilder.java:273) >> com.google.gerrit.server.query.QueryBuilder.operator(QueryBuilder.java:258) >> com.google.gerrit.server.query.QueryBuilder.toPredicate(QueryBuilder.java:227) >> com.google.gerrit.server.query.QueryBuilder.children(QueryBuilder.java:311) >> com.google.gerrit.server.query.QueryBuilder.toPredicate(QueryBuilder.java:217) >> com.google.gerrit.server.query.QueryBuilder.parse(QueryBuilder.java:189) >> com.google.gerrit.server.query.QueryBuilder.parse(QueryBuilder.java:208) >> com.google.gerrit.server.query.change.QueryChanges.query0(QueryChanges.java:144) >> com.google.gerrit.server.query.change.QueryChanges.query(QueryChanges.java:133) >> com.google.gerrit.server.query.change.QueryChanges.apply(QueryChanges.java:99) >> com.google.gerrit.server.query.change.QueryChanges.apply(QueryChanges.java:40) >> com.google.gerrit.httpd.restapi.RestApiServlet.service(RestApiServlet.java:329) 

Seems the same as case #1, apologies for duplicating it :-)

Feedback is more than welcome !!!

Luca.

lucamilanesio

unread,
Dec 17, 2015, 7:36:11 AM12/17/15
to Repo and Gerrit Discussion, david.o...@gmail.com
Another stack-trace coming ...

#4 - Is starred

com.google.gerrit.server.query.change.IsStarredByPredicate.read(IsStarredByPredicate.java:70) >> com.google.gerrit.server.query.change.AndSource.readImpl(AndSource.java:115) >> com.google.gerrit.server.query.change.AndSource.read(AndSource.java:99) >> com.google.gerrit.server.query.change.QueryProcessor.queryChanges(QueryProcessor.java:186) >> com.google.gerrit.server.query.change.QueryProcessor.queryChanges(QueryProcessor.java:123) >> com.google.gerrit.server.query.change.QueryChanges.query0(QueryChanges.java:144) >> com.google.gerrit.server.query.change.QueryChanges.query(QueryChanges.java:133) >> com.google.gerrit.server.query.change.QueryChanges.apply(QueryChanges.java:99) >> com.google.gerrit.server.query.change.QueryChanges.apply(QueryChanges.java:40) >> com.google.gerrit.httpd.restapi.RestApiServlet.service(RestApiServlet.java:329)


#5 - Changes by project (when calculating the e-tag?)

com.google.gerrit.server.query.change.ChangeData.reloadChange(ChangeData.java:581) >> com.google.gerrit.server.query.change.ChangeData.change(ChangeData.java:571) >> com.google.gerrit.server.git.ChangeSet.changesByProject(ChangeSet.java:89) >> com.google.gerrit.server.git.MergeSuperSet.completeChangeSetWithoutTopic(MergeSuperSet.java:100) >> com.google.gerrit.server.git.MergeSuperSet.completeChangeSet(MergeSuperSet.java:91) >> com.google.gerrit.server.change.GetRevisionActions.getETag(GetRevisionActions.java:71) >> com.google.gerrit.server.change.GetRevisionActions.getETag(GetRevisionActions.java:39) >> com.google.gerrit.httpd.restapi.RestApiServlet.addResourceStateHeaders(RestApiServlet.java:489) >> com.google.gerrit.httpd.restapi.RestApiServlet.configureCaching(RestApiServlet.java:469) >> com.google.gerrit.httpd.restapi.RestApiServlet.service(RestApiServlet.java:345)


More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss+unsubscribe@googlegroups.com.

Luca Milanesio

unread,
Dec 17, 2015, 5:26:53 PM12/17/15
to Repo and Gerrit Discussion, david.o...@gmail.com
ping ... any feedback on the "facts" ?

More stack-traces of the intensive queries are coming in a few mins ...

Luca.

To unsubscribe, email repo-discuss...@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

David Ostrovsky

unread,
Dec 17, 2015, 5:32:25 PM12/17/15
to Repo and Gerrit Discussion, david.o...@gmail.com

On Thursday, December 17, 2015 at 11:26:53 PM UTC+1, lucamilanesio wrote:
ping ... any feedback on the "facts" ?

What would be very helpful, when you could correlate those stack traces
with the action on the UI. Surely we could optimize database usage in many
places, but your original statement (that i agreed with) was: the list changes
screen shouldn't perform 1 + n SQL anti pattern queries. When we (still) do
that, this is a serious bug.

Luca Milanesio

unread,
Dec 17, 2015, 5:41:24 PM12/17/15
to David Ostrovsky, Repo and Gerrit Discussion
Hi David,
I have a list of Gerrit queries that people have configured on their "My" home screen: the execution of those had the mentioned stack-traces as consequences.
At the moment the list is still too wide and I need to narrow them to the "minimal set".

Yes, next step is to identify "which query generated what" and I am planning to get to that point between today and tomorrow, so that we can agree on a set of fixes.

My surprise on the number of queries was that in a single day I have *more SQL queries* to fetch changes by ID on the DB than the overall number of changes in the DB itself.
That sounded strange to me as the index should have them already *and* the search should be (almost entirely) based on index.

For instance, now that we are approaching the end of the day in CET, the "select * from changes where change_id=?" has passed the 50K !!! Really to much DB, CPU and latency for nothing IMHO.
A simple cache or a better index utilisation can increase our server performance by a factor of 4x (at least!)

Luca.

Luca Milanesio

unread,
Dec 17, 2015, 6:51:03 PM12/17/15
to David Ostrovsky, Repo and Gerrit Discussion
I have found another stack-trace with a possibly very easy fix:

PatchLineCommentAccess.draftByAuthor(<userId>) -  > com.google.gerrit.server.PatchLineCommentsUtil.draftByAuthor(PatchLineCommentsUtil.java:275) > com.google.gerrit.server.query.change.HasDraftByPredicate.read(HasDraftByPredicate.java:54) > com.google.gerrit.server.query.change.AndSource.readImpl(AndSource.java:115) > com.google.gerrit.server.query.change.AndSource.read(AndSource.java:99) > com.google.gerrit.server.query.change.QueryProcessor.queryChanges(QueryProcessor.java:186) > com.google.gerrit.server.query.change.QueryProcessor.queryChanges(QueryProcessor.java:123) > com.google.gerrit.server.query.change.QueryChanges.query0(QueryChanges.java:144) > com.google.gerrit.server.query.change.QueryChanges.query(QueryChanges.java:133) > com.google.gerrit.server.query.change.QueryChanges.apply(QueryChanges.java:99) > com.google.gerrit.server.query.change.QueryChanges.apply(QueryChanges.java:40) > com.google.gerrit.httpd.restapi.RestApiServlet.service(RestApiServlet.java:329)

It seems that the "hasDraftBy" predicate is widely used but the list of drafts comments isn't cached anywhere and is loaded over and over again from the DB all the times.

The Gerrit query that triggers the above stack trace is typically something like:
/changes/?q=has:draft&n=25&O=81

The hit rate is not very high on GerritHub.io, around 10K calls/day, but the caching wouldn't occupy that much memory (most of the people don't leave draft comments opened) *and* will still save quite a few DB calls.

Luca.

David Ostrovsky

unread,
Dec 18, 2015, 2:34:47 AM12/18/15
to Repo and Gerrit Discussion

On Friday, December 18, 2015 at 12:51:03 AM UTC+1, lucamilanesio wrote:
I have found another stack-trace with a possibly very easy fix:

PatchLineCommentAccess.draftByAuthor(<userId>) -  > com.google.gerrit.server.PatchLineCommentsUtil.draftByAuthor(PatchLineCommentsUtil.java:275) > com.google.gerrit.server.query.change.HasDraftByPredicate.read(HasDraftByPredicate.java:54) > com.google.gerrit.server.query.change.AndSource.readImpl(AndSource.java:115) > com.google.gerrit.server.query.change.AndSource.read(AndSource.java:99) > com.google.gerrit.server.query.change.QueryProcessor.queryChanges(QueryProcessor.java:186) > com.google.gerrit.server.query.change.QueryProcessor.queryChanges(QueryProcessor.java:123) > com.google.gerrit.server.query.change.QueryChanges.query0(QueryChanges.java:144) > com.google.gerrit.server.query.change.QueryChanges.query(QueryChanges.java:133) > com.google.gerrit.server.query.change.QueryChanges.apply(QueryChanges.java:99) > com.google.gerrit.server.query.change.QueryChanges.apply(QueryChanges.java:40) > com.google.gerrit.httpd.restapi.RestApiServlet.service(RestApiServlet.java:329)

It seems that the "hasDraftBy" predicate is widely used but the list of drafts comments isn't cached anywhere and is loaded over and over again from the DB all the times.

The Gerrit query that triggers the above stack trace is typically something like:
/changes/?q=has:draft&n=25&O=81

The hit rate is not very high on GerritHub.io, around 10K calls/day, but the caching wouldn't occupy that much memory (most of the people don't leave draft comments opened) *and* will still save quite a few DB calls.


lucamilanesio

unread,
Dec 18, 2015, 5:11:49 AM12/18/15
to Repo and Gerrit Discussion
@David thanks for https://gerrit-review.googlesource.com/#/c/73470 that would save *A LOT* of SQL calls :-)

Dave Borowitz

unread,
Dec 18, 2015, 12:37:35 PM12/18/15
to Luca Milanesio, David Ostrovsky, Repo and Gerrit Discussion
On Thu, Dec 17, 2015 at 7:14 AM, Luca Milanesio <luca.mi...@gmail.com> wrote:
My first comments and ideas of improvements (for facilitating the discussion):

On 17 Dec 2015, at 11:51, Luca Milanesio <Luca.Mi...@gmail.com> wrote:

Stack-traces coming ... there are actuall 10 different scenarios where a Gerrit query (which isn't supposed to be resolved on the DB) triggers a Get change by ID query on the DB:

#1 - First conflicts query 

>> com.google.gerrit.server.query.change.ChangeQueryBuilder.parseChange(ChangeQueryBuilder.java:975) >> com.google.gerrit.server.query.change.ChangeQueryBuilder.conflicts(ChangeQueryBuilder.java:479) >> sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source) >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> java.lang.reflect.Method.invoke(Method.java:606) >> com.google.gerrit.server.query.QueryBuilder$ReflectionFactory.create(QueryBuilder.java:352) >> com.google.gerrit.server.query.QueryBuilder.operator(QueryBuilder.java:273) >> com.google.gerrit.server.query.QueryBuilder.operator(QueryBuilder.java:258) >> com.google.gerrit.server.query.QueryBuilder.toPredicate(QueryBuilder.java:227) >> com.google.gerrit.server.query.QueryBuilder.children(QueryBuilder.java:311) >> com.google.gerrit.server.query.QueryBuilder.toPredicate(QueryBuilder.java:217) >> com.google.gerrit.server.query.QueryBuilder.parse(QueryBuilder.java:189) >> com.google.gerrit.server.query.QueryBuilder.parse(QueryBuilder.java:208) >> com.google.gerrit.server.query.change.QueryChanges.query0(QueryChanges.java:144) >> com.google.gerrit.server.query.change.QueryChanges.query(QueryChanges.java:133) >> com.google.gerrit.server.query.change.QueryChanges.apply(QueryChanges.java:99) >> com.google.gerrit.server.query.change.QueryChanges.apply(QueryChanges.java:40) >> com.google.gerrit.httpd.restapi.RestApiServlet.service(RestApiServlet.java:329)

We could easily get the change data from the index here, without having to go to the DB.
The code is actually there but it seems that we are not indexing by the change number (PATH_LEGACY_ID)?

  private static final Pattern PAT_LEGACY_ID = Pattern.compile("^[1-9][0-9]*$");
  private static final Pattern PAT_CHANGE_ID = Pattern.compile("^[iI][0-9a-f]{4,}.*$");



#2 - Access control on changes

>> com.google.gerrit.server.project.ChangeControl$GenericFactory.controlFor(ChangeControl.java:60) >> com.google.gerrit.server.ChangeUtil.findChanges(ChangeUtil.java:350) >> com.google.gerrit.server.change.ChangesCollection.parse(ChangesCollection.java:88) >> com.google.gerrit.server.change.ChangesCollection.parse(ChangesCollection.java:42) >> com.google.gerrit.httpd.restapi.RestApiServlet.service(RestApiServlet.java:240) >> javax.servlet.http.HttpServlet.service(HttpServlet.java:729) >> com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287) >> com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277) >> com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182)

This is the one I have already found by code-inspection: again I can't see any reason why we couldn't get the same info from the Index without having to go to the DB.

This is 100% intentional. We need the result of  GET /changes/X/detail to return the latest values for the change even if the index is stale.

Dave Borowitz

unread,
Dec 18, 2015, 12:43:39 PM12/18/15
to lucamilanesio, Repo and Gerrit Discussion, David Ostrovsky

General comment: these stack traces are interesting, but please do not assume that we should *always* trust the secondary index. It's called secondary for a reason. In general the only places where we definitely want to trust the secondary index is when the performance of looking everything up in the database would be prohibitive.

Examples of places we are sure we want to trust the index for performance reasons:
-Search results page
-Mapping Change-Id (alone or in a triplet) to change number. (*Just* for the mapping--not necessarily for reading the change once we've done the mapping. See my previous reply.)

Pretty short list, huh? :)

On Thu, Dec 17, 2015 at 7:36 AM, lucamilanesio <luca.mi...@gmail.com> wrote:
Another stack-trace coming ...

#4 - Is starred

com.google.gerrit.server.query.change.IsStarredByPredicate.read(IsStarredByPredicate.java:70) >> com.google.gerrit.server.query.change.AndSource.readImpl(AndSource.java:115) >> com.google.gerrit.server.query.change.AndSource.read(AndSource.java:99) >> com.google.gerrit.server.query.change.QueryProcessor.queryChanges(QueryProcessor.java:186) >> com.google.gerrit.server.query.change.QueryProcessor.queryChanges(QueryProcessor.java:123) >> com.google.gerrit.server.query.change.QueryChanges.query0(QueryChanges.java:144) >> com.google.gerrit.server.query.change.QueryChanges.query(QueryChanges.java:133) >> com.google.gerrit.server.query.change.QueryChanges.apply(QueryChanges.java:99) >> com.google.gerrit.server.query.change.QueryChanges.apply(QueryChanges.java:40) >> com.google.gerrit.httpd.restapi.RestApiServlet.service(RestApiServlet.java:329)


#5 - Changes by project (when calculating the e-tag?)

com.google.gerrit.server.query.change.ChangeData.reloadChange(ChangeData.java:581) >> com.google.gerrit.server.query.change.ChangeData.change(ChangeData.java:571) >> com.google.gerrit.server.git.ChangeSet.changesByProject(ChangeSet.java:89) >> com.google.gerrit.server.git.MergeSuperSet.completeChangeSetWithoutTopic(MergeSuperSet.java:100) >> com.google.gerrit.server.git.MergeSuperSet.completeChangeSet(MergeSuperSet.java:91) >> com.google.gerrit.server.change.GetRevisionActions.getETag(GetRevisionActions.java:71) >> com.google.gerrit.server.change.GetRevisionActions.getETag(GetRevisionActions.java:39) >> com.google.gerrit.httpd.restapi.RestApiServlet.addResourceStateHeaders(RestApiServlet.java:489) >> com.google.gerrit.httpd.restapi.RestApiServlet.configureCaching(RestApiServlet.java:469) >> com.google.gerrit.httpd.restapi.RestApiServlet.service(RestApiServlet.java:345)

I think for this one we also want to reload the change. The ETag of the revision actions resource, which includes the submit button, is dependent on the full set of changes that will be merged together. If the approvals or a topic of a dependent change brought in by MergeSuperSet changes, we need to know about that.

We might be able to tweak the set of requested fields in this case to reduce db requests? Haven't thought it through fully.
 
To unsubscribe, email repo-discuss...@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.


--
--
To unsubscribe, email repo-discuss...@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

Luca Milanesio

unread,
Dec 18, 2015, 1:05:37 PM12/18/15
to Dave Borowitz, Repo and Gerrit Discussion, David Ostrovsky
Hi Dave,
thanks for clarifying this :-) this helps a lot in deciding which direction to take to improve the DB performance.
Can we just then introduce some extra caching for the worst hit queries?

... again, it impacts over the 80% of our servers load and many times we are making the same exact queries over and over again, thousands of times :-(

Luca.

Dave Borowitz

unread,
Dec 18, 2015, 1:15:48 PM12/18/15
to Luca Milanesio, Repo and Gerrit Discussion, David Ostrovsky
On Fri, Dec 18, 2015 at 1:05 PM, Luca Milanesio <luca.mi...@gmail.com> wrote:
Hi Dave,
thanks for clarifying this :-) this helps a lot in deciding which direction to take to improve the DB performance.
Can we just then introduce some extra caching for the worst hit queries?

... again, it impacts over the 80% of our servers load and many times we are making the same exact queries over and over again, thousands of times :-(

Well yeah, I just said we definitely want to trust the index when the performance is prohibitive, so yes, we need to fix this. I am a little scared at the thought of "extra caching" if you're talking about something other than using data stored in the secondary index, but I would need to see some concrete proposals.

I understand that you're just throwing out interesting-looking stack traces from your instrumented server, and that's fine. We just need to focus on the particular stack traces that are actually accounting for all the load, and not get distracted by the ones that aren't.

Luca Milanesio

unread,
Dec 18, 2015, 1:25:15 PM12/18/15
to Dave Borowitz, Repo and Gerrit Discussion, David Ostrovsky
The problem of extra-caching is the memory footprint and potentially a further slowdown of the server because of GCs :-(
That's why I was thinking about trusting the index as much as possible, but I see the point of the fact that index is an index, isn't a data-store or a cache.

There are definitely three top queries (see the monthly numbers associated):

#1 - 1.5M calls / month => select * from changes where change_id=?
#2 - 750K calls / month => select * from patch_sets where change_id=? order by patch_set_id
#3 - 600K calls / month => select * from patch_set_approvals where change_id=?

I can work out the stack traces starting from #1 and *most* of them are pointing to the same directions: stack #1 and #2.

I've seen regular use-cases of the search (invoked as well on the change-screen multiple times) causing the fetch of the same change over and over again, up to 5 times for the same screen.
Even very short-lived cache (let's say only 5 to 10 minutes) would *ease the pain* of the DB quite a lot !

Again, introducing a cache does not mean that you have to necessarily use it :-)

Luca.

Dave Borowitz

unread,
Dec 18, 2015, 1:45:19 PM12/18/15
to Luca Milanesio, Repo and Gerrit Discussion, David Ostrovsky
On Fri, Dec 18, 2015 at 1:25 PM, Luca Milanesio <luca.mi...@gmail.com> wrote:
The problem of extra-caching is the memory footprint and potentially a further slowdown of the server because of GCs :-(
That's why I was thinking about trusting the index as much as possible, but I see the point of the fact that index is an index, isn't a data-store or a cache.

There are definitely three top queries (see the monthly numbers associated):

#1 - 1.5M calls / month => select * from changes where change_id=?
#2 - 750K calls / month => select * from patch_sets where change_id=? order by patch_set_id
#3 - 600K calls / month => select * from patch_set_approvals where change_id=?

How do these numbers compare to the number non-304 change screen views per month? Reloading the change screen must reload the change, and it actually probably reloads the change several times. 1M/month is less than 0.5qps of page views which honestly sounds pretty reasonable for a site with thousands of users.
 
I can work out the stack traces starting from #1 and *most* of them are pointing to the same directions: stack #1 and #2.

Unless it's somehow on every user dashboard, #1 (conflicts predicate) is only loaded from the change screen. I would be willing to entertain the idea of caching just the response to the conflicts search request for a fixed period using HTTP cache headers. But whether this will be at all useful really depends on your request pattern.

#2 I just don't see why you would want to return cached data. This is a REST API request for /changes/<#>. You want to return the latest values stored in the database.
 
I've seen regular use-cases of the search (invoked as well on the change-screen multiple times) causing the fetch of the same change over and over again, up to 5 times for the same screen.
Even very short-lived cache (let's say only 5 to 10 minutes)

That is an enormously long cache length. You realllly do not want anything approaching that cache length unless you're 100% sure you've done the cache invalidation correctly.

One way I might be more ok with a cache is if we split up Change into "immutable" parts (id, key, createdOn, owner, dest) and mutable parts (everything else), so we didn't have to worry about invalidation. (I only say "immutable" because of Qualcomm's new move change feature, but that's a small surface area.)

Dave Borowitz

unread,
Dec 18, 2015, 2:04:46 PM12/18/15
to Luca Milanesio, Repo and Gerrit Discussion, David Ostrovsky
I also want to get a handle on what are the actual problems you're trying to solve here.

"Lots of database queries" is not a problem in and of itself; it's an expected side effect of having more traffic. Are the database queries causing your database server to fall over?

Similarly, I'm not sure what you mean by "87% of our HTTP latency" going to search results. If you mean you added up all the time to service all requests and 87% of that time is in /changes/?q= requests, that doesn't necessarily indicate a problem either. If search results take 200ms to render and a change screen takes 10ms to render, and there is a 1:1 ratio between search result pages and change screens, then yeah, 95% of your latency (200/210) is in rendering search result pages, but so what? 200ms is fast enough users probably still don't care.

Now if we're talking multi-second search result page latency or it's bringing the server to its knees, that's a problem. But reading your first few messages in this thread, that's not what I've seen.

luca.mi...@gmail.com

unread,
Dec 18, 2015, 2:23:33 PM12/18/15
to Dave Borowitz, Repo and Gerrit Discussion, David Ostrovsky
Hi Dave,
The problem is not the total number of queries but the way they are distributed.

There are peaks of thousands of queries without any specific reason, mostly associated with some specific search patterns.

This is a symptom of a bug IMHO.

If I search for change X I am expecting 1x query for this change, not dozens or hundreds for the same key.

Yes, server performance could still be bearable, but this isn't the point. I am aiming on founding and fixing problems that are not impacting only GerritHub.io but potentially other people using Gerrit.

I saw the same peaks at some of our clients installations and I need to understand and focuses gin resolving them.

With regards to response times of the search page, we are talking abounds seconds (up to 5-10 sometimes) not msec :-(

Hope this clarifies.

Luca

Sent from my iPhone

Dave Borowitz

unread,
Dec 18, 2015, 2:29:48 PM12/18/15
to Luca Milanesio, Repo and Gerrit Discussion, David Ostrovsky
On Fri, Dec 18, 2015 at 2:23 PM, <luca.mi...@gmail.com> wrote:
Hi Dave,
The problem is not the total number of queries but the way they are distributed.

There are peaks of thousands of queries without any specific reason, mostly associated with some specific search patterns.

Thanks for pointing this out. This does not sound like the steady high QPS that we would expect from a normally loaded server.
 
This is a symptom of a bug IMHO.

I agree.
 
If I search for change X I am expecting 1x query for this change, not dozens or hundreds for the same key.

By "search for" do you mean enter a search in the search box, or view the change screen? In the search box you should get zero queries for this change.

On the change screen it's actually closer to a dozen than to 1, because we make a dozen requests:

(Hm. Maybe that's an argument for a cache.)

And even if it's a 10-query spike for loading the change screen, it would take a spike of hundreds or more concurrent change screen loads to get up to the thousands of queries you're seeing.
 
Yes, server performance could still be bearable, but this isn't the point. I am aiming on founding and fixing problems that are not impacting only GerritHub.io but potentially other people using Gerrit.

I saw the same peaks at some of our clients installations and I need to understand and focuses gin resolving them.

With regards to response times of the search page, we are talking abounds seconds (up to 5-10 sometimes) not msec :-(

Yep, this is bad. I think some of the stuff I have not commented on (drafts etc.) might be contributing.

Dave Borowitz

unread,
Dec 18, 2015, 2:36:13 PM12/18/15
to Luca Milanesio, Repo and Gerrit Discussion, David Ostrovsky
Trace #2 (reload change to do permissions check on every REST API call) is not a recent regression. Here is the equivalent code in 2.10:

So if we're seeing more of this particular stack trace than we had been in previous versions, there must be something else going on, like in how many requests the UI is making.

luca.mi...@gmail.com

unread,
Dec 18, 2015, 3:23:07 PM12/18/15
to Dave Borowitz, Repo and Gerrit Discussion, David Ostrovsky
I'll start to make some of the instrumentation permanent in production with the reference of the incoming HTTP Url and time stamp, so that I can get a more accurate picture over time.

In parallel will start pushing some proposal of caching if some scenarios, like the trace #2.

It could well be an old regression that we didn't notice in the past because of the more limited audience and load on our servers.

Thanks for your time and help !

Luca 

Sent from my iPhone

Luca Milanesio

unread,
Dec 18, 2015, 7:57:08 PM12/18/15
to Dave Borowitz, Repo and Gerrit Discussion, David Ostrovsky
Will use https://gerrit-review.googlesource.com/#/c/73475 for adding the stack trace of the REST API triggering the "suspected" query methods.

This will generate:
- Thread ID with time-stamp
- REST API URL and parameters
- Review DB method called with parameters
- Overall execution time of the REST API (nsec)

This data will be useful to identify problems and possible optimisations.

Feel free to improve the instrumentation patch :-)

Luca.

David Ostrovsky

unread,
Dec 20, 2015, 5:53:24 AM12/20/15
to Repo and Gerrit Discussion, dbor...@google.com, david.o...@gmail.com

On Friday, December 18, 2015 at 9:23:07 PM UTC+1, lucamilanesio wrote:
I'll start to make some of the instrumentation permanent in production with the reference of the incoming HTTP Url and time stamp, so that I can get a more accurate picture over time.


Applied your patch, and at least on my test instance, no database hit,
when accessing change list.

However, after accessing the change screen and leavng it open for a while,
I'm seeing repeated database queries:

ChangeAccess.get(1927) -  > com.google.gerrit.server.project.ChangeControl$GenericFactory.controlFor(ChangeControl.java:60) > com.google.gerrit.server.ChangeUtil.findChanges(ChangeUtil.java:351) > com.google.gerrit.server.change.ChangesCollection.parse(ChangesCollection.java:88) > com.google.gerrit.server.change.ChangesCollection.parse(ChangesCollection.java:1) > com.google.gerrit.httpd.restapi.RestApiServlet.service(RestApiServlet.java:244) - /changes/1927/detail?O=10004 - 130335803001

This query is needed to be able to perform ETag comparisson and decide
if new response should be send to the client or just 304 (NOT_MODIFIED):

Thread [HTTP-63] (Suspended)
owns: GuiceFilter$Context  (id=118)
RestApiServlet.notModified(HttpServletRequest, RestResource, RestView<RestResource>) line: 461
ChangesRestApiServlet(RestApiServlet).service(HttpServletRequest, HttpServletResponse) line: 320
ChangesRestApiServlet(HttpServlet).service(ServletRequest, ServletResponse) line: 729
ServletDefinition.doServiceImpl(HttpServletRequest, HttpServletResponse) line: 287
ServletDefinition.doService(ServletRequest, ServletResponse) line: 277
ServletDefinition.service(ServletRequest, ServletResponse) line: 182
ManagedServletPipeline.service(ServletRequest, ServletResponse) line: 91
FilterChainInvocation.doFilter(ServletRequest, ServletResponse) line: 85
GetUserFilter.doFilter(ServletRequest, ServletResponse, FilterChain) line: 82
FilterChainInvocation.doFilter(ServletRequest, ServletResponse) line: 82
CacheControlFilter.doFilter(ServletRequest, ServletResponse, FilterChain) line: 73
FilterChainInvocation.doFilter(ServletRequest, ServletResponse) line: 82
RunAsFilter.doFilter(ServletRequest, ServletResponse, FilterChain) line: 117
FilterChainInvocation.doFilter(ServletRequest, ServletResponse) line: 82
RequestMetricsFilter.doFilter(ServletRequest, ServletResponse, FilterChain) line: 60
FilterChainInvocation.doFilter(ServletRequest, ServletResponse) line: 82
AllRequestFilter$FilterProxy$1.doFilter(ServletRequest, ServletResponse) line: 136
GerritMonitoringFilter$JavamelodyFilter(MonitoringFilter).doFilter(FilterChain, HttpServletRequest, HttpServletResponse) line: 203
GerritMonitoringFilter$JavamelodyFilter(MonitoringFilter).doFilter(ServletRequest, ServletResponse, FilterChain) line: 181
GerritMonitoringFilter.doFilter(ServletRequest, ServletResponse, FilterChain) line: 65
AllRequestFilter$FilterProxy$1.doFilter(ServletRequest, ServletResponse) line: 132
AllRequestFilter$FilterProxy.doFilter(ServletRequest, ServletResponse, FilterChain) line: 138
FilterChainInvocation.doFilter(ServletRequest, ServletResponse) line: 82
RequestContextFilter.doFilter(ServletRequest, ServletResponse, FilterChain) line: 75
FilterChainInvocation.doFilter(ServletRequest, ServletResponse) line: 82
ManagedFilterPipeline.dispatch(ServletRequest, ServletResponse, FilterChain) line: 119
GuiceFilter$1.call() line: 133
GuiceFilter$1.call() line: 130
GuiceFilter$Context.call(Callable<T>) line: 203
GuiceFilter.doFilter(ServletRequest, ServletResponse, FilterChain) line: 130
ServletHandler$CachedChain.doFilter(ServletRequest, ServletResponse) line: 1652
ServletHandler.doHandle(String, Request, HttpServletRequest, HttpServletResponse) line: 585
SessionHandler.doHandle(String, Request, HttpServletRequest, HttpServletResponse) line: 221
ServletContextHandler(ContextHandler).doHandle(String, Request, HttpServletRequest, HttpServletResponse) line: 1127
ServletHandler.doScope(String, Request, HttpServletRequest, HttpServletResponse) line: 515
SessionHandler.doScope(String, Request, HttpServletRequest, HttpServletResponse) line: 185
ServletContextHandler(ContextHandler).doScope(String, Request, HttpServletRequest, HttpServletResponse) line: 1061
ServletContextHandler(ScopedHandler).handle(String, Request, HttpServletRequest, HttpServletResponse) line: 141
RequestLogHandler(HandlerWrapper).handle(String, Request, HttpServletRequest, HttpServletResponse) line: 97
RequestLogHandler.handle(String, Request, HttpServletRequest, HttpServletResponse) line: 95
Server(HandlerWrapper).handle(String, Request, HttpServletRequest, HttpServletResponse) line: 97
Server.handle(HttpChannel<?>) line: 499
HttpConnection$HttpChannelOverHttp(HttpChannel<T>).handle() line: 310
HttpConnection.onFillable() line: 257
AbstractConnection$2.run() line: 540
QueuedThreadPool.runJob(Runnable) line: 635
QueuedThreadPool$3.run() line: 555
Thread.run() line: 745

But, let say we have 10k active users on heavy loaded gerrit site,
each of them open one single change and wait, we would get 10k
queries every 30 seconds (default refresh rate).

I would say, there is a lot room for improvements here. Immediately
meassure would be to shut down change refresh feature per configuration.
Quoting the documentation here:

[[change.updateDelay]]change.updateDelay::
+
How often in seconds the web interface should poll for updates to the
currently open change.  The poller relies on the client's browser
cache to use If-Modified-Since and respect `304 Not Modified` HTTP
responses.  This allows for fast polls, often under 8 milliseconds.
+
With a configured 30 second delay a server with 4900 active users will
typically need to dedicate 1 CPU to the update check.  4900 users
divided by an average delay of 30 seconds is 163 requests arriving per
second.  If requests are served at \~6 ms response time, 1 CPU is
necessary to keep up with the update request traffic.  On a smaller
user base of 500 active users, the default 30 second delay is only 17
requests per second and requires ~10% CPU.
+
If 0 the update polling is disabled.
+
Default is 30 seconds.

 

luca.mi...@gmail.com

unread,
Dec 20, 2015, 8:45:13 AM12/20/15
to David Ostrovsky, Repo and Gerrit Discussion, dbor...@google.com
Hi David,
Thanks for the input: I've rolled out to prod the instrumented patch and will have some feedback by mid next week.

The change you suggested should ease a bit the traffic, maybe by raising to 5 minutes the refresh interval, assuming that is the cause of the overload.

I doubt however that it could generate the peaks I was showing in the graphs, as you should have thousands of users having their browsers opened on a specific change exactly at the same time: possible but unlikely IMHO.

Logs will tell the real story :-)

Luca

Sent from my iPhone
--

David Ostrovsky

unread,
Dec 21, 2015, 2:50:03 AM12/21/15
to Repo and Gerrit Discussion, dbor...@google.com

On Sunday, December 20, 2015 at 2:45:13 PM UTC+1, lucamilanesio wrote:
Hi David,
Thanks for the input: I've rolled out to prod the instrumented patch and will have some feedback by mid next week.

The change you suggested should ease a bit the traffic, maybe by raising to 5 minutes the refresh interval, assuming that is the cause of the overload.

I doubt however that it could generate the peaks I was showing in the graphs, as you should have thousands of users having their browsers opened on a specific change exactly at the same time: possible but unlikely IMHO.

Not quite.

I guess you have misunderstood this feature. Change refresh should catch all kind of updates,
that why it doesn't matter what changes other user opened (if at all). Once a change is opened,
the polling is started every 30 sec, no matter what other users are doing. So to correct your
statement: "thousands of users having their browsers opened on some changes in any time".

1k user opened any changes and leave them for say 1 hour opened, would generate
120k queries with default gerrit setup.


Luca Milanesio

unread,
Dec 21, 2015, 3:25:02 AM12/21/15
to David Ostrovsky, Repo and Gerrit Discussion, dbor...@google.com
Hi David,
I know the feature, but I was saying that I believe is unlikely that all the users have their browsers constantly opened to a page change for hours :-)

However I am extracting the first set of logs from the week-end, around 8K hits, not many as over the week-end the activity is quite low.

Luca.

Luca Milanesio

unread,
Dec 21, 2015, 3:58:49 AM12/21/15
to David Ostrovsky, Repo and Gerrit Discussion, dbor...@google.com
Hi David,
I have actually found some of the users having a single change opened for hours :-O

See below the TS of the REST API associated to a specific change-id:

2015-12-21 05:13:56 INFO  
2015-12-21 05:14:26 INFO  
2015-12-21 05:14:56 INFO  
2015-12-21 05:15:26 INFO  
2015-12-21 05:15:56 INFO  
2015-12-21 05:16:26 INFO  
2015-12-21 05:16:56 INFO  
2015-12-21 05:17:26 INFO  
2015-12-21 05:18:01 INFO  
2015-12-21 05:18:26 INFO  
2015-12-21 05:18:56 INFO  
2015-12-21 05:19:26 INFO  

Those calls create however a sort of "background noise" not really any peak. For sure I can ease the background noise by increasing the refresh time and possibly some of the redundant calls I see are due to this noise.

Luca.

Luca Milanesio

unread,
Dec 21, 2015, 4:26:15 AM12/21/15
to David Ostrovsky, Repo and Gerrit Discussion, dbor...@google.com
I was wondering, even in this scenario: imagine you have change Nr. 123 opened and you've already been notified that there is something new available.

From the first notification, you you know that your screen is outdated: why on earth does the client keep on polling? Even if there are even more new updates available, is your UX going to change because of that?
Once the screen has been notified of being outdated, we should automatically stop polling IMHO.

Any disagreement on that?

Luca.

lucamilanesio

unread,
Dec 21, 2015, 4:28:52 AM12/21/15
to Repo and Gerrit Discussion, david.o...@gmail.com, dbor...@google.com
... and if there are no updates to a change, the polling time should automatically and gradually increase: so if you left the window opened for hours, it is very unlikely that you are closely watching at it.

Polling every 30' for a change opened for 2 hours, is going to be acceptable anyway. On the other side if you just opened the change, first polling could remain at 30s as you are very likely to look at it.

Luca.

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss+unsubscribe@googlegroups.com.

lucamilanesio

unread,
Dec 21, 2015, 4:49:48 AM12/21/15
to Repo and Gerrit Discussion, david.o...@gmail.com, dbor...@google.com
Looking at the code there is already a check if the user is active or not on the screen:

See the UpdateCheckTime method:

  @Override

  public void run() {

    if (!screen.isAttached()) {

      // screen should have cancelled this timer.

      cancel();

      return;

    } else if (running) {

      return;

    }


    running = true;

    screen.loadChangeInfo(false, new AsyncCallback<ChangeInfo>() {

      @Override

      public void onSuccess(ChangeInfo info) {

        running = false;

        screen.showUpdates(info);


        int d = UserActivityMonitor.isActive()

            ? POLL_PERIOD

            : IDLE_PERIOD;

        if (d != delay) {

          delay = d;

          schedule();

        }

      }


Idle period is 2h.


Something isn't quite working as I see the polling going on for hours for some users.

Will try to reproduce the problem locally as there is possibly some bug somewhere ...


Luca

lucamilanesio

unread,
Dec 21, 2015, 5:09:44 AM12/21/15
to Repo and Gerrit Discussion, david.o...@gmail.com, dbor...@google.com
Based on the code, a user should become "inactive" if he has no actions for 10 minutes:

  private static final long TIMEOUT = 10 * 60 * 1000;


    @Override

    public boolean execute() {

      long now = System.currentTimeMillis();

      if (recent) {

        if (!active) {

          ValueChangeEvent.fire(this, active);

        }

        recent = false;

        active = true;

        last = now;

      } else if (active && (now - last) > TIMEOUT) {

        active = false;

        ValueChangeEvent.fire(this, false);

      }

      return true;

    }


Luca.

lucamilanesio

unread,
Dec 21, 2015, 5:36:16 AM12/21/15
to Repo and Gerrit Discussion, david.o...@gmail.com, dbor...@google.com
It actually works: after 10' of user's inactivity, the polling stops.

The background noise generated by the polling is thus negligible, as quite unlikely 7K users will have a single change opened all the time for hours and keep on interacting with it once every 10'.

I can see the pattern of the polls and exclude them from my analysis as well as it wasn't a query after all :-)
(triggers a /changes/NNNN/detail?O=10004 REST call, quite inexpensive anyway, less than 4msec to get executed).

Luca.

David Ostrovsky

unread,
Dec 21, 2015, 5:47:39 AM12/21/15
to Repo and Gerrit Discussion

Am Montag, 21. Dezember 2015 11:36:16 UTC+1 schrieb lucamilanesio:
It actually works: after 10' of user's inactivity, the polling stops.


OK, I missed that polling stops after 10'. This inactivity timeout could be reduced or made configurable.

The background noise generated by the polling is thus negligible, as quite unlikely 7K users will have a single change opened all the time for hours and keep on interacting with it once every 10'.

I can see the pattern of the polls and exclude them from my analysis as well as it wasn't a query after all :-)

Why it wasn't a query? As it can be seen in my stack trace, 304 response will be generated
for /changes/NNNN/detail?O=10004 REST call, and ETag must be computed for that and thus
db record must be retrieved.

Luca Milanesio

unread,
Dec 21, 2015, 5:54:15 AM12/21/15
to David Ostrovsky, Repo and Gerrit Discussion
Sorry, I see one query at the beginning:
/changes/?q=status:open+is:mergeable+conflicts:NNNN&O=a

During the polling, no further queries though: this is what the logs tell me :-)

Luca.

--
--
To unsubscribe, email repo-discuss...@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

Luca Milanesio

unread,
Dec 21, 2015, 9:06:43 AM12/21/15
to David Ostrovsky, Repo and Gerrit Discussion
Just to clarify: during the refresh time (first 10') there was no Gerrit query triggered, but only one SQL query related to the refresh REST call: /changes/NNNN/detail?O=10004.
That query isn't expensive at all (less than 10msec) and it does not represent a problem to us.

Luca.

On 21 Dec 2015, at 10:47, David Ostrovsky <david.o...@gmail.com> wrote:

--
--
To unsubscribe, email repo-discuss...@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

Dave Borowitz

unread,
Dec 21, 2015, 1:56:28 PM12/21/15
to Luca Milanesio, David Ostrovsky, Repo and Gerrit Discussion
On Mon, Dec 21, 2015 at 4:26 AM, Luca Milanesio <luca.mi...@gmail.com> wrote:
I was wondering, even in this scenario: imagine you have change Nr. 123 opened and you've already been notified that there is something new available.

From the first notification, you you know that your screen is outdated: why on earth does the client keep on polling? Even if there are even more new updates available, is your UX going to change because of that?
Once the screen has been notified of being outdated, we should automatically stop polling IMHO.

Any disagreement on that?

No disagreement here.

An additional potential feature on top of that might be to actually reload the change screen if the user is inactive. That is, don't surprise them by reloading in the middle of doing something, but if they haven't looked in a while, it's ok to reload behind their back. (This assumes the code to detect inactivity is working of course.)

Shawn Pearce

unread,
Dec 21, 2015, 3:18:33 PM12/21/15
to Dave Borowitz, Luca Milanesio, David Ostrovsky, Repo and Gerrit Discussion
On Mon, Dec 21, 2015 at 10:56 AM, 'Dave Borowitz' via Repo and Gerrit Discussion <repo-d...@googlegroups.com> wrote:


On Mon, Dec 21, 2015 at 4:26 AM, Luca Milanesio <luca.mi...@gmail.com> wrote:
I was wondering, even in this scenario: imagine you have change Nr. 123 opened and you've already been notified that there is something new available.

From the first notification, you you know that your screen is outdated: why on earth does the client keep on polling? Even if there are even more new updates available, is your UX going to change because of that?
Once the screen has been notified of being outdated, we should automatically stop polling IMHO.

Any disagreement on that?

No disagreement here.

It keeps polling to get more names to show updates from. So just "Update from Jenkins" may not be interesting, but "Update from Jekins, Dave Borowitz" now has more of my interest because a human has replied.


An additional potential feature on top of that might be to actually reload the change screen if the user is inactive. That is, don't surprise them by reloading in the middle of doing something, but if they haven't looked in a while, it's ok to reload behind their back. (This assumes the code to detect inactivity is working of course.)

True, I've thought about this and kind of wanted it, but it the loss of context may be confusing if I switch tabs to research something (making this tab idle), but I come back thinking I'm going to navigate to a file so I can add a comment or delete a draft. Refreshing just because Jenkins voted or there is a new patch set could be undesirable.

Luca Milanesio

unread,
Dec 21, 2015, 5:32:26 PM12/21/15
to Shawn Pearce, Dave Borowitz, David Ostrovsky, Repo and Gerrit Discussion
On 21 Dec 2015, at 20:18, Shawn Pearce <s...@google.com> wrote:

On Mon, Dec 21, 2015 at 10:56 AM, 'Dave Borowitz' via Repo and Gerrit Discussion <repo-d...@googlegroups.com> wrote:


On Mon, Dec 21, 2015 at 4:26 AM, Luca Milanesio <luca.mi...@gmail.com> wrote:
I was wondering, even in this scenario: imagine you have change Nr. 123 opened and you've already been notified that there is something new available.

From the first notification, you you know that your screen is outdated: why on earth does the client keep on polling? Even if there are even more new updates available, is your UX going to change because of that?
Once the screen has been notified of being outdated, we should automatically stop polling IMHO.

Any disagreement on that?

No disagreement here.

It keeps polling to get more names to show updates from. So just "Update from Jenkins" may not be interesting, but "Update from Jekins, Dave Borowitz" now has more of my interest because a human has replied.

OK, makes sense.
Anyway as the information isn't really critical (aka not a read for update) just having the input from an index or a cached entry could be good enough.

Looking at our logs however, is unlikely that the polling is the cause of the spikes and the fetches are quite quick, simple straight select by primary key, very quick.



An additional potential feature on top of that might be to actually reload the change screen if the user is inactive. That is, don't surprise them by reloading in the middle of doing something, but if they haven't looked in a while, it's ok to reload behind their back. (This assumes the code to detect inactivity is working of course.)

True, I've thought about this and kind of wanted it, but it the loss of context may be confusing if I switch tabs to research something (making this tab idle), but I come back thinking I'm going to navigate to a file so I can add a comment or delete a draft. Refreshing just because Jenkins voted or there is a new patch set could be undesirable.

I personally don't like pages that auto-refresh by themselves :-) Better to be notified that something has changed as it does today. Possibly in a more visible way :-)

Luca.

lucamilanesio

unread,
Dec 22, 2015, 5:48:28 AM12/22/15
to Repo and Gerrit Discussion, s...@google.com, dbor...@google.com, david.o...@gmail.com
I collected all the logs from yesterday: I ran in production with the instrumented patch, the performance penalty was still acceptable and I got very useful information back :-)

I've noticed some REST API calls that seem to trigger the fetching of the same change from DB multiple times, over and over again. This may be an initial justification of the peaks I see at times on the number of SQL calls.

For instance I saw a single call to /changes/256052/reviewers that triggered (in the order):

 ChangeAccess.get(256052) 

 ChangeAccess.get(256052) 

 ChangeAccess.get(256052) 

 PatchSetAccess.byChange(256052) 

 PatchLineCommentAccess.byChange(256052) 

 PatchSetAccess.byChange(256052) 

 PatchSetAccess.byChange(256052) 

 PatchSetAccess.byChange(256052) 

 PatchSetAccess.byChange(256052) 

 PatchSetAccess.byChange(256052) 

 PatchSetAccess.byChange(256052) 

 PatchSetAccess.byChange(256052) 

 PatchSetAccess.byChange(256052) 

 PatchSetAccess.byChange(256052) 

 ChangeAccess.get(256052) 

 ChangeAccess.get(256052) 

 PatchSetAccess.byChange(256052) 

 ChangeAccess.get(256052) 

 ChangeAccess.get(256052) 

 PatchSetAccess.byChange(256052) 

 ChangeAccess.get(256052) 

 ChangeAccess.get(256052) 

 PatchSetAccess.byChange(256052) 

 ChangeAccess.get(256052) 

 ChangeAccess.get(256052) 

 PatchSetAccess.byChange(256052) 

 ChangeAccess.get(256052) 

 ChangeAccess.get(256052) 

 PatchSetAccess.byChange(256052) 

 ChangeAccess.get(256052) 

 ChangeAccess.get(256052) 

 PatchSetAccess.byChange(256052) 

 ChangeAccess.get(256052) 

 ChangeAccess.get(256052) 

 PatchSetAccess.byChange(256052) 

 ChangeAccess.get(256052) 

 ChangeAccess.get(256052) 

 PatchSetAccess.byChange(256052) 

 ChangeAccess.get(256052) 

 ChangeAccess.get(256052) 

 PatchSetAccess.byChange(256052) 

 ChangeAccess.get(256052) 

 ChangeAccess.get(256052)


... that sounds a bug to me :-)


Luca.

lucamilanesio

unread,
Dec 22, 2015, 5:59:17 AM12/22/15
to Repo and Gerrit Discussion, s...@google.com, dbor...@google.com, david.o...@gmail.com
The total call took around 0.6 seconds to execute (still quite fast :-O) but regarding peak, it generated a 86 TPS because of a single REST call!

If we are unlucky, and we get ten users doing the same thing at the same time (with around 7K users on the platform, it may happen), then a peak of DB queries is quite likely to happen.

In *this particular case*, do we need to reload the same change (and the list of patches) over 40 times in a single call?
Even a very simple thread-local temporary cache on changes would save *hundreds* of TPS and ease the load on the DB by a couple of order of magnitudes!

Feedback is more than welcome as usual :-)

Luca.

Luca Milanesio

unread,
Dec 22, 2015, 6:46:07 AM12/22/15
to Repo and Gerrit Discussion, Shawn Pearce, dbor...@google.com, david.o...@gmail.com
Some details of the stack-traces of each calls:

Call #1 - REST API collection resolver

 ChangeAccess.get(256052)
  > com.google.gerrit.server.project.ChangeControl$GenericFactory.controlFor(ChangeControl.java:60) > com.google.gerrit.server.ChangeUtil.findChanges(ChangeUtil.java:351) > com.google.gerrit.server.change.ChangesCollection.parse(ChangesCollection.java:88) > com.google.gerrit.server.change.ChangesCollection.parse(ChangesCollection.java:42) > com.google.gerrit.httpd.restapi.RestApiServlet.service(RestApiServlet.java:244)

Call #2 - Adding reviewer to a change

 ChangeAccess.get(256052)
  > com.google.gerrit.server.ChangeUtil.bumpRowVersionNotLastUpdatedOn(ChangeUtil.java:137) > com.google.gerrit.server.change.PostReviewers.addReviewers(PostReviewers.java:236) > com.google.gerrit.server.change.PostReviewers.putGroup(PostReviewers.java:214) > com.google.gerrit.server.change.PostReviewers.apply(PostReviewers.java:141) > com.google.gerrit.server.change.PostReviewers.apply(PostReviewers.java:68) > com.google.gerrit.httpd.restapi.RestApiServlet.service(RestApiServlet.java:340)
 ChangeAccess.get(256052)

Call #3 - Update Lucene Index

 ChangeAccess.get(256052)
  > com.google.gerrit.server.query.change.ChangeData.reloadChange(ChangeData.java:581) > com.google.gerrit.server.query.change.ChangeData.change(ChangeData.java:571) > com.google.gerrit.server.index.ChangeField$3.get(ChangeField.java:96) > com.google.gerrit.server.index.ChangeField$3.get(ChangeField.java:92) > com.google.gerrit.server.index.Schema$1.apply(Schema.java:169) > com.google.gerrit.server.index.Schema$1.apply(Schema.java:164) > com.google.common.collect.Iterators$8.transform(Iterators.java:817) > com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48) > com.google.common.collect.Iterators$7.computeNext(Iterators.java:674) > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) > com.google.gerrit.lucene.LuceneChangeIndex.toDocument(LuceneChangeIndex.java:597) > com.google.gerrit.lucene.LuceneChangeIndex.replace(LuceneChangeIndex.java:316) > com.google.gerrit.server.index.ChangeIndexer.index(ChangeIndexer.java:158) > com.google.gerrit.server.index.ChangeIndexer$IndexTask.call(ChangeIndexer.java:242) > com.google.gerrit.server.index.ChangeIndexer$IndexTask.call(ChangeIndexer.java:203) > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) > com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:310) > java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:132) > com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:61) > com.google.gerrit.server.index.ChangeIndexer.submit(ChangeIndexer.java:200) > com.google.gerrit.server.index.ChangeIndexer.indexAsync(ChangeIndexer.java:133) > com.google.gerrit.server.change.PostReviewers.addReviewers(PostReviewers.java:247) > com.google.gerrit.server.change.PostReviewers.putGroup(PostReviewers.java:214) > com.google.gerrit.server.change.PostReviewers.apply(PostReviewers.java:141) > com.google.gerrit.server.change.PostReviewers.apply(PostReviewers.java:68) > com.google.gerrit.httpd.restapi.RestApiServlet.service(RestApiServlet.java:340)
 PatchSetAccess.byChange(256052)

Call #4 - Update Lucene Index

 PatchSetAccess.byChange(256052)
  > com.google.gerrit.server.query.change.ChangeData.patchSets(ChangeData.java:688) > com.google.gerrit.server.query.change.ChangeData.currentPatchSet(ChangeData.java:598) > com.google.gerrit.server.query.change.ChangeData.currentFilePaths(ChangeData.java:445) > com.google.gerrit.server.index.ChangeField.getFileParts(ChangeField.java:254) > com.google.gerrit.server.index.ChangeField$16.get(ChangeField.java:294) > com.google.gerrit.server.index.ChangeField$16.get(ChangeField.java:290) > com.google.gerrit.server.index.Schema$1.apply(Schema.java:169) > com.google.gerrit.server.index.Schema$1.apply(Schema.java:164) > com.google.common.collect.Iterators$8.transform(Iterators.java:817) > com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48) > com.google.common.collect.Iterators$7.computeNext(Iterators.java:674) > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) > com.google.gerrit.lucene.LuceneChangeIndex.toDocument(LuceneChangeIndex.java:597) > com.google.gerrit.lucene.LuceneChangeIndex.replace(LuceneChangeIndex.java:316) > com.google.gerrit.server.index.ChangeIndexer.index(ChangeIndexer.java:158) > com.google.gerrit.server.index.ChangeIndexer$IndexTask.call(ChangeIndexer.java:242) > com.google.gerrit.server.index.ChangeIndexer$IndexTask.call(ChangeIndexer.java:203) > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) > com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:310) > java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:132) > com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:61) > com.google.gerrit.server.index.ChangeIndexer.submit(ChangeIndexer.java:200) > com.google.gerrit.server.index.ChangeIndexer.indexAsync(ChangeIndexer.java:133) > com.google.gerrit.server.change.PostReviewers.addReviewers(PostReviewers.java:247) > com.google.gerrit.server.change.PostReviewers.putGroup(PostReviewers.java:214) > com.google.gerrit.server.change.PostReviewers.apply(PostReviewers.java:141) > com.google.gerrit.server.change.PostReviewers.apply(PostReviewers.java:68) > com.google.gerrit.httpd.restapi.RestApiServlet.service(RestApiServlet.java:340)

Call #5 - Update Lucene Index

 PatchLineCommentAccess.byChange(256052)
  > com.google.gerrit.server.PatchLineCommentsUtil.publishedByChange(PatchLineCommentsUtil.java:151) > com.google.gerrit.server.query.change.ChangeData.publishedComments(ChangeData.java:735) > com.google.gerrit.server.index.ChangeField$27.get(ChangeField.java:559) > com.google.gerrit.server.index.ChangeField$27.get(ChangeField.java:554) > com.google.gerrit.server.index.Schema$1.apply(Schema.java:169) > com.google.gerrit.server.index.Schema$1.apply(Schema.java:164) > com.google.common.collect.Iterators$8.transform(Iterators.java:817) > com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48) > com.google.common.collect.Iterators$7.computeNext(Iterators.java:674) > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) > com.google.gerrit.lucene.LuceneChangeIndex.toDocument(LuceneChangeIndex.java:597) > com.google.gerrit.lucene.LuceneChangeIndex.replace(LuceneChangeIndex.java:316) > com.google.gerrit.server.index.ChangeIndexer.index(ChangeIndexer.java:158) > com.google.gerrit.server.index.ChangeIndexer$IndexTask.call(ChangeIndexer.java:242) > com.google.gerrit.server.index.ChangeIndexer$IndexTask.call(ChangeIndexer.java:203) > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) > com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:310) > java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:132) > com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:61) > com.google.gerrit.server.index.ChangeIndexer.submit(ChangeIndexer.java:200) > com.google.gerrit.server.index.ChangeIndexer.indexAsync(ChangeIndexer.java:133) > com.google.gerrit.server.change.PostReviewers.addReviewers(PostReviewers.java:247) > com.google.gerrit.server.change.PostReviewers.putGroup(PostReviewers.java:214) > com.google.gerrit.server.change.PostReviewers.apply(PostReviewers.java:141) > com.google.gerrit.server.change.PostReviewers.apply(PostReviewers.java:68) > com.google.gerrit.httpd.restapi.RestApiServlet.service(RestApiServlet.java:340)

Call #6 to #14 - Format JSON result (repeated 8 times)

 PatchSetAccess.byChange(256052)
  > com.google.gerrit.server.query.change.ChangeData.patchSets(ChangeData.java:688) > com.google.gerrit.server.query.change.ChangeData.currentPatchSet(ChangeData.java:598) > com.google.gerrit.server.change.ReviewerJson.format(ReviewerJson.java:110) > com.google.gerrit.server.change.PostReviewers.addReviewers(PostReviewers.java:251) > com.google.gerrit.server.change.PostReviewers.putGroup(PostReviewers.java:214) > com.google.gerrit.server.change.PostReviewers.apply(PostReviewers.java:141) > com.google.gerrit.server.change.PostReviewers.apply(PostReviewers.java:68) > com.google.gerrit.httpd.restapi.RestApiServlet.service(RestApiServlet.java:340)

Call #15 - Send e-mail notification to reviewer

 ChangeAccess.get(256052)
  > com.google.gerrit.server.query.change.ChangeData.reloadChange(ChangeData.java:581) > com.google.gerrit.server.query.change.ChangeData.change(ChangeData.java:571) > com.google.gerrit.server.mail.ChangeEmail.<init>(ChangeEmail.java:74) > com.google.gerrit.server.mail.NewChangeSender.<init>(NewChangeSender.java:35) > com.google.gerrit.server.mail.AddReviewerSender.<init>(AddReviewerSender.java:32) > com.google.gerrit.server.mail.AddReviewerSender$$FastClassByGuice$$efc519d2.newInstance(<generated>) > com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40) > com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:61) > com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:105) > com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85) > com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:267) > com.google.inject.internal.InjectorImpl$2$1.call(InjectorImpl.java:1016) > com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1092) > com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1012) > com.google.inject.assistedinject.FactoryProvider2.invoke(FactoryProvider2.java:770) > com.sun.proxy.$Proxy61.create(Unknown Source) > com.google.gerrit.server.change.PostReviewers.emailReviewers(PostReviewers.java:285) > com.google.gerrit.server.change.PostReviewers.addReviewers(PostReviewers.java:258) > com.google.gerrit.server.change.PostReviewers.putGroup(PostReviewers.java:214) > com.google.gerrit.server.change.PostReviewers.apply(PostReviewers.java:141) > com.google.gerrit.server.change.PostReviewers.apply(PostReviewers.java:68) > com.google.gerrit.httpd.restapi.RestApiServlet.service(RestApiServlet.java:340)

Call #16 - Initialise the Change e-mail object

 ChangeAccess.get(256052)
  > com.google.gerrit.server.patch.PatchSetInfoFactory.get(PatchSetInfoFactory.java:75) > com.google.gerrit.server.mail.ChangeEmail.init(ChangeEmail.java:151) > com.google.gerrit.server.mail.NewChangeSender.init(NewChangeSender.java:48) > com.google.gerrit.server.mail.AddReviewerSender.init(AddReviewerSender.java:37) > com.google.gerrit.server.mail.OutgoingEmail.send(OutgoingEmail.java:93) > com.google.gerrit.server.change.PostReviewers.emailReviewers(PostReviewers.java:288) > com.google.gerrit.server.change.PostReviewers.addReviewers(PostReviewers.java:258) > com.google.gerrit.server.change.PostReviewers.putGroup(PostReviewers.java:214) > com.google.gerrit.server.change.PostReviewers.apply(PostReviewers.java:141) > com.google.gerrit.server.change.PostReviewers.apply(PostReviewers.java:68) > com.google.gerrit.httpd.restapi.RestApiServlet.service(RestApiServlet.java:340)

Call #17 - Init change hook runner

 PatchSetAccess.byChange(256052)
  > com.google.gerrit.server.query.change.ChangeData.patchSets(ChangeData.java:688) > com.google.gerrit.server.query.change.ChangeData.currentPatchSet(ChangeData.java:598) > com.google.gerrit.server.query.change.ChangeData.loadCommitData(ChangeData.java:665) > com.google.gerrit.server.query.change.ChangeData.commitMessage(ChangeData.java:628) > com.google.gerrit.server.events.EventFactory.asChangeAttribute(EventFactory.java:133) > com.google.gerrit.common.ChangeHookRunner.doReviewerAddedHook(ChangeHookRunner.java:629) > com.google.gerrit.server.change.PostReviewers.addReviewers(PostReviewers.java:263) > com.google.gerrit.server.change.PostReviewers.putGroup(PostReviewers.java:214) > com.google.gerrit.server.change.PostReviewers.apply(PostReviewers.java:141) > com.google.gerrit.server.change.PostReviewers.apply(PostReviewers.java:68) > com.google.gerrit.httpd.restapi.RestApiServlet.service(RestApiServlet.java:340)

Call #18 - Compile change hook object

 ChangeAccess.get(256052)
  > com.google.gerrit.server.patch.PatchSetInfoFactory.get(PatchSetInfoFactory.java:75) > com.google.gerrit.server.events.EventFactory.asPatchSetAttribute(EventFactory.java:449) > com.google.gerrit.common.ChangeHookRunner.asPatchSetAttribute(ChangeHookRunner.java:342) > com.google.gerrit.common.ChangeHookRunner.doReviewerAddedHook(ChangeHookRunner.java:630) > com.google.gerrit.server.change.PostReviewers.addReviewers(PostReviewers.java:263) > com.google.gerrit.server.change.PostReviewers.putGroup(PostReviewers.java:214) > com.google.gerrit.server.change.PostReviewers.apply(PostReviewers.java:141) > com.google.gerrit.server.change.PostReviewers.apply(PostReviewers.java:68) > com.google.gerrit.httpd.restapi.RestApiServlet.service(RestApiServlet.java:340)

Call #19 - Compile change hook object

 ChangeAccess.get(256052)
  > com.google.gerrit.server.events.EventFactory.asPatchSetAttribute(EventFactory.java:459) > com.google.gerrit.common.ChangeHookRunner.asPatchSetAttribute(ChangeHookRunner.java:342) > com.google.gerrit.common.ChangeHookRunner.doReviewerAddedHook(ChangeHookRunner.java:630) > com.google.gerrit.server.change.PostReviewers.addReviewers(PostReviewers.java:263) > com.google.gerrit.server.change.PostReviewers.putGroup(PostReviewers.java:214) > com.google.gerrit.server.change.PostReviewers.apply(PostReviewers.java:141) > com.google.gerrit.server.change.PostReviewers.apply(PostReviewers.java:68) > com.google.gerrit.httpd.restapi.RestApiServlet.service(RestApiServlet.java:340)

Call #20 - Compile change hook object

 PatchSetAccess.byChange(256052)
  > com.google.gerrit.server.query.change.ChangeData.patchSets(ChangeData.java:688) > com.google.gerrit.server.query.change.ChangeData.currentPatchSet(ChangeData.java:598) > com.google.gerrit.server.query.change.ChangeData.loadCommitData(ChangeData.java:665) > com.google.gerrit.server.query.change.ChangeData.commitMessage(ChangeData.java:628) > com.google.gerrit.server.events.EventFactory.asChangeAttribute(EventFactory.java:133) > com.google.gerrit.common.ChangeHookRunner.doReviewerAddedHook(ChangeHookRunner.java:629) > com.google.gerrit.server.change.PostReviewers.addReviewers(PostReviewers.java:263) > com.google.gerrit.server.change.PostReviewers.putGroup(PostReviewers.java:214) > com.google.gerrit.server.change.PostReviewers.apply(PostReviewers.java:141) > com.google.gerrit.server.change.PostReviewers.apply(PostReviewers.java:68) > com.google.gerrit.httpd.restapi.RestApiServlet.service(RestApiServlet.java:340)

Call #21 - Compile change hook object

 ChangeAccess.get(256052)
  > com.google.gerrit.server.patch.PatchSetInfoFactory.get(PatchSetInfoFactory.java:75) > com.google.gerrit.server.events.EventFactory.asPatchSetAttribute(EventFactory.java:449) > com.google.gerrit.common.ChangeHookRunner.asPatchSetAttribute(ChangeHookRunner.java:342) > com.google.gerrit.common.ChangeHookRunner.doReviewerAddedHook(ChangeHookRunner.java:630) > com.google.gerrit.server.change.PostReviewers.addReviewers(PostReviewers.java:263) > com.google.gerrit.server.change.PostReviewers.putGroup(PostReviewers.java:214) > com.google.gerrit.server.change.PostReviewers.apply(PostReviewers.java:141) > com.google.gerrit.server.change.PostReviewers.apply(PostReviewers.java:68) > com.google.gerrit.httpd.restapi.RestApiServlet.service(RestApiServlet.java:340)

... the remaining 22 calls are all about finishing up to compile a very nice Change object for a "potential hook" that will never be fired, as there are no hook listeners to this project !

I believe  there is a lot of space for improvement :-)

Luca.

lucamilanesio

unread,
Dec 23, 2015, 5:23:29 AM12/23/15
to Repo and Gerrit Discussion, s...@google.com, dbor...@google.com, david.o...@gmail.com
Posted [1] to start a "short-lived" cache exercise on changes, that would start ease the pain of peaks during REST API operations.
Reviews and feedback are more than welcome :-)


Luca.

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss+unsubscribe@googlegroups.com.

David Ostrovsky

unread,
Dec 23, 2015, 5:35:05 AM12/23/15
to Repo and Gerrit Discussion

Thanks, Luca. You picked really interesting REST handler case,
adding reviewers as a group. The code, wasn't optimized for
accepting a group, and what happens there is N-queries anti
pattern for one single database record.

Am Dienstag, 22. Dezember 2015 11:59:17 UTC+1 schrieb lucamilanesio:
The total call took around 0.6 seconds to execute (still quite fast :-O) but regarding peak, it generated a 86 TPS because of a single REST call!

If we are unlucky, and we get ten users doing the same thing at the same time (with around 7K users on the platform, it may happen), then a peak of DB queries is quite likely to happen.


This is  even worse: the amount of SQL queries isn't constant with some upper limit N,
but depends on the member number in the group: 1k members: 1k queries for single
REST call.
 
In *this particular case*, do we need to reload the same change (and the list of patches) over 40 times in a single call?
Even a very simple thread-local temporary cache on changes would save *hundreds* of TPS and ease the load on the DB by a couple of order of magnitudes!


Thread local wouldn't actually help here (not to mention this is always ugly).
Let's check the code:

    update.commit();

=> Here the mutation of the database record was completed.
and the caching of the corresponding ChangeData instance could
be implemented.

    CheckedFuture<?, IOException> indexFuture =
        indexer.indexAsync(rsrc.getId());

=> However, indexing (and retrieval of ChangeData instance) done asynchronously,
but ChangeData instance (and its loading machinery) is not thread safe:

    result.reviewers = Lists.newArrayListWithCapacity(added.size());

    for (PatchSetApproval psa : added) {
      // New reviewers have value 0, don't bother normalizing.
      result.reviewers.add(json.format(
          new ReviewerInfo(psa.getAccountId()),
          reviewers.get(psa.getAccountId()),
          ImmutableList.of(psa)));
    }


=> To render JSON result another ChangeData instance is created:
    accountLoaderFactory.create(true).fill(result.reviewers);

    indexFuture.checkedGet();

=> Email machinery creates another new fresh ChangeData instance:
    emailReviewers(rsrc.getChange(), added);
    if (!added.isEmpty()) {
      PatchSet patchSet = dbProvider.get().patchSets().get(rsrc.getChange().currentPatchSetId());

=> This loop is the most annoying part here and causes N times loading of new ChangeData
instances for exactly the same database record:

      for (PatchSetApproval psa : added) {
        Account account = accountCache.get(psa.getAccountId()).getAccount();
        hooks.doReviewerAddedHook(rsrc.getChange(), account, patchSet, dbProvider.get());
      }
    }

Very naive approach to optimize for adding a group as reviewer use case would be to expose
new method in ChangeHooks interface doReviewersAddedHook:

    void doReviewersAddedHook(Change change, Collection<Account> account,
      PatchSet patchSet, ReviewDb db) throws OrmException;

Even more annoying in the hooks/event triggering code above is the fact, that all those N
different ChangeData retrievals were in vain, at least on my test instance, on most recent
master with a couple plugins installed. There are no hooks deployed at all, and only reviewers
plugin and core MetricListener contribute Change listeners, but the reviewers plugin only
cares about PatchSetCreatedEvent:

@Override
  public void onEvent(Event event) {
    if (!(event instanceof PatchSetCreatedEvent)) {
      return;
    }
    [...]

And MetricListener only counts and increments the type of the event, without actually consuming
the event itself. So i wonder if we could use some smart guava Suppliers approach here, that only
retrieves the data, when it is actually used and not only to keep the database, Git and GC busy.


Luca Milanesio

unread,
Dec 23, 2015, 5:47:12 AM12/23/15
to David Ostrovsky, Repo and Gerrit Discussion
Hey David, thanks for your analysis ... I did not actually realised that was made for a group of people :-O
That explains even more the peaks I see, the are as high as the cardinality of the group.

On 23 Dec 2015, at 10:35, David Ostrovsky <david.o...@gmail.com> wrote:


Thanks, Luca. You picked really interesting REST handler case,
adding reviewers as a group. The code, wasn't optimized for
accepting a group, and what happens there is N-queries anti
pattern for one single database record.

Ouch !


Am Dienstag, 22. Dezember 2015 11:59:17 UTC+1 schrieb lucamilanesio:
The total call took around 0.6 seconds to execute (still quite fast :-O) but regarding peak, it generated a 86 TPS because of a single REST call!

If we are unlucky, and we get ten users doing the same thing at the same time (with around 7K users on the platform, it may happen), then a peak of DB queries is quite likely to happen.


This is  even worse: the amount of SQL queries isn't constant with some upper limit N,
but depends on the member number in the group: 1k members: 1k queries for single
REST call.

See how many SQL queries peaks in the past 24h


Many of them are around 500 calls/min but some of them reach 4.5K calls/min (which is 75 SQL queries per second, similar to the situation I've shown in my case).
From the call pattern I suspected there was a loop somewhere :-O


Very naive approach to optimize for adding a group as reviewer use case would be to expose
new method in ChangeHooks interface doReviewersAddedHook:

    void doReviewersAddedHook(Change change, Collection<Account> account,
      PatchSet patchSet, ReviewDb db) throws OrmException;

Even more annoying in the hooks/event triggering code above is the fact, that all those N
different ChangeData retrievals were in vain, at least on my test instance, on most recent
master with a couple plugins installed. There are no hooks deployed at all, and only reviewers
plugin and core MetricListener contribute Change listeners, but the reviewers plugin only
cares about PatchSetCreatedEvent:

@Override
  public void onEvent(Event event) {
    if (!(event instanceof PatchSetCreatedEvent)) {
      return;
    }
    [...]

And MetricListener only counts and increments the type of the event, without actually consuming
the event itself. So i wonder if we could use some smart guava Suppliers approach here, that only
retrieves the data, when it is actually used and not only to keep the database, Git and GC busy.



Reply all
Reply to author
Forward
0 new messages