New search engine (secondary index) live on gerrit.googlesource.com

487 views
Skip to first unread message

Dave Borowitz

unread,
Sep 6, 2013, 3:57:26 PM9/6/13
to repo-discuss
We just turned on secondary index searches for gerrit.googlesource.com; it's live in the western US and should be rolling out globally in the next few minutes.

This means you can take all the new search operators for a spin, like file, message, etc. Try it out and let us know what you think, or if you run into anything that looks like an inconsistency.

(We know latency is still an issue due to the slowness of our primary database, but we hope to improve this in the not-too-distant future by storing enough data in the secondary index to render the dashboard without hitting the database.)

Dave Borowitz

unread,
Sep 6, 2013, 3:59:12 PM9/6/13
to repo-discuss
I may have spoken too soon, looks like we have an issue rendering user dashboards so I will roll back temporarily.

Dave Borowitz

unread,
Sep 6, 2013, 5:48:08 PM9/6/13
to repo-discuss

David Pursehouse

unread,
Sep 9, 2013, 11:34:01 PM9/9/13
to repo-d...@googlegroups.com
On 09/07/2013 04:57 AM, Dave Borowitz wrote:
> We just turned on secondary index searches for gerrit.googlesource.com
> <http://gerrit.googlesource.com>; it's live in the western US and should
> be rolling out globally in the next few minutes.
>

When do you plan to switch it on for android-review.googlesource.com?

Dave Borowitz

unread,
Sep 9, 2013, 11:34:55 PM9/9/13
to David Pursehouse, repo-discuss

Hopefully this week, maybe next.

--
--
To unsubscribe, email repo-discuss+unsubscribe@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

--- You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

David Pursehouse

unread,
Sep 10, 2013, 1:31:22 AM9/10/13
to Dave Borowitz, repo-discuss
On 09/10/2013 12:34 PM, Dave Borowitz wrote:
> Hopefully this week, maybe next.
>
Great. I know several people who will find that extremely useful...

David Ostrovsky

unread,
Sep 10, 2013, 2:17:38 AM9/10/13
to repo-d...@googlegroups.com, repo-discuss


Am Freitag, 6. September 2013 21:57:26 UTC+2 schrieb Dave Borowitz:
We just turned on secondary index searches for gerrit.googlesource.com; it's live in the western US and should be rolling out globally in the next few minutes.

This means you can take all the new search operators for a spin, like file, message, etc. Try it out and let us know what you think, or if you run into anything that looks like an inconsistency.

It seems like file:^regex is still broken. Gerrit-review seems to disable it entirely:

"regular expression queries not supported"

And running it on a local system with latest master and enabled secondary index support produces the following error:

"line 1:17 no viable alternative at character '%'"

I think it worked already for me.

Shawn Pearce

unread,
Sep 10, 2013, 5:32:37 PM9/10/13
to David Ostrovsky, repo-discuss
On Mon, Sep 9, 2013 at 11:17 PM, David Ostrovsky
<david.o...@gmail.com> wrote:
> Am Freitag, 6. September 2013 21:57:26 UTC+2 schrieb Dave Borowitz:
>>
>> We just turned on secondary index searches for gerrit.googlesource.com;
>> it's live in the western US and should be rolling out globally in the next
>> few minutes.
>>
>> This means you can take all the new search operators for a spin, like
>> file, message, etc. Try it out and let us know what you think, or if you run
>> into anything that looks like an inconsistency.
>
>
> It seems like file:^regex is still broken. Gerrit-review seems to disable it
> entirely:
>
> "regular expression queries not supported"

This is correct; our secondary indexing implementation is not able to
efficiently yield candidate terms for a regex so regex searches are
disabled in our installation.

Once its live on android-review Dave is going to look at some
alternatives that might get closer to supporting regex.

> And running it on a local system with latest master and enabled secondary
> index support produces the following error:
>
> "line 1:17 no viable alternative at character '%'"
>
> I think it worked already for me.

This sounds like a bug in master that should be tracked down and fixed.

Edwin Kempin

unread,
Sep 11, 2013, 8:37:09 AM9/11/13
to Shawn Pearce, David Ostrovsky, repo-discuss



2013/9/10 Shawn Pearce <s...@google.com>
Have you tried it with a regexp that doesn't conatin '%'? At least this works for me.
 

This sounds like a bug in master that should be tracked down and fixed.

--
--
To unsubscribe, email repo-discuss...@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

David Ostrovsky

unread,
Sep 11, 2013, 10:16:35 AM9/11/13
to repo-d...@googlegroups.com, repo-discuss

On Wednesday, September 11, 2013 2:37:09 PM UTC+2, Edwin Kempin wrote:

> And running it on a local system with latest master and enabled secondary
> index support produces the following error:
>
> "line 1:17 no viable alternative at character '%'"
>
> I think it worked already for me.
Have you tried it with a regexp that doesn't conatin '%'? At least this works for me.

I haven't used '%' at all, the patterns like

file:^sw/.*
or
file:^.*/BUCK

are failing for me with the error above.

Dave Borowitz

unread,
Sep 11, 2013, 10:18:06 AM9/11/13
to David Ostrovsky, repo-discuss
Sounds like an issue with your browser or reverse proxy encoding those special chars. What is the request URL that search corresponds to?


--

David Ostrovsky

unread,
Sep 11, 2013, 4:11:39 PM9/11/13
to repo-d...@googlegroups.com, repo-discuss


Am Mittwoch, 11. September 2013 16:18:06 UTC+2 schrieb Dave Borowitz:
Sounds like an issue with your browser or reverse proxy encoding those special chars. What is the request URL that search corresponds to?


So it doesn't hurt to use the right browser: i activated buck_daemon_ui_firefox debug configuration but used Chrome.

So the query was called twice:

1: [file:^.*BUCK]
2: [file:%5E.*BUCK]

With the right browser the "file:^regex" pattern works as expected.
I noticed, that the `reindex` throws a lot of errors on me, but still generates the index data [1].


Dave Borowitz

unread,
Sep 11, 2013, 4:15:04 PM9/11/13
to David Ostrovsky, repo-discuss
Thanks for the report. This one AFAICT is harmless:
  org.h2.jdbc.JdbcSQLException: Error opening database: "Sleep interrupted" [8000-173]
I think it's what happens if you open and then close a database connection too quickly.

The ones about loading plugins we should address. I'm assuming this is built from master as of this morning, when I submitted a change to load plugins in pgm.Reindex. Apparently that doesn't work for everyone. Blame guice.

David Ostrovsky

unread,
Sep 11, 2013, 4:40:57 PM9/11/13
to repo-d...@googlegroups.com, Dave Borowitz, repo-discuss


Am Mittwoch, 11. September 2013 22:15:04 UTC+2 schrieb Dave Borowitz:
On Wed, Sep 11, 2013 at 1:11 PM, David Ostrovsky <david.o...@gmail.com> wrote:


Am Mittwoch, 11. September 2013 16:18:06 UTC+2 schrieb Dave Borowitz:
Sounds like an issue with your browser or reverse proxy encoding those special chars. What is the request URL that search corresponds to?


So it doesn't hurt to use the right browser: i activated buck_daemon_ui_firefox debug configuration but used Chrome.

So the query was called twice:

1: [file:^.*BUCK]
2: [file:%5E.*BUCK]

With the right browser the "file:^regex" pattern works as expected.
I noticed, that the `reindex` throws a lot of errors on me, but still generates the index data [1].


Thanks for the report. This one AFAICT is harmless:
  org.h2.jdbc.JdbcSQLException: Error opening database: "Sleep interrupted" [8000-173]
I think it's what happens if you open and then close a database connection too quickly.

The ones about loading plugins we should address. I'm assuming this is built from master as of this morning, when I submitted a change to load plugins in pgm.Reindex. Apparently that doesn't work for everyone. Blame guice.
 

Yes, reverting 05b254feb29c83f73f23410edbe104bcf37e164b helps here [1].


David Ostrovsky

unread,
Sep 12, 2013, 1:52:06 AM9/12/13
to repo-d...@googlegroups.com, Dave Borowitz, repo-discuss


Am Mittwoch, 11. September 2013 22:15:04 UTC+2 schrieb Dave Borowitz:
On Wed, Sep 11, 2013 at 1:11 PM, David Ostrovsky <david.o...@gmail.com> wrote:


Am Mittwoch, 11. September 2013 16:18:06 UTC+2 schrieb Dave Borowitz:
Sounds like an issue with your browser or reverse proxy encoding those special chars. What is the request URL that search corresponds to?


So it doesn't hurt to use the right browser: i activated buck_daemon_ui_firefox debug configuration but used Chrome.

So the query was called twice:

1: [file:^.*BUCK]
2: [file:%5E.*BUCK]

With the right browser the "file:^regex" pattern works as expected.
I noticed, that the `reindex` throws a lot of errors on me, but still generates the index data [1].


Thanks for the report. This one AFAICT is harmless:
  org.h2.jdbc.JdbcSQLException: Error opening database: "Sleep interrupted" [8000-173]
I think it's what happens if you open and then close a database connection too quickly.

The ones about loading plugins we should address.
 

David Ostrovsky

unread,
Sep 13, 2013, 4:41:36 PM9/13/13
to repo-d...@googlegroups.com, repo-discuss


Am Freitag, 6. September 2013 21:57:26 UTC+2 schrieb Dave Borowitz:
We just turned on secondary index searches for gerrit.googlesource.com; it's live in the western US and should be rolling out globally in the next few minutes.

This means you can take all the new search operators for a spin, like file, message, etc. Try it out and let us know what you think, or if you run into anything that looks like an inconsistency.


That looks like an inconsistency:

Lucene seems to behave differently as the index behind Gerrit-review.
Its default analyzer uses English stop words set and deliberately ignores all these words:

"a", "an", "and", "are", "as", "at", "be", "but", "by",
[...]
"they", "this", "to", "was", "will", "with"

As mentioned in the comment to this change [1] the "message" predicate itself and all other
predicates that depend on it, i. e. "patch" are currently affected.

Dave Borowitz

unread,
Sep 13, 2013, 4:45:38 PM9/13/13
to David Ostrovsky, repo-discuss
I am fine tweaking the Lucene analyzer to produce better results, assuming we can agree on such a definition, but it's a non-goal to make Lucene match our Google-internal proprietary search engine exactly on full-text searches (obviously exact matches are a different story). I think this is probably also true of any other secondary index implementation that may arise.

Similarly if you have any queries that you think produce poor results on googlesource.com I'll be happy to discuss them with the search engine team.


--

David Ostrovsky

unread,
Sep 30, 2013, 5:15:28 PM9/30/13
to repo-d...@googlegroups.com, repo-discuss
It seems that reindexing of published draft changes (with only one ps) is broken.
(The code is there, though: Publish.java:78). After publishing a draft change on CS2
it still appears as Draft on change list n Gerrit-Review [1][2]. And if i logged out, then
i can not see it at all.

Dave Borowitz

unread,
Sep 30, 2013, 5:31:27 PM9/30/13
to David Ostrovsky, repo-discuss
There's an issue with our indexing system unrelated to draft changes.. Fixing it now, thanks for the report.

--
Reply all
Reply to author
Forward
0 new messages