Searchfox Update: Searchfox searches are now faster!

47 views
Skip to first unread message

Andrew Sutherland

unread,
Jul 19, 2022, 2:37:33 AM7/19/22
to dev-pl...@mozilla.org
Have you found saying this to yourself very recently?: "Wow, the
statistical distribution of the latencies of my searchfox searches feels
like it has shifted noticeably towards zero, particularly at the higher
percentiles!  But how is it possible for searchfox to be any faster? 
Doesn't that pose some kind of risk to the space-time continuum?  Sure,
I want fast searches, but at what cost?  At what cost?!"

Good news!  Your uncanny intuition for statistical distributions has
been proved right!  Your searchfox mozilla-central searches should now
be noticeably faster!

Here's the OLD search request distribution for mozilla-central for last
Thursday's UTC10-ish run which was up from ~18 UTC to ~6 UTC:

cache_status        _count p50         p66         p75        
p90         p95         p99
----------------------------------------------------------------------------------------------------------
MISS                2025          0.18        0.38 0.73       
2.76        4.23        9.12
HIT                 114           0           0 0           0          
0           0

And here's the NEW search request distribution for this Monday's
UTC22-ish run which was up from ~6 UTC to ~18 UTC:

cache_status        _count p50         p66         p75        
p90         p95         p99
----------------------------------------------------------------------------------------------------------
MISS                3396          0.06        0.09 0.17       
0.79        1.92        2.74
HIT                 219           0           0 0           0          
0           0

Potentially interesting context:

- The bulk of the search time is always the fulltext (regexp-enabled)
livegrep codesearch (because this is a hard problem and livegrep is
magic but still constrained by the rules of our universe until
searchfox's new fast searches open a gateway into others).  If you just
want an (exact!) identifier search, you can just prefix your query with
`id:` and bypass the fulltext search.  Surprisingly, this also makes the
path filter mechanism work against semantic results, whereas if you use
a path filter for a default search, it disables identifier lookup. 
There's more detail on how the existing "search" endpoint's parsing
works at https://bugzilla.mozilla.org/show_bug.cgi?id=1762817#c0.  The
forthcoming "query" mechanism will allow some more flexibility in this
space.

- Searchfox fulltext queries have always been made to the livegrep
codesearch server with a 10 second deadline, which is why the 99th
percentile for the old day can't particularly get above 10 seconds; work
just stops.  That said, anecdotally, it appears that it was rare for
this to happen for queries since it was usually the case that the 1000
result limit would intervene prior to this.  Additionally, it was common
for these time-limited queries to also involve I/O delay and for them to
be incremental search queries that were made as typing occurred which
would allow subsequent queries to no longer be I/O limited thanks to
caching. For mozilla-central we no longer expect to see delays for this
because we now eagerly cache the entire fulltext database into memory
and have increased effective parallelism by a factor of 4-8.

- Searchfox caches all dynamic queries for the (remaining 12 hour life
of) the server, both search and otherwise, so that if you experience
slowness on a query and you share the link, whoever you share it with
should not experience that same slowness!

- More details on the optimizations and investigations can be found at
https://bugzilla.mozilla.org/show_bug.cgi?id=1779672 for the first pass
at optimization and https://bugzilla.mozilla.org/show_bug.cgi?id=1779899
for the 2nd pass.

Andrew

Reply all
Reply to author
Forward
0 new messages