Solr - Speed improvements?

186 views
Skip to first unread message

Eric Larson

unread,
Jul 18, 2012, 5:40:10 PM7/18/12
to blacklight-...@googlegroups.com
Hi all,

Here at UW-Madison we were recently crawled so heavily by Baidu and Yandex (simultaneously) that our catalog was brought down -- in fact the user running Solr just flat ran out of processes on the box.

We learned a few interesting things during that incident:
1) Cutting off non-US traffic saved the day (Apache config)
2) You can use iptables to throttle traffic (since implemented)
3) During the incident our ruby processes were timing out waiting for solr to return results

That last point has stuck with me.  We've since instrumented our app with New Relic (highly recommended) and are seeing quite a few slow solr transaction traces -- especially for queries with lots of tokens, and double especially during garbage collection.

I'm wondering if anyone in the community has spent time fine-tuning their Solr setup?  Anyone load balancing across multiple slave indices?  Anyone sharding their index into multiple cores, leveraging more CPU?  Or playing with CommonGrams as a solution for queries with many tokens?  Anyone stress testing their Solr index?

Basically, is anyone proud of their Solr solution and willing to chat about it?  What have you learned and what works well at your institutions?

Cheers,
- Eric

PS: For the record, yes we're _still_ using Solr 1.4, and we've got ~8.6M docs (27GB) in our index.

--
Eric Larson
Digital Library Consultant
UW Digital Collections Center


Naomi Dushay

unread,
Jul 18, 2012, 6:15:16 PM7/18/12
to blacklight-...@googlegroups.com
Eric,

We've experienced some outages somewhat recently, our best guess was it was due to crawlers.    What we are doing / have done:

Solr boxes:
  we have a master which we use only for indexing
  we have two slaves which we use only for searching.

  we are at Solr 3.5+;   we have 6.8 million docs;  our index is  50G, but about 30G of that is marcxml.   I've been pretty careful about which fields are indexed and which are stored.
  in the past I put in a little effort to ensure our Solr caches are ok:    
      fieldValueCache  and  filterCache have 98% or higher hit ratio and no evictions.   I followed advice and prewarm these with the appropriate facet fields and values.
      documentCache is about 60% hit rate
      queryResultCache is turned off -- I could never get a hit ratio over 20%

   here's our current javavm args:   -Xms12g -Xmx12g -server -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 
   Java HotSpot(TM) 64-Bit Server VM (build 20.8-b03, mixed mode)

   run under tomcat5

Until very recently, we had a  single box with production rails app only, pointing to a single solr slave.   

The above setup worked great for us, with a few exceptions:
  1.  we discovered that certain queries can tank solr:   e.g. asking for the last page in a huge set of search results   (we took that capability out of our UI)
  2.  we have experienced some outages in the last few months due to crawlers.

We recently spun up loading balancing:

Our blacklight application is now load balanced, and we gave each of the 3 app servers more RAM to avoid outages at the app level.

We continued to experience a few outages, so we now have 2 of the app servers pointed at one Solr slave, and 1 app server pointed at the other Solr slave.


We are working on moving our Solr slaves to a load balanced system.  Chris Beer is currently (as in right now) doing load testing.    We have to load test our old slaves, our new slaves, and the load-balancer going to slaves.

We are using the load testing to determine how much RAM we will need on the new boxes, etc.

We are also not yet sure how to best tune our load-balancing algorithm.  

We should know more in about a week.


Thanks for the tip about New Relic -- that could be really helpful.  We've had to resort to analyzing Solr logs in the past … and have learned that the queries with the long response times aren't always the queries that *cause* the long response times.

- Naomi


--
You received this message because you are subscribed to the Google Groups "Blacklight Development" group.
To post to this group, send email to blacklight-...@googlegroups.com.
To unsubscribe from this group, send email to blacklight-develo...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/blacklight-development?hl=en.

Justin Coyne

unread,
Jul 18, 2012, 6:18:35 PM7/18/12
to blacklight-...@googlegroups.com
I've seen improved performance under GC using Java 7 with the G1 garbage collecter.  I'd also take a look at that.

-Justin

Erik Hatcher

unread,
Jul 19, 2012, 8:27:18 AM7/19/12
to blacklight-...@googlegroups.com

On Jul 18, 2012, at 17:40 , Eric Larson wrote:
> PS: For the record, yes we're _still_ using Solr 1.4, and we've got ~8.6M docs (27GB) in our index.

The best advice I can give here is to upgrade Solr.

Solr 4.0-alpha was recently released. Without a doubt there are dramatic improvements across the board with memory usage, performance, and scalability.

And here's how things shake out memory-wise in comparison to 3x: <http://www.lucidimagination.com/blog/2012/04/06/memory-comparisons-between-solr-3x-and-trunk/>

And most definitely, load balancing is strongly recommended in your situation. Having even two round-robined slaves can be a great benefit, offloading indexing to one server, and searches to the slaves sharing the load.

Great advice from the others already was already offered regarding JVM and GC too.

Erik

Eric Larson

unread,
Jul 19, 2012, 10:44:23 AM7/19/12
to blacklight-...@googlegroups.com
Hi Naomi,

Thank you *so much* for the detailed reply.  This will help us out greatly as we use the remainder of the summer to dive into replication, load balancing and upgrading Solr.

Immediately (and I apologize) more questions come to mind:

* Migrating from Solr 1.4 to Solr 3.X
- Any important gotchas you recall from that effort?
- Have you posted your latest schema and config online? or mind sharing it?

* App hosting environment
- Running Ruby 1.9.3 right? or have you guys gone JRuby?
- Are you serving the app via Apache / Passenger? (we do for Shibboleth support)
- If you're on Passenger, any extra arguments set in your Apache config? (MaxPoolSize, MinInstances, etc.)

* Load balancing between apps
- How is that being done? (haproxy perhaps?)
- How does logging work across these balanced apps? (curious because we rescue and log a lot of bad MARC holdings data in our app)

* Load balancing between solr slaves
- How will that be done?
- Still not sharding the index, right?  Just adding more slaves?

* Is this more-or-less the general load-balanced "request flow"?
- Public-Facing Web Server > Load Balancer > Rails Application Server(s) > Load Balancer > Apache Solr(s)

- - - -

We've got a ton to learn from your production setup.  If it's not already formally documented somewhere, it would be triumphant if it was online.

Thanks again!
- Eric

Eric Larson

unread,
Jul 19, 2012, 10:45:16 AM7/19/12
to blacklight-...@googlegroups.com
Thanks for the tip!  We'll give that GC a look.
- Eric

Jonathan Rochkind

unread,
Jul 19, 2012, 10:57:22 AM7/19/12
to blacklight-...@googlegroups.com
I don't have much of value to add, just chiming in anyway. :)

You mentioned slow solr queries with lots of tokens -- that's definitely
something we've noticed too, and lots of token searches is definitely
something our users do (such as copying and pasting a title into the
search box, without using double quotes).

We're still on Solr 1.4 too. I kind of expect if we upgrade to newer
Solr it would improve things to some extent; maybe by the time I find
time to upgrade, configure, test, etc., Solr 4 will be avail. :)

We also are planning on trying out Solr on a machine with a local SSD
disk, figuring that would have pretty huge benefits. We've actually
already got the machine requisitioned, just haven't moved things yet.
Curious if anyone has any experience with that.

We already, from the start, have a seperate Solr instance for indexing
vs searching, wtih replication from one to the other. At the moment, we
actually have them both on the _same machine_, but I think it's still
beneficial to do this for a variety of reasons, including keeping the
JVM's seperate so the indexing process can't OOM the searcher and vice
versa. Once we move to the SSD, that'll be just the searcher, leaving
the indexer on the machine with ordinary moving disk.
>> helpful. We've had to resort to analyzing Solr logs in the past �
>>> ela...@library.wisc.edu <mailto:ela...@library.wisc.edu>
>>>
>>>
>>> Connect with us on�
>>> �The Web: http://uwdc.library.wisc.edu
>>> <http://uwdc.library.wisc.edu/>
>>> �Facebook: http://digital.library.wisc.edu/1711.dl/uwdc-fb
>>> �Twitter: http://twitter.com/UWdigiCollec
>>> �RSS: http://uwdc.library.wisc.edu/News/NewsFeed/UWDCNews.xml
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the
>>> Google Groups "Blacklight Development" group.
>>> To post to this group, send email to
>>> blacklight-...@googlegroups.com
>>> <mailto:blacklight-...@googlegroups.com>.
>>> To unsubscribe from this group, send email to
>>> blacklight-develo...@googlegroups.com
>>> <mailto:blacklight-develo...@googlegroups.com>.
>>> For more options, visit this group at
>>> http://groups.google.com/group/blacklight-development?hl=en.
>>
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "Blacklight Development" group.
>> To post to this group, send email to
>> blacklight-...@googlegroups.com
>> <mailto:blacklight-...@googlegroups.com>.
>> To unsubscribe from this group, send email to
>> blacklight-develo...@googlegroups.com
>> <mailto:blacklight-development%2Bunsu...@googlegroups.com>.
>> For more options, visit this group at
>> http://groups.google.com/group/blacklight-development?hl=en.
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "Blacklight Development" group.
>> To post to this group, send email to
>> blacklight-...@googlegroups.com
>> <mailto:blacklight-...@googlegroups.com>.
>> To unsubscribe from this group, send email to
>> blacklight-develo...@googlegroups.com
>> <mailto:blacklight-develo...@googlegroups.com>.
>> For more options, visit this group at
>> http://groups.google.com/group/blacklight-development?hl=en.
>
> --
> Eric Larson
> Digital Library Consultant
> UW Digital Collections Center
> ela...@library.wisc.edu <mailto:ela...@library.wisc.edu>
>
>
> Connect with us on�
> �The Web: http://uwdc.library.wisc.edu <http://uwdc.library.wisc.edu/>
> �Facebook: http://digital.library.wisc.edu/1711.dl/uwdc-fb
> �Twitter: http://twitter.com/UWdigiCollec
> �RSS: http://uwdc.library.wisc.edu/News/NewsFeed/UWDCNews.xml

Michael J. Giarlo

unread,
Jul 19, 2012, 10:59:13 AM7/19/12
to blacklight-...@googlegroups.com
Hi Eric,

Though I am sure Naomi's Solr configs are more finely tuned than ours, I'd be glad to point you at our configs (for Solr 3.5, to which we migrated about 2 months ago):

https://github.com/psu-stewardship/scholarsphere/tree/master/solr_conf

I love this thread! We're taking lots of notes over here in PA.

-Mike

Naomi Dushay

unread,
Jul 19, 2012, 2:14:06 PM7/19/12
to blacklight-...@googlegroups.com
On Jul 19, 2012, at 7:44 AM, Eric Larson wrote:

Hi Naomi,

Thank you *so much* for the detailed reply.  This will help us out greatly as we use the remainder of the summer to dive into replication, load balancing and upgrading Solr.

Immediately (and I apologize) more questions come to mind:

* Migrating from Solr 1.4 to Solr 3.X
- Any important gotchas you recall from that effort?

- Have you posted your latest schema and config online? or mind sharing it?



* App hosting environment

Note that we like separate VMs for the app vs. Solr -- they don't compete for the same resources.

- Running Ruby 1.9.3 right? or have you guys gone JRuby?
  SearchWorks is currently using   ruby-1.8.7-p358  

- Are you serving the app via Apache / Passenger? (we do for Shibboleth support)
Passenger, yes.

- If you're on Passenger, any extra arguments set in your Apache config? (MaxPoolSize, MinInstances, etc.)
    PassengerMaxPoolSize 250
    PassengerPoolIdleTime 300
    PassengerMinInstances 3
    PassengerMaxInstancesPerApp 0
    PassengerSpawnMethod smart-lv2


* Load balancing between apps
- How is that being done? (haproxy perhaps?)
- How does logging work across these balanced apps? (curious because we rescue and log a lot of bad MARC holdings data in our app)
(from our sysadmin)
#1 - we aren't using loadbalancing software, we are using BigIP F5 hardware loadbalancers.
#2 - least connections
#3 - we log across the individual app servers

* Load balancing between solr slaves
- How will that be done?
working on it - not sure yet.  We are testing different algorithms and configurations

- Still not sharding the index, right?  Just adding more slaves?
right.  


* Is this more-or-less the general load-balanced "request flow"?
- Public-Facing Web Server > Load Balancer > Rails Application Server(s) > Load Balancer > Apache Solr(s)

(url for)  Load Balancer for Rails App Servers  > specific Rails App Server > (url for) Load Balancer for Solr > specific Solr


- - - -

We've got a ton to learn from your production setup.  If it's not already formally documented somewhere, it would be triumphant if it was online.
maybe after it settles down ...

Michael J. Giarlo

unread,
Jul 19, 2012, 2:16:38 PM7/19/12
to blacklight-...@googlegroups.com
I smell an excellent topic for LibDevConX^4, incidentally.

-Mike


----- Original Message -----
> From: "Naomi Dushay" <ndu...@stanford.edu>
> To: blacklight-...@googlegroups.com
> Sent: Thursday, July 19, 2012 2:14:06 PM
> Subject: Re: [Blacklight-development] Solr - Speed improvements?
>
>
>
>

Tom Cramer

unread,
Jul 28, 2012, 11:25:27 AM7/28/12
to blacklight-...@googlegroups.com, blacklight-...@googlegroups.com
++

Personally, I'd love to see configs, stats and even New Relic demo'ed, plus comparison of strategies of lb and sharding.

- Tom

Typed with my thumbs.

James Stuart

unread,
Jul 28, 2012, 11:31:36 AM7/28/12
to blacklight-...@googlegroups.com
Yeah. We're talking about what our production setup is going to look
like, and would love to have a chat about reasons to use load
balancers (on the web side) and what not.

Michael J. Giarlo

unread,
Jul 28, 2012, 3:55:51 PM7/28/12
to blacklight-...@googlegroups.com
We will soon be deploying a load-balanced Blacklight- and Hydra-based Rails app (with Solr & Fedora behind haproxy). Glad to discuss what worked and what didn't at LDCX4, and we'd be eager to compare notes!
Reply all
Reply to author
Forward
0 new messages