threading issues with rsolr

nt94043

unread,

Feb 6, 2012, 9:05:52 PM2/6/12

to rsolr

Can anyone comment on whether rsolr is expected to work well in a
multithreaded environment? I have yet to debug this, but I am getting
an "uninitialized constant RSolr::Connection" raised if I try to spin
up a bunch of threads and open RSolr connections quickly.

In addition, I get much, much better indexing performance if I index
using N different processes than if I use N different threads in the
same process, which also raises an eyebrow (example: a test with 10
threads is 30x slower than 10 processes). This could be a ruby green
threading issue, as well, so further investigation on my part is
warranted.

I just wanted to solicit some opinions from the group as I dive in. I
know many gems are never used anywhere except in rails, so thread
safety / multithreaded performance is never even a consideration.

Thanks in advance for any comments.

matt mitchell

unread,

Feb 12, 2012, 3:01:35 PM2/12/12

to rs...@googlegroups.com

There's nothing in RSolr (that I know of) that would be problematic when using in a multi-threaded setup. But I have yet to experience better results when indexing w/more threads in Ruby.

Can you post a sample of how you attempted to do this?

Multiple processes would definitely be better and could truly run in parallel, you'll just use more memory. We use this approach at work quite a bit.

Here's a great post on using JRuby btw: http://robotlibrarian.billdueber.com/indexing-data-into-solr-via-jruby-with-threads/

- Matt

nt94043

unread,

Feb 12, 2012, 4:47:29 PM2/12/12

to rs...@googlegroups.com

Well, I did find a threading problem in rsolr itself: rsolr uses autoload, which is not a threadsafe feature. That completely explains the exception I was seeing.

Regarding the threaded performance, I don't have the code handy, but it wasn't anything exotic...there's a queue of documents ready to be indexed, and each thread simply sits in a loop shifting 100 documents at a time out of the queue and calling solr.add(docarray). If I switch from "Thread.new" to "fork" for the worker block, performance increases dramatically. In most multithreaded environments I've worked in, multiple workers spending most of their time blocked waiting for IO from a remote system are a great application for threads, but perhaps that is not true in ruby. I was hoping it was some libs involved that were causing the problem.

I did read a bit about the fact that net/http is not a particularly fast HTTP client, so I tried reimplementing Rsolr::Connection with a couple of different clients that do a lot better (curb and EventMachine::HttpClient2), but the performance was exactly the same.

At this point, I've spent enough time on this diversion that I'm giving up and just forking multiple processes so I can solve some more important problems, but it's not very satisfying and seems like there's an artificial bottleneck somewhere in there.

matt mitchell

unread,

Feb 12, 2012, 5:16:10 PM2/12/12

to rs...@googlegroups.com

Wow, autoload is not thread safe! http://bugs.ruby-lang.org/issues/921

Too bad. That's probably causing the class not found errors?

So you didn't see any difference with the other http drivers? Did you try typhoeus?

- Matt

Reply all

Reply to author

Forward