Can you post a sample of how you attempted to do this?
Multiple processes would definitely be better and could truly run in parallel, you'll just use more memory. We use this approach at work quite a bit.
Here's a great post on using JRuby btw: http://robotlibrarian.billdueber.com/indexing-data-into-solr-via-jruby-with-threads/
- Matt
Regarding the threaded performance, I don't have the code handy, but it wasn't anything exotic...there's a queue of documents ready to be indexed, and each thread simply sits in a loop shifting 100 documents at a time out of the queue and calling solr.add(docarray). If I switch from "Thread.new" to "fork" for the worker block, performance increases dramatically. In most multithreaded environments I've worked in, multiple workers spending most of their time blocked waiting for IO from a remote system are a great application for threads, but perhaps that is not true in ruby. I was hoping it was some libs involved that were causing the problem.
I did read a bit about the fact that net/http is not a particularly fast HTTP client, so I tried reimplementing Rsolr::Connection with a couple of different clients that do a lot better (curb and EventMachine::HttpClient2), but the performance was exactly the same.
At this point, I've spent enough time on this diversion that I'm giving up and just forking multiple processes so I can solve some more important problems, but it's not very satisfying and seems like there's an artificial bottleneck somewhere in there.
Too bad. That's probably causing the class not found errors?
So you didn't see any difference with the other http drivers? Did you try typhoeus?
- Matt