Re: [Blacklight-development] Blacklight and multi-cores in Solr

239 views
Skip to first unread message

James Stuart

unread,
Sep 18, 2012, 2:20:09 PM9/18/12
to blacklight-...@googlegroups.com
Hi there! So, this is not related to SolrMARC at all. Any given
blacklight search can only access one index, but by cleverly setting
configuration variables, you can have a single web application search
a variety of cores.

http://cliobeta.columbia.edu works with two separate solr indices. The
academic commons 'source' is searching one solr, the catalog 'source'
is searching another. You can search them separately, or the quick
search does both.

We do this (and quite possibly, there are better ways) have a method
that runs before any search that does this:

if source == "Academic Commons"
Blacklight.solr = RSolr::Ext.connect(:url => APP_CONFIG[:ac2_solr_url])
else
Blacklight.solr = RSolr::Ext.connect(:url =>
APP_CONFIG[:catalog_solr_url])
end

As far as actually combining results from multiple solr indices in one
result set, solr, as far as I know, doesn't have any capacity to
search across multiple cores.

On Tue, Sep 18, 2012 at 2:09 PM, Sheila <sbla...@utk.edu> wrote:
> Hi, I am working on a project that is using Blacklight with Solr. We have
> not been able to get Blacklight to work with multi-cores in Solr. We can
> only get it to work with a single core. Can Blacklight work with multi-cores
> in Solr? If not, can you tell me why it can't or what the problem is? Is it
> related to SolrMARC at all? Thanks for your help!
>
> --
> You received this message because you are subscribed to the Google Groups
> "Blacklight Development" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/blacklight-development/-/04HqS0_u6HkJ.
> To post to this group, send email to
> blacklight-...@googlegroups.com.
> To unsubscribe from this group, send email to
> blacklight-develo...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/blacklight-development?hl=en.

Jonatan Fournier

unread,
Sep 18, 2012, 2:17:32 PM9/18/12
to blacklight-...@googlegroups.com
Hi Sheila,

What I've been doing right now (not the most elegant I would say) is:

I have many cores (some with same schema, some with different schema)

I made sure my schemas at least have one field in common (title),
which is used for the short list display after search set in the
catalog_controller.

Then I list all my fields from all schema combined into the
catalog_controller for the show/document view.

To get my search across cores working I enabled Solr Shards that
includes all my core, then I created a "distrib" request handler. I
point Blacklight at this distrib request handler, so it will query all
my cores, list them with the common title field!

Thanks works for me :)

Cheers

/jonatan

James Stuart

unread,
Sep 18, 2012, 2:24:20 PM9/18/12
to blacklight-...@googlegroups.com
Oh, interesting! See, I learn something every day. :)

Jonathan Rochkind

unread,
Sep 18, 2012, 2:51:44 PM9/18/12
to blacklight-...@googlegroups.com, Sheila
Sure, I'm using Blacklight (and SolrMarc) with multi-core. I haven't had
any problems at all.

What is the nature of your problems? What happens that's different than
what you expect/want to happen?

I think I do sort of recall that I had to switch SolrMarc to 'http
communications' mode, SolrMarc didn't do well with multi-core in it's
default 'direct file writing' mode. I think there are reasons to prefer
'http communications' mode in general. And sorry, I can never remember
how to configure SolrMarc.

Jonathan Rochkind

unread,
Sep 18, 2012, 2:52:38 PM9/18/12
to blacklight-...@googlegroups.com
Jonathan Fournier -- what do you think you get out of this that's an
advantage, vs just using one single Solr index? Why are you doing it
this way, what is the benefit?

Jonatan Fournier

unread,
Sep 18, 2012, 2:59:44 PM9/18/12
to blacklight-...@googlegroups.com
On Tue, Sep 18, 2012 at 2:52 PM, Jonathan Rochkind <roch...@jhu.edu> wrote:
> Jonathan Fournier -- what do you think you get out of this that's an
> advantage, vs just using one single Solr index? Why are you doing it this
> way, what is the benefit?

My indices combined are in Terabytes, indexing time became really painful.

And the cost of distributing the search across many cores (even on the
same machine) was better than a query within one big TB blob.

Robert Haschart

unread,
Sep 18, 2012, 3:46:46 PM9/18/12
to blacklight-...@googlegroups.com
I also wasn't aware of the capability to combine cores dynamically. We
have been using multiple cores for storing each of the different classes
of data in our system, (one for the Marc records from our ILS, one for
the records from Hathi Trust, one for our Digitized Books, etc.) and
part of the nightly update process takes the multiple cores and combines
them into a single index that is then used by our Blacklight instance.

But we have been running both Blacklight and SolrMarc against a
MultiCore Solr installation for years. I'm convinced that even if you
only want to use a single index, you should still use a the MultiCore
Solr, and simply configure it to use 'n' cores (where n = 1) .

-Bob Haschart

Eric Larson

unread,
Sep 18, 2012, 3:50:39 PM9/18/12
to blacklight-...@googlegroups.com

Jonatan Fournier

unread,
Sep 18, 2012, 3:58:14 PM9/18/12
to blacklight-...@googlegroups.com
On Tue, Sep 18, 2012 at 3:50 PM, Eric Larson <ela...@library.wisc.edu> wrote:
> http://carsabi.com/car-news/2012/03/23/optimizing-solr-7x-your-search-speed/
>
> See: Shard that Sh*t.

I'm glad this proves what I've been doing and too lazy to fully test
and present results ;)

I simply used sharding over multicore because I was too lazy to setup
multiple Tomcat instances, multicore gave me this nice unique url per
core for free, so I sharded that sh*t :)

I also had the assumption that Solr wasn't using all my physical CPU
core, now it does!

For those interested in doing so, don't forget that you need to
implement your own distributed indexing (as simple has id.hashcode() %
n) to decide where to index/update.

Cheers,

/jonatan

Jonathan Rochkind

unread,
Sep 18, 2012, 4:26:54 PM9/18/12
to blacklight-...@googlegroups.com
Yeah, I'm not talking about multi-core vs multi-tomcat, I'm still
wondering what the benefit of splitting your stuff over multiple "solr
indexes" (either way) and then figuring out how to cross-search them is,
comparedto just putting everything into one big Solr index.

Basically performance? Yeah, Jonatan Fournier says he was having
indexing and searching performance problems before he split things up,
cool. With an index size of terabytes, which is def a lot bigger than
mine.

There is definitely an added complexity to splitting over multiple solr
indexes but then cross-seraching em (whether with actual shards or
manual) in various ways; I'd want to check to make sure I was actually
having performance problems that would be solved by that solution before
pursuing it.

Jonatan Fournier

unread,
Sep 18, 2012, 4:31:51 PM9/18/12
to blacklight-...@googlegroups.com
Some visual of what I did:

http://blog.shutupandcode.net/?p=1136

Jonathan Rochkind

unread,
Sep 18, 2012, 5:04:31 PM9/18/12
to blacklight-...@googlegroups.com, Jonatan Fournier
Thanks, that blog post is super helpful.

But okay, so honestly, I'm still wondering.

I understand thecomparison between your "Multiple Index Scenario (One
Server)" and "Distributed Multicore Scenario (Multi-Server)".

What's not clear from that blog post is why you'd have a _multi-core_
scenario in the first place --- put "Users", "Movies", "Music",
"Images", and "Blogs" all in different cores.

If you never need to search accross them, them of course put them in
different cores. But in that blog post, he DOES need to search accross
them, and puts a /distrib Request handler in front of them all to make
it possible. But what do you gain here over having all your data
(Movies, Music, Books, etc) in one Solr index? (Which you distribute
accross shards if you need to performance, or have on one server if you
don't).

That's what's still not clear to me. Is the answer still performance
gain?

It's a lot simpler to have them all in the same solr index. Different
elements in a solr index can have different fields. They just need to
have common fields when you want to search accross all of them -- which
you need in the multi-core-with-distrib-handler architecture anyway. So
what's the point of doing multi-core-with-distrib handler instead of
just one single solr index? (I understand the point of multi-server
sharding or not).

Tom Cramer

unread,
Sep 18, 2012, 8:58:21 PM9/18/12
to blacklight-...@googlegroups.com, Tom Cramer
Jonatan,

Can you share any more details on your Blacklight installation? A multi-TB index would make (I think) an interesting referent on the projectblaclklight.org site, especially if the site is publicly accessible. 

- Tom


Tom Cramer

unread,
Sep 18, 2012, 9:03:44 PM9/18/12
to blacklight-...@googlegroups.com, Tom Cramer
Jonatan,

Can you share any more details on your Blacklight installation? A multi-TB index would make (I think) an interesting referent on the projectblaclklight.org site, especially if the site is publicly accessible. 

- Tom


James Stuart

unread,
Oct 2, 2012, 6:09:46 PM10/2/12
to blacklight-...@googlegroups.com, Tom Cramer
Does anybody know if there's a way to pass a hash-function to
solrmarc, to get it to split across multiple shards in a consistent
way?

--James
Reply all
Reply to author
Forward
0 new messages