If you can do http Solr requests directly to your Solr server, and
they work (e.g. http://localhost:8983/solr/select?q=blah )
then the next place to look is in your solr.yml file -- are you
talking to the right Solr instance? Is Solr running when you fire
up Blacklight?
> --
> You received this message because you are subscribed to the Google
> Groups "Blacklight Development" group.
> To post to this group, send email to blacklight-...@googlegroups.com
> .
> To unsubscribe from this group, send email to blacklight-develo...@googlegroups.com
> .
> For more options, visit this group at http://groups.google.com/group/blacklight-development?hl=en
> .
>
--
You received this message because you are subscribed to the Google Groups "Blacklight Development" group.
To post to this group, send email to blacklight-...@googlegroups.com.
To unsubscribe from this group, send email to blacklight-develo...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/blacklight-development?hl=en.
If you have a working Solr you're happy with already, then it is just
a matter of configuring Blacklight as others have suggested.
Jason
On Tue, Jan 18, 2011 at 8:25 PM, kfoley <kfol...@gmail.com> wrote:
And we're continually trying to make Blacklight easier to use for
newbies, so feel free to give us suggestions of things you think could
be improved, whether documentation or the way configuration works or
anything else. [Can't promise we'll implement em of course, but
suggestions welcome.]
If you group.main=true you'll have an easier time. This means that
grouped results will pretty much be formatted like ungrouped results.
This limitation comes mainly from rsolr(-ext) rather than Blacklight.
This means that you'll only have access to the top matching document
for the group. I've gotten around this limitation by indexing some
aggregation-level data along with each individual document within that
aggregation.
Other issues include pagination. There's no way (at least in the
nightly snapshot I saw) to determine how many groups you'd have for
any search. This means that regular pagination helpers won't work,
since you don't know the total number of "documents" being paginated.
To work around this I've used the per_page + 1 trick where I request
more documents than I intend to show on the page and if there is an
extra record then I know that there is at least one more page.
In the UI I'm developing collapsed results when clicked on send the
user to a search where the group is atomized into individual documents
with the same faceted search interface available. So things like links
from a document in a search result needs to be overridden to provide
this new behavior where some documents (atomized) link to show views
while clicking on an aggregation continues to take one to a search
results page. I've also added another index view so that collapsed and
uncollapsed results look different enough. We'll hopefully be doing
some testing of this interface to see to what works best for our
users. This may mean ditching field collapsing altogether or may
provide me the chance to do more work on making field collapsing
easier to work with.
Jason
[FieldCollapsing definitely doesn't give you the total number of
collapsed 'pages' you'll get, and probably never will -- the Solr
developers don't see any way to do that without destroying performance,
because right now Solr only 'collapses' documents in the visible page,
it doesn't go through the entire result set and collapse it, which is
what it would need to do to know the total number of post-collapsed items.]
I wouldn't use it as a first resort to your problem, if you can find
another way around it. Which you may not be able to do, Solr isn't great
at certain things. (In many cases, I think the best solution is to
pre-process your records and merge them _before_ you add them to the
Solr index, but that's not easy to do either in a typical library
environment).
What are you actually thinking of using field collapsing for, Karen?
Anyone know if there is a way to determine how many unique values are
in a single facet for any arbitrary search that wouldn't be a big
performance problem? I've figured if pagination wasn't already part of
field collapsing, and as Jonathan mentions, would likely never get in
there, that there might not be a way to get this information, but I
figured I'd ask in any case. Since it seems as if it is possible to
group on more than one field (group.field can be specified more than
once), it may be that factor which limits getting pagination to work?
I'm only grouping by one field so far, so maybe there is a way to
calculate how many pages I'd have?
Jason
Sadly, there's also no way to do that in Solr, and the Solr developers
seem uninterested in figuring one out, believing that it may not be
possible without performance implications.
Although I have looked at the code a _bit_, and in my _completely_
not-familiar-with-solr not-a-solr-developer opinion, I thought I saw
some ways maybe I could add it in -- but it gets confusing, as there are
3 or 4 different paths facetting can take depending on the nature of
your data. I thought I saw a way to put it in without performance
problems for the strategy Solr uses for facet.method=fc on a
multi-valued field. Which is pretty much always what I have. So if you
feel like doing some Java hacking, you could try to write some Java to
do this, a custom version of the SimpleFacet component. (And/or patch
suggested back to Solr of course). When I looked a bit, it did seem
possible to me, but I could be wrong, and don't really have time to get
into it right now compared to how much I need it.
If you don't need "for an arbitrary search", but just accross your
entire corpus, you can do it. But within an arbitrary search, nope.
What Solr does is allow you to ask for the number of groups (as rows)
you want returned. So you have more than 10 aggregations and you
request 10 groups, it will return 10 groups. In the group.main
representation you only get the first matching document back. This
means it is able to be parsed by most Solr response parsers. Getting
back grouped results has a completely different syntax. Using that new
grouping syntax you could get back 10 groups and then display more
than one result within each group. Exactly how the relevance of groups
and documents within groups is determined, I'm not certain of yet. So
the first document within a group may be very relevant, while the
second document may be much less relevant in the current search
context--I don't know. Certainly each document in the group must show
up somewhere in the search results, but may not be as relevant. Does
relevancy with grouping work like that? I don't know.
The implementation of field collapsing in current Solr trunk is
different than the patches to previous versions of Solr. It may be
that those previous patches worked more similar to how Google does.
Jason
Check out call numbers and pub dates at
http://searchworks.stanford.edu
( you have to select a top level facet value to see what's underneath,
in our implementation).
If this is the desired behavior, I can provide more details. I
believe there is now "hierarchical facets" patch in Solr, if it isn't
included already in 4.0
- Naomi
"grouped":{
"groups":{
"matches":188,
"groups":[{
"groupValue":"zinc concentrations set II",
"doclist":{"numFound":4,"start":0,"docs":[
{
"id":"625",
"name":"Zn_0.000_vs_NRC-1d.sig",
"groups":["zinc concentrations set II"]},
{
"id":"626",
"name":"Zn_0.005_vs_NRC-1d.sig",
"groups":["zinc concentrations set II"]},
{
"id":"627",
"name":"Zn_0.010_vs_NRC-1d.sig",
"groups":["zinc concentrations set II"]},
{
"id":"628",
"name":"Zn_0.015_vs_NRC-1d.sig",
"groups":["zinc concentrations set II"]}]
}},
{
"groupValue":"ZnSO4 0.015mM step time series rep-1",
"doclist":{"numFound":8,"start":0,"docs":[
{
"id":"652",
"name":"ZnSO4_ts_set-1_-005min_vs_NRC-1h1.sig",
"groups":["ZnSO4 0.015mM step time series rep-1"]},
{
"id":"653",
"name":"ZnSO4_ts_set-1_000min_vs_NRC-1h1.sig",
"groups":["ZnSO4 0.015mM step time series rep-1"]},
{
"id":"654",
"name":"ZnSO4_ts_set-1_005min_vs_NRC-1h1.sig",
"groups":["ZnSO4 0.015mM step time series rep-1"]},
{
"id":"655",
"name":"ZnSO4_ts_set-1_010min_vs_NRC-1h1.sig",
"groups":["ZnSO4 0.015mM step time series rep-1"]},
{
"id":"656",
"name":"ZnSO4_ts_set-1_020min_vs_NRC-1h1.sig",
"groups":["ZnSO4 0.015mM step time series rep-1"]},
{
"id":"657",
"name":"ZnSO4_ts_set-1_040min_vs_NRC-1h1.sig",
"groups":["ZnSO4 0.015mM step time series rep-1"]},
{
"id":"658",
"name":"ZnSO4_ts_set-1_080min_vs_NRC-1h1.sig",
"groups":["ZnSO4 0.015mM step time series rep-1"]},
{
"id":"659",
"name":"ZnSO4_ts_set-1_160min_vs_NRC-1h1.sig",
"groups":["ZnSO4 0.015mM step time series rep-1"]}]
}},
....
For the doclist, I only have it returning the id, name and groups for testing purposes. In reality there would be much more info
there (the metadata).
You'll notice that it says, "numFound" then displays that number of "docs" for that groupValue. This is similar to what I'm
wanting to do in the UI - have the groupValue be clickable (expand/collapse) such that depending on the state it will
display/hide the member conditions (docs) to the user.
To post to this group, send email to blacklight-...@googlegroups.com.
To unsubscribe from this group, send email to blacklight-develo...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/blacklight-development?hl=en.
<mailto:blacklight-development%2Bunsu...@googlegroups.com <mailto:blacklight-development%252Buns...@googlegroups.com> >.
Apologies for resurrecting a dead thread, but I’ve started a solr 4 blacklight-jetty branch (mainly to run the tests against).
Branch: https://github.com/projectblacklight/blacklight-jetty/tree/solr-4
Diff: https://github.com/projectblacklight/blacklight-jetty/compare/master...solr-4
The majority of the Blacklight tests passed using stock Blacklight, however there were some spellchecking failures (which might actually be bad tests — the intended behavior isn’t entirely clear to me)
To run the Blacklight tests, I had to self-compile SolrMarc using the latest SolrJ library. I can’t remember off-hand if the embedded solr feature worked, or if I had to run against the http endpoint.
I tried to leave the out-of-the-box solr config alone as much as possible and just add in the appropriate Blacklight configuration. I wasn’t entirely successful in this attempt, but should do better next time (and, note to self, also commit the stock Solr configs for ease-of-diffing later)
https://github.com/projectblacklight/blacklight-jetty/blob/solr-4/solr/development-core/conf/solrconfig.xml
https://github.com/projectblacklight/blacklight-jetty/blob/solr-4/solr/development-core/conf/schema.xml
The two major differences are:
- using the solr multicore configuration with a development core and a test core. I certainly like the multicore configuration better, and if there are no objections would like to proceed with it.
- Using the new (in Solr 3.1) ICU tokenizers and filters to replace the schema.UnicodeNormalizationFilterFactory, schema.CJKFilterFactory, etc. See http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUTokenizerFactory and http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUFoldingFilterFactory
As the tests pass, I assume nothing terrible happens when I do that. I’d love for someone who actually knows what they are doing with CJK languages or unicode normalization to take a look sometime.
On a related note — hopefully as part of this work, I’ll be able to cobble together some HOWTO documentation about going from a stock Solr config to what Blacklight expects and add it to the wiki.
To post to this group, send email to blacklight-...@googlegroups.com.
To unsubscribe from this group, send email to blacklight-develo...@googlegroups.com.
Indeed. But to be clear, I think either way SolrMarc is using "SolrJ",
just a question of whether it's in embedded mode, or HTTP mode.
>> On a related note � hopefully as part of this work, I�ll be able to
>> cobble together some HOWTO documentation about going from a stock
>> Solr config to what Blacklight expects and add it to the wiki.
I was thinking about doing this too, perhaps not exactly the same thing
as you, which might make our approaches complementary. What I was
thinking: There are actually a bunch of different choices for how you
set up your solrconfig.xml and corresonding blacklight config, I was
thinking of making a list of "scenarios" with solrconfig.xml set up a
certain way, blacklight config set up the simplest way that works for
that config, and then bonus advanced search config for that scenario too.
That's simply not true that it only collapses documents visible in the page. It is some way serious Lucene Collector magic across values of a field to "group" them, and definitely runs over the entire result set from q/fq's.
Note that the feature is really called Field *Grouping*, not collapsing, in case there's any semantic confusions about that. It's explained a little here: <http://wiki.apache.org/solr/FieldCollapsing>
Erik