Solr is drunk, doesn't know how to sort <dates>, any thoughts?

226 views
Skip to first unread message

bro...@barnard.edu

unread,
Apr 19, 2016, 11:04:37 AM4/19/16
to islandora
Hey all,

So our most amazing digital archivist realized that our date sorting for some collections is behaving exceptionally.

Here is a first hand look at a weird sort: https://digitalcollections.barnard.edu/islandora/search?type=dismax&islandora_solr_search_navigation=0&sort=mods_originInfo_dateCreated_sort%20desc&f[0]=mods_genre_ms%3A%22yearbook%22&f[1]=mods_name_corporate_namePart_ms%3A%22Barnard%5C%20College%22
The first result, at least for me, is from 1913, however we have yearbooks as new as 2013 (or 14) that are not on page 1! Here is the crazy thing: the ascending sort on Date Created appears to work just fine... 

We created a field called mods_originInfo_dateCreated_sort in schema.xml that looks like this:

<field name="mods_originInfo_dateCreated_sort" type="date" indexed="true" stored="true"/>
<copyField source="mods_originInfo_encoding_w3cdtf_keyDate_yes_point_start_qualifier_approximate_dateCreated_dt"dest="mods_originInfo_dateCreated_sort"/>
<copyField source="mods_originInfo_encoding_w3cdtf_keyDate_yes_dateCreated_dt" dest="mods_originInfo_dateCreated_sort"/>
<copyField source="mods_originInfo_encoding_w3cdtf_dateCreated_dt" dest="mods_originInfo_dateCreated_sort"/>
<copyField source="mods_originInfo_encoding_iso8601_dateCreated_dt" dest="mods_originInfo_dateCreated_sort"/>


We did this because we wanted one field to unify them all for search and sort.

Does anyone have any thoughts? Suggestions? Is the answer something so simple that I've just overlooked it? Any help is greatly appreciated.


Thank you!
Ben

Nelson Hart

unread,
Apr 19, 2016, 11:33:31 AM4/19/16
to isla...@googlegroups.com
Ben,

On object http://digitalcollections.barnard.edu/object/yearbook-2013/mortarboard-2013, what Solr field contains the dateCreated value? It would appear you're not copying the particular field to mods_originInfo_dateCreated_sort.


I'm not sure what XSLTs you are using to transform, but based on what you have above, it would be something like: mods_originInfo_encoding_iso8601_keyDate_yes_dateCreated_dt

Nelson




--
For more information about using this group, please read our Listserv Guidelines: http://islandora.ca/content/welcome-islandora-listserv
---
You received this message because you are subscribed to the Google Groups "islandora" group.
To unsubscribe from this group and stop receiving emails from it, send an email to islandora+...@googlegroups.com.
Visit this group at https://groups.google.com/group/islandora.
To view this discussion on the web visit https://groups.google.com/d/msgid/islandora/b6ed3ee2-5e5d-42e1-9d53-427c58fe0a9f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

bro...@barnard.edu

unread,
Apr 19, 2016, 9:52:29 PM4/19/16
to islandora
Dear Nelson,

Thank you for the prompt reply! You've set me on the right course and I think I've narrowed down the problem. 

Thanks again! I will post back to the group with my solution if it is confirmed ~ should anyone arrive at the same problem.

Best,
B

bro...@barnard.edu

unread,
Apr 20, 2016, 3:32:03 PM4/20/16
to islandora
So Nelson hit it on the head. Added that to the schema and will think about doing a reindex in the next few...

Thanks again!

Brandon Weigel

unread,
May 18, 2016, 12:06:26 PM5/18/16
to islandora
That is really not ideal behaviour... I'm running into lots of problems with fields that contain attributes. I cited this discussion chain in my own post as an example of Islandora's poor (and inconsistent) handling of attributes: https://groups.google.com/forum/#!topic/islandora/WSIfnMA61po

I'm not sure exactly what the solution is, but it would probably involve a fair bit of work on core.

Benjamin Rosner

unread,
May 18, 2016, 2:29:34 PM5/18/16
to isla...@googlegroups.com
I want to see what your Solr debug looks like! I love and hate these issues ~ Solr is such a challenge. Did you create a field similar to ours? What does that look like? Did you force FGS to reindex after creating it? Do you want me to move this discussion to your post? Should I stop asking questions? 

We're here to help!

You received this message because you are subscribed to a topic in the Google Groups "islandora" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/islandora/eBhYg7A9TeE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to islandora+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Benjamin Rosner

Instructional Applications Developer
Library and Academic Information Services
Sulzberger Hall Annex, Barnard College
p: 212-854-9005

Brandon Weigel

unread,
May 18, 2016, 2:40:26 PM5/18/16
to isla...@googlegroups.com
I don’t have a solution yet… I think perhaps if we can’t get Solr to ignore qualifiers when you set them up, Islandora should instead create another version of each field (which I guess would have to be done for EVERY field) — something like mods_originInfo_dateIssued_allVersions_ss - harmonizing all versions of the given field with attributes stripped out. But I’m not sure what the performance cost would be of adding an extra version of every field field for every object - could be negligible, could not; I’ve no idea. (Anyone more knowledgeable than I know? Say you have five million objects… what would the impact be?)

Brandon Weigel
Librarian
BC Electronic Library Network

bro...@barnard.edu

unread,
May 18, 2016, 8:11:21 PM5/18/16
to islandora
I do understand that this is not ideal and Islandora, or at the least the schema.xml recommended could be amended to handle the variety of metadata fields that contain date that might be used for sorting.. though I don't know one size fits all.

The issue we faced at Barnard was that we had multiple fields that were valid representations of 'date created` and wanted to use them all for date created sort. Instead of messing with any record-level metadata we created a Solr field that was essentially a copy of every*_dt we wanted to use for this sorting. Ultimately the Solr index now has that data for each record FGS informs it of.

If you look at my OP you'll see the additions to the Solr schema that we made. Note, when we added and later amended them we did a Solr "reindex" (misnomer, but you get the point). I guess the short of our solution was that Solr does the heavy lifting, Islandora uses our new field for sorting and it works great. I'm not sure of the specs on our Development Server, but the 'reindex' test I did on 50k objects took <10hrs. 

I hope this helps, and that I'm explaining how we arrived at a solution so that you could, at least temporarily, patch your issue. If I can be of any help you can reach out to me on IRC (arebenji) or just private message me from the group.
Reply all
Reply to author
Forward
0 new messages