How to reindex solr authority

180 views
Skip to first unread message

Peter Dietz

unread,
Oct 28, 2016, 11:45:29 AM10/28/16
to DSpace Technical Support
Hi All,

We have a Solr Authority index that needs to be reindexed. Some of the data is out of date. And also, for unknown reasons, we are missing the external authority ID for some records (the equivalent of orcid_id is missing). I've found org.dspace.authority.UpdateAuthorities as some sort of stub for how to find records in Solr authority, process them, but it doesn't appear to do anything. I've used that as a start, I find records, and then since I'm missing data, search from that data present against the external authority system to find matches. 

My question, is how to actually reindex a record in solr. I can find an authority value from solr, but I don't see how to write it with updates back to solr.

Here's what I'm thinking for reindex.

String selectedID = "1f8e4a15-4631-4b82-8289-5e92463b776b";
AuthorityValue byUID = authorityValueFinder.findByUID(context, selectedID);
MyAuthorityValue searchValue = myAuthoritySource.queryAuthorities(byUID.getValue(), 1);
byUID.setRemoteID(searchValue.getRemoteID());
byUID.update();   //Doesn't do anything.

I wonder if instead of trying to write to solr, I need to update the item metadata to trigger a reindex... It might be easier to declare dspace solr authority bankruptcy, and write a third-party script that gets the solr authority index in a state I need it to be...



Anyways, the Solr documentation lists using SOLR as a data source as a huge no no.
Using Solr as a Data Source

Don't do this unless you have no other option. Solr is not really designed for this role. Every attempt is made to ensure that Solr is stable, but indexes do get corrupted by unanticipated situations, and by things completely outside developer control.


Storing a text_value="Dietz, Peter" and authority=orcid:0012-1234-4567, in DSpace database metadatavalue would have been better than setting authority=abcdef-12345-ghijk-6789, and then having to trust that nothing has happened to SOLR, and that that id has the data you are looking for. Well, we don't have anything inside that id, for many entries. DSpace database should be the rock-solid trustable data source, and solr should just have a reindex function, where it updates itself from the database values. And, perhaps some even better solution could be found than just this, such as a table just for authority keys.






________________
Peter Dietz
Longsight
www.longsight.com
pe...@longsight.com
p: 740-599-5005 x809
Reply all
Reply to author
Forward
0 new messages