Re: [VuFind-Tech] Conditional Indexing based on Solr id and .mrc filename convention

9 views
Skip to first unread message

Demian Katz

unread,
Nov 28, 2017, 8:17:48 PM11/28/17
to Leila Gonzales, vufin...@lists.sourceforge.net, solrma...@googlegroups.com

Leila,


I'm copying this thread over to solrmarc-tech in case Bob (or anyone else) has any suggestions. I'll give this some thought myself, but tomorrow is going to be crazy (post-vacation catch-up plus substantial hiring committee duties), so it may be a couple of days before I'll be able to sit down and write a coherent response. 😊


- Demian




From: Leila Gonzales <l...@agiweb.org>
Sent: Tuesday, November 28, 2017 1:01 PM
To: vufin...@lists.sourceforge.net
Subject: [VuFind-Tech] Conditional Indexing based on Solr id and .mrc filename convention
 

Hi all,

 

We’d like to do some conditional indexing based on the filename and on whether or not a record exists in the Solr database.

 

So for instance, if a  MARC input filename contains the text “ADD”, then we want to make sure that all records in that .mrc input file do not already exist in the Solr database. If they do, we want to log the IDs in a warning message and also skip over those records (and not index them).  

 

I’ve started a preliminary custom indexing .java file in my local/import/index_java/src/customcode/index/ directory. Basically, I’ve created a recValid_str variable in my marc_local.properties where I call the custom validation method to determine 0 or 1 for whether or not to use the DeleteRecordIfFieldEmpty method to skip indexing of the record. I figure I can dump the MARC_FILE variable into a global variable from the import-marc.sh script, and then the issue is just figuring out how to query Solr to see if the id exists already. I looked over the documentation on using getfromsolr here: https://github.com/solrmarc/solrmarc/issues/17

 

Is it possible to call the getfromsolr from within my custom indexing java file so that it calls $VUFIND_HOME/import/bin/getfromsolr, reads in the url and default_core variables from config.ini’s [Index] section, and constructs a command something like below that returns information as to whether that id is already in the Solr database?

 

$VUFIND_HOME/import/bin/getfromsolr SolrURL/default_core id:1234567

 

Thanks for any guidance you can provide.

 

Cheers,
Leila

 

Robert Haschart

unread,
Nov 29, 2017, 5:15:08 PM11/29/17
to solrma...@googlegroups.com
Leila,

I'll think about this for a little bit.  The code for getfromsolr hasn't yet been implemented in the 3.x version of SolrMarc (partly because it was quite ugly)
There is a custom indexing routine that I helped Tod Olson with, that does lookups in Solr in the middle of a custom indexing routine. 
Although in that case it was looking up information that had been pre-loaded into a separate Solr index, the code for doing the lookup might be able to do what you want.

I'll look to see whether that code is in a github repo somewhere, although I suspect it isn't.

-Bob Haschart
--
You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to solrmarc-tec...@googlegroups.com.
To post to this group, send email to solrma...@googlegroups.com.
Visit this group at https://groups.google.com/group/solrmarc-tech.
For more options, visit https://groups.google.com/d/optout.

Robert Haschart

unread,
Nov 29, 2017, 6:34:04 PM11/29/17
to solrma...@googlegroups.com, Leila Gonzales, vufin...@lists.sourceforge.net
Leila,

I have something I think will work, I'll send it to you tomorrow.


-Bob Haschart

On 11/28/2017 8:17 PM, Demian Katz wrote:
--

Leila Gonzales

unread,
Nov 29, 2017, 8:20:18 PM11/29/17
to Robert Haschart, solrma...@googlegroups.com, vufin...@lists.sourceforge.net

Thank you very much Bob!

 

-Leila

Robert Haschart

unread,
Nov 30, 2017, 5:26:56 PM11/30/17
to solrma...@googlegroups.com, Leila Gonzales, vufin...@lists.sourceforge.net
Leila,

I have generalized the custom indexing method that I helped Tod Olsen with so that it will be able to accomplish what you need it to do and still be able to be used for purposes similar to what Tod's institution needed.   I will include a description of how to use it to do what you need.   I'll add the source file to the github repo   solrmarc/custom_methods

https://github.com/solrmarc/custom_methods

and then add some documentation there as well.

an example that would accomplish what you would seem to want using the attached custom java code is:

exists1=custom, skipIfDataExists("http://myserver.machine.edu:8983/solr/bib", "id")

where this "http://myserver.machine.edu:8983/solr/bib"  is the URL of where you want to check for whether that record exists.
and "id" is simply the name of a field that will appear in the record if it exists.

This method assumes that your ids are found in the 001 field of your records, and that they are stored in the "id" field in the solr index.   If these assumptions are not valid you can used a different method in the java file like this:

exists=custom, skipIfDataExists("http://myserver.machine.edu:8983/solr/bib", "001", "id", "id")

with the 2nd and 3rd parameters specifying where to find the id in the record being indexed, and what searchable field should be used for the lookup.

Let me know if you manage to get this method working for you.

-Bob Haschart

Leila Gonzales

unread,
Dec 1, 2017, 9:59:35 AM12/1/17
to Robert Haschart, solrma...@googlegroups.com, vufin...@lists.sourceforge.net

Thank you very much Bob. I really appreciate this. I’ll let you know how it goes with getting this method working.

 

Kind regards,
Leila

 

From: Robert Haschart [mailto:rh...@virginia.edu]

Sent: Thursday, November 30, 2017 2:27 PM
To: solrma...@googlegroups.com; Leila Gonzales; vufin...@lists.sourceforge.net

Leila Gonzales

unread,
Dec 1, 2017, 1:23:55 PM12/1/17
to Robert Haschart, solrma...@googlegroups.com, vufin...@lists.sourceforge.net

Thanks again for this code Bob. I have a quick question on the compiling of this mixin, and my apologies for a newbie question about this.

 

I put the code in /vufind/MyDirectory/local/import/index_java/src/org/solrmarc/mixin/ and when I went to run the import-marc.sh script, I got this error:

 

/vufind/MyDirectory/local/import/index_java/src/org/solrmarc/mixin/SolrRecordLookupMixin.java:199: error: cannot find symbol

                resp = server.query(params);

                             ^

  symbol:   method query(org.apache.solr.client.solrj.SolrQuery)

  location: variable server of type org.solrmarc.solr.SolrProxy

 

Both org.apache.solr.client.solrj.SolrQuery and org.solrmarc.solr.SolrProxy are declared in the import section of the SolrRecordLookupMixin, so I’m not sure why this is erring out. Could it be that since this file “extends SolrIndexerMixin” that it’s not seeing the SolrIndexerMixin file that is in the solrmarc_core..3.0.6.jar file (/vufind/MyDirectory/import/solrmarc_core_3.0.6.jar::org/solrmarc/index/SolrIndexerMixin.class)? If this is the case, how would I specify the location of that file?

 

Thanks for your help on this,

Leila

 

 

From: Robert Haschart [mailto:rh...@virginia.edu]

Sent: Thursday, November 30, 2017 2:27 PM
To: solrma...@googlegroups.com; Leila Gonzales; vufin...@lists.sourceforge.net

Robert Haschart

unread,
Dec 1, 2017, 2:41:32 PM12/1/17
to solrma...@googlegroups.com, vufin...@lists.sourceforge.net
Leila,

I just found that the commit that added support for querying a Solr index to the SolrProxy interface class was done on Jan 27 of this year, while the current latest release was created on Jan 12.   So I'll need to put together a new release of SolrMarc for that feature to actually work for you.   Sorry about that.

I'll add in a fix for the problem that Demian just found for working with Solr 7.1 and try to create a release in the next day or so.

-Bob Haschart

Leila Gonzales

unread,
Dec 1, 2017, 2:45:14 PM12/1/17
to solrma...@googlegroups.com, vufin...@lists.sourceforge.net

Thanks Bob. I also figured out an alternate way to do the query and that is to call a python script that queries the Solr database from within my custom java method. Not sure it’s the best option, but it’s pretty quick and works like a charm. If you think it would be useful, let me know and I’ll send you the code. I’ll convert over to the new SolrMarc release when it’s available so I can take advantage of the new code you wrote.


Thanks again for your help!

Leila

Demian Katz

unread,
Dec 1, 2017, 3:48:57 PM12/1/17
to solrma...@googlegroups.com, vufin...@lists.sourceforge.net

Wonderful! I'll update VuFind and give this another test whenever the release is ready. Thanks for your help with this!


- Demian



From: solrma...@googlegroups.com <solrma...@googlegroups.com> on behalf of Robert Haschart <rh...@virginia.edu>
Sent: Friday, December 1, 2017 2:41 PM
To: solrma...@googlegroups.com; vufin...@lists.sourceforge.net

Robert Haschart

unread,
Dec 1, 2017, 5:24:50 PM12/1/17
to solrma...@googlegroups.com, vufin...@lists.sourceforge.net
Leila,

I just uploaded a build from the current head of the github repo for SolrMarc to the dropbox location you shared.
It includes the changes that you will need for the custom method I sent, as well as the fix for the problem Demian reported

Have a good weekend.

-Bob

Leila Gonzales

unread,
Dec 2, 2017, 9:06:57 PM12/2/17
to solrma...@googlegroups.com, vufin...@lists.sourceforge.net

Thank you very much Bob!

 

-          Leila

Robert Haschart

unread,
Dec 7, 2017, 2:02:23 PM12/7/17
to solrma...@googlegroups.com
Leila,

Another new feature that has been added to the version I gave you is support for new keywords similar to   DeleteRecordIfFieldEmpty
they are:  SkipRecordIfFieldEmpty , SkipRecordIfFieldNotEmpty  and  DeleteRecordIfFieldNotEmpty

With them you can use the new mixin code that I copied to your dropbox location and have an index specification like this:

exists1=custom, getExtraSolrDataByID("http://myserver.machine.edu:8983/solr/bib", "id"), SkipRecordIfFieldNotEmpty 


-Bob Haschart 

Leila Gonzales

unread,
Dec 8, 2017, 4:22:14 PM12/8/17
to solrma...@googlegroups.com

Thank you very much Bob. These features will be immensely helpful.

 

Kind regards,
Leila

 

From: solrma...@googlegroups.com [mailto:solrma...@googlegroups.com] On Behalf Of Robert Haschart
Sent: Thursday, December 7, 2017 11:02 AM
To: solrma...@googlegroups.com
Subject: Re: [solrmarc-tech] Re: [VuFind-Tech] Conditional Indexing based on Solr id and .mrc filename convention

 

Leila,


Another new feature that has been added to the version I gave you is support for new keywords similar to   DeleteRecordIfFieldEmpty
they are:  SkipRecordIfFieldEmpty , SkipRecordIfFieldNotEmpty  and  DeleteRecordIfFieldNotEmpty

With them you can use the new mixin code that I copied to your dropbox location and have an index specification like this:

exists1=custom, getExtraSolrDataByID("http://myserver.machine.edu:8983/solr/bib", "id"), SkipRecordIfFieldNotEmpty 


-Bob Haschart 

On 12/2/2017 9:06 PM, Leila Gonzales wrote:

Thank you very much Bob!

 

Leila

Reply all
Reply to author
Forward
0 new messages