Detection of illegal skip indicators?

4 views
Skip to first unread message

Demian Katz

unread,
Apr 15, 2019, 2:40:23 PM4/15/19
to solrma...@googlegroups.com

Hello, everyone –

 

In trying to help someone troubleshoot a SolrMarc error message, I discovered that illegal skip indicators can cause error dumps like this:

 

java.lang.StringIndexOutOfBoundsException: String index out of range: -1

        at java.lang.String.substring(String.java:1931)

        at org.solrmarc.index.extractor.formatter.FieldFormatterBase.cleanData(FieldFormatterBase.java:408)

        at org.solrmarc.index.extractor.formatter.FieldFormatterBase.prepData(FieldFormatterBase.java:579)

        at org.solrmarc.index.specification.SingleDataFieldSpecification.addFieldValues(SingleDataFieldSpecification.java:89)

        at org.solrmarc.index.extractor.impl.direct.FieldMatch.addValuesTo(FieldMatch.java:42)

        at org.solrmarc.index.extractor.impl.direct.DirectMultiValueExtractor.extract(DirectMultiValueExtractor.java:59)

        at org.solrmarc.index.extractor.AbstractMultiValueExtractor.extract(AbstractMultiValueExtractor.java:14)

        at org.solrmarc.index.extractor.AbstractMultiValueExtractor.extract(AbstractMultiValueExtractor.java:1)

        at org.solrmarc.index.indexer.AbstractValueIndexer.getFieldData(AbstractValueIndexer.java:72)

        at org.solrmarc.index.SolrIndexerShim.getFieldListCollector(SolrIndexerShim.java:138)

        at org.solrmarc.index.SolrIndexerShim.getSortableTitle(SolrIndexerShim.java:620)

        at org.solrmarc.index.SolrIndexer.getSortableTitle(SolrIndexer.java:331)

        at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:498)

        at org.solrmarc.index.extractor.methodcall.SingleValueExtractorMethodCall.invoke(SingleValueExtractorMethodCall.java:50)

        at org.solrmarc.index.extractor.methodcall.SingleValueExtractorMethodCall.invoke(SingleValueExtractorMethodCall.java:1)

        at org.solrmarc.index.extractor.methodcall.AbstractExtractorMethodCall.invoke(AbstractExtractorMethodCall.java:37)

        at org.solrmarc.index.extractor.methodcall.MethodCallSingleValueExtractor.extract(MethodCallSingleValueExtractor.java:44)

        at org.solrmarc.index.extractor.AbstractMultiValueExtractor.extract(AbstractMultiValueExtractor.java:1)

        at org.solrmarc.index.indexer.MultiValueIndexer.getFieldData(MultiValueIndexer.java:103)

        at org.solrmarc.driver.Indexer.indexToSolrDoc(Indexer.java:314)

        at org.solrmarc.driver.Indexer.getIndexDoc(Indexer.java:211)

        at org.solrmarc.driver.IndexerWorker.run(IndexerWorker.java:68)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)

 

It appears that there is an unguarded substring operation here:

 

https://github.com/solrmarc/solrmarc/blob/master/src/org/solrmarc/index/extractor/formatter/FieldFormatterBase.java#L408

 

Is this intentional? Would it be better to do some more controlled checking and emit a more targeted error message in this scenario? I’m not sure about all the contexts in which this code is called, but if there’s a need for improvement here and Bob wants to provide a little guidance, I’m open to putting together a pull request….

 

- Demian

Tod Olson

unread,
Apr 15, 2019, 3:45:36 PM4/15/19
to solrmarc-tech, Tod Olson
I would agree that some error handling would be useful. I see a few things here:

First, as you say some more targeted error message would be great. Or even catch the java.lang.StringIndexOutOfBoundsException and re-throw some sort of IllegalIndicatorExcpection where the message would contain field information: tag, indicators, and data. That would be a huge help in tracking down data problems.

Second, the 740 uses indicator 1 for the non-filing indicator. It would be great to extend the code to be aware of the 740.

-Tod
> --
> You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to solrmarc-tec...@googlegroups.com.
> To post to this group, send email to solrma...@googlegroups.com.
> Visit this group at https://groups.google.com/group/solrmarc-tech.
> For more options, visit https://groups.google.com/d/optout.

Haschart, Robert J (rh9ec)

unread,
Apr 15, 2019, 5:28:56 PM4/15/19
to solrma...@googlegroups.com

> Is this intentional?


Yeah, that's the ticket, I meant to do that.   :-)


You are right it would be much better to perform some checking and return a target error message or exception or something.    I was thinking on a somewhat related matter about how one could make it so that if a translation map is applied and if a data item is found that doesn't match any of the entries, that it should be possible to specify that that should be flagged or noted somehow.


-Bob



From: solrma...@googlegroups.com <solrma...@googlegroups.com> on behalf of Demian Katz <demia...@villanova.edu>
Sent: Monday, April 15, 2019 2:40:20 PM
To: solrma...@googlegroups.com
Subject: [solrmarc-tech] Detection of illegal skip indicators?
 
SolrMarc is a utility that reads in MARC records, extracts information from various fields as specified in an indexing specification, and sends that information to a specified Apache Solr index. -...

 

Is this intentional? Would it be better to do some more controlled checking and emit a more targeted error message in this scenario? I’m not sure about all the contexts in which this code is called, but if there’s a need for improvement here and Bob wants to provide a little guidance, I’m open to putting together a pull request….

 

- Demian

--

Haschart, Robert J (rh9ec)

unread,
Apr 15, 2019, 5:32:55 PM4/15/19
to solrmarc-tech

First,   I agree with Tod here.  


Second, the arcane jumbled mess that is the MARC record standard sometimes make me want to cry or scream.


-Bob



From: solrma...@googlegroups.com <solrma...@googlegroups.com> on behalf of Tod Olson <olso...@gmail.com>
Sent: Monday, April 15, 2019 3:45:33 PM
To: solrmarc-tech
Cc: Tod Olson
Subject: Re: [solrmarc-tech] Detection of illegal skip indicators?
 

john.p...@gmail.com

unread,
Apr 16, 2019, 8:20:10 AM4/16/19
to solrma...@googlegroups.com
It’s done that to all of at one time or another.

John

Demian Katz

unread,
Apr 16, 2019, 8:26:13 AM4/16/19
to solrma...@googlegroups.com

Thanks, Bob!

 

I’m about to head to a conference for the rest of the week, but if you’d like me to take any action toward a solution, let me know and I’ll see if I can carve out a little time when I get back… not that I currently have anything approaching a well-formed plan yet. 😊

 

- Demian

Haschart, Robert J (rh9ec)

unread,
Apr 16, 2019, 1:25:21 PM4/16/19
to solrma...@googlegroups.com

The actual fix should not be that difficult.   I may need guidance or opinions or votes on how it should work.


Given that there are some fields where the second indicator indicates the number of "non-filing" characters (e.g.  245)

and some fields where the first indicator indicates the number of "non-filing" characters (e.g. 740)   which solution would make more sense?


1)  Add a "specification modifier keyword" that says "Use the first indicator to strip non-filing characters" in addition to the existing one that means "Use the second indicator to strip non-filing characters"


2) Add a parameter to the existing specification keyword (somehow) to allow the specification of which  indicator should be so-used.


3) Add a list of fields that would be consulted to determine which indicator (if any) should be used for stripping non-filing characters.


Given that the existing  "specification modifier keyword" is  stripInd2    I'm leaning towards solution 1,  but maybe also implementing solution 3.   So that there would be three "specification modifier keyword"s : 


stripInd1    stripInd2     and   stripInd


So that the first two could unambiguously specify which indicator should be so-used, and the third (which may then be made the default -- opinions?) would  consult the list of fields that would say  for 245a (and these others)  use the 2nd indicator   for 740a (and these others)  use the 1st indicator, for any other fields/subfields, the modifier keyword would be blithely ignored. 


-Bob Haschart


Sent: Tuesday, April 16, 2019 8:26:09 AM
To: solrma...@googlegroups.com
Subject: [solrmarc-tech] RE: Detection of illegal skip indicators?
 

Demian Katz

unread,
Apr 16, 2019, 1:35:46 PM4/16/19
to solrma...@googlegroups.com

Bob,

 

I agree that if we’re going to keep calling the modifier stripInd2, it makes more sense to add a stripInd1 than to change the behavior of stripInd2. Having an automatic stripInd option would be useful, but we can’t assume everyone is using the same MARC standards, so the explicit ind1/ind2 options will be valuable for an extra level of control when needed.

 

If you do want to implement the automatic stripping, you might find this code a useful reference:

 

https://github.com/pear/File_MARC/blob/master/File/MARC/Lint.php#L740

 

This is my PHP port of the MARC::Lint module, which contains identical logic for figuring out which strip indicator to use. I believe this is up to date with MARC21 standards.

Reply all
Reply to author
Forward
0 new messages