Hello, everyone –
In trying to help someone troubleshoot a SolrMarc error message, I discovered that illegal skip indicators can cause error dumps like this:
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(String.java:1931)
at org.solrmarc.index.extractor.formatter.FieldFormatterBase.cleanData(FieldFormatterBase.java:408)
at org.solrmarc.index.extractor.formatter.FieldFormatterBase.prepData(FieldFormatterBase.java:579)
at org.solrmarc.index.specification.SingleDataFieldSpecification.addFieldValues(SingleDataFieldSpecification.java:89)
at org.solrmarc.index.extractor.impl.direct.FieldMatch.addValuesTo(FieldMatch.java:42)
at org.solrmarc.index.extractor.impl.direct.DirectMultiValueExtractor.extract(DirectMultiValueExtractor.java:59)
at org.solrmarc.index.extractor.AbstractMultiValueExtractor.extract(AbstractMultiValueExtractor.java:14)
at org.solrmarc.index.extractor.AbstractMultiValueExtractor.extract(AbstractMultiValueExtractor.java:1)
at org.solrmarc.index.indexer.AbstractValueIndexer.getFieldData(AbstractValueIndexer.java:72)
at org.solrmarc.index.SolrIndexerShim.getFieldListCollector(SolrIndexerShim.java:138)
at org.solrmarc.index.SolrIndexerShim.getSortableTitle(SolrIndexerShim.java:620)
at org.solrmarc.index.SolrIndexer.getSortableTitle(SolrIndexer.java:331)
at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.solrmarc.index.extractor.methodcall.SingleValueExtractorMethodCall.invoke(SingleValueExtractorMethodCall.java:50)
at org.solrmarc.index.extractor.methodcall.SingleValueExtractorMethodCall.invoke(SingleValueExtractorMethodCall.java:1)
at org.solrmarc.index.extractor.methodcall.AbstractExtractorMethodCall.invoke(AbstractExtractorMethodCall.java:37)
at org.solrmarc.index.extractor.methodcall.MethodCallSingleValueExtractor.extract(MethodCallSingleValueExtractor.java:44)
at org.solrmarc.index.extractor.AbstractMultiValueExtractor.extract(AbstractMultiValueExtractor.java:1)
at org.solrmarc.index.indexer.MultiValueIndexer.getFieldData(MultiValueIndexer.java:103)
at org.solrmarc.driver.Indexer.indexToSolrDoc(Indexer.java:314)
at org.solrmarc.driver.Indexer.getIndexDoc(Indexer.java:211)
at org.solrmarc.driver.IndexerWorker.run(IndexerWorker.java:68)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
It appears that there is an unguarded substring operation here:
Is this intentional? Would it be better to do some more controlled checking and emit a more targeted error message in this scenario? I’m not sure about all the contexts in which this code is called, but if there’s a need for improvement here and Bob wants to provide a little guidance, I’m open to putting together a pull request….
- Demian
> Is this intentional?
Yeah, that's the ticket, I meant to do that. :-)
You are right it would be much better to perform some checking and return a target error message or exception or something. I was thinking on a somewhat related matter about how one could make it so that if a translation map is applied and if a data item is found that doesn't match any of the entries, that it should be possible to specify that that should be flagged or noted somehow.
-Bob
Is this intentional? Would it be better to do some more controlled checking and emit a more targeted error message in this scenario? I’m not sure about all the contexts in which this code is called, but if there’s a need for improvement here and Bob wants to provide a little guidance, I’m open to putting together a pull request….
- Demian
First, I agree with Tod here.
Second, the arcane jumbled mess that is the MARC record standard sometimes make me want to cry or scream.
-Bob
Thanks, Bob!
I’m about to head to a conference for the rest of the week, but if you’d like me to take any action toward a solution, let me know and I’ll see if I can carve out a little time when I get back… not that I currently have anything approaching a well-formed plan yet. 😊
- Demian
The actual fix should not be that difficult. I may need guidance or opinions or votes on how it should work.
Given that there are some fields where the second indicator indicates the number of "non-filing" characters (e.g. 245)
and some fields where the first indicator indicates the number of "non-filing" characters (e.g. 740) which solution would make more sense?
1) Add a "specification modifier keyword" that says "Use the first indicator to strip non-filing characters" in addition to the existing one that means "Use the second indicator to strip non-filing characters"
2) Add a parameter to the existing specification keyword (somehow) to allow the specification of which indicator should be so-used.
3) Add a list of fields that would be consulted to determine which indicator (if any) should be used for stripping non-filing characters.
Given that the existing
"specification modifier keyword" is stripInd2 I'm leaning towards solution 1, but maybe
also implementing solution 3. So that there would be three "specification modifier keyword"s :
stripInd1 stripInd2 and stripInd
So that the first two could unambiguously specify which indicator should be so-used, and the third (which may then be made the default -- opinions?) would consult the list of fields that would say for 245a (and
these others) use the 2nd indicator for 740a (and these others) use the 1st indicator, for any other fields/subfields, the modifier keyword would be blithely ignored.
-Bob Haschart
Bob,
I agree that if we’re going to keep calling the modifier stripInd2, it makes more sense to add a stripInd1 than to change the behavior of stripInd2. Having an automatic stripInd option would be useful, but we can’t assume everyone is using the same MARC standards, so the explicit ind1/ind2 options will be valuable for an extra level of control when needed.
If you do want to implement the automatic stripping, you might find this code a useful reference:
https://github.com/pear/File_MARC/blob/master/File/MARC/Lint.php#L740
This is my PHP port of the MARC::Lint module, which contains identical logic for figuring out which strip indicator to use. I believe this is up to date with MARC21 standards.