Bernd,
It seems like perhaps SolrMarc needs a new modifier as a complement to the existing stripInd/stripInd1/stripInd2 options, perhaps accompanied by some new configuration properties to control which characters are used for stripping. I haven't worked with modifier code before, so I'm not exactly sure how this works... but seems like a possible approach. I'm copying solrmarc-tech back into the thread in case others there have thoughts on this.
There is no need for any patches in marc4j, it has already everything implemented according to MARC rules.
But there could be some work at SolrMarc done and it must be very flexible because every catalog system has it's own non-sorting chars or even char sequence. Your Pica uses "\u009b" and "\u009c", Alma uses "<<" and ">>".
The question is, should SolrMarc rely on the non-sorting rules of MARC records or should it ignore any MARC rules and just go for replaceFirst of a special char or char sequence.
Regards
Bernd
Am 04.11.21 um 14:00 schrieb Uwe Reh:
> Adding a own method in Solrmarc sounds like a good approach. A patch
> of marc4j seems to be 'too' general. (marc4j shouldn't change the
> content.)
>
> In our HDS project we are using dedicated fields for sorting
>
> * 'author_sort', 'title_sort': filled respecting the rules for non
> sorting prefixes.
>
> * 'date_sort_asc', 'date_sort_desc': filled with first/last date of
> publication. (needed for journals and series)
>
> Since we fill our index with a own 'SolrPica', I can't share a
> SolrMarc solution. But the generic code is quite simple.
>
>> private String removeNonSortingLeader(String out) {
>> if (out == null) return null;
>> if (out.isEmpty()) return "";
>> // Entferne '@' an erster Stelle.
>> if (out.charAt(0) == '@') return out.substring(1);
>> // Entferne Nichtsortierzeichen nach RAK
>> out = out.replaceFirst("^\\w+ @", "");
>> // Entferne Nichtsortierzeichen nach MARC
>> out = out.replaceFirst("^˜\u009b.*\u009c", "");
>> return out;
>> }
>
> Note 1) In my original code I'm using the control chars directly. With
> the encoded chars, the last regex might not work.
> Note 2) Yes, the code isn't optimized. I tend to leave this task to the JVM.
>
> Uwe
>
>
>
> Am 04.11.21 um 13:10 schrieb Bernd Fehling:
>> Hi Demian,
>>
>> I was looking for a general solution to enhance SolrMarc and
>> therefore digging into marc4j.
>> It is generally possible to have this in SolrMarc because of marc4j.
>> But as I mentioned, the librarians have their own view and just left
>> the ages of punched-card readers. ;-)
>>
>> I write a class for it and that's it. QAE (quick and easy)
> demian.katz%
40villanova.edu%7Cfa9972897eab4eb8cc8a08d99f9ddb08%7C765a8
> de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C637716323361945275%7CUnknown%7CT
> WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI
> 6Mn0%3D%7C3000&sdata=lcK1jOO1Y6roj3h9iKNNuJUowTHziqKt59hjpgKvyTY%3
> D&reserved=0
>
--
*************************************************************
Bernd Fehling Bielefeld University Library
Dipl.-Inform. (FH) LibTec - Library Technology
Universitätsstr. 25 and Knowledge Management
33615 Bielefeld
Tel.
+49 521 106-4060 bernd.fehling(at)
uni-bielefeld.de
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ub.uni-bielefeld.de%2F~befehl%2F&data=04%7C01%7Cdemian.katz%40villanova.edu%7Cfa9972897eab4eb8cc8a08d99f9ddb08%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C637716323361955235%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=SZdByUFco7JI6XzR3KepWTXwAdfT0WIi%2BJY3TRMEpRc%3D&reserved=0
BASE - Bielefeld Academic Search Engine -
https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.base-search.net%2F&data=04%7C01%7Cdemian.katz%40villanova.edu%7Cfa9972897eab4eb8cc8a08d99f9ddb08%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C637716323361955235%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=NS29LtrhIgFolBJj6X%2FfI107d%2BDyIDTKP3k5a60peSA%3D&reserved=0
*************************************************************
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fvufind-tech&data=04%7C01%7Cdemian.katz%40villanova.edu%7Cfa9972897eab4eb8cc8a08d99f9ddb08%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C637716323361955235%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=y8QJVyCJkuMPSUvRm6NTspJsBHfj8K05DSNuthr85K0%3D&reserved=0