Control characters 0x1D-0x1F

6 views
Skip to first unread message

Demian Katz

unread,
Mar 30, 2021, 10:25:58 AM3/30/21
to solrma...@googlegroups.com

Hello,

 

Since its earliest days, VuFind has had code which string replaces entities #29;, #30; and #31; in MARC data with their actual character equivalents, 0x1D, 0x1E and 0x1F. The comment in the code suggests that this is compensating for something that SolrMarc is doing.

 

A pull request has recently been opened to remove this code, since it seems that it is no longer needed:

 

https://github.com/vufind-org/vufind/pull/1900

 

I believe this is true, and I plan to merge the PR… but I also don’t know the history behind this, so I wanted to check and see if this rings a bell for anyone here!

 

thanks,

Demian

Levy, Michael

unread,
Mar 30, 2021, 11:34:57 AM3/30/21
to solrma...@googlegroups.com
Wow, Demian, this brought back some memories! Jonathan Rochkind figured this out and here is his blog which includes a workaround for Blacklight. Scroll down to "Blacklight: Deal with weird escaping of Marc21 binary"


If we don't need to do that any more, how nice! Thank you, Demian, and also again thanks to Jonathan!

--
You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to solrmarc-tec...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/solrmarc-tech/DM5PR03MB3082965F0DC97AF2819DA7DEE87D9%40DM5PR03MB3082.namprd03.prod.outlook.com.

Demian Katz

unread,
Mar 30, 2021, 2:00:14 PM3/30/21
to solrma...@googlegroups.com

Thanks, Michael! That’s interesting, though if anything, that just increases the mystery around what happened. Did Solr change? Did SolrMarc change? It’s unclear without playing with a bunch of different legacy versions of things to see where the behavior changed – a project I’m considering, but probably won’t have the time to tackle. I’ll let you know if I learn anything more, and I’ll be interested to know if anyone else can narrow this down any further.

 

- Demian

Levy, Michael

unread,
Mar 30, 2021, 8:40:01 PM3/30/21
to solrma...@googlegroups.com

Demian Katz

unread,
Mar 31, 2021, 9:20:27 AM3/31/21
to solrma...@googlegroups.com

Thanks for these helpful links; there’s also been some more discussion on the VuFind PR.

 

From all of this, it seems that there are two different things which have changed, either of which might explain the change in behavior (and again, both of which could be tested experimentally if time permitted – but mine currently does not):

 

  1. The links below talk about XML; perhaps this is a quirk of the Solr XML output handler. VuFind has subsequently switched to use JSON instead of XML, so the XML quirks would no longer be relevant.
  2. SolrMarc used to default to writing to Solr in binary mode through the SolrJ API, but that support was subsequently removed in favor of writing via HTTP transactions. If the strange encoding was related to write mode, that could also be a factor.

 

I think option 1 is perhaps more probable, but like I said, I haven’t had a chance to prove it. 😊

Michael Lackhoff

unread,
Mar 31, 2021, 11:24:14 AM3/31/21
to solrma...@googlegroups.com
Am 31.03.2021 um 15:20 schrieb Demian Katz:
> Thanks for these helpful links; there’s also been some more discussion on the VuFind PR.
>
> From all of this, it seems that there are two different things which have changed, either of which might explain the change in behavior (and again, both of which could be tested experimentally if time permitted – but mine currently does not):
>
>
> 1. The links below talk about XML; perhaps this is a quirk of the Solr XML output handler. VuFind has subsequently switched to use JSON instead of XML, so the XML quirks would no longer be relevant.
> 2. SolrMarc used to default to writing to Solr in binary mode through the SolrJ API, but that support was subsequently removed in favor of writing via HTTP transactions. If the strange encoding was related to write mode, that could also be a factor.
>
> I think option 1 is perhaps more probable, but like I said, I haven’t had a chance to prove it. 😊

I am quite sure it is option 1 and it would be very helpful if it would
keep working that way.
Reason is that the very powerful Indexdata tools like Metaproxy and
Simpleserver rely very much on SRU which only works with XML.
On my request they even built in support for this solrmarc speciality:

http://lists.indexdata.dk/pipermail/yazlist/2014-March/003909.html
has the request and
http://lists.indexdata.dk/pipermail/yazlist/2014-March/003911.html
the solution.

- Michael

> - Demian
>
> From: solrma...@googlegroups.com <solrma...@googlegroups.com> On Behalf Of Levy, Michael
> Sent: Tuesday, March 30, 2021 8:40 PM
> To: solrma...@googlegroups.com
> Subject: Re: [EXTERNAL] Re: [solrmarc-tech] Control characters 0x1D-0x1F
>
> I have found some postings circa 2010-2011 that might possibly shed some light because there are some references to escaping these characters.
>
> * https://groups.google.com/g/blacklight-development/c/HlbI7hzZF8E/m/5EAGAXPRD8gJ<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fg%2Fblacklight-development%2Fc%2FHlbI7hzZF8E%2Fm%2F5EAGAXPRD8gJ&data=04%7C01%7Cdemian.katz%40villanova.edu%7Cac825b780fb2482a366508d8f3dd863b%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C637527480068500089%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=bUj9j3rFiS4MTiwTy35rOVfYggJ11s%2F%2B62QqEpJSmhk%3D&reserved=0>
> * https://groups.google.com/g/blacklight-development/c/uP-cdvs7SbQ/m/AbPEcFDVcD0J<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fg%2Fblacklight-development%2Fc%2FuP-cdvs7SbQ%2Fm%2FAbPEcFDVcD0J&data=04%7C01%7Cdemian.katz%40villanova.edu%7Cac825b780fb2482a366508d8f3dd863b%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C637527480068500089%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=S%2Buq24T3e9g7J00r%2FmaJmmK1RsoCOHDzGF7FEvgAMsU%3D&reserved=0>
> * https://groups.google.com/g/blacklight-development/c/O255zjZPlcA/m/RFZwAielrkYJ<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fg%2Fblacklight-development%2Fc%2FO255zjZPlcA%2Fm%2FRFZwAielrkYJ&data=04%7C01%7Cdemian.katz%40villanova.edu%7Cac825b780fb2482a366508d8f3dd863b%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C637527480068510082%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=DbX9QBRORGbMiEzKiMdJ5%2BYIMb3YKZ0CPv%2FdSxRz034%3D&reserved=0>
>
>
> On Tue, Mar 30, 2021 at 2:00 PM Demian Katz <demia...@villanova.edu<mailto:demia...@villanova.edu>> wrote:
> Thanks, Michael! That’s interesting, though if anything, that just increases the mystery around what happened. Did Solr change? Did SolrMarc change? It’s unclear without playing with a bunch of different legacy versions of things to see where the behavior changed – a project I’m considering, but probably won’t have the time to tackle. I’ll let you know if I learn anything more, and I’ll be interested to know if anyone else can narrow this down any further.
>
> - Demian
>
> From: solrma...@googlegroups.com<mailto:solrma...@googlegroups.com> <solrma...@googlegroups.com<mailto:solrma...@googlegroups.com>> On Behalf Of Levy, Michael
> Sent: Tuesday, March 30, 2021 11:34 AM
> To: solrma...@googlegroups.com<mailto:solrma...@googlegroups.com>
> Subject: [EXTERNAL] Re: [solrmarc-tech] Control characters 0x1D-0x1F
>
> Wow, Demian, this brought back some memories! Jonathan Rochkind figured this out and here is his blog which includes a workaround for Blacklight. Scroll down to "Blacklight: Deal with weird escaping of Marc21 binary"
>
> https://bibwild.wordpress.com/2013/06/18/upgrading-a-blacklight-app-from-solr-1-4-to-solr-4-3/<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbibwild.wordpress.com%2F2013%2F06%2F18%2Fupgrading-a-blacklight-app-from-solr-1-4-to-solr-4-3%2F&data=04%7C01%7Cdemian.katz%40villanova.edu%7Cac825b780fb2482a366508d8f3dd863b%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C637527480068510082%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=NP0rCsmAM49s3a6kaLADYLUSZbadFCW6UGX5OstNyMA%3D&reserved=0>
>
> If we don't need to do that any more, how nice! Thank you, Demian, and also again thanks to Jonathan!
>
> On Tue, Mar 30, 2021 at 10:25 AM Demian Katz <demia...@villanova.edu<mailto:demia...@villanova.edu>> wrote:
> Hello,
>
> Since its earliest days, VuFind has had code which string replaces entities #29;, #30; and #31; in MARC data with their actual character equivalents, 0x1D, 0x1E and 0x1F. The comment in the code suggests that this is compensating for something that SolrMarc is doing.
>
> A pull request has recently been opened to remove this code, since it seems that it is no longer needed:
>
> https://github.com/vufind-org/vufind/pull/1900<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fvufind-org%2Fvufind%2Fpull%2F1900&data=04%7C01%7Cdemian.katz%40villanova.edu%7Cac825b780fb2482a366508d8f3dd863b%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C637527480068520076%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=TGjGvjBa%2BRS3WbF0RT4bIqANHvM5YTVDjymVe%2FtgOJI%3D&reserved=0>
>
> I believe this is true, and I plan to merge the PR… but I also don’t know the history behind this, so I wanted to check and see if this rings a bell for anyone here!
>
> thanks,
> Demian
> --
> You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to solrmarc-tec...@googlegroups.com<mailto:solrmarc-tec...@googlegroups.com>.
> To view this discussion on the web visit https://groups.google.com/d/msgid/solrmarc-tech/DM5PR03MB3082965F0DC97AF2819DA7DEE87D9%40DM5PR03MB3082.namprd03.prod.outlook.com<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fsolrmarc-tech%2FDM5PR03MB3082965F0DC97AF2819DA7DEE87D9%2540DM5PR03MB3082.namprd03.prod.outlook.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data=04%7C01%7Cdemian.katz%40villanova.edu%7Cac825b780fb2482a366508d8f3dd863b%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C637527480068520076%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=TA2sVFtA95x0wB69LsLxuCSh7H5i5dpg%2Fmq%2BXs4505A%3D&reserved=0>.
> --
> You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to solrmarc-tec...@googlegroups.com<mailto:solrmarc-tec...@googlegroups.com>.
> To view this discussion on the web visit https://groups.google.com/d/msgid/solrmarc-tech/CAF9-XV%2Byt%3D9MT-%3DV-dBa0XFRCP61bBkuu8vYrBh2gxLPNj-RMg%40mail.gmail.com<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fsolrmarc-tech%2FCAF9-XV%252Byt%253D9MT-%253DV-dBa0XFRCP61bBkuu8vYrBh2gxLPNj-RMg%2540mail.gmail.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data=04%7C01%7Cdemian.katz%40villanova.edu%7Cac825b780fb2482a366508d8f3dd863b%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C637527480068530073%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=FRwRXjicyIZ9Erku7MnZCc65PNOwpexyDlOwmnTFuag%3D&reserved=0>.
> --
> You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to solrmarc-tec...@googlegroups.com<mailto:solrmarc-tec...@googlegroups.com>.
> To view this discussion on the web visit https://groups.google.com/d/msgid/solrmarc-tech/DM5PR03MB308225DFDF89A6FC77FBC9B8E87D9%40DM5PR03MB3082.namprd03.prod.outlook.com<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fsolrmarc-tech%2FDM5PR03MB308225DFDF89A6FC77FBC9B8E87D9%2540DM5PR03MB3082.namprd03.prod.outlook.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data=04%7C01%7Cdemian.katz%40villanova.edu%7Cac825b780fb2482a366508d8f3dd863b%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C637527480068530073%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=W70Rxr0L5PDQutANTJPjHJvvPmy8aWqhenr9mmqmceE%3D&reserved=0>.
> --
> You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to solrmarc-tec...@googlegroups.com<mailto:solrmarc-tec...@googlegroups.com>.
> To view this discussion on the web visit https://groups.google.com/d/msgid/solrmarc-tech/CAF9-XVKQMQvtJTUPV-kHbWqmjswARyBRV1Du4-vZ1u9LNwQatw%40mail.gmail.com<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fsolrmarc-tech%2FCAF9-XVKQMQvtJTUPV-kHbWqmjswARyBRV1Du4-vZ1u9LNwQatw%2540mail.gmail.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data=04%7C01%7Cdemian.katz%40villanova.edu%7Cac825b780fb2482a366508d8f3dd863b%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C637527480068540070%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=KPVe2S2scntO9MJ%2Bg%2F0PqHpfJlkhLBcKz72NjZDK9DU%3D&reserved=0>.
>

Till Kinstler

unread,
Mar 31, 2021, 1:32:37 PM3/31/21
to solrma...@googlegroups.com
Am 31.03.21 um 17:23 schrieb Michael Lackhoff:
> Am 31.03.2021 um 15:20 schrieb Demian Katz:
>> Thanks for these helpful links; there’s also been some more discussion on the VuFind PR.
>>
>> From all of this, it seems that there are two different things which have changed, either of which might explain the change in behavior (and again, both of which could be tested experimentally if time permitted – but mine currently does not):
>>
>>
>> 1. The links below talk about XML; perhaps this is a quirk of the Solr XML output handler. VuFind has subsequently switched to use JSON instead of XML, so the XML quirks would no longer be relevant.
>> 2. SolrMarc used to default to writing to Solr in binary mode through the SolrJ API, but that support was subsequently removed in favor of writing via HTTP transactions. If the strange encoding was related to write mode, that could also be a factor.
>>
>> I think option 1 is perhaps more probable, but like I said, I haven’t had a chance to prove it. 😊
>
> I am quite sure it is option 1 and it would be very helpful if it would
> keep working that way.

Yes, I also seem to remember something with the XML Solr input or output
and MARC control characters.
I just tested this: Post a Solr JSON document with a ISO-MARC record
without replacing the control characters to a Solr collection in Solr
8.6.3, then retrieve it through the XML response writer. The ISOMARC
comes back with the control characters replaced by #29;, #30;, #31; in
the XMLish output (see below, relevant snippets of original XML and JSON
responses). While the JSON output has the original MARC control
characters. I have no idea, when Solr has learned this...
I'd still vote for keeping the replacement of the control characters at
least as an option in solrmarc for backwards compatibility (because it
seems, that is was once necessary, and there might be software setups
out there that expect/need it).

Till

Experiment with Solr 8.6.3, posted a MARC record without touching the
control characters and retrieved it through XML and JSON writers
respectively:

JSON writer result:
"fullrecord":"01734cam a22004932
4500001001000000003000700010005001700017007000300034008004100037015002100078016002200099020004800121035002200169035002500191035002000216040003100236041000800267041000800275044001000283084001300293100010100306240002300407245006400430250001200494264005000506300001700556336002600573337004600599338002500645655009800670689007600768689007500844689008600919689001101005700002001016700004601036912001501082912001401097912001201111951000701123980006201130984002501192985002301217\u001e124282148\u001eDE-627\u001e20190119145426.0\u001etu\u001e930526s1993
gw ||||| 00| ||ger c\u001e \u001fa93,A17,2161\u001f2dnb\u001e7
\u001fa930523938\u001f2DE-101\u001e \u001fa3458161937\u001fc: DM 38.00
(Pp.)\u001f93-458-16193-7\u001e \u001fa(DE-627)124282148\u001e
\u001fa(DE-599)GBV124282148\u001e \u001fa(OCoLC)75316021\u001e
\u001faDE-627\u001fbger\u001fcDE-627\u001ferakwb\u001e \u001fager\u001e
\u001fhheb\u001e \u001fcXA-DE\u001e \u001fa59\u001f2sdnb\u001e1
\u001faʿOz,
Amos\u001fd1939-2018\u001feverfasserin\u001f0(DE-588)118855379\u001f0(DE-627)079631088\u001f0(DE-576)165099275\u001f4aut\u001e10\u001faLada'at
ischa <dt>\u001e15\u001faEine Frau erkennen\u001fcAmos Oz. Aus dem Hebr.
von Ruth Achlama\u001e \u001fa2. Aufl\u001e 1\u001faFrankfurt am
Main\u001faLeipzig\u001fbInsel-Verl.\u001fc1993\u001e \u001fa318
S\u001fc21 cm\u001e \u001faText\u001fbtxt\u001f2rdacontent\u001e
\u001faohne Hilfsmittel zu benutzen\u001fbn\u001f2rdamedia\u001e
\u001faBand\u001fbnc\u001f2rdacarrier\u001e 7\u001faFiktionale
Darstellung\u001f0(DE-588)1071854844\u001f0(DE-627)82648378X\u001f0(DE-576)43337439X\u001f2gnd-content\u001e00\u001fDg\u001f0(DE-588)4027808-6\u001f0(DE-627)104767804\u001f0(DE-576)208972358\u001faIsrael\u001f2gnd\u001e01\u001fDs\u001f0(DE-588)4182341-2\u001f0(DE-627)105311219\u001f0(DE-576)210010959\u001faSpion\u001f2gnd\u001e02\u001fDs\u001f0(DE-588)4123184-3\u001f0(DE-627)104577681\u001f0(DE-576)209557559\u001faFamilienkonflikt\u001f2gnd\u001e0
\u001f5DE-101\u001e1 \u001f00590333232\u001f4oth\u001e12\u001faʿOz,
Amos\u001fd1939-2018\u001ftLada'at ischa <dt>\u001e
\u001faGBV_ILN_31\u001e \u001faSYSFLAG_1\u001e \u001faGBV_KXP\u001e
\u001faBO\u001e \u001f231\u001f101\u001fb220884811\u001ffMag\u001fd95 A
4813/1\u001feu\u001fx0027\u001fyn\u001fz14-07-00\u001e
\u001f231\u001f101\u001fa27$006869785\u001e
\u001f231\u001f101\u001faB93/R/4249\u001e\u001d",


XML writer result:
<str name="fullrecord">01734cam a22004932
4500001001000000003000700010005001700017007000300034008004100037015002100078016002200099020004800121035002200169035002500191035002000216040003100236041000800267041000800275044001000283084001300293100010100306240002300407245006400430250001200494264005000506300001700556336002600573337004600599338002500645655009800670689007600768689007500844689008600919689001101005700002001016700004601036912001501082912001401097912001201111951000701123980006201130984002501192985002301217#30;124282148#30;DE-627#30;20190119145426.0#30;tu#30;930526s1993
gw ||||| 00| ||ger c#30; #31;a93,A17,2161#31;2dnb#30;7
#31;a930523938#31;2DE-101#30; #31;a3458161937#31;c: DM 38.00
(Pp.)#31;93-458-16193-7#30; #31;a(DE-627)124282148#30;
#31;a(DE-599)GBV124282148#30; #31;a(OCoLC)75316021#30;
#31;aDE-627#31;bger#31;cDE-627#31;erakwb#30; #31;ager#30; #31;hheb#30;
#31;cXA-DE#30; #31;a59#31;2sdnb#30;1 #31;aʿOz,
Amos#31;d1939-2018#31;everfasserin#31;0(DE-588)118855379#31;0(DE-627)079631088#31;0(DE-576)165099275#31;4aut#30;10#31;aLada'at
ischa <dt>#30;15#31;aEine Frau erkennen#31;cAmos Oz. Aus dem Hebr. von
Ruth Achlama#30; #31;a2. Aufl#30; 1#31;aFrankfurt am
Main#31;aLeipzig#31;bInsel-Verl.#31;c1993#30; #31;a318 S#31;c21 cm#30;
#31;aText#31;btxt#31;2rdacontent#30; #31;aohne Hilfsmittel zu
benutzen#31;bn#31;2rdamedia#30; #31;aBand#31;bnc#31;2rdacarrier#30;
7#31;aFiktionale
Darstellung#31;0(DE-588)1071854844#31;0(DE-627)82648378X#31;0(DE-576)43337439X#31;2gnd-content#30;00#31;Dg#31;0(DE-588)4027808-6#31;0(DE-627)104767804#31;0(DE-576)208972358#31;aIsrael#31;2gnd#30;01#31;Ds#31;0(DE-588)4182341-2#31;0(DE-627)105311219#31;0(DE-576)210010959#31;aSpion#31;2gnd#30;02#31;Ds#31;0(DE-588)4123184-3#31;0(DE-627)104577681#31;0(DE-576)209557559#31;aFamilienkonflikt#31;2gnd#30;0
#31;5DE-101#30;1 #31;00590333232#31;4oth#30;12#31;aʿOz,
Amos#31;d1939-2018#31;tLada'at ischa <dt>#30; #31;aGBV_ILN_31#30;
#31;aSYSFLAG_1#30; #31;aGBV_KXP#30; #31;aBO#30;
#31;231#31;101#31;b220884811#31;fMag#31;d95 A
4813/1#31;eu#31;x0027#31;yn#31;z14-07-00#30;
#31;231#31;101#31;a27$006869785#30;
#31;231#31;101#31;aB93/R/4249#30;#29;</str>



--
Till Kinstler
Verbundzentrale des Gemeinsamen Bibliotheksverbundes (VZG)
Platz der Göttinger Sieben 1, D 37073 Göttingen
kins...@gbv.de, +49 (0) 551 39-31414, http://www.gbv.de/

Demian Katz

unread,
Mar 31, 2021, 5:33:08 PM3/31/21
to solrma...@googlegroups.com
Thanks, Till, that's very helpful!

I don't think SolrMarc has ever done this replacement... I think some discovery layers have just added code to do the decoding based on the XML handler behavior... and it sounds like that code can be removed if wt=json is being used in place of wt=xml. (And I'd advocate for its removal in those situations, since it can cause unintended side effects when it is not actually needed).

- Demian

-----Original Message-----
From: solrma...@googlegroups.com <solrma...@googlegroups.com> On Behalf Of Till Kinstler
Sent: Wednesday, March 31, 2021 1:33 PM
To: solrma...@googlegroups.com
Subject: Re: [EXTERNAL] Re: [solrmarc-tech] Control characters 0x1D-0x1F

Verbundzentrale des Gemeinsamen Bibliotheksverbundes (VZG) Platz der Göttinger Sieben 1, D 37073 Göttingen kins...@gbv.de, +49 (0) 551 39-31414, https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gbv.de%2F&amp;data=04%7C01%7Cdemian.katz%40villanova.edu%7C992576f580784e713d8f08d8f46afaad%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C637528087601088598%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=iULG1XH8WZ%2BkszQKoYcOObJ2p%2FNYiXFKF5XwwoskWMk%3D&amp;reserved=0

--
You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to solrmarc-tec...@googlegroups.com.
To view this discussion on the web visit https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fsolrmarc-tech%2F92ccf624-0520-50ac-4ee6-a53ef1b1fa04%2540gbv.de&amp;data=04%7C01%7Cdemian.katz%40villanova.edu%7C992576f580784e713d8f08d8f46afaad%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C637528087601088598%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=qqYWkEw2P0zNfADzmrlo5nsI%2F2cTEmNAZNTmopzheSE%3D&amp;reserved=0.

Till Kinstler

unread,
Apr 1, 2021, 3:01:43 AM4/1/21
to solrma...@googlegroups.com
It seems, solrmarc < 3 did the replacement, but solrmarc 3 does not; see
the two Solr JSON output results below. I think, actually it is / was
marc4j doing this. Someone up for some code archeology? (I am not sure,
if I'll find the time in the next days)

Till

Post a MARC record to Solr 8.6.3 through FullRecordAsMarc with
solrmarc 3 (latest master code from github), retrieve it through JSON
response writer:

"fullrecord":"01734cam a22004932
4500001001000000003000700010005001700017007000300034008004100037015002100078016002200099020004800121035002200169035002500191035002000216040003100236041000800267041000800275044001000283084001300293100010100306240002300407245006400430250001200494264005000506300001700556336002600573337004600599338002500645655009800670689007600768689007500844689008600919689001101005700002001016700004601036912001501082912001401097912001201111951000701123980006201130984002501192985002301217\u001e124282148\u001eDE-627\u001e20190119145426.0\u001etu\u001e930526s1993
gw ||||| 00| ||ger c\u001e \u001fa93,A17,2161\u001f2dnb\u001e7
\u001fa930523938\u001f2DE-101\u001e \u001fa3458161937\u001fc: DM 38.00
(Pp.)\u001f93-458-16193-7\u001e \u001fa(DE-627)124282148\u001e
\u001fa(DE-599)GBV124282148\u001e \u001fa(OCoLC)75316021\u001e
\u001faDE-627\u001fbger\u001fcDE-627\u001ferakwb\u001e \u001fager\u001e
\u001fhheb\u001e \u001fcXA-DE\u001e \u001fa59\u001f2sdnb\u001e1
\u001faʿOz,
Amos\u001fd1939-2018\u001feverfasserin\u001f0(DE-588)118855379\u001f0(DE-627)079631088\u001f0(DE-576)165099275\u001f4aut\u001e10\u001faLada'at
ischa <dt>\u001e15\u001faEine Frau erkennen\u001fcAmos Oz. Aus dem Hebr.
von Ruth Achlama\u001e \u001fa2. Aufl\u001e 1\u001faFrankfurt am
Main\u001faLeipzig\u001fbInsel-Verl.\u001fc1993\u001e \u001fa318
S\u001fc21 cm\u001e \u001faText\u001fbtxt\u001f2rdacontent\u001e
\u001faohne Hilfsmittel zu benutzen\u001fbn\u001f2rdamedia\u001e
\u001faBand\u001fbnc\u001f2rdacarrier\u001e 7\u001faFiktionale
Darstellung\u001f0(DE-588)1071854844\u001f0(DE-627)82648378X\u001f0(DE-576)43337439X\u001f2gnd-content\u001e00\u001fDg\u001f0(DE-588)4027808-6\u001f0(DE-627)104767804\u001f0(DE-576)208972358\u001faIsrael\u001f2gnd\u001e01\u001fDs\u001f0(DE-588)4182341-2\u001f0(DE-627)105311219\u001f0(DE-576)210010959\u001faSpion\u001f2gnd\u001e02\u001fDs\u001f0(DE-588)4123184-3\u001f0(DE-627)104577681\u001f0(DE-576)209557559\u001faFamilienkonflikt\u001f2gnd\u001e0
\u001f5DE-101\u001e1 \u001f00590333232\u001f4oth\u001e12\u001faʿOz,
Amos\u001fd1939-2018\u001ftLada'at ischa <dt>\u001e
\u001faGBV_ILN_31\u001e \u001faSYSFLAG_1\u001e \u001faGBV_KXP\u001e
\u001faBO\u001e \u001f231\u001f101\u001fb220884811\u001ffMag\u001fd95 A
4813/1\u001feu\u001fx0027\u001fyn\u001fz14-07-00\u001e
\u001f231\u001f101\u001fa27$006869785\u001e
\u001f231\u001f101\u001faB93/R/4249\u001e\u001d",


solrmarc < 3 ("solrmarc 2.something"; must be the latest pre-3 code)
"fullrecord":"01734cam a22004932
4500001001000000003000700010005001700017007000300034008004100037015002100078016002200099020004800121035002200169035002500191035002000216040003100236041000800267041000800275044001000283084001300293100010100306240002300407245006400430250001200494264005000506300001700556336002600573337004600599338002500645655009800670689007600768689007500844689008600919689001101005700002001016700004601036912001501082912001401097912001201111951000701123980006201130984002501192985002301217#30;124282148#30;DE-627#30;20190119145426.0#30;tu#30;930526s1993
gw ||||| 00| ||ger c#30; #31;a93,A17,2161#31;2dnb#30;7
#31;a930523938#31;2DE-101#30; #31;a3458161937#31;c: DM 38.00
(Pp.)#31;93-458-16193-7#30; #31;a(DE-627)124282148#30;
#31;a(DE-599)GBV124282148#30; #31;a(OCoLC)75316021#30;
#31;aDE-627#31;bger#31;cDE-627#31;erakwb#30; #31;ager#30; #31;hheb#30;
#31;cXA-DE#30; #31;a59#31;2sdnb#30;1 #31;aʿOz,
Amos#31;d1939-2018#31;everfasserin#31;0(DE-588)118855379#31;0(DE-627)079631088#31;0(DE-576)165099275#31;4aut#30;10#31;aLada'at
ischa <dt>#30;15#31;aEine Frau erkennen#31;cAmos Oz. Aus dem Hebr. von
Ruth Achlama#30; #31;a2. Aufl#30; 1#31;aFrankfurt am
Main#31;aLeipzig#31;bInsel-Verl.#31;c1993#30; #31;a318 S#31;c21 cm#30;
#31;aText#31;btxt#31;2rdacontent#30; #31;aohne Hilfsmittel zu
benutzen#31;bn#31;2rdamedia#30; #31;aBand#31;bnc#31;2rdacarrier#30;
7#31;aFiktionale
Darstellung#31;0(DE-588)1071854844#31;0(DE-627)82648378X#31;0(DE-576)43337439X#31;2gnd-content#30;00#31;Dg#31;0(DE-588)4027808-6#31;0(DE-627)104767804#31;0(DE-576)208972358#31;aIsrael#31;2gnd#30;01#31;Ds#31;0(DE-588)4182341-2#31;0(DE-627)105311219#31;0(DE-576)210010959#31;aSpion#31;2gnd#30;02#31;Ds#31;0(DE-588)4123184-3#31;0(DE-627)104577681#31;0(DE-576)209557559#31;aFamilienkonflikt#31;2gnd#30;0
#31;5DE-101#30;1 #31;00590333232#31;4oth#30;12#31;aʿOz,
Amos#31;d1939-2018#31;tLada'at ischa <dt>#30; #31;aGBV_ILN_31#30;
#31;aSYSFLAG_1#30; #31;aGBV_KXP#30; #31;aBO#30;
#31;231#31;101#31;b220884811#31;fMag#31;d95 A
4813/1#31;eu#31;x0027#31;yn#31;z14-07-00#30;
#31;231#31;101#31;a27$006869785#30; #31;231#31;101#31;aB93/R/4249#30;#29;",

Am 31.03.21 um 23:33 schrieb Demian Katz:

Demian Katz

unread,
Apr 1, 2021, 10:29:46 AM4/1/21
to solrma...@googlegroups.com
On the VuFind PR, Ere Maijala pointed out this code in Solr which appears to be doing the escaping:

https://github.com/apache/solr/blob/main/solr/solrj/src/java/org/apache/solr/common/util/XML.java#L35

Is it possible that SolrMarc did the escaping for a time, and then when Solr started doing it internally, the code was removed?

If wt=xml responses from recent Solr versions are returning the escaped values without the escaping being done by SolrMarc itself, that would seem to suggest that we don't need the logic in SolrMarc, and are better off letting Solr take care of it. Consuming applications shouldn't care who did the escaping as long as they get valid XML responses back.

Ere also points out that if Solr could be updated to support XML 1.1, this whole problem with illegal control characters might be able to be eliminated -- but that's a different can of worms.
> DE-627)079631088\u001f0(DE-576)165099275\u001f4aut\u001e10\u001faLada'
> at ischa <dt>\u001e15\u001faEine Frau erkennen\u001fcAmos Oz. Aus dem
> Hebr.
> von Ruth Achlama\u001e \u001fa2. Aufl\u001e 1\u001faFrankfurt am
> Main\u001faLeipzig\u001fbInsel-Verl.\u001fc1993\u001e \u001fa318
> S\u001fc21 cm\u001e \u001faText\u001fbtxt\u001f2rdacontent\u001e
> \u001faohne Hilfsmittel zu benutzen\u001fbn\u001f2rdamedia\u001e
> \u001faBand\u001fbnc\u001f2rdacarrier\u001e 7\u001faFiktionale
> Darstellung\u001f0(DE-588)1071854844\u001f0(DE-627)82648378X\u001f0(DE
> -576)43337439X\u001f2gnd-content\u001e00\u001fDg\u001f0(DE-588)4027808
> -6\u001f0(DE-627)104767804\u001f0(DE-576)208972358\u001faIsrael\u001f2
> gnd\u001e01\u001fDs\u001f0(DE-588)4182341-2\u001f0(DE-627)105311219\u0
> 01f0(DE-576)210010959\u001faSpion\u001f2gnd\u001e02\u001fDs\u001f0(DE-
> 588)4123184-3\u001f0(DE-627)104577681\u001f0(DE-576)209557559\u001faFa
> milienkonflikt\u001f2gnd\u001e0
> \u001f5DE-101\u001e1 \u001f00590333232\u001f4oth\u001e12\u001faʿOz,
> Amos\u001fd1939-2018\u001ftLada'at ischa <dt>\u001e
> \u001faGBV_ILN_31\u001e \u001faSYSFLAG_1\u001e \u001faGBV_KXP\u001e
> \u001faBO\u001e \u001f231\u001f101\u001fb220884811\u001ffMag\u001fd95
> A 4813/1\u001feu\u001fx0027\u001fyn\u001fz14-07-00\u001e
> \u001f231\u001f101\u001fa27$006869785\u001e
> \u001f231\u001f101\u001faB93/R/4249\u001e\u001d",
>
>
> XML writer result:
> <str name="fullrecord">01734cam a22004932
> 4500001001000000003000700010005001700017007000300034008004100037015002
> 1000780160022000990200048001210350022001690350025001910350020002160400
> 0310023604100080026704100080027504400100028308400130029310001010030624
> 0002300407245006400430250001200494264005000506300001700556336002600573
> 3370046005993380025006456550098006706890076007686890075008446890086009
> 1968900110100570000200101670000460103691200150108291200140109791200120
> 1111951000701123980006201130984002501192985002301217#30;124282148#30;D
> E-627#30;20190119145426.0#30;tu#30;930526s1993
> gw ||||| 00| ||ger c#30; #31;a93,A17,2161#31;2dnb#30;7
> #31;a930523938#31;2DE-101#30; #31;a3458161937#31;c: DM 38.00
> (Pp.)#31;93-458-16193-7#30; #31;a(DE-627)124282148#30;
> #31;a(DE-599)GBV124282148#30; #31;a(OCoLC)75316021#30;
> #31;aDE-627#31;bger#31;cDE-627#31;erakwb#30; #31;ager#30; #31;hheb#30;
> #31;cXA-DE#30; #31;a59#31;2sdnb#30;1 #31;aʿOz,
> Amos#31;d1939-2018#31;everfasserin#31;0(DE-588)118855379#31;0(DE-627)0
> 79631088#31;0(DE-576)165099275#31;4aut#30;10#31;aLada'at
> ischa <dt>#30;15#31;aEine Frau erkennen#31;cAmos Oz. Aus dem Hebr. von
> Ruth Achlama#30; #31;a2. Aufl#30; 1#31;aFrankfurt am
> Main#31;aLeipzig#31;bInsel-Verl.#31;c1993#30; #31;a318 S#31;c21 cm#30;
> #31;aText#31;btxt#31;2rdacontent#30; #31;aohne Hilfsmittel zu
> benutzen#31;bn#31;2rdamedia#30; #31;aBand#31;bnc#31;2rdacarrier#30;
> 7#31;aFiktionale
> Darstellung#31;0(DE-588)1071854844#31;0(DE-627)82648378X#31;0(DE-576)4
> 3337439X#31;2gnd-content#30;00#31;Dg#31;0(DE-588)4027808-6#31;0(DE-627
> )104767804#31;0(DE-576)208972358#31;aIsrael#31;2gnd#30;01#31;Ds#31;0(D
> E-588)4182341-2#31;0(DE-627)105311219#31;0(DE-576)210010959#31;aSpion#
> 31;2gnd#30;02#31;Ds#31;0(DE-588)4123184-3#31;0(DE-627)104577681#31;0(D
> E-576)209557559#31;aFamilienkonflikt#31;2gnd#30;0
> #31;5DE-101#30;1 #31;00590333232#31;4oth#30;12#31;aʿOz,
> Amos#31;d1939-2018#31;tLada'at ischa <dt>#30; #31;aGBV_ILN_31#30;
> #31;aSYSFLAG_1#30; #31;aGBV_KXP#30; #31;aBO#30;
> #31;231#31;101#31;b220884811#31;fMag#31;d95 A
> 4813/1#31;eu#31;x0027#31;yn#31;z14-07-00#30;
> #31;231#31;101#31;a27$006869785#30;
> #31;231#31;101#31;aB93/R/4249#30;#29;</str>
>
>
>
> --
> Till Kinstler
> Verbundzentrale des Gemeinsamen Bibliotheksverbundes (VZG) Platz der
> Göttinger Sieben 1, D 37073 Göttingen kins...@gbv.de, +49 (0) 551
> 39-31414,
> https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.g
> bv.de%2F&amp;data=04%7C01%7Cdemian.katz%40villanova.edu%7C1be5cd07171a
> 405d172608d8f4dc02ed%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C6375
> 28573085926281%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2l
> uMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=EhwqaU95AROlSAehA
> qDg%2FvFWA7QD8RDMsuWo%2BndnjtA%3D&amp;reserved=0
>
> --
> You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to solrmarc-tec...@googlegroups.com.
> To view this discussion on the web visit https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fsolrmarc-tech%2F92ccf624-0520-50ac-4ee6-a53ef1b1fa04%2540gbv.de&amp;data=04%7C01%7Cdemian.katz%40villanova.edu%7C1be5cd07171a405d172608d8f4dc02ed%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C637528573085926281%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=fMn5o6SvksvLlD%2FUK5TuU%2ByXC2yg7lv53s7ixULn3ig%3D&amp;reserved=0.
>

--
You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to solrmarc-tec...@googlegroups.com.
To view this discussion on the web visit https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fsolrmarc-tech%2F19888f27-6e7f-2d58-3698-fb85f34dcc5e%2540gmail.com&amp;data=04%7C01%7Cdemian.katz%40villanova.edu%7C1be5cd07171a405d172608d8f4dc02ed%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C637528573085926281%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=zdtete1uacZt3Yhj%2BNXNRBa1LLUlJhHLYcs4biqoDuI%3D&amp;reserved=0.
Reply all
Reply to author
Forward
0 new messages