"arFindingAidJob": ERROR(SAXON): SXXP0003: Error reported by XML parser: Premature end of file.

612 views
Skip to first unread message

urph...@gmail.com

unread,
Dec 7, 2021, 10:38:18 AM12/7/21
to AtoM Users
Hi Support
We are getting follow error when we generating finding aid:

[info] [2021-12-07 11:00:45] Job 988236 "arFindingAidJob": Job started.
[info] [2021-12-07 11:00:45] Job 988236 "arFindingAidJob": Generating finding aid (40-21)...
[info] [2021-12-07 11:01:21] Job 988236 "arFindingAidJob": Running: java -jar '/usr/share/nginx/atom/lib/task/pdf/saxon9he.jar' -s:'/tmp/phpLUtu4d' -xsl:'/usr/share/nginx/atom/lib/task/pdf/ead-pdf-inventory-summary.xsl' -o:'/tmp/phpuEEnDQ' 2>&1
[info] [2021-12-07 11:01:23] Job 988236 "arFindingAidJob": Transforming the EAD with Saxon has failed.
[info] [2021-12-07 11:01:23] Job 988236 "arFindingAidJob": ERROR(SAXON): Error on line 2 column 1 of phpLUtu4d:
[info] [2021-12-07 11:01:23] Job 988236 "arFindingAidJob": ERROR(SAXON): SXXP0003: Error reported by XML parser: Premature end of file.
[info] [2021-12-07 11:01:23] Job 988236 "arFindingAidJob": ERROR(SAXON): Transformation failed: Run-time errors were reported
[info] [2021-12-07 11:01:23] Job 988236 "arFindingAidJob": Job finished.

We have:
FOP Version 2.1

openjdk version "1.8.0_292"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_292-b10)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.292-b10, mixed mode)

PHP 7.2.33-1+0~20200807.47+debian10~1.gbpcb3068 (cli) (built: Aug  7 2020 14:56:51) ( NTS )
Copyright (c) 1997-2018 The PHP Group
Zend Engine v3.2.0, Copyright (c) 1998-2018 Zend Technologies
    with Zend OPcache v7.2.33-1+0~20200807.47+debian10~1.gbpcb3068, Copyright (c) 1999-2018, by Zend Technologies
Please, we need your help.
Thanks.

Jimena

Dan Gillean

unread,
Dec 7, 2021, 12:00:40 PM12/7/21
to ICA-AtoM Users
Hi Jimena, 

A couple of quick diagnostic questions: 
  1. Can you export the target descriptive hierarchy as EAD XML?
  2. Can you generate finding aids for other archival units?
Hopefully these questions will help us narrow down the issue. 

We also have a few suggestions in the documentation here: 
In particular, I suggest you check for these things from the documentation recommendations: 

This means your EAD may fail to export properly if:
  • You’ve used unescaped special characters, such as ampersands & or < and >.
  • You’ve used inline HTML elements to style the display of some fields in AtoM - for example, using <em> or <i> elements for emphasis or italics.
  • You’ve cut and pasted non UTF-8 encoded characters into AtoM - a common example would be the curvy quotation marks used in many word processing applications like Microsoft Word, instead of the standard UTF-8 straight quotes "


Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him


--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/012f0afb-758e-4fde-b7d2-324f8763f05en%40googlegroups.com.

Jimena Vaca Jurado

unread,
Dec 7, 2021, 12:46:48 PM12/7/21
to ica-ato...@googlegroups.com
Hi Dan,
Answering the quick diagnostic:

1.  Can you export the target descriptive hierarchy as EAD XML?
Yes, we can. 
<ead>
<eadheader langencoding="iso639-2b" countryencoding="iso3166-1" dateencoding="iso8601" repositoryencoding="iso15511" scriptencoding="iso15924" relatedencoding="DC">
<eadid identifier="grm-mych" countrycode="BO" mainagencycode="ABNB," url="http://34.122.142.167/index.php/grm-mych" encodinganalog="identifier">GRM MyCh</eadid>
<filedesc>
<titlestmt>
<titleproper encodinganalog="title">Mojos y Chiquitos</titleproper>
</titlestmt>
<publicationstmt>
<publisher encodinganalog="publisher">Archivo y Biblioteca Nacionales de Bolivia</publisher>
<address>
<addressline>Calle Dalence #4</addressline>
<addressline>Sucre</addressline>
<addressline>Bolivia</addressline>
</address>
<date normal="2021-10-14" encodinganalog="date">2021-10-14</date>
</publicationstmt>
</filedesc>
<profiledesc>
<creation>
Generado por Access to Memory (AtoM) 2.6.4\n
<date normal="2021-12-07">2021-12-07 16:43 UTC</date>
</creation>
<langusage>
<language langcode="spa">español</language>
</langusage>
</profiledesc>
</eadheader>
<archdesc level="collection" relatedencoding="ISAD(G)v2">
<did>
<unittitle encodinganalog="3.1.2">Mojos y Chiquitos</unittitle>
<unitid encodinganalog="3.1.1" countrycode="BO" repositorycode="ABNB,">GRM MyCh</unitid>
<unitdate normal="1758/1887" encodinganalog="3.1.3">1758 - 1887</unitdate>
<physdesc encodinganalog="3.1.5"> 939 unidades documentales, papel. </physdesc>
<repository>
<corpname>Archivo y Biblioteca Nacionales de Bolivia</corpname>
<address>
<addressline>Calle Dalence #4</addressline>
<addressline>Sucre</addressline>
<addressline>Bolivia</addressline>
</address>
</repository>
<origination encodinganalog="3.2.1">
<persname id="atom_981589_actor">Moreno, Gabriel René</persname>
</origination>
</did>
<bioghist id="md5-16c9e0177c4fc62cb1b736e4a54c439d" encodinganalog="3.2.2">
<note>
<p>(Santa Cruz 1836 – Valparaíso-Chile 1908) Historiador y bibliógrafo. Hijo de Gabriel José Moreno Roca y de Sinforosa del Rivero, estudió secundaria en el colegio Junín de Sucre (1851-1855) y Derecho en la Universidad de Chile (1858-1865), recibiéndose de abogado en 1866. Trabajó como profesor y bibliotecario en el Instituto Nacional de Santiago, incidentalmente desempeñó la secretaría de la legación de Bolivia (1871-1873). Estuvo en Sucre en dos oportunidades, una para acopiar información (1871 y 1874-1875) y otra para defenderse de las acusaciones que caían sobre él (1880). La ausencia física fue colmada ampliamente con la dedicación de toda su obra de investigador a Bolivia, esta se puede agrupar en los rubros principales siguientes: 1) Bibliográfico, en que recogió los catálogos de su biblioteca boliviana personal (libros, folletos y periódicos); el de la Biblioteca Nacional chilena en materia peruana; 2) archivística, con el catálogo de la documentación de Mojos y Chiquitos que salvó de la destrucción, ordenó y devolvió al Estado boliviano; 3) Historiográfica, cuya cumbre es “Últimos días coloniales en el alto Perú”; 4) Crítica, en su juventud se dedicó a analizar a los poetas románticos. Su aporte al conocimiento del país ha sido trascendental sin que hasta el momento de su muerte nadie pueda igualarlo.</p>
</note>
</bioghist>

2. Can you generate finding aids for other archival units?
 
No, we can't generate other finding aids. We have only two archival units.

I am trying to resolve this issue around three days ago, currently I am following this advices:


But, when I execute this command I getting the message below:

java -jar lib/task/pdf/saxon9he.jar -s:'/tmp/test1.xml' -xsl:'lib/task/pdf/ead-pdf-full-details.xsl' -o:'./test.fo'
Error on line 2 column 112 of test1.xml:
  SXXP0003: Error reported by XML parser: White spaces are required between publicId and systemId.

Transformation failed: Run-time errors were reported

I hope you can help and guide me, please.

Thanks.

Jimena

You received this message because you are subscribed to a topic in the Google Groups "AtoM Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ica-atom-users/50pZ142ivaY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/CAC1FhZL_vpUu4YYFMQ7O-U%2B2nEUyyNfTzQNW%3DkHbEuez5EiogA%40mail.gmail.com.


--
Jimena Vaca Jurado

Dan Gillean

unread,
Dec 7, 2021, 4:30:25 PM12/7/21
to ICA-AtoM Users
Hi again Jimena, 

Is that the whole of the EAD XML output when you try to export the description?

If so, then the original error message makes sense to me - this EAD file is incomplete. There are several open elements ( such as the <archdesc> element, and the opening <ead> element) that do not have corresponding closing elements in the file currently. It's possible this would trigger a message like the one you reported: "Error reported by XML parser: Premature end of file."

As for why the EAD is being truncated:

I would recommend that you revisit the last suggestion from my first email, and check the contents of your description. Did you copy and paste this content from another source, such as a Microsoft Word document for example? If yes, then it may contain non UTF-8 characters that break the XML rendering. 

For example, in the <bioghist> element I can see some of the non UTF-8 "curly" quotation marks: 

...ordenó y devolvió al Estado boliviano; 3) Historiográfica, cuya cumbre es Últimos días coloniales en el alto Perú; 4) Crítica, en su juventud se dedicó a analizar a los poetas románticos. Su aporte al conocimiento del país ha sido trascendental sin que hasta el momento de su muerte nadie pueda igualarlo.</p>

If you need to cut and paste content from another source, I recommend finding a way to ensure it is UTF-8 first. For example, you'll find many online guides on how to ensure that Word documents are saved in UTF-8; here are two resources I found quickly: 
There are also online encoding converters where you can paste content. 

Don't forget the other suggestions in the first email as well - check for < > characters or HTML elements in the description and remove them if possible, since they can break the conversion to XML. 

I hope this helps to resolve the issue! Let us know how it goes. 

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him

Reply all
Reply to author
Forward
0 new messages