arFindingAidJob": ERROR(SAXON): Transformation failed: Run-time errors were reported

20 views
Skip to first unread message

shiva naik

unread,
Nov 17, 2021, 11:55:39 AMNov 17
to AtoM Users
Hi Support

We are getting follow error while generating finding aid, can anyone please help?

[info] [2021-11-17 08:46:01] Job 68668 "arFindingAidJob": Job started.
[info] [2021-11-17 08:46:01] Job 68668 "arFindingAidJob": Generating finding aid (d-bpc)...
[info] [2021-11-17 08:46:02] Job 68668 "arFindingAidJob": Running: java -jar '/usr/share/nginx/atom/lib/task/pdf/saxon9he.jar' -s:'/tmp/phpGaoeAS' -xsl:'/usr/share/nginx/atom/lib/task/pdf/ead-pdf-inventory-summary.xsl' -o:'/tmp/phpbLXP7V' 2>&1
[info] [2021-11-17 08:46:05] Job 68668 "arFindingAidJob": Transforming the EAD with Saxon has failed.
[info] [2021-11-17 08:46:05] Job 68668 "arFindingAidJob": ERROR(SAXON): Error
[info] [2021-11-17 08:46:06] Job 68668 "arFindingAidJob": ERROR(SAXON): I/O error reported by XML parser processing file:/tmp/phpGaoeAS: Server returned HTTP
[info] [2021-11-17 08:46:06] Job 68668 "arFindingAidJob": ERROR(SAXON): response code: 403 for URL: http://lcweb2.loc.gov/xmlcommon/dtds/ead2002/ead.dtd
[info] [2021-11-17 08:46:06] Job 68668 "arFindingAidJob": ERROR(SAXON): Transformation failed: Run-time errors were reported
[info] [2021-11-17 08:46:06] Job 68668 "arFindingAidJob": Job finished.

Many thanks
Shiva

Dan Gillean

unread,
Nov 17, 2021, 2:27:53 PMNov 17
to ICA-AtoM Users
Hi Shiva, 

We have seen a similar issue to this in the past, but not with this exact output - namely the 403 error. 

The problem appears to be occurring when the SAXON parser attempts to follow the URI provided in the EAD file's DOCTYPE header to the canonical EAD 2002 DTD maintained by the Library of Congress, at: http://lcweb2.loc.gov/xmlcommon/dtds/ead2002/ead.dtd

Previously, during EAD imports we sometimes had issues with this - the Library of Congress servers would be down or unavailable, and then EAD import would fail. Eventually, we addressed this by storing a local copy of the EAD 2002 DTD in AtoM itself, so it no longer needs LoC to be available to complete an import. 

However, when we export EAD, we want it to be valid and usable outside of AtoM, so following the expected conventions, we add the LoC DTD URI to the DOCTYPE and EAD header information, rather than a path to a locally stored file in AtoM. 

When AtoM generates a finding aid, it first generates the EAD, and then it uses XSLT stylesheets to transform that XML into a PDF (or an RTF document, depending on your settings). So, the EAD XML is the basis used for generating the finding aid. 

We have seen cases where the same timeout issues occur trying to reach the LoC DTD during finding aid generation, and have an issue filed for this: 
However, I see that the error returned in your case is not exactly the same - rather than the process timing out while trying to reach the Library of Congress DTD, it actually returned a 403 error. A 403 HTTP status code typically means "forbidden" - i.e. access denied. 

Now, the EAD DTD *should* be public so this is a strange outcome. There are two possible reasons I can think of immediately: 
  1. The Library of Congress server hosting the EAD 2002 DTD is down or has been moved etc and the old address is now forbidden
  2. Your site is on a VPN or behind a firewall that does not allow public access
Regarding 1:

I have checked, and it seems that the URI used in AtoM is in fact out of date - when you try to follow http://lcweb2.loc.gov/xmlcommon/dtds/ead2002/ead.dtd it actually redirects you to: https://memory.loc.gov/xmlcommon/dtds/ead2002/ead.dtd. Nevertheless the DTD is there and remains publicly accessible, and the redirect works as expected in a web browser.  

Additionally: A) I have checked, and finding aid generation is working for me in 2.6. If the redirect were the issue, then no one using AtoM should be able to generate finding aids without encountering this issue. B) a 403 is not a typical response when a redirect is encountered, even if there is some reason why the CLI task can't follow it. It's also not the response you would expect if the problem was from the LoC server being temporarily down - such an error wouldn't return a 403 Forbidden status code, but would look more like the timeout output shown on issue #13247.  This suggests to me that there is something else going on. 

So far, based on this, my guess would be that issue 2 might be what's the case. Does your AtoM instance have access to the public web?

Regardless of your answer or whether or not hypothesis 2 is even correct, I have another idea that might help work around the issue. 

Returning to issue #13247 for a moment: we haven't yet found an ideal solution to this issue, since we still want to conform to the EAD and XML conventions and properly reference the publicly available DTD in the EAD's header information. 

In the meantime, there is a proposed workaround on the ticket (see note-4) that may work in your case, as it removes the need to contact LoC: 

Workaround (removing the 2nd line in plugins/sfEadPlugin/modules/sfEadPlugin/templates/indexSuccessHeader.xml.php ) helped with the issue.

Here is the file: 
As you can see, it's just 2 lines. You can comment out the second one locally and see if that solves the issue. In PHP, you can comment out lines by adding two slashes before it, like so: 

<?php echo '<?xml version="1.0" encoding="'.sfConfig::get('sf_charset', 'UTF-8')."\" ?>\n" ?>
// <!DOCTYPE ead PUBLIC "+//ISBN 1-931666-00-8//DTD ead.dtd (Encoded Archival Description (EAD) Version 2002)//EN" "http://lcweb2.loc.gov/xmlcommon/dtds/ead2002/ead.dtd">


Let us know if this helps, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him


--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/93b1bb72-959e-4e3d-ae2a-a587f3de866bn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages