arFindingAidJob": ERROR(SAXON): Transformation failed: Run-time errors were reported

385 views
Skip to first unread message

shiva naik

unread,
Nov 17, 2021, 11:55:39 AM11/17/21
to AtoM Users
Hi Support

We are getting follow error while generating finding aid, can anyone please help?

[info] [2021-11-17 08:46:01] Job 68668 "arFindingAidJob": Job started.
[info] [2021-11-17 08:46:01] Job 68668 "arFindingAidJob": Generating finding aid (d-bpc)...
[info] [2021-11-17 08:46:02] Job 68668 "arFindingAidJob": Running: java -jar '/usr/share/nginx/atom/lib/task/pdf/saxon9he.jar' -s:'/tmp/phpGaoeAS' -xsl:'/usr/share/nginx/atom/lib/task/pdf/ead-pdf-inventory-summary.xsl' -o:'/tmp/phpbLXP7V' 2>&1
[info] [2021-11-17 08:46:05] Job 68668 "arFindingAidJob": Transforming the EAD with Saxon has failed.
[info] [2021-11-17 08:46:05] Job 68668 "arFindingAidJob": ERROR(SAXON): Error
[info] [2021-11-17 08:46:06] Job 68668 "arFindingAidJob": ERROR(SAXON): I/O error reported by XML parser processing file:/tmp/phpGaoeAS: Server returned HTTP
[info] [2021-11-17 08:46:06] Job 68668 "arFindingAidJob": ERROR(SAXON): response code: 403 for URL: http://lcweb2.loc.gov/xmlcommon/dtds/ead2002/ead.dtd
[info] [2021-11-17 08:46:06] Job 68668 "arFindingAidJob": ERROR(SAXON): Transformation failed: Run-time errors were reported
[info] [2021-11-17 08:46:06] Job 68668 "arFindingAidJob": Job finished.

Many thanks
Shiva

Dan Gillean

unread,
Nov 17, 2021, 2:27:53 PM11/17/21
to ICA-AtoM Users
Hi Shiva, 

We have seen a similar issue to this in the past, but not with this exact output - namely the 403 error. 

The problem appears to be occurring when the SAXON parser attempts to follow the URI provided in the EAD file's DOCTYPE header to the canonical EAD 2002 DTD maintained by the Library of Congress, at: http://lcweb2.loc.gov/xmlcommon/dtds/ead2002/ead.dtd

Previously, during EAD imports we sometimes had issues with this - the Library of Congress servers would be down or unavailable, and then EAD import would fail. Eventually, we addressed this by storing a local copy of the EAD 2002 DTD in AtoM itself, so it no longer needs LoC to be available to complete an import. 

However, when we export EAD, we want it to be valid and usable outside of AtoM, so following the expected conventions, we add the LoC DTD URI to the DOCTYPE and EAD header information, rather than a path to a locally stored file in AtoM. 

When AtoM generates a finding aid, it first generates the EAD, and then it uses XSLT stylesheets to transform that XML into a PDF (or an RTF document, depending on your settings). So, the EAD XML is the basis used for generating the finding aid. 

We have seen cases where the same timeout issues occur trying to reach the LoC DTD during finding aid generation, and have an issue filed for this: 
However, I see that the error returned in your case is not exactly the same - rather than the process timing out while trying to reach the Library of Congress DTD, it actually returned a 403 error. A 403 HTTP status code typically means "forbidden" - i.e. access denied. 

Now, the EAD DTD *should* be public so this is a strange outcome. There are two possible reasons I can think of immediately: 
  1. The Library of Congress server hosting the EAD 2002 DTD is down or has been moved etc and the old address is now forbidden
  2. Your site is on a VPN or behind a firewall that does not allow public access
Regarding 1:

I have checked, and it seems that the URI used in AtoM is in fact out of date - when you try to follow http://lcweb2.loc.gov/xmlcommon/dtds/ead2002/ead.dtd it actually redirects you to: https://memory.loc.gov/xmlcommon/dtds/ead2002/ead.dtd. Nevertheless the DTD is there and remains publicly accessible, and the redirect works as expected in a web browser.  

Additionally: A) I have checked, and finding aid generation is working for me in 2.6. If the redirect were the issue, then no one using AtoM should be able to generate finding aids without encountering this issue. B) a 403 is not a typical response when a redirect is encountered, even if there is some reason why the CLI task can't follow it. It's also not the response you would expect if the problem was from the LoC server being temporarily down - such an error wouldn't return a 403 Forbidden status code, but would look more like the timeout output shown on issue #13247.  This suggests to me that there is something else going on. 

So far, based on this, my guess would be that issue 2 might be what's the case. Does your AtoM instance have access to the public web?

Regardless of your answer or whether or not hypothesis 2 is even correct, I have another idea that might help work around the issue. 

Returning to issue #13247 for a moment: we haven't yet found an ideal solution to this issue, since we still want to conform to the EAD and XML conventions and properly reference the publicly available DTD in the EAD's header information. 

In the meantime, there is a proposed workaround on the ticket (see note-4) that may work in your case, as it removes the need to contact LoC: 

Workaround (removing the 2nd line in plugins/sfEadPlugin/modules/sfEadPlugin/templates/indexSuccessHeader.xml.php ) helped with the issue.

Here is the file: 
As you can see, it's just 2 lines. You can comment out the second one locally and see if that solves the issue. In PHP, you can comment out lines by adding two slashes before it, like so: 

<?php echo '<?xml version="1.0" encoding="'.sfConfig::get('sf_charset', 'UTF-8')."\" ?>\n" ?>
// <!DOCTYPE ead PUBLIC "+//ISBN 1-931666-00-8//DTD ead.dtd (Encoded Archival Description (EAD) Version 2002)//EN" "http://lcweb2.loc.gov/xmlcommon/dtds/ead2002/ead.dtd">


Let us know if this helps, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him


--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/93b1bb72-959e-4e3d-ae2a-a587f3de866bn%40googlegroups.com.

Raphaël Barman

unread,
Dec 8, 2021, 5:45:52 AM12/8/21
to AtoM Users
Hi,

We are currently experiencing the same issue.
It seems that the LoC is blocking through Cloudflare the user agent used by Saxon:
curl -vvv --user-agent "Java/1.8.0_275" http://lcweb2.loc.gov/xmlcommon/dtds/ead2002/ead.dtd
is returning a 403, whereas without the user agent it works fine.

I don't have a solution except contacting LoC.
For now, we will mitigate the issue by removing the header.

Best,
Raphaël

Dan Gillean

unread,
Dec 8, 2021, 9:08:59 AM12/8/21
to ICA-AtoM Users
Thanks for this additional information, Raphaël! 

FYI, our team has filed an issue for this and is now looking into options for addressing it in the next release. See: 
I've shared this thread with the developer doing the investigation. 

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him

Message has been deleted

maxi...@gmail.com

unread,
Jan 22, 2024, 9:25:44 AMJan 22
to AtoM Users
Hello Dan, i have the same issue but i don't know where i need remove the DOCTYPE declaration from the EAD export avoids the timeout in all cases. Can you guide me please? Thank you! 

Dan Gillean

unread,
Jan 22, 2024, 9:45:58 AMJan 22
to ica-ato...@googlegroups.com
Hi there, 

What version of AtoM do you have installed? This issue should be fixed in version 2.7.0 and later, per the linked issue above. If you are on a lower version, the easiest fix will be to upgrade. 

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him

maxi...@gmail.com

unread,
Jan 22, 2024, 3:36:29 PMJan 22
to AtoM Users
I have the 2.6 version, but i can't upgrade yet. What can i do?

Dan Gillean

unread,
Jan 23, 2024, 9:11:26 AMJan 23
to ica-ato...@googlegroups.com
Right now, I can only think of 2 options: 

1) easiest option: don't use EAD XML exports for now. Use CSV instead. Prioritize upgrading as soon as possible. 

2) If you have some familiarity working with git, you could try to take the commit that fixes the issue, and apply it locally to your 2.6 instance as a patch. The commit is here: 
You can read up on how to apply a git commit as a patch, but if you are not a developer with at least some experience working with git or similar distributed version control systems, then I don't recommend this! No matter what, please make sure that you make a backup before proceeding.

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him

Reply all
Reply to author
Forward
0 new messages