Exporting archival description to PDF

115 views
Skip to first unread message

Thomas Debesse

unread,
Jan 17, 2022, 5:59:18 PM1/17/22
to AtoM Users
Hi, I'm looking for a way to turn archival descriptions to ready-to-print documents like a PDF file. Printing the page does not really fits the need, also it does not automatically unroll long texts before printing.

Is there an export-to-pdf plugin or something like that?

Best regards,

Dan Gillean

unread,
Jan 18, 2022, 8:33:46 AM1/18/22
to ICA-AtoM Users
Hi Thomas, 

We don't have an export to PDF option, but that's essentially what the Finding Aid generation can do - take an archival unit (i.e. a description and all of its descendants) and generate a version optimized for printing and offline reading, in PDF or RTF formats. There are two different layouts supported as well - a Full Details one that includes all fields at all levels of description, and an Inventory List that will use a briefer table-based summary layout for lower-level descriptions like files and items, where there is often less descriptive metadata. 

You can also upload your own finding aids - so you could potentially generate a finding aid in RTF format, edit it to your preferences, save it as a PDF, delete the generated finding aid, and then upload your modified one instead. 

For more information, see: 

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him


--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/eaf11791-663b-4a3f-b02a-29aee76ce929n%40googlegroups.com.

Thomas Debesse

unread,
Jan 18, 2022, 10:54:15 AM1/18/22
to AtoM Users
Hi, thank you for your prompt and clear answer. I now get an error generating finding aids:

----------
[info] [2022-01-18 16:43:58] Job 131642 "arFindingAidJob": Job started.
[info] [2022-01-18 16:43:58] Job 131642 "arFindingAidJob": Generating finding aid ([redacted])...
[info] [2022-01-18 16:43:59] Job 131642 "arFindingAidJob": Running: java -jar '/var/www/[redacted]/lib/task/pdf/saxon9he.jar' -s:'/tmp/phpSSCF6V' -xsl:'/var/www/[redacted]/lib/task/pdf/ead-pdf-inventory-summary.xsl' -o:'/tmp/phpChDeZW' 2>&1
[info] [2022-01-18 16:44:00] Job 131642 "arFindingAidJob": Transforming the EAD with Saxon has failed.
[info] [2022-01-18 16:44:00] Job 131642 "arFindingAidJob": ERROR(SAXON): Error
[info] [2022-01-18 16:44:00] Job 131642 "arFindingAidJob": ERROR(SAXON):   I/O error reported by XML parser processing file:/tmp/phpSSCF6V: Server returned HTTP
[info] [2022-01-18 16:44:00] Job 131642 "arFindingAidJob": ERROR(SAXON):   response code: 403 for URL: http://lcweb2.loc.gov/xmlcommon/dtds/ead2002/ead.dtd
[info] [2022-01-18 16:44:00] Job 131642 "arFindingAidJob": ERROR(SAXON): Transformation failed: Run-time errors were reported
[info] [2022-01-18 16:44:01] Job 131642 "arFindingAidJob": Job finished.
----------

If I try to download manually the file with curl or wget it works… So I don't know why it gets a 403 HTTP error. Would you know what's happening? Is there a way for me to work around the bug by manually download the file and storing it myself where it should be stored?

Thomas Debesse

unread,
Jan 18, 2022, 11:05:34 AM1/18/22
to AtoM Users
For more information, I tried to workaround the bug by hosting this ead.dtd file myself on my own server with the same path and configuring domains to actually resolve lcweb2.loc.gov as my own server, and while downloading it with that url from my own server works with wget, I still get the same 403 HTTP error when trying to generate a finding aid with AtoM interface.

Dan Gillean

unread,
Jan 18, 2022, 3:29:55 PM1/18/22
to ICA-AtoM Users
Hi Thomas, 

I think you've run into a known issue that appears intermittently, and which we have fixed in the upcoming 2.7 release. See: 
Essentially, I think that the Library of Congress is throttling curl and wget access to the DTD due to unexpectedly high levels of traffic. Many applications (including AtoM) are using the URIs in the header of XML documents for basic validation, but this causes bandwidth issues for the hosts. There's a rather snarky post from the W3C about this very issue from 2008 that likely explains a lot in this case as well: 
Essentially, the argument is that DTD URIs should be used as an identifier, and not a locator, and when fetched should use an appropriately identified agent and implement caching so that the call does not need to be repeated. 

Someone in a related forum thread pointed out the following: 

It seems that the LoC is blocking through Cloudflare the user agent used by Saxon:
curl -vvv --user-agent "Java/1.8.0_275" http://lcweb2.loc.gov/xmlcommon/dtds/ead2002/ead.dtd
is returning a 403, whereas without the user agent it works fine.

So it sounds like a very similar situation. 

While AtoM has had a local EAD 2002 DTD in place for a while, it was only being used in specific cases, and this did not include the Java Saxon XSLT transformation used during parsing and import. It's this Saxon agen that has been blocked by LoC, causing the error. 

In the 2.7 release, we've added a new Java library to act as a resolver, and have restructured the XML import to use the local DTD file. Long term, this should resolve the issue. 

In the meantime, you can either try again (it seems to work intermittently - perhaps it starts blocking requests when a certain threshold is reached), or deleting the header information in your EAD file (so it just starts with an <ead> element with no attributes) will also bypass the problem. 

You can also see the changes added to the 2.7 development branch in this commit: 
Regards, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him

Thomas Debesse

unread,
Jan 26, 2022, 12:28:55 PM1/26/22
to AtoM Users
Hi, thank you for the advices, I did that :

--- a/plugins/sfEadPlugin/modules/sfEadPlugin/templates/indexSuccessHeader.xml.php
+++ b/plugins/sfEadPlugin/modules/sfEadPlugin/templates/indexSuccessHeader.xml.php
@@ -1,2 +1 @@
 <?php echo '<?xml version="1.0" encoding="'.sfConfig::get('sf_charset', 'UTF-8')."\" ?>\n" ?>
-<!DOCTYPE ead PUBLIC "+//ISBN 1-931666-00-8//DTD ead.dtd (Encoded Archival Description (EAD) Version 2002)//EN" "http://lcweb2.loc.gov/xmlcommon/dtds/ead2002/ead.dtd">

and this worked around the first error (I now get another one, see below), but this seems to confirm something : something in the backend is not using the local dns resolver of my server, since I spoofed lcweb2.loc.gov on my local domain and before my template edit AtoM was still getting the 403 errors from the real lcweb2.loc.gov server. So maybe, the code does not only uselessly fetch that DTD on every validation, but also query some unknown DNS server instead of using system one (and local caches), adding more useless traffic on the Internet.

The new error I get is this one:

[info] [2022-01-19 00:42:45] Job 131645 "arFindingAidJob": Job started.
[info] [2022-01-19 00:42:45] Job 131645 "arFindingAidJob": Generating finding aid (xxxxxxxx)...
[info] [2022-01-19 00:42:45] Job 131645 "arFindingAidJob": Running: java -jar '/var/www/xxxxxxxx/lib/task/pdf/saxon9he.jar' -s:'/tmp/phpjnGICP' -xsl:'/var/www/xxxxxxxx/lib/task/pdf/ead-pdf-inventory-summary.xsl' -o:'/tmp/phplyZfiP' 2>&1
[info] [2022-01-19 00:42:47] Job 131645 "arFindingAidJob": Running: fop -r -q -fo '/tmp/phplyZfiP' -pdf '/var/www/xxxxxxxx/downloads/xxxxxxxx.pdf' 2>&1
[info] [2022-01-19 00:42:47] Job 131645 "arFindingAidJob": Converting the EAD FO to PDF has failed.
[info] [2022-01-19 00:42:48] Job 131645 "arFindingAidJob": ERROR(FOP): [warning] /usr/bin/fop: JVM flavor 'sun' not understood
[info] [2022-01-19 00:42:48] Job 131645 "arFindingAidJob": ERROR(FOP): [warning] /usr/bin/fop: Unable to locate avalon-framework in /usr/share/java
[info] [2022-01-19 00:42:48] Job 131645 "arFindingAidJob": ERROR(FOP): Exception in thread "main" java.lang.NoSuchMethodError: java.nio.CharBuffer.limit(I)Ljava/nio/CharBuffer;
[info] [2022-01-19 00:42:48] Job 131645 "arFindingAidJob": ERROR(FOP):  at org.apache.fop.fo.FOText.characters(FOText.java:143)
[info] [2022-01-19 00:42:48] Job 131645 "arFindingAidJob": ERROR(FOP):  at org.apache.fop.fo.FObjMixed.characters(FObjMixed.java:74)
[info] [2022-01-19 00:42:48] Job 131645 "arFindingAidJob": ERROR(FOP):  at org.apache.fop.fo.FOTreeBuilder$MainFOHandler.characters(FOTreeBuilder.java:390)
[info] [2022-01-19 00:42:48] Job 131645 "arFindingAidJob": ERROR(FOP):  at org.apache.fop.fo.FOTreeBuilder.characters(FOTreeBuilder.java:136)
[info] [2022-01-19 00:42:48] Job 131645 "arFindingAidJob": ERROR(FOP):  at org.apache.xalan.transformer.TransformerIdentityImpl.characters(TransformerIdentityImpl.java:1126)
[info] [2022-01-19 00:42:48] Job 131645 "arFindingAidJob": ERROR(FOP):  at org.apache.xerces.parsers.AbstractSAXParser.characters(Unknown Source)
[info] [2022-01-19 00:42:48] Job 131645 "arFindingAidJob": ERROR(FOP):  at org.apache.xerces.xinclude.XIncludeHandler.characters(Unknown Source)
[info] [2022-01-19 00:42:48] Job 131645 "arFindingAidJob": ERROR(FOP):  at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown Source)
[info] [2022-01-19 00:42:48] Job 131645 "arFindingAidJob": ERROR(FOP):  at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
[info] [2022-01-19 00:42:48] Job 131645 "arFindingAidJob": ERROR(FOP):  at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
[info] [2022-01-19 00:42:48] Job 131645 "arFindingAidJob": ERROR(FOP):  at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
[info] [2022-01-19 00:42:48] Job 131645 "arFindingAidJob": ERROR(FOP):  at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
[info] [2022-01-19 00:42:49] Job 131645 "arFindingAidJob": ERROR(FOP):  at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
[info] [2022-01-19 00:42:49] Job 131645 "arFindingAidJob": ERROR(FOP):  at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
[info] [2022-01-19 00:42:49] Job 131645 "arFindingAidJob": ERROR(FOP):  at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
[info] [2022-01-19 00:42:49] Job 131645 "arFindingAidJob": ERROR(FOP):  at org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:485)
[info] [2022-01-19 00:42:49] Job 131645 "arFindingAidJob": ERROR(FOP):  at org.apache.fop.cli.InputHandler.transformTo(InputHandler.java:293)
[info] [2022-01-19 00:42:49] Job 131645 "arFindingAidJob": ERROR(FOP):  at org.apache.fop.cli.InputHandler.renderTo(InputHandler.java:116)
[info] [2022-01-19 00:42:49] Job 131645 "arFindingAidJob": ERROR(FOP):  at org.apache.fop.cli.Main.startFOP(Main.java:183)
[info] [2022-01-19 00:42:49] Job 131645 "arFindingAidJob": ERROR(FOP):  at org.apache.fop.cli.Main.main(Main.java:214)
[info] [2022-01-19 00:42:49] Job 131645 "arFindingAidJob": Job finished.

My Java version is OpenJDK 1.8.0 (from Ubuntu 20.04 LTS):

$ java -version
openjdk version "1.8.0_312"
OpenJDK Runtime Environment (build 1.8.0_312-8u312-b07-0ubuntu1~20.04-b07)
OpenJDK 64-Bit Server VM (build 25.312-b07, mixed mode)

I see two warnings and then an exception :

[info] [2022-01-19 00:42:48] Job 131645 "arFindingAidJob": ERROR(FOP): [warning] /usr/bin/fop: JVM flavor 'sun' not understood
[info] [2022-01-19 00:42:48] Job 131645 "arFindingAidJob": ERROR(FOP): [warning] /usr/bin/fop: Unable to locate avalon-framework in /usr/share/java
[info] [2022-01-19 00:42:48] Job 131645 "arFindingAidJob": ERROR(FOP): Exception in thread "main" java.lang.NoSuchMethodError: java.nio.CharBuffer.limit(I)Ljava/nio/CharBuffer;


I don't know if the first one is significant.

For the second one I installed libavalon-framework-java and then, the second warning was gone but I still get an exception after the first warning:

[info] [2022-01-26 18:22:11] Job 131759 "arFindingAidJob": Job started.
[info] [2022-01-26 18:22:11] Job 131759 "arFindingAidJob": Generating finding aid (xxxxxxxx)...
[info] [2022-01-26 18:22:11] Job 131759 "arFindingAidJob": Running: java -jar '/var/www/xxxxxxxx/lib/task/pdf/saxon9he.jar' -s:'/tmp/php00pefN' -xsl:'/var/www/xxxxxxxx/lib/task/pdf/ead-pdf-inventory-summary.xsl' -o:'/tmp/php9rtzPN' 2>&1
[info] [2022-01-26 18:22:13] Job 131759 "arFindingAidJob": Running: fop -r -q -fo '/tmp/php9rtzPN' -pdf '/var/www/xxxxxxxx/downloads/xxxxxxxx.pdf' 2>&1
[info] [2022-01-26 18:22:14] Job 131759 "arFindingAidJob": Converting the EAD FO to PDF has failed.
[info] [2022-01-26 18:22:14] Job 131759 "arFindingAidJob": ERROR(FOP): [warning] /usr/bin/fop: JVM flavor 'sun' not understood
[info] [2022-01-26 18:22:14] Job 131759 "arFindingAidJob": ERROR(FOP): Exception in thread "main" java.lang.NoSuchMethodError: java.nio.CharBuffer.limit(I)Ljava/nio/CharBuffer;
[info] [2022-01-26 18:22:14] Job 131759 "arFindingAidJob": ERROR(FOP):  at org.apache.fop.fo.FOText.characters(FOText.java:143)
[info] [2022-01-26 18:22:14] Job 131759 "arFindingAidJob": ERROR(FOP):  at org.apache.fop.fo.FObjMixed.characters(FObjMixed.java:74)
[info] [2022-01-26 18:22:14] Job 131759 "arFindingAidJob": ERROR(FOP):  at org.apache.fop.fo.FOTreeBuilder$MainFOHandler.characters(FOTreeBuilder.java:390)
[info] [2022-01-26 18:22:14] Job 131759 "arFindingAidJob": ERROR(FOP):  at org.apache.fop.fo.FOTreeBuilder.characters(FOTreeBuilder.java:136)
[info] [2022-01-26 18:22:14] Job 131759 "arFindingAidJob": ERROR(FOP):  at org.apache.xalan.transformer.TransformerIdentityImpl.characters(TransformerIdentityImpl.java:1126)
[info] [2022-01-26 18:22:14] Job 131759 "arFindingAidJob": ERROR(FOP):  at org.apache.xerces.parsers.AbstractSAXParser.characters(Unknown Source)
[info] [2022-01-26 18:22:14] Job 131759 "arFindingAidJob": ERROR(FOP):  at org.apache.xerces.xinclude.XIncludeHandler.characters(Unknown Source)
[info] [2022-01-26 18:22:14] Job 131759 "arFindingAidJob": ERROR(FOP):  at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown Source)
[info] [2022-01-26 18:22:14] Job 131759 "arFindingAidJob": ERROR(FOP):  at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
[info] [2022-01-26 18:22:14] Job 131759 "arFindingAidJob": ERROR(FOP):  at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
[info] [2022-01-26 18:22:14] Job 131759 "arFindingAidJob": ERROR(FOP):  at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
[info] [2022-01-26 18:22:14] Job 131759 "arFindingAidJob": ERROR(FOP):  at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
[info] [2022-01-26 18:22:15] Job 131759 "arFindingAidJob": ERROR(FOP):  at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
[info] [2022-01-26 18:22:15] Job 131759 "arFindingAidJob": ERROR(FOP):  at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
[info] [2022-01-26 18:22:15] Job 131759 "arFindingAidJob": ERROR(FOP):  at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
[info] [2022-01-26 18:22:15] Job 131759 "arFindingAidJob": ERROR(FOP):  at org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:485)
[info] [2022-01-26 18:22:15] Job 131759 "arFindingAidJob": ERROR(FOP):  at org.apache.fop.cli.InputHandler.transformTo(InputHandler.java:293)
[info] [2022-01-26 18:22:15] Job 131759 "arFindingAidJob": ERROR(FOP):  at org.apache.fop.cli.InputHandler.renderTo(InputHandler.java:116)
[info] [2022-01-26 18:22:15] Job 131759 "arFindingAidJob": ERROR(FOP):  at org.apache.fop.cli.Main.startFOP(Main.java:183)
[info] [2022-01-26 18:22:15] Job 131759 "arFindingAidJob": ERROR(FOP):  at org.apache.fop.cli.Main.main(Main.java:214)
[info] [2022-01-26 18:22:15] Job 131759 "arFindingAidJob": Job finished.

The issue looks similar to this one faced by another software: https://github.com/netty/netty/issues/10593

Instead that, on my end, it looks like I get the opposite one, Java 8 reporting there is no signature for a method that is newer. Maybe I'll have to update to a newer JDK.

Thomas Debesse

unread,
Jan 26, 2022, 12:49:31 PM1/26/22
to AtoM Users
So I installed openjdk-11-jre-headless and it fixed the issue. The Finding Aid PDFs are now properly generated. Thank you very much for the help

I noticed another bug in the mean time. I worked it around here but you'll may want to know about it because that's seems to be a bug in AtoM itself:

When I faced that last remaining issue (java.lang.NoSuchMethodError: java.nio.CharBuffer.limit(I)Ljava/nio/CharBuffer), a broken PDF was generated, so it was listed in the archival description, but the file was unreadable. Because the PDF production faced an error, there was no button to delete or regenerate the existing PDF on the AtomM UI (but the link to download it was displayed), and clicking the Generate button did nothing. I fixed the issue by deleting the produced faulty PDF files on the file system directly in downloads/ folder.

I assume the fix for that issue is to not check the success of the PDF generation operation but to check the presence of such generated PDF file (incomplete or not) to display the buttons to Delete and Regenerate.

I hope having upgraded to Java 11 will not break other things elsewhere. If you're interested, I now have a complete procedure for installing AtoM 2.6 (master) on Ubuntu 20.04 LTS.

Dan Gillean

unread,
Jan 26, 2022, 1:38:05 PM1/26/22
to ICA-AtoM Users
Hi Thomas, 

I'm glad that you've gotten this working! 

Regarding the additional issue you've reported: 

It is a bit of an edge case that shouldn't generally happen if everything is properly configured, but I agree that if a relation to a finding aid is stored in the database and therefore showing up on the related description page, AtoM should allow you to delete it or re-generate a replacement finding aid. 

Consequently, I've filed the following bug ticket based on your report, so we can track this issue and potentially address it in a future release: 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him

Reply all
Reply to author
Forward
0 new messages