Error importing EAC-CPF XML records

53 views
Skip to first unread message

Stephanie Sapienza

unread,
May 19, 2021, 1:50:37 PM5/19/21
to AtoM Users
Hi all,

I'm testing AtoM for a new NEH-funded grant project which will involve importing, editing, and exporting many EAC-CPF records in and out of AtoM. I asked our lead developer to install AtoM for testing with this project, which he did (using Ansible) today: https://github.com/umd-mith/mith-atom/.

I went in and attempted to test an initial import of three different EAC-CPF records and received the same error each time. These are archival authorities pulled from the Social Networks and Archival Context (SNAC) project, so the XML should be valid and well-formed. Log below, and screencaps attached. 

Log

[info] [2021-05-19 10:30:36] Job 565 "arFileImportJob": Job started.
[info] [2021-05-19 10:30:36] Job 565 "arFileImportJob": Importing XML file: Gillis-Don-EACCPF-8267832.xml.
[info] [2021-05-19 10:30:36] Job 565 "arFileImportJob": Indexing imported records.
[info] [2021-05-19 10:30:36] Job 565 "arFileImportJob": Update type: import-as-new
[info] [2021-05-19 10:30:38] Job 565 "arFileImportJob": Exception: Unable to execute INSERT statement. [wrapped: SQLSTATE[HY000]: General error: 1364 Field 'type_id' doesn't have a default value]
[info] [2021-05-19 10:30:38] Job 565 "arFileImportJob": File: /usr/share/nginx/atom/vendor/symfony/lib/plugins/sfPropelPlugin/lib/vendor/propel/util/BasePeer.php
[info] [2021-05-19 10:30:38] Job 565 "arFileImportJob": Line: 299

I did look through the group discussions and found similar errors, but nothing leading me to a solution. Is someone able to help troubleshoot? I'm going to need to batch import many more of these, so this one specific function really does need to work in order for me to use AtoM for the project!

Best,
Stephanie


  --
Stephanie Sapienza
Digital Humanities Archivist
Maryland Institute for Technology in the Humanities (MITH)
University of Maryland

atom-mith-job-mgmt-page.JPG
atom-mith-xml-import-error.JPG

José Raddaoui

unread,
May 19, 2021, 2:33:54 PM5/19/21
to AtoM Users
Hi Stephanie,

It looks like that playbook is based on a development environment and I think you're being affected by enabling STRICT_TRANS_TABLES in the SQL mode. I'd suggest that you test with the stable/2.6.x branch instead of the development branch (qa/2.x), you'll need to change the following in the playbook vars:
I hope that solves the issue and gives you a more stable environment.

Best regards,
Radda.

Stephanie Sapienza

unread,
May 20, 2021, 1:59:13 PM5/20/21
to AtoM Users
Thank you Jose! I think the first type of error resolved with this fix. 

However, when I tried to re-import after the update, it's now telling me that the file is an "Unknown schema or import format" (see below/attached). 

This authority record was downloaded directly from SNAC, the #1 (I believe) adopter of the EAC-CPF format in the U.S., so I wonder why the schema wasn't recognized? Should that perhaps be a separate discussion post? 

Here's the link to the specific record I used in my test: https://snaccooperative.org/download/53212838?type=eac-cpf

Best,
Stephanie

Log

[info] [2021-05-20 10:51:22] Job 614 "arFileImportJob": Job started.
[info] [2021-05-20 10:51:22] Job 614 "arFileImportJob": Importing XML file: 28371107.xml.
[info] [2021-05-20 10:51:22] Job 614 "arFileImportJob": Indexing imported records.
[info] [2021-05-20 10:51:22] Job 614 "arFileImportJob": Update type: import-as-new
[info] [2021-05-20 10:51:22] Job 614 "arFileImportJob": Exception: Unknown schema or import format: ""

atom-mith-xml-import-error-schema.JPG

José Raddaoui

unread,
May 20, 2021, 3:04:45 PM5/20/21
to AtoM Users
Hi again Stephanie,

I am glad that worked. About the new error, I'm not able to reproduce (I tried with the XML from the first post). Could you provide more information about how are you doing the import? I couldn't get that error using `/object/importSelect?type=xml` and selecting EAC CPF. However, I got another error ...

[info] [2021-05-20 11:50:19] Job 2004276 "arFileImportJob": Exception: Unable to execute INSERT statement. [wrapped: SQLSTATE[23000]: Integrity constraint violation: 1452 Cannot add or update a child row: a foreign key constraint fails (atom.event, CONSTRAINT event_FK_2 FOREIGN KEY (type_id) REFERENCES term (id) ON DELETE CASCADE)]

This is probably caused by an unexpected event type in the XML.

Best,
Radda.

Stephanie Sapienza

unread,
May 20, 2021, 4:07:53 PM5/20/21
to AtoM Users
That's interesting! Yes, I mistakenly linked you to a different authority record than I originally used this morning. But I'm glad that I made that error, because it showed me that it was not necessarily solely related to the schema itself and might be something else.

I made a short screencast demonstrating how I did the import. This time I used a third authority record and it yielded the same '1452 Cannot add or update a child row: a foreign key constraint fails' message that Radda received. 

So there appear to be at least two different types of errors that are produced when attempting an import - does anyone have an idea of what can be done to resolve these? For the first type of error, it seemed that AtoM's import module wasn't validating the particular flavor of EAC-CPF record used by SNAC. So I thought the fix was looking at the differences between SNAC's CPF and an 'official' ISAAR CPF file (which can't be THAT different ... I think? This is an archival standard so interoperability is important ...).

But perhaps it's a combination of issues?

To be clear, we're testing AtoM for use in BULK imports and exports of XML records. So eventually we'll be using the CLI to do bulk imports and exports of hundreds of CPF records. But if we can't get the software to do even just these 2-3 test records then it might be a critical decision point. So I really want to make sure we've at least attempted to isolate the issue before giving up and moving onto another solution!

Best,
Stephanie

Dan Gillean

unread,
May 21, 2021, 11:11:07 AM5/21/21
to ICA-AtoM Users
Hi Stephanie, 

As far as I can tell from your video, you're performing the import correctly. I haven't yet nailed down the specific source of the error message, but I suspect that this has more to do with supported EAC elements and mappings than anything else. 

EAC-CPF XML, much like EAD 2002, is flexible enough that there are a number of different ways of serializing the same data within the standard - each valid, yet mapped differently. AtoM's EAC mappings were first implemented in 2010, and while they've received a few updates and corrections since, they haven't undergone major revision. They were also developed specific to what ISAAR-CPF fields AtoM supports, and how those are implemented, rather than as a generalized tool that can import any EAC-CPF file. Meanwhile, the EAC-CPF specification has been significantly revised since AtoM's main mappings were first implemented - meaning AtoM is due for some significant review and updating! 

I ended up looking at the wrong file (this one instead of the later sample), but at a glance, I can see some elements in the EAC export that AtoM doesn't support, or are mapped differently. Some examples: 
  • AtoM doesn't currently support the import of maintenance events. Everything in the maintenanceHistory elements would likely be ignored currently
  • AtoM also doesn't currently import the structured <source> elements found in the SNAC EAC export. 
  • The SNAC EAC doesn't define what type of alternative name forms it includes - it just puts them in generic <part> elements, while AtoM will typically indicate whether it's the <authorizedForm> or an <alternativeForm> at minimum. I will have to test to see how AtoM handles mapping the name forms (and which ends up as the authorizedForm) when this information is not specified in the import file.
    • In fact, it looks like the SNAC EAC uses custom attributes (such as snac:preferenceScore="99"), which are not part of the EAC specification, and will not likely be understood by any other application that can import EAC-CPF XML unless a specific custom mapping has been added
  • The SNAC EAC file includes <citation> elements nested in <biogHist>. AtoM currently has no import support for these, so I suspect they would either be ignored, or their contents imported as plain text inside the biographical history
  • The SNAC EAC file also embeds a lot of descriptive data from related collections in elements nested under <resourceRelation> elements. I suspect most of the nested elements (such as <physdesc> and <unittitle>) would be ignored, and may in fact be part of what's causing the import errors, as these are not actually EAC-CPF XML elements as far as I'm aware. I also don't see any namespacing in this file that indicates that EAD elements will be nested inside the EAC - I'm not sure if this is a common or accepted practice in an EAC-CPF file, but I haven't seen it done before
  • Similarly, the SNAC file uses <cpfRelation> elements for relationships with other actors - but while the UI qualifies these relationships somewhat (associatedWith, etc), I see no indication of the relationship type in the export EAC. This too may be causing AtoM to throw errors, as it is unsure how to create the relationship between entities
    • Note that AtoM will typically create local stub records (authorities or descriptions) when encountering related entities in an import file - AtoM's not able to follow any links provided and scrape further information, nor does AtoM's relationship module currently allow you to create links to external resources, so the name is used
Unfortunately, I suspect that the kind of bulk importing you hope to do from SNAC might not be possible without either some development work to enhance AtoM's current level of support for EAC elements and how flexible it can be with different serializations, or else spending some time to create a local XSLT or script of some kind that transform SNAC EAC exports into a format that AtoM can import. 

As you may know, major development work in AtoM depends on community support for Artefactual to be able implement, either in the form of community code contributions, or via development sponsorship. You can learn more about the history of AtoM, as well as how we currently maintain and develop the project, here: 
If you have in-house developers who would like to enhance AtoM's EAC-CPF support and contribute this back to the public project, we provide a number of development resources on our wiki. Any local developers are also welcome to start a thread in our forum, and our team can offer some general suggestions and feedback. If, on the other hand, your institution might be interested in sponsoring work from our team to enhance AtoM's EAC support, or else would like our team to help develop a transformation script you could use, feel free to contact Artefactual off-list and we would be happy to discuss options and prepare estimates for you. 

Finally, if it would be helpful, I'd be happy to try and recreate a SNAC authority in AtoM, and export it as EAC-CPF XML, so you can compare the output. 

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him


--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/78e82490-b8e1-4ad9-a471-5a5e68e6ccfcn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages