Updating accession records via CSV import in 2.4.0-156

63 views
Skip to first unread message

Damian Bauder

unread,
Mar 6, 2018, 11:51:12 AM3/6/18
to ica-ato...@googlegroups.com
I have discovered some unexpected, but welcome, behaviour in 2.4. It appears that when I import accession records, they are matching and update (based on accession number, I assume) existing records. We have a large number of placeholder accession records that we imported as part of a data migration and we're looking to add detail to them in bulk. I've spent some time thinking about how to accomplish this, and assumed I would have to write some SQL to clean out duplicate records in the database or something. Imagine my surprise when I ran a test import through the web interface and found it updating existing records!

Is this doing what I think it's doing? I'd like to confirm the behaviour before I go ahead and import 1000+ accessions to our staging server.

I wish I'd known about this before! Is it possible I am missing it in the documentation? From what I can see, updates via import work for archival descriptions, archival institutions, and authority records. Those all have options in the web interface for changing update behaviours, while accession records do not, and yet the log output for the job shows the following (note the line "Update type: import-as-new"):

[info] [2018-03-06 08:05:25] Job 388054 "arFileImportJob": Job started.
[info] [2018-03-06 08:05:25] Job 388054 "arFileImportJob": Importing CSV file: accessions.csv.
[info] [2018-03-06 08:05:25] Job 388054 "arFileImportJob": Indexing imported records.
[info] [2018-03-06 08:05:25] Job 388054 "arFileImportJob": Update type: import-as-new
[info] [2018-03-06 08:05:27] Job 388054 "arFileImportJob": php '/usr/share/nginx/atom-2.4/symfony' 'csv:accession-import' --index     --quiet --source-name='accessions.csv' '/usr/share/nginx/atom-2.4/uploads/tmp/TMP4c01d2e4'
[info] [2018-03-06 08:05:27] Job 388054 "arFileImportJob": Found 89822
[info] [2018-03-06 08:05:27] Job 388054 "arFileImportJob": .Found 4191
[info] [2018-03-06 08:05:27] Job 388054 "arFileImportJob": .Found 4193
[info] [2018-03-06 08:05:27] Job 388054 "arFileImportJob": .Found 4385
[info] [2018-03-06 08:05:27] Job 388054 "arFileImportJob": .Found 4334
[info] [2018-03-06 08:05:27] Job 388054 "arFileImportJob": .Found 4387
[info] [2018-03-06 08:05:28] Job 388054 "arFileImportJob": .Found 4389
[info] [2018-03-06 08:05:28] Job 388054 "arFileImportJob": .Couldn't find accession # 00.008... creating.
[info] [2018-03-06 08:05:28] Job 388054 "arFileImportJob": .Found 4364
[info] [2018-03-06 08:05:28] Job 388054 "arFileImportJob": .Found 6215
[info] [2018-03-06 08:05:28] Job 388054 "arFileImportJob": .
[info] [2018-03-06 08:05:28] Job 388054 "arFileImportJob": Import complete.
[info] [2018-03-06 08:05:28] Job 388054 "arFileImportJob": Job finished.

Dan Gillean

unread,
Mar 6, 2018, 12:17:11 PM3/6/18
to ICA-AtoM Users
Hi Damian, 

Good catch! I agree this could use some clarification in our documentation - I will try to get something put together for this soon. To be honest, I'll have to do some testing myself to figure out exactly what works and what doesn't! 

I think the reason it's not really covered in the most recent CSV import rewrite of the documentation is because this was not new functionality added with the other update options. Instead, I believe this is old behavior that has been in place for a while, so that there is a way to link descriptions and accessions via import, as you're currently doing. 

AtoM does require that Accession numbers be unique, so if a match is found, then I believe it will update the record. Note that this will likely work similar to the other update options - removing or otherwise updating related entities (like Donors) will likely not be possible, since they are stored in different tables in the database - older Donor records would likely be left in place and a new one added instead, where an update is attempted. 

I'm not sure how soon I'll have time to do some thorough testing and write updates to the docs, so I would suggest importing the records in small batches so you can test the outcome yourself as you progress in the meantime. 

Keep us posted!  

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory

On Tue, Mar 6, 2018 at 11:51 AM, Damian Bauder <drba...@ucalgary.ca> wrote:
I have discovered some unexpected, but welcome, behaviour in 2.4. It appears that when I import accession records, they are matching and update (based on accession number, I assume) existing records. We have a large number of placeholder accession records that we imported as part of a data migration and we're looking to add detail to them in bulk. I've spent some time thinking about how to accomplish this, and assumed I would have to write some SQL to clean out duplicate records in the database or something. Imagine my surprise when I ran a test import through the web interface and found it updating existing records!

Is this doing what I think it's doing? I'd like to confirm the behaviour before I go ahead and import 1000+ accessions to our staging server.

I wish I'd known about this before! Is it possible I am missing it in the documentation? From what I can see, updates via import work for archival descriptions, archival institutions, and authority records. Those all have options in the web interface for changing update behaviours, while accession records do not, and yet the log output for the job shows the following (note the line "Update type: import-as-new"):

[info] [2018-03-06 08:05:25] Job 388054 "arFileImportJob": Job started.
[info] [2018-03-06 08:05:25] Job 388054 "arFileImportJob": Importing CSV file: gencat_accessions.csv.

[info] [2018-03-06 08:05:25] Job 388054 "arFileImportJob": Indexing imported records.
[info] [2018-03-06 08:05:25] Job 388054 "arFileImportJob": Update type: import-as-new
[info] [2018-03-06 08:05:27] Job 388054 "arFileImportJob": php '/usr/share/nginx/atom-2.4/symfony' 'csv:accession-import' --index     --quiet --source-name='gencat_accessions.csv' '/usr/share/nginx/atom-2.4/uploads/tmp/TMP4c01d2e4'

[info] [2018-03-06 08:05:27] Job 388054 "arFileImportJob": Found 89822
[info] [2018-03-06 08:05:27] Job 388054 "arFileImportJob": .Found 4191
[info] [2018-03-06 08:05:27] Job 388054 "arFileImportJob": .Found 4193
[info] [2018-03-06 08:05:27] Job 388054 "arFileImportJob": .Found 4385
[info] [2018-03-06 08:05:27] Job 388054 "arFileImportJob": .Found 4334
[info] [2018-03-06 08:05:27] Job 388054 "arFileImportJob": .Found 4387
[info] [2018-03-06 08:05:28] Job 388054 "arFileImportJob": .Found 4389
[info] [2018-03-06 08:05:28] Job 388054 "arFileImportJob": .Couldn't find accession # 00.008... creating.
[info] [2018-03-06 08:05:28] Job 388054 "arFileImportJob": .Found 4364
[info] [2018-03-06 08:05:28] Job 388054 "arFileImportJob": .Found 6215
[info] [2018-03-06 08:05:28] Job 388054 "arFileImportJob": .
[info] [2018-03-06 08:05:28] Job 388054 "arFileImportJob": Import complete.
[info] [2018-03-06 08:05:28] Job 388054 "arFileImportJob": Job finished.

--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-users+unsubscribe@googlegroups.com.
To post to this group, send email to ica-atom-users@googlegroups.com.
Visit this group at https://groups.google.com/group/ica-atom-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/4338f179-5ddf-4010-9052-8dfef920f839%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Message has been deleted

Dan Gillean

unread,
Mar 20, 2018, 11:53:26 AM3/20/18
to ICA-AtoM Users
Hi Damian, 

Just a quick update to let you know that I've finally had some time to test this a little bit. 

It seems that updates to existing accession records *are* possible via CSV import, from both the CLI and the user interface, in 2.4 and 2.5. The Accession number must match exactly for updates to take place. 

The behavior is similar to the update behavior on archival descriptions. That is: you can update any field that belongs directly to the accession record, but you can't update linked entities, like Donors, or creators and events (e.g. dates of creation, etc). Attempts to add different donor/creator/event data results in new rows / entities being added without the old ones being removed/deleted - you would need to manually go in and delete the entities you no longer want. 

When I have time, I'll try to add this to the documentation! 

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory

On Tue, Mar 6, 2018 at 3:04 PM, Damian Bauder <drba...@ucalgary.ca> wrote:
Thanks, Dan! I'll let you know if I discover any unexpected behaviour. In the meantime, hopefully this post will come in useful for anybody else interested in this functionality.

Cheers,

Damian
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To post to this group, send email to ica-ato...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-users+unsubscribe@googlegroups.com.
To post to this group, send email to ica-atom-users@googlegroups.com.
Visit this group at https://groups.google.com/group/ica-atom-users.

Damian Bauder

unread,
Mar 20, 2018, 11:57:29 AM3/20/18
to ica-ato...@googlegroups.com

Thanks, Dan, I appreciate the followup!

 

Cheers,

 

Damian


For more options, visit https://groups.google.com/d/optout.

 

--
You received this message because you are subscribed to a topic in the Google Groups "AtoM Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ica-atom-users/w3EYlb9VkLA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ica-atom-user...@googlegroups.com.


To post to this group, send email to ica-ato...@googlegroups.com.
Visit this group at https://groups.google.com/group/ica-atom-users.

Reply all
Reply to author
Forward
0 new messages