How to manage data

124 views

CSV-importFAQarchival-descriptiondigital-objectimport-export

Skip to first unread message

Georgie

unread,

Feb 1, 2024, 8:05:18 AM2/1/24

to AtoM Users

Since I am new to Atom, I would like to ask a question.
How do you manage when importing data with CSV?
I manage separate CSVs for each fond, and when I add data under a fond, I add the data to the CSV for that fond.
However, one problem arises.
The image files in the digitalobjectpath column listed in the CSV are already deleted from the server when imported.
With my management method, new data is added, so for the image file in the digitalobjectpath column of existing data, you have to specify a path and image that are not on the server, and when you import that CSV, the image disappears from AtoM. there is.
One solution is to delete the value in the digitalobjectpath field of the existing data, but in that case, you will need to prepare two "CSV files for management" and "CSV files for import."
I would like to use only one CSV for each fond.
For those of you who import via CSV, how do you manage CSV?

Dan Gillean

unread,

Feb 1, 2024, 4:01:55 PM2/1/24

to ica-ato...@googlegroups.com

Hi Georgie,

Welcome to the AtoM community! I confess I am not sure I fully understand your workflow currently. However, I will try to provide some general information about CSV imports in AtoM, digital objects, and how to link the two, that might hopefully provide you with some options and answers along the way.

CSV import and digital objects - general links

First, if you have not reviewed them you can find the latest documentation for CSV import here:

CSV prep and CSV import via the user interface: https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv-import/
CSV import via the command-line: https://www.accesstomemory.org/docs/latest/admin-manual/maintenance/cli-import-export/#csv-import-cli

Additionally, here is a slide deck we use in training sessions and AtoM Camps related to prepping archival description CSVs for import:

https://www.slideshare.net/accesstomemory/csv-import-in-atom

Finally, AtoM 2.7 and later now has a CSV validation module, so you can double-check that your CSV is well-formed before importing it. See:

https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv-validation/

Description CSV templates and importing digital objects

There are essentially two different ways to import and link digital objects at the same time that you are importing descriptions via CSV:

Importing local files
Linking to external files that are available on the web

These two methods are supported by the two different digial object columns available in the CSV:

The digitalObjectPath column is used for adding the absolute path to local digital objects placed on the same server
The digitalObjectURI column is used for adding a link directly to a target digital object available on the public web

No matter what method you use, AtoM enforces a 1:1 relationship between a digital object and an information object (i.e. a description) - meaning you can only ever link 1 object to a description at a time. If you need to attach multiple objects (such as images of different sides of the same art object, or different pages of a book, etc), then you would need to make lower level descriptions, as ALL descriptive metadata (including just a title or brief description) must be associated with a description - only technical metadata (such as file size, dimensions, colorspace, format, etc) is directly associated with a digital object. Fortunately, levels of description in AtoM can be user-defined, so you can create whatever sub-item levels you need or prefer - we include a default "Part" level as an example sub-item level, but you could just as easily create levels called "View," "Page," "Component," "Element,", "Side," "Facet," or whatever you need.

Practically this means that each row in your csv should only ever use one of those two digital object columns at a time, and you cannot put more than one digital object path or URI in a single cell. If you have a row that has data in both the digitalObjectPath and digitalObjectURI columns, AtoM will ignore the path value and use the URI value by default for that row/description.

With that in mind, let's look at each method. See also:

https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv-import/#digital-object-related-import-columns

Using the digitalObjectPath column

This is used to add local digital objects stored on the same server to your descriptions as they are being created. Meaning: there is no way to use this feature without someone having access to the back end of your AtoM installation, as you will first need to make the digital objects available in a temporary directory to perform an import this way. However, because AtoM will actually COPY the digital object from the provided path and create its own derivatives, they should still be wherever you put them after the import completes.

Prepping your digital objects for import

We generally recommend making a temporary folder of your import images at the root AtoM installation level. AtoM already has a directory called "images" and another one called "uploads" each of which are used for different purposes, so I recommend you name this temp folder something different - let's say import-objects for now. So, you would first create a new directory (i.e. folder) called import-objects, and then add all the digital objects you will be importing to this directory. Then you would copy this directory to the root AtoM installation directory.

Putting the digital objects on your server and prepping your CSV

If you have followed our installation instructions, then your AtoM root directory is typically: /usr/share/nginx/atom. This means our example folder full of digital objects for import should be found at:

/usr/share/nginx/atom/import-objects

Now, let's say we have 3 digital objects to import with our CSV:

family-photo.jpg
vacation-video.mp4
correspondence-1946.pdf

In the description row related to the family photograph, under the digitalObjectPath column, I would now add:

/usr/share/nginx/atom/import-objects/family-photo.jpg

I would repeat this in the relevant rows for the two other descriptions, adding the full path to the related digital objects in the digitalObjectPath column like so:

/usr/share/nginx/atom/import-objects/vacation-video.mp4
/usr/share/nginx/atom/import-objects/correspondence-1946.pdf

Now, I can run my import - either via the user interface, or via the command-line. Either way, when processing the CSV rows related to these digital objects, AtoM will:

Create the archival description based on the CSV row data
Encounter the local file path in the digitalObjectPath column
Follow it to find the digital object
Copy that into the uploads directory
Generate two derivatives - a smaller resolution reference copy for the view page of the description, and a thumbnail for search and browse results
Move on to the next row in the CSV

Using the digitalObjectURI option

The other way to import digital objects as part of a CSV description import is to use the digitalObjectURI column.

In this case, instead of adding local files to your AtoM server for import, you are instead providing links to external digital objects on the web. AtoM will follow these links, copy the object found, and generate its own derivatives. Instead of storing the original object found at the link, AtoM will just save the URL that leads to it. This means this using a link like this will ultimately take up less space on your server, since only the smaller derivatives will be stored locally. If your institution already has a Digital Asset Management System (DAMS) or similar in place for managing digital objects, and it can create web links that meet AtoM's requirements, this can be a preferable option, since you don't need to store the original file twice (once in your DAMS and once in AtoM).

This method is only possible when your digital objects meet 3 conditions:

They must be publicly available on the web. This means no logins or passwords required, no VPNs or firewalls or captchas - it must be a link that anyone could access - so that AtoM can access them
It must be an HTTP or HTTPS link. FTP or SFTP links will not work, nor will links to local share drives or similar - again, it must be on the public web for AtoM to access.
The URI provided must end in the file extension. For example, .jpg, .pdf, .mp4, etc. You can often access this for images on a web page by right clicking on them and selecting the option to open the image (or whatever file type) in a new tab. If the URL provided hides the extension - like a YouTube video link, for example, which does not include a file extension (since Google does not want you to download videos without permission), then AtoM will not know where on the page to find the digital object, and the process will not work

For example:

This link will not work in the digitalObjectURI column: https://www.boredpanda.com/cute-rabbits/
This link WOULD work in the digitalObjectURI column: https://www.boredpanda.com/blog/wp-content/uploads/2021/11/cute-rabbits-307-618b7d91f120c__700.jpg

Otherwise, the process is nearly identical to the process for using the Path column: Add the URI to the digitalObectURI column for the relevant row in your CSV, and then run your import - either via the user interface, or via the command-line. In both cases, AtoM will do the following as it parses the CSV row by row:

Create the archival description based on the CSV row data
Encounter the URL included in the digitalObjectURI column for that row
Follow it to find the digital object
Generate two derivatives - a smaller resolution reference copy for the view page of the description, and a thumbnail for search and browse results
Store the URL in the place of the master / original digital object
Move on to the next row in the CSV

Importing digital objects in bulk to existing descriptions

Finally, there is a way to do this in two different steps - where you first create your descriptions in AtoM (either via CSV import, manually via the user interface, or via whatever combination of the two you need until you are satisfied with your descriptive hierarchy), and then bulk import the digital objects later.

To do this, you will require command-line access to your server, as this is a command-line task not currently supported via the user interface. Instructions can be found here:

https://www.accesstomemory.org/docs/latest/admin-manual/maintenance/cli-import-export/#digital-object-load-task

Generally we recommend this task for local files, but technically the task will work if you add external URLs to the CSV you prepare. This method can be especially useful for large files (like video files), where importing the digital object separately from the description CSV is less likely to cause import timeouts or other bottlenecks.

Much like the digitalObjectPath option described above, you would first prep a folder of your digital objects for import, and then place this somewhere on your AtoM server - we generally recommend the root AtoM installation directory (i.e. usually /usr/share/nginx/atom) for simplicity.

You would then create your own 2 column CSV file to use with the task.

Now, a quick warning about this: AtoM expects the CSV to be UTF-8 encoded, and to use unix-style line ending characters. Unfortunately, this means that we do NOT recommend using Microsoft Excel to create your CSV, since Microsoft insists on using its own custom character encodings and line ending characters by default - and these can cause all kinds of unexpected errors when importing into AtoM. In general, for all CSV preparation, we strongly recommend something like LibreOffice Calc - not only is it free and open source, but it will use UTF-8 and unix style line endings by default, and will also allow you to set your encoding and a number of other configurable details before opening any CSV. See this part of the CSV import documentation, or slides 10 and 11 of the CSV import slide deck for more information.

Back to preparing your digital object CSV:

This CSV will have 2 columns:

A filepath column that includes the absolute path to the related digital object you placed on your server (for example, /usr/share/nginx/atom/import-objects/family-photo.jpg)
A second column that will tell AtoM which description this digital object should be linked

There are a few different options you can use for the second column, but the one I strongly recommend you use is the slug column, because of the other two options supported, the objectID is difficult to get without using a MySQL query, and the identifier does not have to be unique in AtoM (so you could end up attaching your digital object to the wrong description!). By contrast, slugs in AtoM MUST be unique, and don't require CLI access - a slug is just the unique part of an AtoM record's URL. So for example, if:

Your AtoM installation is at: https://archives.example.com
You have an item-level description called "Family photograph, 1946" you want to use
That description has a URL of https://archives.example.com/family-photograph-1946

Then the slug for that description would be:

family-photograph-1946

So, your digital object CSV is 2 columns - one with the full filepath to the local object, and the other with the slug of the target description. Now you put the CSV on your AtoM server as well, and use the task to import your digital objects, as described in the task documentattion - for example:

php symfony digitalobject:load --index /usr/share/nginx/atom/my-import-csv

Conclusion

I hope that helps clarify the options you have for including digital objects with your CSV imports! If that didn't help, perhaps you can try to clarify the request with this information, and I'll be happy to answer any remaining questions you have.

Cheers,

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056

@accesstomemory

he / him

--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/0ca79ab4-1437-4324-b15b-e2f3a66e2dd4n%40googlegroups.com.

Reply all

Reply to author

Forward

0 new messages