CSV Import fails to load .png digital objects

45 views
Skip to first unread message

Scott Breeden

unread,
Oct 31, 2021, 3:43:20 PMOct 31
to AtoM Users
Hello,

To add three child archival descriptions and attach digital objects to them, I followed Method 1A as described in this post:


AtoM added the child descriptions just fine, but it seemed unable to find the image files specified in the CSV file's digitalObjectPath column.

The attached csvimport.jpg file is a screenshot showing output from php symfony csv:digital-object-path-check.  Also attached is the CSV file, atomdctest_v5add1.csv.  Does anybody know why the image files weren't processed?

I am using AtoM 2.6.4 on a virtual Ubuntu 18.04 LTS system, running in Oracle Virtual Box on a Windows 10 laptop.  I followed the recommended installation instructions, except for modifying a couple of steps to specify the correct location of PHP.

-Scott



atomdctest_v5add1.csv
csvimport.jpg

Dan Gillean

unread,
Nov 1, 2021, 10:57:49 AMNov 1
to ICA-AtoM Users
Hi Scott, 

First, just to confirm that all the dependencies are in place and working as expected - are you able to link a digital object to an existing description via the user interface?

The only thing that jumped out at me while looking at your attachments are the filesystem permissions. AtoM expects the root installation directory and everything below it to be owned by the www-data user. See the installation instructions here: 
You can try using the command listed there to reset the filesystem permissions to the www-data user: 
  • sudo chown -R www-data:www-data /usr/share/nginx/atom
Then, if you need to use sudo for any command, you can specify the www-data user like so: 
  • sudo -u www-data php symfony [...]
Let us know how a UI digital object upload goes, and if changing the file system permissions has any effect. 

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him


--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/9fd61852-489f-4496-99ee-b0b9c5fc4ef5n%40googlegroups.com.

Scott Breeden

unread,
Nov 2, 2021, 2:50:15 AMNov 2
to ica-ato...@googlegroups.com
Dan,

Thank you for that quick response. Yes, I can link both single and
multiple digital objects to existing descriptions via the user
interface. But I would like to do that via CSV import as well.

After I posted my question, I wondered if file permissions might be
the problem. So today I moved some files into a new directory having
different permissions but got the same result as before. I will try
making some more changes in the next day or two and see what happens.

Thank you for suggesting some commands to try. I had already run
"sudo chown -R www-data:www-data ..." since that was mentioned in the
installation instructions. I hadn't thought of trying "sudo -u
www-data ..."

By the way, your earlier post said to create the directory of image
files under the AtoM directory (normally /usr/share/nginx/atom). But
since the CSV file's digitalObjectPath values are full pathnames, not
relative pathnames, is that really necessary?

-Scott
> You received this message because you are subscribed to a topic in the Google Groups "AtoM Users" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/ica-atom-users/_2fctwQMtc4/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to ica-atom-user...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/CAC1FhZK34ssqHqNPhgcbVN8-UKztprLrDwTVznW8cPHuSda_7A%40mail.gmail.com.

Dan Gillean

unread,
Nov 2, 2021, 4:50:02 PMNov 2
to ICA-AtoM Users
Hi Scott, 

I've always had the best success with imports using absolute paths in the CSV template and making sure that the objects are somewhere in AtoM's root installation directory - I usually recommend creating a temporary subdirectory like "upload-images" or "import-files" etc just because it is easier to clean up later. That said, you could certainly experiment a bit. The key thing is that AtoM will need to have the necessary permissions to retrieve the files - so if you put them somewhere else, then the directory and all upload files should still be updated so they are owned by the www-data user.

You could also experiment a bit with using a relative path in the CSV with a test import, and see if it changes anything, either for the import itself or the check task. 

It has been a while since I've used the command-line task to check digital object CSV columns before an import, and looking at your original post, I did wonder if the task asking for a directory path, combined with using an absolute path in the CSV, might have led to the task failing (i.e. I wondered if perhaps the task was trying to look for something like  /usr/share/nginx/atom/testimages/usr/share/nginx/atom/testimages/CHM-BRO-HIS-003a.png and not finding anything due to the duplicated file path. I had hoped to have some time to run a few tests myself but haven't yet, and wanted to reply. 

If you haven't re-tried updating the permissions since my response and running the import command as the www-data user, it's worth a try - in the terminal screenshot you provided it looks like all the contents of testimages are owned by the scott user, and when you use sudo, you're also running as that user. 

As a final aside - if you've manually created this environment, it's possible there's some configuration issue that might be hard for us to identify remotely. We do provide a Vagrant box that essentially pre-bundles an AtoM development environment for use with Virtualbox - you can read more about it here: 
A couple things to note about this: 

First, the Vagrant box we provide is intended for local testing and development - it is not intended to run persistently or act as a production-ready web accessible environment. 

Second, as development environments, by default the installation instructions will set up our latest development branch (qa/2.x in our code repository) - i.e. this is where we are actively developing the 2.7 release. This means there may be bugs or regressions as we continue with development. If you want to set up a box using our stable/2.6.x branch, then you'd want to explicitly specify an older vagrant box version during setup, since the PHP and MySQL versions are different between AtoM 2.6 and 2.7, and just changing the branch via git won't work. 

If you did want to follow this approach, there are instructions for how to install an older AtoM vagrant box version and make sure it's up to date for that version in this thread: 
Meanwhile, even if you install the latest version, we have done a lot of development to add features and fix regressions since we created the latest vagrant version, so even a newer environment should be updated. Instructions on how to do so (including how to preserve any data you want to save - which you could use to move data from the current VirtualBox installation to the Vagrant environment) can be found in this thread: 
Let me know how everything goes! 

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him

Scott Breeden

unread,
Nov 4, 2021, 2:26:30 AMNov 4
to ica-ato...@googlegroups.com
Dan,

What I've found out so far:

The command line task to check digital object CSV columns has a
problem with absolute paths regardless of any file permission
problems.

In atom/lib/task/import/csvDigitalObjectPathsCheckTask.class.php, the
getCsvColumnValues() function returns values from the CSV file. If
these values are full path names, though, they won't match the output
of getImageFiles(), which consists of filenames only, not pathnames.

Suggested fix:

In getCsvColumnValues(), replace:

array_push($values, $row[$imageColumnIndex]);

with:

// Remove absolute path leading to image file.
$relativeFilePath = basename($row[$imageColumnIndex]);
array_push($values, $relativeFilePath);

When I created a modified version of
csvDigitalObjectPathsCheckTask.class.php that included this change, I
got expected results.

Next I will go back to investigating the actual CSV import problem.
Now that my input files are readable by anyone (-rw-rw-r--) and in
directories that anyone can access (drwxrwxr-x), would you expect
running as the www-data user to make any difference?

About Vagrant: I tried using that originally, but I could never get
AtoM to store date values, like creation dates of archival
descriptions. No matter what I entered in the form, it was silently
ignored, and AtoM continued to complain that a required value was
missing. So I switched to a desktop version of Ubuntu in VirtualBox,
which did not have that problem.

Maybe someday I could try a different AtoM Vagrant box version. I
will take a look at those links that you provided if I decide to go
down that path again. Thank you.

-Scott
> To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/CAC1FhZ%2BB3%2BNRNVOpJiyvhcwBhUW8RJB6U_ReWOV4AaX%3DkpZijA%40mail.gmail.com.

Dan Gillean

unread,
Nov 4, 2021, 9:12:19 AMNov 4
to ICA-AtoM Users
Hi Scott, 

Thanks so much for this! I will discuss your proposed change to the check task with our developers, but it sounds very helpful to me from the description. 

Regarding the Vagrant box and dates: 

I believe that this issue is now resolved - this is one of the challenges of working in a development environment based on the latest branch. We have been implementing CSRF support in AtoM to increase the security of the application, but in the process of doing so there was a period where updates to related tables were not being saved properly. You can see a ticket about this here, if you're curious: 
If your Virtualbox environment is still tracking our qa/2.x development branch, then it may just be that the timing of your switch coincided with the patch resolving the issue being added. 

Now that my input files are readable by anyone (-rw-rw-r--) and in
directories that anyone can access (drwxrwxr-x), would you expect
running as the www-data user to make any difference?

I'd still recommend running AtoM related tasks as the www-data user whenever possible as a general rule, since this is what AtoM expects by default. Doing so may help avoid encountering other unexpected conflicts or issues we haven't uncovered yet. 

Let us know how it goes, though!

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him

Scott Breeden

unread,
Nov 4, 2021, 2:48:28 PMNov 4
to ica-ato...@googlegroups.com
Dan,

Of course, fixing only the check task code would not be advisable if
there were still a problem with the import code. In my case, the
saving grace of the original check code's failure was that it
correctly predicted the failure of the subsequent import. I will need
more time to look at the import code.

Thanks for the link to the problem that was affecting dates. Nice to
know that I wasn't the only person who noticed this, and that the
problem has now been fixed.

-Scott
> To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/CAC1FhZ%2BOE9WMxK27xmcQab1iCABQMJbd1FbtC98jsQkPdO3KCQ%40mail.gmail.com.

Scott Breeden

unread,
Nov 9, 2021, 4:33:40 PMNov 9
to AtoM Users
Dan,

Latest news:

In one of my previous tests, I made a mistake when I moved my CSV file and image files into a different directory:  I forgot to change the values in the CSV file's digitalObjectPath column to match the new location.  So, AtoM still looked in the old location, which still failed, due to a file permission problem.  (It would have been nice to see an error message in the log, though.)

After I fixed the CSV file, importing the new child descriptions AND their digital object files worked just fine.  Hooray!

Conclusions:

(1) There's nothing wrong with the import code.

(2) CSV and image files do not have to be under the AtoM installation directory:  They can be anywhere as long as AtoM has read permission for them plus execute permission for all directories in their paths.  That said, putting everything in subdirectories under AtoM and THEN running  sudo chown ... should be an easy way to eliminate permission problems, except maybe in bizarre cases like file owners not having read permission for their own files.

(3) My suggested fix to the check code in getCsvColumnValues() is still valid:  it eliminates certain bogus error messages when digitalObjectPath values contain absolute pathnames.

(4) I see that AtoM 2.7 will feature a new CSV validator in addition to  csv:digital-object-path-check.  I haven't tried running the new code, but I'm all in favor of being able to do as much as possible from the user interface, using background jobs where needed.  The documentation looks good.

-Scott

Dan Gillean

unread,
Nov 9, 2021, 4:54:19 PMNov 9
to ICA-AtoM Users
Hi Scott, 

Thanks so much for the update! I confess I was breaking my head a bit trying to recreate the import issue and think of other possible causes -  to the point that I have not yet reproduced the issue in the csv:digital-object-path-check task and filed an issue with your proposed fix. I'm glad to know it's working! 

I'll turn my attention to the task now. In the meantime, I'm glad you've noticed the CSV validation feature that's coming, and thanks for your kind words about the documentation - designing, refining, testing, and documenting that feature (sponsored by the Council of Nova Scotia Archives) was a lot of work, but I think we're all happy with how it turned out and hope that it will help end users avoid a lot of common errors during CSV preparation and import. 

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him

Reply all
Reply to author
Forward
0 new messages