Bug in php symfony digitalobject:load

141 views
Skip to first unread message

Roberto Greiner

unread,
Mar 19, 2025, 4:33:29 PM3/19/25
to ica-ato...@googlegroups.com
I'm trying to import some objects using the 'php symfony
digitalobject:load' command. This worked (to some extent) in Atom 2.8.
but in 2.9 I got the following error:

Import file must contain a 'filename' column and one of the following:
'slug', 'identifier', 'information_object_id'

My file was the following:

information_object_id,filename
538,0103020001/MEMORANDO/000302000306000006306.pdf
539,0103020001/MEMORANDO/000302000307000006307.pdf
540,0103020001/MEMORANDO/000302000308000006308.pdf
541,0103020001/MEMORANDO/000302000309000006309.pdf
542,0103020001/MEMORANDO/000302000310000006310.pdf
543,0103020001/MEMORANDO/000302000316000006316.pdf
544,0103020001/MEMORANDO/000302000317000006317.pdf
545,0103020001/MEMORANDO/000302000318000006318.pdf
546,0103020001/MEMORANDO/000302000319000006319.pdf
547,0103020001/MEMORANDO/000302000320000006320.pdf

I"ve checked all I coudl (permissions, Linux/windows formating, UTF8,
and other things.) After a lot of testing, to my surprise, adding a
dummy column before the valid columns made the command work. The file
became the following:

dummy,information_object_id,filename
dummy,538,0103020001/MEMORANDO/000302000306000006306.pdf
dummy,539,0103020001/MEMORANDO/000302000307000006307.pdf
dummy,540,0103020001/MEMORANDO/000302000308000006308.pdf
dummy,541,0103020001/MEMORANDO/000302000309000006309.pdf
dummy,542,0103020001/MEMORANDO/000302000310000006310.pdf
dummy,543,0103020001/MEMORANDO/000302000316000006316.pdf
dummy,544,0103020001/MEMORANDO/000302000317000006317.pdf
dummy,545,0103020001/MEMORANDO/000302000318000006318.pdf
dummy,546,0103020001/MEMORANDO/000302000319000006319.pdf
dummy,547,0103020001/MEMORANDO/000302000320000006320.pdf

With this file, the command worked and the files where imported
correctly. All I did was add the 'dummy' column and the command worked.
This did not happen in 2.8, and, as far as I know, should not be
necessary. Could someone check this?

Thank you,

Roberto


--
-----------------------------------------------------
Marcos Roberto Greiner

Os otimistas acham que estamos no melhor dos mundos
Os pessimistas tem medo de que isto seja verdade
James Branch Cabell
-----------------------------------------------------

Alberto Pereira

unread,
Mar 19, 2025, 5:47:10 PM3/19/25
to ica-ato...@googlegroups.com
Hi,

I can't replicate this locally. It works as expected with those 2 columns (information_object_id,filename).
Can you try to remove the dummy column from the file and use the new edited file to try again? Last time this happened to me was a space between the comma and the field name (like this: information_object_id, filename).

--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/ica-atom-users/f7b40339-daf7-4aec-9af4-909cbaaab59c%40gmail.com.

Hans-Arno Mielsch

unread,
Mar 20, 2025, 6:26:26 AM3/20/25
to AtoM Users
With my fresh installation of AtoM 2.9 in Ubuntu 24, the import by  "sudo php symfony digitalobject:load /home/hans-arno/import/bilder.csv"
works, but not fine: for every JPG-file it reports:
convert-im6.q16: missing required argument  @ error/convert.c/ConvertImageCommand/545.
identify-im6.q16: missing required argument  @ error/identify.c/IdentifyImageCommand/249.
But the items are loaded!
What am I missing?

Regards,
-Hans-Arno

Roberto Greiner

unread,
Mar 20, 2025, 7:35:08 AM3/20/25
to ica-ato...@googlegroups.com
I've tried it. Deleted the imported objects, re-imported the descriptions, edited the file to remove the column, updated the slug numbers and ran the command again. Got exactly the same error. In any case, I double checked and there are no undue space or typos in the first line.

Then, I opened the file with an hex editor...

I found the following hex sequence in the start of the file: "ef bb bf". After removing it, the import worked. Those are non printable ASCII characters. After some digging, I found that those three characters are standard when exporting an UTF8 csv (see https://answers.microsoft.com/en-us/msoffice/forum/all/excel-seems-to-mess-up-my-csv-files/bc7598cf-afd7-4a2a-ac3e-1e7526ee29b7), and that's what excel did when I created the import file. Since UTF-8 is needed to properly import text with accentuation in atom, shouldn't there be added some code to take that in consideration?

Thank you,

Roberto


Roberto

Em 19/03/2025 18:46, Alberto Pereira escreveu:
> Hi, > > I can't replicate this locally. It works as expected with those 2 > columns (information_object_id,filename). Can you try to remove the > dummy column from the file and use the new edited file to try again? > Last time this happened to me was a space between the comma and the > field name (like this: information_object_id, filename). > > On Wed, Mar 19, 2025 at 8:33 PM Roberto Greiner > <mrgr...@gmail.com> wrote: > > I'm trying to import some objects using the 'php symfony > digitalobject:load' command. This worked (to some extent) in Atom > 2.8. but in 2.9 I got the following error: > > Import file must contain a 'filename' column and one of the > following: 'slug', 'identifier', 'information_object_id' > > My file was the following: > > information_object_id,filename 538,0103020001/ > MEMORANDO/000302000306000006306.pdf 539,0103020001/ > MEMORANDO/000302000307000006307.pdf 540,0103020001/ > MEMORANDO/000302000308000006308.pdf 541,0103020001/ > MEMORANDO/000302000309000006309.pdf 542,0103020001/ > MEMORANDO/000302000310000006310.pdf 543,0103020001/ > MEMORANDO/000302000316000006316.pdf 544,0103020001/ > MEMORANDO/000302000317000006317.pdf 545,0103020001/ > MEMORANDO/000302000318000006318.pdf 546,0103020001/ > MEMORANDO/000302000319000006319.pdf 547,0103020001/ > MEMORANDO/000302000320000006320.pdf > > I"ve checked all I coudl (permissions, Linux/windows formating, > UTF8, and other things.) After a lot of testing, to my surprise, > adding a dummy column before the valid columns made the command > work. The file became the following: > > dummy,information_object_id,filename dummy,538,0103020001/ > MEMORANDO/000302000306000006306.pdf dummy,539,0103020001/ > MEMORANDO/000302000307000006307.pdf dummy,540,0103020001/ > MEMORANDO/000302000308000006308.pdf dummy,541,0103020001/ > MEMORANDO/000302000309000006309.pdf dummy,542,0103020001/ > MEMORANDO/000302000310000006310.pdf dummy,543,0103020001/ > MEMORANDO/000302000316000006316.pdf dummy,544,0103020001/ > MEMORANDO/000302000317000006317.pdf dummy,545,0103020001/ > MEMORANDO/000302000318000006318.pdf dummy,546,0103020001/ > MEMORANDO/000302000319000006319.pdf dummy,547,0103020001/ > MEMORANDO/000302000320000006320.pdf > > With this file, the command worked and the files where imported > correctly. All I did was add the 'dummy' column and the command > worked. This did not happen in 2.8, and, as far as I know, should > not be necessary. Could someone check this? > > Thank you, > > Roberto > > > -- ----------------------------------------------------- Marcos > Roberto Greiner > > Os otimistas acham que estamos no melhor dos mundos Os pessimistas > tem medo de que isto seja verdade James Branch Cabell > ----------------------------------------------------- > > -- You received this message because you are subscribed to the > Google Groups "AtoM Users" group. To unsubscribe from this group and > stop receiving emails from it, send an email to ica-atom- > users+un...@googlegroups.com <mailto:ica-atom- > users%2Bunsu...@googlegroups.com>. To view this discussion visit > https://groups.google.com/d/msgid/ica-atom-users/f7b40339- > daf7-4aec-9af4-909cbaaab59c%40gmail.com. > > -- You received this message because you are subscribed to the > Google Groups "AtoM Users" group. To unsubscribe from this group and > stop receiving emails from it, send an email to ica-atom- > users+un...@googlegroups.com. To view this discussion visit > https://groups.google.com/d/msgid/ica-atom-users/CAM-- > OVvWKLJJZjgZhdKdM4EKOFakwB4R0cFMW5gPTX%2BVsd07yQ%40mail.gmail.com > <https://groups.google.com/d/msgid/ica-atom-users/CAM-- > OVvWKLJJZjgZhdKdM4EKOFakwB4R0cFMW5gPTX%2BVsd07yQ%40mail.gmail.com? > utm_medium=email&utm_source=footer>. --

Alberto Pereira

unread,
Mar 20, 2025, 8:13:47 AM3/20/25
to ica-ato...@googlegroups.com
Hi,

Hans-Arno, that's probably the ImageMagick version. I'm not sure what version is in the repositories now, but for Ubuntu 22.04, it's version 6. Version 7 made some changes, notably removing the "convert" command but still maintaining the legacy command. Maybe they changed the signature of some function.

Roberto, the file is still UTF-8, or can be, but without the BOM characters at the beginning. There's been a discussion around BOM and Unicode standard for years, with Microsoft sticking to it; but it breaks in lots of different places. 
When I use excel to create csv files, I usually just convert its encoding afterwards, to stay as far away as possible from any discussions around character encoding standards!

--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/ica-atom-users/5b918d4a-d45f-4580-be59-67b9e413497an%40googlegroups.com.

Dan Gillean

unread,
Mar 20, 2025, 8:18:57 AM3/20/25
to ica-ato...@googlegroups.com
Hi Roberto, 

I have also been unable to reproduce this issue. A couple observations: 

First, we have long recommended that our users avoid Microsoft products for data preparation. While after much feedback they have vastly improved the more standardized options included in recent versions of Excel, MS has a long history of ignoring existing (and widely adopted) standards in favor of implementing things their own proprietary way, which can have all kinds of unexpected consequences for AtoM. This includes their default character encoding, and also how line ending characters are rendered in the file. I would not be surprised if there are other aspects of their UTF-8 export option that are non-standard. 

In any case, I did look a bit more into the hex sequence you found, and I think this is a good summary: 
There are other good answers on the page as well. Essentially, the Byte Order Mark (or BOM) that those characters represent are not recommended for UTF-8 files, and in fact can break many other implementations. I also noticed that the people in the comments of that Stack Overflow thread objecting tended to mention its usefulness with Microsoft products. Therefore, it seems that once again, despite not being recommended, Microsoft has found a BOM to be an easier way for THEM to handle UTF-8 files and have persisted in doing so. 

In general, we recommend using something like LibreOffice Calc as an alternative. It is free, open source, tends to follow both de jure and de facto open standards wherever possible, and has some basic transparency that makes data prep and review much easier - such as the ability to select the desired separator, character encoding, and string delimiter in a widget with a preview prior to opening any CSV file: 

calc-csv-options.png


I will note that some of AtoM's other import options do seem a bit more flexible - for example, read this section of the CSV validation documentation for archival description CSV imports, which will check for a BOM in an incoming file: 
This implies to me that a valid UTF-8 BOM would be acceptable - and in fact, though I still don't recommend its use (ESPECIALLY in a production instance), I have tested later versions of Excel with the UTF-8 CSV export option and they will import into AtoM. It may be possible to add similar logic to the digitalobject:load task, but at present I don't think this is a priority for the Maintainers, given the community consensus around the inclusion of BOM characters in UTF-8 files.

Cheers, 

Dan Gillean, MAS, MLIS
Business & User Experience Analyst
Artefactual Systems, Inc.
604-527-2056
he / him


To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/ica-atom-users/1a162731-92f3-4806-9a7f-4d43a2db3f7d%40gmail.com.

Brandon Uhlman

unread,
Mar 20, 2025, 12:59:09 PM3/20/25
to AtoM Users
Hans-Arno,

I'm experiencing the same problem as you.

I traced the problem to lines 138-139 of /plugins/sfThumbnailPlugin/lib/sfImageMagickAdapter.class.php, when the plugin is trying to determine whether ImageMagick is installed by calling convert and identify with either the specified parameters or (by default) no parameters.

With no parameters, convert and identify return a version tag and usage information. When the plugin does that at lines 141 and 146, it checks for the string 'ImageMagick' in the first line of the output, which is there, but if you actually run those commands on the command line with no parameters on Ubuntu 20.04 (Atom 2.8) and Ubuntu 24.04 (Atom 2.9), the spurious error messages are generated at the command line too.

Ubuntu 20.04
buhlman@libatomdev02:~$ convert -help
Version: ImageMagick 6.9.10-23 Q16 x86_64 20190101 https://imagemagick.org
Copyright: © 1999-2019 ImageMagick Studio LLC
[snip]
By default, the image format of `file' is determined by its magic
number.  To specify a particular image format, precede the filename
with an image format name and a colon (i.e. ps:image) or specify the
image type as the filename suffix (i.e. image.ps).  Specify 'file' as
'-' for standard input or output.

Ubuntu 24.04
buhlman@libatomdev02:~$ convert -help
atom@libatomstg02:~$ convert
Version: ImageMagick 6.9.12-98 Q16 x86_64 18038 https://legacy.imagemagick.org
Copyright: (C) 1999 ImageMagick Studio LLC
[snip]
By default, the image format of `file' is determined by its magic
number.  To specify a particular image format, precede the filename
with an image format name and a colon (i.e. ps:image) or specify the
image type as the filename suffix (i.e. image.ps).  Specify 'file' as
'-' for standard input or output.

convert-im6.q16: missing required argument  @ error/convert.c/ConvertImageCommand/545.

Ubuntu 24.04 does include a slightly newer version of ImageMagick 6, and I think this was probably introduced there. Looks like it was in this commit.

Brandon

Hans-Arno Mielsch

unread,
Mar 21, 2025, 5:10:27 AM3/21/25
to AtoM Users
Hi Brandon,
nice analysis - thanks for the hint to the change which probably caused this error in ImageMagick.
Well it is of course caused by AtoM for not providing the parameters ;-)
But the error handling in ImageMagick looks suspicious, too.
if ThrowMagickException() is really doing a throw, the next line will be unreachable, isn't it?

Well, I installed AtoM in Ubuntu 24 as documented and I have no clue, how to downgrade to ImageMagick 6.9.10-23.

-Hans-Arno
Reply all
Reply to author
Forward
0 new messages