Characterization errors

81 views
Skip to first unread message

Ashley Ray

unread,
Sep 27, 2022, 10:30:08 AM9/27/22
to archivematica
Hi everyone,

I've been doing some testing in our staging version of Archivematica as we prepare to make our born digital accessible. This morning I put in 21 a/v files in order to check what duration and dimension the METS has for each file. Everything seemed to go smoothly but when I looked at the METS, only 1 of the 21 files were able to be characterized and had duration and dimension. I looked into Archivematica and noticed that there was an error on Characterize and extract metadata. The error logs look like this:

NalediPandorSpeech.WMA
get() returned more than one FormatVersion -- it returned 2!Traceback (most recent call last): File "/src/MCPClient/lib/job.py", line 111, in JobContext yield File "/src/MCPClient/lib/clientScripts/characterize_file.py", line 131, in call job.set_status(main(job, *job.args[1:])) File "/src/MCPClient/lib/clientScripts/characterize_file.py", line 46, in main format = FormatVersion.active.get(fileformatversion__file_uuid=file_uuid) File "/usr/local/lib/python2.7/dist-packages/django/db/models/manager.py", line 127, in manager_method return getattr(self.get_queryset(), name)(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/django/db/models/query.py", line 338, in get (self.model._meta.object_name, num) MultipleObjectsReturned: get() returned more than one FormatVersion -- it returned 2!

This error is repeated on all 20 items but they're a variety of things: wav, m4a, wma, mp3, mp4, mpg, wmv, mov, etc. The one item that was characterized as expected was an old .wav file (audio/x-wav (Waveform Audio (PCMWAVEFORMAT)) fmt/141 )

I tried searching around but couldn't find anyone else with this issue but hoping someone can help! 

Thanks!

Best,

Ashley

Sarah Romkey

unread,
Sep 28, 2022, 10:27:48 AM9/28/22
to archiv...@googlegroups.com
Hi Ashley,

Can you check the file identification for these files, did it succeed or fail?

Cheers,

Sarah

Sarah Romkey, MAS,MLIS
Archivematica Program Manager
@archivematica / @accesstomemory




--
You received this message because you are subscribed to the Google Groups "archivematica" group.
To unsubscribe from this group and stop receiving emails from it, send an email to archivematic...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/archivematica/28dfd444-c2f1-4831-b22a-e357f93a0afbn%40googlegroups.com.

Ashley Ray

unread,
Sep 28, 2022, 11:13:20 AM9/28/22
to archiv...@googlegroups.com

Hey Sarah!

 

Yes, it looks like all successes for the file identifications. For example:

 

IDCommand: Identify using Siegfried 1.7.10

IDCommand UUID: 75290b14-2931-455f-bdde-3b4b3f8b7f15

IDTool: Siegfried

IDTool UUID: 454df69d-5cc0-49fc-93e4-6fbb6ac659e7

File: (a7e3a8be-e79e-4825-98af-ee968c3760b6) /var/archivematica/sharedDirectory/watchedDirectories/workFlowDecisions/selectFormatIDToolTransfer/AR_TST_9_27-16dd2bfd-e530-4ca5-8fc2-4be574779a9d/objects/Audio/NalediPandorSpeech.WMA

Command output: fmt/132

/var/archivematica/sharedDirectory/watchedDirectories/workFlowDecisions/selectFormatIDToolTransfer/AR_TST_9_27-16dd2bfd-e530-4ca5-8fc2-4be574779a9d/objects/Audio/NalediPandorSpeech.WMA identified as a WMA file

 

Is there anything else you’d need to see?

 

Best,

 

Ashley

 

Ashley Ray

Digital Workflow Specialist

M 0788 921 1741

Pronouns: she/her

--
You received this message because you are subscribed to a topic in the Google Groups "archivematica" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/archivematica/KVffW2CHwm8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to archivematic...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/archivematica/CAAr2QtvznLfwZ1VVctMgLrh7p7RKOfTmGhEZ%3Dp8Z-CJ%3DSP-9Qw%40mail.gmail.com.

Ashley Ray

unread,
Sep 29, 2022, 9:30:53 AM9/29/22
to archivematica
Hey Sarah,

I actually did have another ingest fail today on both file format identification and characterization. Some, like jpgs and tifs, I don't think should have any trouble though there are some other unknown type files mixed in because they were given to us without extensions I guess. 

File Identification Failure examples that actually successfully passed characterization:

1409_George_Thudichum_copy_.jpg
Exit Code: 255
Standard Output
IDCommand: Identify using Siegfried 1.7.10 IDCommand UUID: 75290b14-2931-455f-bdde-3b4b3f8b7f15 IDTool: Siegfried IDTool UUID: 454df69d-5cc0-49fc-93e4-6fbb6ac659e7 File: (0c3d34a1-c663-4adc-9fa8-6c566003ad35) /var/archivematica/sharedDirectory/watchedDirectories/workFlowDecisions/selectFormatIDToolTransfer/SABIOM8-8c6edd45-d758-498e-aee2-1a5f81e33c48/objects/Disc_2/1409_George_Thudichum_copy_.jpg

Errors and diagnosis
Error: IDCommand with UUID 75290b14-2931-455f-bdde-3b4b3f8b7f15 exited non-zero. Error: siegfried determined that the file format is UNKNOWN

Desktop_DF
Exit code: 255
Standard output (stdout)
IDCommand: Identify using Siegfried 1.7.10 IDCommand UUID: 75290b14-2931-455f-bdde-3b4b3f8b7f15 IDTool: Siegfried IDTool UUID: 454df69d-5cc0-49fc-93e4-6fbb6ac659e7 File: (5fee15d8-40d7-492a-ab99-32e9dcee9f6f) /var/archivematica/sharedDirectory/watchedDirectories/workFlowDecisions/selectFormatIDToolTransfer/SABIOM8-8c6edd45-d758-498e-aee2-1a5f81e33c48/objects/Disc_1/Desktop_DF

Errors and diagnostics (stderr)
Error: IDCommand with UUID 75290b14-2931-455f-bdde-3b4b3f8b7f15 exited non-zero. Error: siegfried determined that the file format is UNKNOWN

1408_Marie_Thudichum.tif
Exit code: 255
Standard output (stdout)
IDCommand: Identify using Siegfried 1.7.10 IDCommand UUID: 75290b14-2931-455f-bdde-3b4b3f8b7f15 IDTool: Siegfried IDTool UUID: 454df69d-5cc0-49fc-93e4-6fbb6ac659e7 File: (d749544e-834e-470f-87eb-6bacb00f353b) /var/archivematica/sharedDirectory/watchedDirectories/workFlowDecisions/selectFormatIDToolTransfer/SABIOM8-8c6edd45-d758-498e-aee2-1a5f81e33c48/objects/Disc_2/1408_Marie_Thudichum.tif

Errors and diagnostics (stderr)
Error: IDCommand with UUID 75290b14-2931-455f-bdde-3b4b3f8b7f15 exited non-zero. Error: siegfried determined that the file format is UNKNOWN

And then items in the same ingest that passed file identification but failed characterization:

1408_Marie_Thudichum.jpg
Standard output (stdout)
IDCommand: Identify using Siegfried 1.7.10 IDCommand UUID: 75290b14-2931-455f-bdde-3b4b3f8b7f15 IDTool: Siegfried IDTool UUID: 454df69d-5cc0-49fc-93e4-6fbb6ac659e7 File: (70665056-56f2-46ac-966a-da3df6d8c748) /var/archivematica/sharedDirectory/watchedDirectories/workFlowDecisions/selectFormatIDToolTransfer/SABIOM8-8c6edd45-d758-498e-aee2-1a5f81e33c48/objects/Disc_2/1408_Marie_Thudichum.jpg Command output: fmt/41 /var/archivematica/sharedDirectory/watchedDirectories/workFlowDecisions/selectFormatIDToolTransfer/SABIOM8-8c6edd45-d758-498e-aee2-1a5f81e33c48/objects/Disc_2/1408_Marie_Thudichum.jpg identified as a Raw JPEG Stream

Errors and diagnostics (stderr)
get() returned more than one FormatVersion -- it returned 2!Traceback (most recent call last): File "/src/MCPClient/lib/job.py", line 111, in JobContext yield File "/src/MCPClient/lib/clientScripts/characterize_file.py", line 131, in call job.set_status(main(job, *job.args[1:])) File "/src/MCPClient/lib/clientScripts/characterize_file.py", line 46, in main format = FormatVersion.active.get(fileformatversion__file_uuid=file_uuid) File "/usr/local/lib/python2.7/dist-packages/django/db/models/manager.py", line 127, in manager_method return getattr(self.get_queryset(), name)(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/django/db/models/query.py", line 338, in get (self.model._meta.object_name, num) MultipleObjectsReturned: get() returned more than one FormatVersion -- it returned 2!

1409_George_Thudichum
Standard output (stdout)
IDCommand: Identify using Siegfried 1.7.10 IDCommand UUID: 75290b14-2931-455f-bdde-3b4b3f8b7f15 IDTool: Siegfried IDTool UUID: 454df69d-5cc0-49fc-93e4-6fbb6ac659e7 File: (2d6b144d-3add-43a0-b0d1-9c83d0e0f3ce) /var/archivematica/sharedDirectory/watchedDirectories/workFlowDecisions/selectFormatIDToolTransfer/SABIOM8-8c6edd45-d758-498e-aee2-1a5f81e33c48/objects/Disc_1/1409_George_Thudichum Command output: fmt/353 /var/archivematica/sharedDirectory/watchedDirectories/workFlowDecisions/selectFormatIDToolTransfer/SABIOM8-8c6edd45-d758-498e-aee2-1a5f81e33c48/objects/Disc_1/1409_George_Thudichum identified as a TIFF

Errors and diagnostics (stderr)
get() returned more than one FormatVersion -- it returned 2!Traceback (most recent call last): File "/src/MCPClient/lib/job.py", line 111, in JobContext yield File "/src/MCPClient/lib/clientScripts/characterize_file.py", line 131, in call job.set_status(main(job, *job.args[1:])) File "/src/MCPClient/lib/clientScripts/characterize_file.py", line 46, in main format = FormatVersion.active.get(fileformatversion__file_uuid=file_uuid) File "/usr/local/lib/python2.7/dist-packages/django/db/models/manager.py", line 127, in manager_method return getattr(self.get_queryset(), name)(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/django/db/models/query.py", line 338, in get (self.model._meta.object_name, num) MultipleObjectsReturned: get() returned more than one FormatVersion -- it returned 2!

1416__Thudichum.jpg
Standard output (stdout)
IDCommand: Identify using Siegfried 1.7.10 IDCommand UUID: 75290b14-2931-455f-bdde-3b4b3f8b7f15 IDTool: Siegfried IDTool UUID: 454df69d-5cc0-49fc-93e4-6fbb6ac659e7 File: (d6b5e0a5-7062-43de-8b6f-aa43d0ab45b1) /var/archivematica/sharedDirectory/watchedDirectories/workFlowDecisions/selectFormatIDToolTransfer/SABIOM8-8c6edd45-d758-498e-aee2-1a5f81e33c48/objects/Disc_2/1416__Thudichum.jpg Command output: fmt/41 /var/archivematica/sharedDirectory/watchedDirectories/workFlowDecisions/selectFormatIDToolTransfer/SABIOM8-8c6edd45-d758-498e-aee2-1a5f81e33c48/objects/Disc_2/1416__Thudichum.jpg identified as a Raw JPEG Stream

Errors and diagnostics (stderr)
get() returned more than one FormatVersion -- it returned 2!Traceback (most recent call last): File "/src/MCPClient/lib/job.py", line 111, in JobContext yield File "/src/MCPClient/lib/clientScripts/characterize_file.py", line 131, in call job.set_status(main(job, *job.args[1:])) File "/src/MCPClient/lib/clientScripts/characterize_file.py", line 46, in main format = FormatVersion.active.get(fileformatversion__file_uuid=file_uuid) File "/usr/local/lib/python2.7/dist-packages/django/db/models/manager.py", line 127, in manager_method return getattr(self.get_queryset(), name)(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/django/db/models/query.py", line 338, in get (self.model._meta.object_name, num) MultipleObjectsReturned: get() returned more than one FormatVersion -- it returned 2!

I hope that makes some sort of sense to you because it really doesn't make any sense to me. I've put through a few other ingests today without these file or characterization errors so I'm not sure what the root cause is. It would definitely make more sense and be easier to track down the error if it was happening every time, to every file, in every ingest but it's not so far. 

Best,

Ashley

Sarah Romkey

unread,
Sep 30, 2022, 8:03:44 AM9/30/22
to archiv...@googlegroups.com
Hmmm... the reason I asked about file ID is I saw "more than one format version" in the output, and thought possibly that characterisation wasn't running as expected because Archivematica didn't know which rule to run. Which characterization tool is supposed to be running on these files? (Not sure if you have adjusted the defaults in preservation planning). Could you try running whichever tool you are expecting outside of Archivematica, and see if it's working correctly there?

Sarah Romkey, MAS,MLIS

Ashley Ray

unread,
Oct 5, 2022, 6:01:56 AM10/5/22
to archivematica
Hey Sarah,

It looks like we do have all the default settings. I have asked a colleague to try running the files in the tools outside of Archivematica but I think we might try updating Archivematica first as we're still on 1.11. I'm hoping maybe that'll solve it. I'll post back here the results when we do that update!

Best,

Ashley

Reply all
Reply to author
Forward
0 new messages