METS Generation fails for a SIP consisting of 9000 jpegs

48 views
Skip to first unread message

arif....@gmail.com

unread,
Mar 15, 2021, 11:53:08 AM3/15/21
to archivematica
Hello fellow AM users,

I have encountered a rather strange problem with AM ingest that doesn't quite match any of the previously reported issues here on the list.

I have been trying to ingest a SIP consisting of 9000+ jpegs (all constituents of a single collection with a total size 7.6 GB).  While all steps in the workflow till the "Generate METS file" step do complete successfully,  when it gets to the "Generate METS file", it shows "Executing command" for over 24 hours before eventually failing with no log or error message and an error code "None."
The AM server has a reasonable amount of resource (8 CPUs, 32 GB RAM, 100 GB diskspace) with MySQL accepting 5K+ connections. So,  I have ruled out insufficient compute resource as a potential cause.

Has anybody else experienced this? If so, how did you solve it?

Any help would be greatly appreciated.

Best
Arif

gclibrary...@gmail.com

unread,
Mar 15, 2021, 11:55:19 AM3/15/21
to archivematica
Yes, I experienced an analogous issue with PDFs and, although on the Artefactual ArchivesDirect hosted server, I was advised to break down into smaller SIPs.

Best,
Stephen Klein

Arif Shaon

unread,
Mar 15, 2021, 11:59:13 AM3/15/21
to archiv...@googlegroups.com
Hi Stephen,

Thanks for the quick response.  Yes, smaller SIPs would be a workaround but I was hoping for a simpler solution.
Were you ever explained the root cause of the problem that you faced with the PDFs?

Best
Arif


--
You received this message because you are subscribed to the Google Groups "archivematica" group.
To unsubscribe from this group and stop receiving emails from it, send an email to archivematic...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/archivematica/bf94f07a-ad8f-43ac-8cc2-35d90418e795n%40googlegroups.com.

gclibrary...@gmail.com

unread,
Mar 15, 2021, 12:00:36 PM3/15/21
to archivematica
As assumed, computational resources to write a METs composed on 9K+ documents needs to be enormous.

Arif Shaon

unread,
Mar 15, 2021, 12:22:35 PM3/15/21
to archiv...@googlegroups.com
I see. So 32GB and 8 CPU wouldnt be sufficient for this, would it?
Interestingly, I checked the resources available while the "Generate Mets" was running and I noticed the following:
Only 1 CPU was being utilised at 100% capacity
Only 20 GB disk space was being used.
Only 30% of 32GB RAM was in use.
I find this rather puzzling.

If anybody else has any insight into this, it would be really helpful.

Best
Arir

You received this message because you are subscribed to a topic in the Google Groups "archivematica" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/archivematica/G0vdniSm03A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to archivematic...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/archivematica/5345999d-0aeb-422c-ae50-548ace984bc8n%40googlegroups.com.

Tatiana Canelhas

unread,
Mar 16, 2021, 11:35:05 AM3/16/21
to archivematica
Hi Arif,

Have you tried to install more than one MCPClient to try to divide this tasks (microservices)?


Hope this helps,
Tati

Arif Shaon

unread,
Mar 16, 2021, 11:54:37 AM3/16/21
to archiv...@googlegroups.com
Hi Tati,

No I haven't actually. I will give it a try and see what happens. I shall report back.

Thank you very much. 

Best
Arif

Reply all
Reply to author
Forward
0 new messages