Hi everyone,I'm looking for suggestions on an issue we're having with Archivematica FITS when it's processing PDFs. For context, we manually installed Archivematica on RHEL9 on a VM with 8CPU/32GB RAM.
We are trying to process a 50GB SIP with about 1,500 files, most of which are PDFs. The transfer fails on the FITS steps at two points: on the Transfer page on Characterize and Extract Metadata, and on the Ingest page during the Process Submission Documentation microservice during the Characterize and extract metadata on submission documentation job.
In the Archivematica dashboard failure logs I see some JVM out of memory errors, and digging deeper into the logs on our VM we see that the process that appears to be failing is a perl process running exiftool as part of FITS, and it fails due to running out of memory. I have included a screenshot of our resource usage monitoring so you can see the memory spike and the resulting crash.
Other SIPs that are larger/contain more files don't fail on this step or consume nearly as much memory, so we believe this is specifically related to the fact that this SIP is mostly PDFs. And other than this specific situation, the rest of the microservices seem to be running well within the VM limits. So we'd like to avoid adding more memory if we can. We also don't want to turn off FITS.
Given that, I have a few questions:
- Is it normal for exiftool to consume so much memory when processing PDFs?
- Is there anything we can do other than turn off FITS/add more memory to the VM that might improve the performance of exiftool on PDFs?
Would love to hear from anyone who has faced similar problems.
Thanks,
Nicole Currens
Senior Software Developer
University of Texas Libraries