Trouble with Ingest

241 views
Skip to first unread message

frank....@gmail.com

unread,
Jan 19, 2021, 5:19:22 AM1/19/21
to archivematica
We have for some time now experienced Archivematica as generally very slow, but have explained this with the (mostly) very large datasets we put in. 

Now it seems the problem is more complex. We have observed that every time we open the empty Ingest-tab, this generates an almost never ending series of queries against MySQL. These takes up 100% CPU per view, so that two browsers showing the Ingest-tab results in 200% CPU on the server. As a consequence, when we  start new jobs in Transfer, these are delayed for hours and sometimes days when they are sent to Ingest.

On the Ingest-tab, we have done "Remove all completed". We have also cleared out everything under Administration -> Processing storage usage.

Is there a way to find out what Ingest is doing in the background, so that we can stop it?

I include a series of screenshots to illustrate the problem, both in Archivematica and in MySQL (MariaDB).

On a side note, the Transfer-tab shows a red circle with "1" inside it, as if waiting for input. But there is nothing waiting. We tried solving this earlier, by rebuilding the index, but the mark remains. Index rebuild finished without any errors. I don't know if this is related, but it's worth mentioning.

Skjermbilde2.PNGSkjermbilde3.PNGSkjermbilde4.PNGSkjermbilde5.PNG

frank....@gmail.com

unread,
Jan 20, 2021, 7:42:15 AM1/20/21
to archivematica
Hi again,

We also discovered that the /var/Archivematica/sharedDirectory/currentlyProcessing/ directory isn't empty. There are lots of folders containing metadata and logs for previous runs, and one folder also contains a 7z-file of a run that failed. I would have expected this to be moved to the failed-directory. 

Can this be what causes the delay in Ingest? And is it trivial to just delete it?

Regards
Frank Skagemo

Ross Spencer

unread,
Jan 20, 2021, 8:32:55 AM1/20/21
to archivematica
Hi Frank,

That one is a known issue: https://github.com/archivematica/Issues/issues/1113 if you have nothing currently processing, i.e. Archivematica is idle with completed transfers, and all your transfers are complete, you can delete the folders in that directory with no issue. The volume should be low, and as noted in that ticket shouldn't impact performance. 

Have a look at the storage usage options here too for more areas that can be cleaned up but via the Archivematica User Interface: https://www.archivematica.org/en/docs/archivematica-1.12/user-manual/administer/dashboard-admin/#dashboard-usage 

I have asked others who do the database truncate more regularly than me who might have the script handy (where I don't) so hopefully we'll have a SQL snippet you can use to clean up the transfers/SIP tables. 

What version are you running? There was some indexing added to these tables in 1.11 (IIRC which version) which should make things faster. But there are still limits with lots of processing. More information in this ticket too, but there should be a way to purge some of this very transitory data where once transfer/ingest is complete we're using other mechanisms to manage to result of processing, i.e. AIPs. 

Hopefully back to you  soon. 
Best,
Ross

Ross Spencer

unread,
Jan 20, 2021, 8:47:05 AM1/20/21
to archivematica
Hi again,

This might be useful to you Frank (and others watching this list with the same issues):

SET FOREIGN_KEY_CHECKS=0;
truncate table Derivations;
truncate table Events_agents;
truncate table Events;
truncate table FilesIdentifiedIDs;
truncate table FilesIDs;
truncate table Files;
truncate table Jobs;
truncate table main_fpcommandoutput;
truncate table SIPs;
truncate table Tasks;
truncate table Transfers;
truncate table UnitVariables;
SET FOREIGN_KEY_CHECKS=1;


The data in these tables is all an output of processing. It gets collected in the database and then output to your METS file in respective SIPs/AIPs. 

Running this command will remove this information from the database so you can't access this historical processing data from the dashboard once it is clear, but if you have already cleared transfers from the UI it means that you are unlikely to want this anyway. You have access to information about AIPs from the Archival Storage tab, the Storage Service, and of course the packages themselves moving forward. Which is probably how you're managing Archivematica as we speak. 

Perhaps restart the services once you have cleaned currently processing, or performed this operation on the database with: for i in dashboard mcp-client mcp-server storage-service; do service archivematica-$i restart; done

An you should be good to go. I feel it will be interesting for this list to hear how it goes for you. 

Let us know,
Best,
Ross

frank....@gmail.com

unread,
Jan 21, 2021, 7:58:22 AM1/21/21
to archivematica
Hi Ross,

Thank you for the thorough reply. On your suggestion, I deleted all files and folders under /currentlyProcessing/, including the one that contained the 7z-file. This didn't immediately make a difference, so I logged into the MCP-database. After running all the commands you listed, something changed:

The number "1" dissapeared from transfer, and instead an old (and failed) job showed up again, seemingly waiting for input on "Examine contents". We chose "Skip", and then it immediately failed on "Create transfer metadata.XML". But the job itself didn't fail, and we can't remove it from the Transfer-tab. We have tried "Remove all completed" and "Remove" on the job itself - but none of them responds.

I started digging into the filesystem under /var/archivematica/sharedDirectory/ to see if I could find anything out of place, and I think i did. The UUID on the stuck job in Transfer corresponds to a directory under /var/archivematica/sharedDirectory/watchedDirectories/workFlowDecisions/examineContentsChoice/. This contains the whole original package. I can't find any other references to it in the files.

The next question is therefore; Can I delete this folder? Or will that create more trouble in the UI/database?

Regards,
Frank

Ross Spencer

unread,
Jan 21, 2021, 8:38:42 AM1/21/21
to archivematica
Hi Frank,

Good sleuthing. You should be able to delete that whole folder - restart the services - and then I expect it to show up as failed in the dashboard where you can delete it. You can then re-run that transfer from your transfer source if this still needs to be processed. 

Let me know how it goes. 
Best,
Ross

frank....@gmail.com

unread,
Jan 22, 2021, 4:12:31 AM1/22/21
to archivematica
Hi Ross,

That's it!

Deleted the folder, ran all the mysql-commands again and restarted the services, and Archivematica is back in running order again.

Thanks for your help, Ross. We will use what we learned here as a start for a new routine in case of failed transfers in the future. I'm guessing similar things might happen again...

Regards,
Frank

Ross Spencer

unread,
Jan 22, 2021, 6:21:58 AM1/22/21
to archivematica
Awesome! Thanks Frank! Best news of the day. 

Glad to have helped, and I'm glad this thread has provided the opportunity to put some of this information out there. It should be useful for others.

Have a good weekend.
Regards,
Ross

Sean Kalynuk

unread,
Jan 26, 2021, 3:24:53 PM1/26/21
to archiv...@googlegroups.com

This cleaned up our development environment. Thanks Ross!

 

For some context, our dev MCP Server was taking 14 minutes to finish initializing at start-up due to the huge amount of data in the Tasks table (4GB of data and over 3 million rows). Now with the table cleaned up, start-up is no longer slow.

 

--

Sean

 

From: archiv...@googlegroups.com <archiv...@googlegroups.com> on behalf of Ross Spencer <rspe...@artefactual.com>
Date: Wednesday, January 20, 2021 at 7:47 AM
To: archivematica <archiv...@googlegroups.com>
Subject: [archivematica] Re: Trouble with Ingest

Caution: This message was sent from outside the University of Manitoba.

--
You received this message because you are subscribed to the Google Groups "archivematica" group.
To unsubscribe from this group and stop receiving emails from it, send an email to archivematic...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/archivematica/3d689232-47aa-434d-b5a0-d218fba46779n%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages