Hi all,
We are running into some pre-archive issues. If we have a lot of Deletes running/pending, then start some Builds/Rebuilds, the tasks seem to get "stuck" sort of after the (re)builds are completed, and the previously running Deletes don't do anything.
In the JavaMelody monitoring, we see the builds running just fine: the JMS threads ("DefaultMessageListenerContainer-xx") are happily running along, accessing i/o and things. After all builds are finished and just the delete tasks are left, nothing happens. All threads go to idle. If I then schedule another Rebuild in the pre-archive, it also gets stuck. It almost seems like the message queue gets 'stuck' for some reason, where the listeners think it's empty but xnat keeps adding to it?
We ran this xnatpy thingy to monitor the status of session in our pre-archive
session.prearchive.caching = False
print(Counter([x.status for x in session.prearchive.sessions()]))
Results:
Counter({'READY': 8395, 'ERROR': 486, 'QUEUED_DELETING': 97, '_BUILDING': 3})
Counter({'READY': 8396, 'ERROR': 486, 'QUEUED_DELETING': 97, '_BUILDING': 2})
Counter({'READY': 8397, 'ERROR': 486, 'QUEUED_DELETING': 97, '_BUILDING': 1})
Counter({'READY': 8398, 'ERROR': 486, 'QUEUED_DELETING': 97})
Counter({'READY': 8398, 'ERROR': 485, 'QUEUED_DELETING': 97, 'QUEUED_BUILDING': 1})
Counter({'READY': 8398, 'ERROR': 485, 'QUEUED_DELETING': 97, 'QUEUED_BUILDING': 1})
(bottom line keeps repeating in perpetuity, ).
Restarting XNAT temporarily resolves the issue as the pre-archive gets rebuilt. But we're looking for some more robust solution. Any ideas where to look? Is there some API we could use to cancel the queued tasks, then try requeuing them to avoid having to restart xnat?
We're still running XNAT 1.8.6.1. Upgrades are planned but we're first migrating off CentOS and Postgres 9.
Any input would be greatly appreciated.
Kind regards,
Mark Janse
Health-RI