XNAT pre-archive tasks stuck

12 views
Skip to first unread message

Mark Janse

unread,
Jul 3, 2024, 12:01:38 PM (2 days ago) Jul 3
to xnat_discussion
Hi all,

We are running into some pre-archive issues. If we have a lot of Deletes running/pending, then start some Builds/Rebuilds, the tasks seem to get "stuck" sort of after the (re)builds are completed, and the previously running Deletes don't do anything.

In the JavaMelody monitoring, we see the builds running just fine: the JMS threads ("DefaultMessageListenerContainer-xx") are happily running along, accessing i/o and things. After all builds are finished and just the delete tasks are left, nothing happens. All threads go to idle. If I then schedule another Rebuild in the pre-archive, it also gets stuck. It almost seems like the message queue gets 'stuck' for some reason, where the listeners think it's empty but xnat keeps adding to it?

We ran this xnatpy thingy to monitor the status of session in our pre-archive
session.prearchive.caching = False
print(Counter([x.status for x in  session.prearchive.sessions()]))

Results:
Counter({'READY': 8395, 'ERROR': 486, 'QUEUED_DELETING': 97, '_BUILDING': 3})
Counter({'READY': 8396, 'ERROR': 486, 'QUEUED_DELETING': 97, '_BUILDING': 2})
Counter({'READY': 8397, 'ERROR': 486, 'QUEUED_DELETING': 97, '_BUILDING': 1})
Counter({'READY': 8398, 'ERROR': 486, 'QUEUED_DELETING': 97})
Counter({'READY': 8398, 'ERROR': 485, 'QUEUED_DELETING': 97, 'QUEUED_BUILDING': 1})
Counter({'READY': 8398, 'ERROR': 485, 'QUEUED_DELETING': 97, 'QUEUED_BUILDING': 1})
(bottom line keeps repeating in perpetuity, ).

Restarting XNAT temporarily resolves the issue as the pre-archive gets rebuilt. But we're looking for some more robust solution. Any ideas where to look? Is there some API we could use to cancel the queued tasks, then try requeuing them to avoid having to restart xnat?
We're still running XNAT 1.8.6.1. Upgrades are planned but we're first migrating off CentOS and Postgres 9.

Any input would be greatly appreciated.

Kind regards,
Mark Janse
Health-RI

Charlie Moore

unread,
Jul 3, 2024, 12:11:21 PM (2 days ago) Jul 3
to xnat_discussion
Hi Mark,

You can trigger a rebuild of the entire prearchive with a PUT to /data/prearchive as an admin. If you try that (on a dev server, first, hopefully), does it resolve your issue? This is the function that ends up getting called (with force = true): https://bitbucket.org/xnatdev/xnat-web/src/09c36e7b39820501e26164dcae19304c6a7ec8bf/src/main/java/org/nrg/xnat/helpers/prearchive/PrearcDatabase.java?at=master#lines-487:508 .

Thanks,
Charlie Moore

Mark Janse

unread,
Jul 3, 2024, 12:35:35 PM (2 days ago) Jul 3
to xnat_di...@googlegroups.com
Hi Charlie,

Thanks for the quick response. Unfortunately, this does not seem to resolve the issue. While the pending tasks all get removed and the pre-archive gets rebuilt nicely, if I try to delete or build a session, it still will get stuck as pending or "QUEUED_".
I also don't see anything exciting in the log files (besides the pre-archive rebuilding stuff), but if you know a good place to look let me know.

Kind regards,
Mark Janse
Health-RI

Op wo 3 jul 2024 om 18:11 schreef 'Charlie Moore' via xnat_discussion <xnat_di...@googlegroups.com>:
--
You received this message because you are subscribed to a topic in the Google Groups "xnat_discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/xnat_discussion/kP8fC6meWbQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to xnat_discussi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/xnat_discussion/cdbd8784-2e8c-49c6-8da5-a23d01b11473n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages