Upload and Archive Issues

60 views
Skip to first unread message

Duncan Smith

unread,
May 29, 2024, 12:32:46 PMMay 29
to xnat_discussion
Hi all,

We've run into a couple of issues when trying to automate DICOM uploads into our XNAT (1.8.10.1), using XNATPY.

Issue 1

prearchive_session = session.services.import_(
zipped_dicoms, project=project, subject=subject, experiment=f"{experiment}-DICOM",
destination="/prearchive")

We were able to run our script for 12+ hours, uploading multiple MRI sessions in the process, but eventually this line failed and gave us:

Exception in thread XNATpyKeepAliveThread:
Traceback (most recent call last):
File "/home/flitney_local/.conda/envs/xnat-relay/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
self.run()
File "/home/flitney_local/.conda/envs/xnat-relay/lib/python3.12/threading.py", line 1010, in run
self._target(*self._args, **self._kwargs)
File "/home/flitney_local/.conda/envs/xnat-relay/lib/python3.12/site-packages/xnat/session.py", line 249, in _keepalive_thread_run
self.heartbeat()
File "/home/flitney_local/.conda/envs/xnat-relay/lib/python3.12/site-packages/xnat/session.py", line 240, in heartbeat
self.get('/data/JSESSION', timeout=10)
File "/home/flitney_local/.conda/envs/xnat-relay/lib/python3.12/site-packages/xnat/session.py", line 376, in get
self._check_response(response, accepted_status=accepted_status, uri=uri) # Allow OK, as we want to get data
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/flitney_local/.conda/envs/xnat-relay/lib/python3.12/site-packages/xnat/session.py", line 326, in _check_response
raise exceptions.XNATResponseError(
xnat.exceptions.XNATResponseError: Invalid status for response from XNATSession for url https://test-xnat.win.ox.ac.uk/data/JSESSION (status 502, accepted status: [200])

We haven't been able to reproduce it (the session in question uploaded without a problem when we re-tried it) - although the upload still succeeded despite the error.

Issue 2

prearchive_session.archive(overwrite="delete", trigger_pipelines=False)

We have been running into an issue with the archive command and after 10 minutes (almost exactly) we get an error response that looks to be a form of timeout, although eventually the archive process does complete.

2024-05-29 16:13:36,552 - siemens_xnat_upload - ERROR - Invalid status for response from XNATSession for url https://test-xnat.win.ox.ac.uk/data/services/archive?src=/data/prearchive/projects/2021_015/20240529_154643551/F3T_2021_015_043-DICOM&auto-archive=false&overwrite=delete&triggerPipelines=false (status 500, accepted status: [200, 201])
Traceback (most recent call last):
File "/home/flitney_local/git/twix-xnat-relay/tasks.py", line 158, in process_dicom_folder
dicom_upload(session, project, subject, experiment, zipped_dicoms)
File "/home/flitney_local/git/twix-xnat-relay/tasks.py", line 140, in dicom_upload
prearchive_session.archive(overwrite="delete", trigger_pipelines=False)
File "/home/flitney_local/.conda/envs/xnat-relay/lib/python3.12/site-packages/xnat/prearchive.py", line 241, in archive
response = self.xnat_session.post('/data/services/archive', query=query)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/flitney_local/.conda/envs/xnat-relay/lib/python3.12/site-packages/xnat/session.py", line 448, in post
self._check_response(response, accepted_status=accepted_status, uri=uri)
File "/home/flitney_local/.conda/envs/xnat-relay/lib/python3.12/site-packages/xnat/session.py", line 326, in _check_response
raise exceptions.XNATResponseError(
xnat.exceptions.XNATResponseError: Invalid status for response from XNATSession for url https://test-xnat.win.ox.ac.uk/data/services/archive?src=/data/prearchive/projects/2021_015/20240529_154643551/F3T_2021_015_043-DICOM&auto-archive=false&overwrite=delete&triggerPipelines=false (status 500, accepted status: [200, 201])

Other than increasing our nginx timeout values (which has previously fixed timeout issues, and are set much higher than 10 minutes) I'm not sure where else this might be limited by.

I'm not sure if either of these are XNAT or XNATPy (or both) related, but any ideas or suggestions would be appreciated.

Thanks,
Duncan

John Flavin

unread,
Jun 2, 2024, 5:56:48 PMJun 2
to xnat_di...@googlegroups.com
Hi Duncan,

My general question on both issues is: did you look in the XNAT logs? Were any errors logged around the same time as the errors you received on your end? If you can't find anything in XNAT's logs that could (possibly) point to nginx shutting down the connection.

As to the specific issues:

I'm not sure about issue 2; I am not really familiar with the internal details of that archive API so I wouldn't know what to look for.

Issue 1 is confusing, but I can see some interesting details. If you look through the traceback you can see that it's coming from a heartbeat() method within xnatpy, which is just ensuring your session is still authenticated with XNAT by hitting the /data/JSESSION API to get a JSESSION token. I would expect this should be a very quick API call with minimal load on XNAT. So either that's wrong and this is causing more load than we intended, or maybe something else was causing a spike of load and this call got slowed down as a side consequence, or maybe all these guesses are wrong and something else was happening.

Logs would be helpful. I wouldn't recommend posting them unless you've thoroughly inspected them for any PHI or sensitive internal details, but you should try to look in them yourself and see if you find anything interesting.

John Flavin

--
You received this message because you are subscribed to the Google Groups "xnat_discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xnat_discussi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/xnat_discussion/708fd374-a9c4-4e1c-bbd4-87651d7c977en%40googlegroups.com.

Charlie Moore

unread,
Jun 3, 2024, 12:07:14 PMJun 3
to xnat_discussion
Hi Duncan,

For what it's worth, I've run into similar issues when scripting very long running sequential uploads. The solution I ended up with was defining status codes 502, 503, and 504 as "server issue" codes where the particular REST call would pause for a bit and then reissue the same call when running into it. My theory was that the load from the significant amount of ingest was temporarily overloading tomcat and/or the proxy and causing some intermittent noise.

Thanks,
Charlie Moore

Duncan Smith

unread,
Jun 5, 2024, 8:22:00 AMJun 5
to xnat_discussion
Hi both,

Thanks for the replies, it's been interesting to hear your thoughts on these issues. I forgot to mention in my post, but unfortunately we have not been able to find any related logs from XNAT, Tomcat, or NGINX.

Issue 2 - where we continue to hit a 10 minute limit to archive commands is our main problem at the moment. Whenever a session takes longer than 10 minutes to archive we get the XNATResponseError back, but the archiving always completes successfully regardless, and we don't see this when it takes less than 10 minutes. Ideally we wouldn't just assume an XNATResponseError is a successful archive, as this is going to be an automated system. 

We have tried a slightly different command and specified timeout values but it has not made a difference (although setting the timeout values to be less than 10 minutes does make a difference and gives the error sooner depending on the value provided):

query = {"src": prearchive_session.uri, "overwrite": "append", "auto-archive": "false"}
request = session.post('/data/services/archive', query=query, timeout=(1800, 1800))

We have settings such as these in our NGINX config:

proxy_connect_timeout 1800;
proxy_send_timeout 1800;
proxy_read_timeout 1800;

Is there any other setting that may affect this 10 minute archive timeout/error?

Thanks,
Duncan
Reply all
Reply to author
Forward
0 new messages