Container launch from XNAT vs manual, GPU not found

63 views
Skip to first unread message

Jens Petersen

unread,
Apr 27, 2020, 7:36:14 AM4/27/20
to xnat_discussion
Hey everyone,

I'm running an instance of XNAT and the container service plugin on a server, where we do some processing, including applying some DL models on GPU. This has been working well up until a few weeks ago, but now the GPU is no longer recognized within the launched containers. I couldn't see any updates that looked relevant (I'm not maintaining the server myself) and I currently have no idea what the problem is. The reason I'm posting here is that I can launch those containers manually (as user xnat) and everything works fine, so I figured there must be a difference in how the container service launches containers vs. manual launching. The manually launched containers and the ones from XNAT look virtually identical when I use docker inspect, and both use the nvidia runtime (v2, set as default).
Do you have any idea what the issue could be here? Any help is much appreciated!

Best,
Jens

Kelsey

unread,
Apr 27, 2020, 6:09:25 PM4/27/20
to xnat_discussion
Hey Jens,
A couple of things come to mind regarding containers that stop working.  

Do any of your container definitions used the "latest" tag and were recently rebuilt?  e.g. 

>FROM postgres:latest

This can result in your images changing (on a rebuild) without you explicitly changing them.

The other thing that tends to cause problems with GPU containers is when the Docker server or GPU drivers are updated.  Can you determine if either of these has happened recently?


Otherwise, take a look at the containers.log file after attempting a launch.  There might be some clues in there.  Feel free to share it here or privately (as long as it is free of PHI).

Best,
Matt
Reply all
Reply to author
Forward
0 new messages