How to deploy Caffe on a GPGPU cluster for loading via the environment modules system

20 views
Skip to first unread message

Dennis Mungai

unread,
Dec 10, 2016, 11:45:48 AM12/10/16
to Caffe Users
Hello guys,

Some of us who deploy Caffe for use on multi-user GPU clusters may need to install it in a custom location, then load it via the environment-modules system.

This attached gist is a tutorial on how I accomplished the task above, with the following notes:

1. I built Nvidia' s branch of Caffe.
2. This version is also built with Nvidia's Nickel library (NCCL) which speeds up inter and intra-GPU communication between nodes on a cluster, or multiple cards on the same server board.
3. A custom installation location is used, as is often the case with deploying pipelines to multi-user clusters.
4. The application is loaded on runtime to the user's environment via the environment modules system. See this for more information on it.

Gist URL (on Github): https://gist.github.com/Brainiarc7/410626c4cdd6a68770c8b7134c4bdeaf

As tested on Ubuntu Linux 16.04LTS, with the latest Nvidia Driver installed and CUDA 8.0.

Regards,

Dennis Mungai.
Reply all
Reply to author
Forward
0 new messages