installation issues and workarounds, Ubuntu 16.18 CUDA 8.0

181 views
Skip to first unread message

Rob Miller

unread,
Apr 18, 2017, 6:16:36 AM4/18/17
to torch7
I upgraded my dev box from Ubuntu 15.10 to 16.10 and have had a variety of issues getting back to a functional installation.  I list my solutions here in the hopes that they may help others, as my searches did not reveal them.

(1) 
[ 60%] Generating random.c
/home/rob/torch/install/bin/luajit: /home/rob/torch/pkg/torch/random.lua:3: module 'torchcwrap' not found:

in this case, torchcwrap.lua is in the directory torch/pkg/torch, and the subsequent messages list all the places it looked for it -- but these do not include the working directory.  I have a LUA_PATH variable defined in my login scripts referencing my own lua library, and this may have been the cause of the problem.  I added ';./?.lua;' to my LUA_PATH variable and could complete this part of the build.

(2) 
optim 1.0.5-0 is now built and installed in /home/rob/torch/install/ (license: BSD)
...
cmake -E make_directory build && cd build && cmake .. -DLUALIB= -DCMAKE_CXX_FLAGS=${CMAKE_CXX_FLAGS} -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/home/rob/torch/install/bin/.." -DCMAKE_INSTALL_PREFIX="/home/rob/torch/install/lib/luarocks/rocks/cutorch/scm-1" && make -j$jopts install
...
CMake Error at /usr/share/cmake-3.5/Modules/FindPackageHandleStandardArgs.cmake:148 (message):
  Could NOT find CUDA (missing: CUDA_CUDART_LIBRARY) (found suitable version
  "8.0", minimum required is "6.5")
Call Stack (most recent call first):
  /usr/share/cmake-3.5/Modules/FindPackageHandleStandardArgs.cmake:388 (_FPHSA_FAILURE_MESSAGE)
  /home/rob/torch/install/share/cmake/torch/FindCUDA.cmake:1009 (find_package_handle_standard_args)
  CMakeLists.txt:7 (FIND_PACKAGE)

[allow me to digress and note that 'Could NOT find CUDA (missing: CUDA_CUDART_LIBRARY) (found suitable version  "8.0", minimum required is "6.5")' is a remarkably unhelpful error message...]

at this point I had removed all Nvidia code / drivers / etc provided by the Ubuntu apt install servers and used the network install .deb from https://developer.nvidia.com/cuda-downloads .  I could successfully run nvidia-smi and the Nvidia samples deviceQuery and bandwidthTest (see the post-installation instructions at developer.nvidia.com).  I could pull a fresh torch from github and still stop with this error.

From inspecting torch/install/share/cmake/torch/FindCUDA.cmake:1009 (find_package_handle_standard_args), I worked out that adding -DCUDA_TOOLKIT_ROOT_DIR="/usr/local/cuda" to the specific failing cmake step was useful, i.e.:

cmake .. -DLUALIB= -DCMAKE_CXX_FLAGS=${CMAKE_CXX_FLAGS} -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/home/rob/torch/install/bin/.." -DCMAKE_INSTALL_PREFIX="/home/rob/torch/install/lib/luarocks/rocks/cutorch/scm-1" -DCUDA_TOOLKIT_ROOT_DIR="/usr/local/cuda"

(your working location will be different, please don't just cut and paste)

after this I could run install.sh again and get cutorch built.

I had to do the same for cudnn as well.  It may be that https://github.com/hughperkins/FindCUDA would have helped me here, but I did not find it until now.

With cudnn I still had the issue that (apparently) the Makefile was created in extra/cudnn and the call to make is in ./build, so I get 

-- Generating done
-- Build files have been written to: /home/rob/torch/extra/cudnn
make: *** No targets specified and no makefile found. Stop.

Error: Build error: Failed building.


Do you want to automatically prepend the Torch install location
to PATH and LD_LIBRARY_PATH in your /home/rob/.bashrc? (yes/no)
[yes] >>> 

I could do the 'make' in extra/cudnn but 'make install' hit some write permission errors so I gave up and used sudo (which I have not needed before).

I seem to have torch7 installed now, and can run 'th -lcutorch -e "cutorch.test()"' and 'th -lcunn -e "nn.testcuda()"' with only the expected memory/resource failures (I have a 1 gig GeForce GTX 460 GPU).

I hope that helps someone.  I submitted the LUA_PATH issue to github but not the others (FindCUDA and cudnn makefile / make install issues).

rob.


Reply all
Reply to author
Forward
0 new messages