I check the script and find that /opt/gridengine/default doesn't
exist. Its like the node installed the sge rpm but didnt do any
configuration as it does on the regular compute nodes.
In /export/rocks/install/site-profiles/5.3/graphs/default/gpu-appliance.xml
I have:
<?xml version="1.0" standalone="no"?>
<graph>
<description>
</description>
<changelog>
</changelog>
<edge from="gpu-compute">
<to>compute</to>
</edge>
<order gen="kgen" head="TAIL">
<tail>gpu-compute</tail>
</order>
</graph>
And this is my gpu-compute.xml node file
<?xml version="1.0" standalone="no"?>
<kickstart>
<description>
GPU compute nodes.
</description>
<changelog>
</changelog>
<package>libXi-devel</package>
<package>kernel-devel</package>
<package>kernel-headers</package>
<post>
<file name="/etc/motd" mode="append">
GPU Compute Node
</file>
echo "Downloading CUDA Driver and Toolkit"
wget --quiet
http://127.0.0.1/install/devdriver_4.0_linux_64_270.41.19.run
wget --quiet
http://127.0.0.1/install/cudatoolkit_4.0.17_linux_64_rhel5.5.run
echo "Installing Driver"
sh /devdriver_4.0_linux_64_270.41.19.run -s -n
--no-runlevel-check --kernel-name=2.6.18-194.3.1.el5_lustre.1.8.4
echo "Installing Toolkit"
sh /cudatoolkit_4.0.17_linux_64_rhel5.5.run
echo "CUDA Installed"
<file name="/etc/ld.so.conf" mode="append">
/usr/local/cuda/lib64
/usr/local/cuda/lib
</file>
<file name="/etc/profile.d/cuda.sh" mode="create" perms="0755">
#!/bin/bash
export PATH=$PATH:/usr/local/cuda/bin
</file>
ldconfig
/share/apps/nvidia/files/deviceQuery -h > /tmp/nvidia.status
<file name="/etc/rc.local" mode="append">
</file>
sh /etc/rc.local
</post>
</kickstart>
--
David Noriega
System Administrator
Computational Biology Initiative
High Performance Computing Center
University of Texas at San Antonio
One UTSA Circle
San Antonio, TX 78249
Office: BSE 3.112
Phone: 210-458-7100
http://www.cbi.utsa.edu
What is the output of:
# rocks list membership
- gb
--
On the frontend, try:
# rocks set appliance attr gpu-compute exec_host true
# rocks set appliance attr gpu-compute sge true
Then reinstall a gpu-compute node.
- gb
So just try this on the frontend:
# rocks set appliance attr gpu-compute sge true
Then reinstall the gpu-compute node.
--
Engine
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20120225/58645546/attachment.html