[Rocks-Discuss] Created gpu appliance does not join sge queue

76 views
Skip to first unread message

David Noriega

unread,
Feb 24, 2012, 2:54:07 PM2/24/12
to Discussion of Rocks Clusters
I've created a new gpu compute type appliance for new nodes that have
gpus in them. Everything works great, but they dont add themselves to
the queue. I check them and /etc/init.d/sgeexecd.xxxx isnt there. I
copy over the script from another node, but it doesnt run. It says the
following:
can't determine path to Grid Engine binaries

I check the script and find that /opt/gridengine/default doesn't
exist. Its like the node installed the sge rpm but didnt do any
configuration as it does on the regular compute nodes.

In /export/rocks/install/site-profiles/5.3/graphs/default/gpu-appliance.xml
I have:
<?xml version="1.0" standalone="no"?>

<graph>

<description>
</description>

<changelog>
</changelog>

<edge from="gpu-compute">
<to>compute</to>
</edge>

<order gen="kgen" head="TAIL">
<tail>gpu-compute</tail>
</order>

</graph>

And this is my gpu-compute.xml node file
<?xml version="1.0" standalone="no"?>

<kickstart>

<description>

GPU compute nodes.

</description>


<changelog>
</changelog>

<package>libXi-devel</package>
<package>kernel-devel</package>
<package>kernel-headers</package>
<post>

<file name="/etc/motd" mode="append">
GPU Compute Node
</file>
echo "Downloading CUDA Driver and Toolkit"
wget --quiet
http://127.0.0.1/install/devdriver_4.0_linux_64_270.41.19.run
wget --quiet
http://127.0.0.1/install/cudatoolkit_4.0.17_linux_64_rhel5.5.run

echo "Installing Driver"
sh /devdriver_4.0_linux_64_270.41.19.run -s -n
--no-runlevel-check --kernel-name=2.6.18-194.3.1.el5_lustre.1.8.4
echo "Installing Toolkit"
sh /cudatoolkit_4.0.17_linux_64_rhel5.5.run
echo "CUDA Installed"

<file name="/etc/ld.so.conf" mode="append">
/usr/local/cuda/lib64
/usr/local/cuda/lib
</file>

<file name="/etc/profile.d/cuda.sh" mode="create" perms="0755">
#!/bin/bash
export PATH=$PATH:/usr/local/cuda/bin
</file>

ldconfig
/share/apps/nvidia/files/deviceQuery -h > /tmp/nvidia.status

<file name="/etc/rc.local" mode="append">

</file>

sh /etc/rc.local

</post>

</kickstart>

--
David Noriega
System Administrator
Computational Biology Initiative
High Performance Computing Center
University of Texas at San Antonio
One UTSA Circle
San Antonio, TX 78249
Office: BSE 3.112
Phone: 210-458-7100
http://www.cbi.utsa.edu

Greg Bruno

unread,
Feb 24, 2012, 5:29:37 PM2/24/12
to Discussion of Rocks Clusters
On Fri, Feb 24, 2012 at 11:54 AM, David Noriega <tsk...@my.utsa.edu> wrote:
> I've created a new gpu compute type appliance for new nodes that have
> gpus in them. Everything works great, but they dont add themselves to
> the queue. I check them and /etc/init.d/sgeexecd.xxxx isnt there. I
> copy over the script from another node, but it doesnt run. It says the
> following:
> can't determine path to Grid Engine binaries
>
> I check the script and find that /opt/gridengine/default doesn't
> exist. Its like the node installed the sge rpm but didnt do any
> configuration as it does on the regular compute nodes.

What is the output of:

# rocks list membership

- gb

David Noriega

unread,
Feb 24, 2012, 5:48:27 PM2/24/12
to Discussion of Rocks Clusters
# rocks list membership
MEMBERSHIP APPLIANCE
Frontend: frontend
Compute: compute
NAS Appliance: nas
Ethernet Switch: network
Power Distribution Unit: power
IPMI: ipmi
GPU Compute: gpu-compute

--

Greg Bruno

unread,
Feb 24, 2012, 6:29:20 PM2/24/12
to Discussion of Rocks Clusters
On Fri, Feb 24, 2012 at 2:48 PM, David Noriega <tsk...@my.utsa.edu> wrote:
> # rocks list membership
> MEMBERSHIP               APPLIANCE
> Frontend:                frontend
> Compute:                 compute
> NAS Appliance:           nas
> Ethernet Switch:         network
> Power Distribution Unit: power
> IPMI:                    ipmi
> GPU Compute:             gpu-compute

On the frontend, try:

# rocks set appliance attr gpu-compute exec_host true
# rocks set appliance attr gpu-compute sge true

Then reinstall a gpu-compute node.

- gb

Aijun Wang

unread,
Feb 25, 2012, 9:06:47 AM2/25/12
to Discussion of Rocks Clusters
We know your rocks version is 5.3 by your question,and I think the output
of 'rocks list appliance attr gpu-compute' maybe:
*
*
*APPLIANCE ATTR VALUE*
*gpu-compute: managed true*

So just try this on the frontend:


# rocks set appliance attr gpu-compute sge true

Then reinstall the gpu-compute node.

--
Engine
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20120225/58645546/attachment.html

David Noriega

unread,
Feb 27, 2012, 12:23:55 PM2/27/12
to Discussion of Rocks Clusters
Thanks that did it. On this topic, is there a way to control which
queue it gets added to?
Reply all
Reply to author
Forward
0 new messages