I'm installing a Rocks 5.2 x86_64 cluster using identical ASUS P5Q SE/R
system boards using Intel Core 2 Quad CPU Q9550 @ 2.83GHz and 16GB RAM.
This system board has an on board Atheros (Attansic) 8121 PCI express
gigabit ethernet controller which is not supported out-of-the-box by
Rocks 5.2.
A RealTek 8169 card has been added to the head node for it's connection
to the public network.
The problem I'm having is that the compute node fails to PXE boot
because for some reason it doesn't have the driver for the Atheros 8121
in it's /modules/modules.cgz.
i.e. I get information and errors of the following on the Alt-F3 console
of the compute node:
INFO : modules to insert libata pata_marvell ata_piix
INFO : loaded libata from /modules/modules.cgz
INFO : loaded pata_marvell from /modules/modules.cgz
INFO : loaded ata_piix from /modules/modules.cgz
INFO : inserted /tmp/libata.ko
INFO : inserted /tmp/
INFO : inserted /tmp/
INFO : load module set done
And then a bit later the installation goes interactive because there is
no network device to choose. Error is:
ERROR : ROCKS:rocksNetworkUp:no network devices in choose
network device!
I have installed a driver for the Atheros 8121 card on the head node and
followed the instructions at "4.9. Adding a Device Driver"
http://www.rocksclusters.org/roll-documentation/base/5.2/customization-d
river.html to add the Atheros 8121 driver to the PXE boot image for the
compute nodes but for some reason this has not worked.
I have complete logs of step 8 (Build the rocks-boot package) and step
10 (Rebuild the distro) from
http://www.rocksclusters.org/roll-documentation/base/5.2/customization-d
river.html if required.
I did have to download the 3.7GB rocks 5.2 source code on a Windows
machine so possibly there are file permission problems that need
correcting.
I hope someone can assist as I'm running out of time.
Detailed information on the configuration and installation process (note
that I am obliged to obscure network addresses and host names):
PCI id for the Atheros 8121 ethernet controller is:
# lspci -nn -v -s 02:00.0
02:00.0 0200: 1969:1026 (rev b0)
Subsystem: 1043:8304
Flags: bus master, fast devsel, latency 0, IRQ 66
Memory at fe9c0000 (64-bit, non-prefetchable)
[size=256K]
I/O ports at cc00 [size=128]
Capabilities: [40] Power Management version 2
Capabilities: [48] Message Signalled Interrupts: 64bit+
Queue=0/0 Enable+
Capabilities: [58] Express Endpoint IRQ 0
I've found the source code for this ethernet controller from the vendor
(Atheros): "AR81Family-linux-v1.0.0.10.tar.gz"
This is found on the following web site:
http://partner.atheros.com/Drivers.aspx
I've installed Rocks 5.2 from the x86_64 DVD and selected the following
rolls:
base
ganglia
hpc
java
kernel
os
sge
Now the initial configuration of Rocks was done incorrectly because I
only had eth0 available (the RealTek card) and configured it as the
public interface.
Yes this is the card we want to use for the public interface but not as
eth0, it should be eth1 when it's the public interface so I have had to
do much reconfiguration of both the Linux O.S. and rocks after
installation of the driver for the Atheros 8121 card on the head node.
After installation of Rocks 5.2 I added the
"AR81Family-linux-v1.0.0.10.tar.gz" driver to the head node (cd src;
make install) and I can now see both network interfaces on the head
node.
I've then had to carefully reconfigure:
/etc/sysconfig/networking/devices/ifcfg-eth0
/etc/sysconfig/networking/devices/ifcfg-eth1
Correcting the HWADDR lines and static IP address configuration,
removing whitespace at the end of each line and removing the blank line.
I then ensured that the hard links for the other 2 instances of these
files were correct:
/etc/sysconfig/networking/profiles/default/ifcfg-eth[01]
/etc/sysconfig/network-scripts/ifcfg-eth[01]
i.e.:
# cat /etc/sysconfig/networking/devices/ifcfg-eth0
DEVICE=eth0
HWADDR=01:02:03:04:05:06
IPADDR=192.168.100.1
NETMASK=255.255.255.0
BOOTPROTO=static
ONBOOT=yes
MTU=1500
# cat /etc/sysconfig/networking/devices/ifcfg-eth1
DEVICE=eth1
HWADDR=11:12:13:14:15:16
IPADDR=123.123.0.99
NETMASK=255.255.252.0
BOOTPROTO=static
ONBOOT=yes
MTU=1500
# ls -li /etc/sysconfig/networking/devices/ifcfg-eth[01]
/etc/sysconfig/networking/profiles/default/ifcfg-eth[01]
/etc/sysconfig/network-scripts/ifcfg-eth[01] | sort
325449 -rw-r--r-- 3 root root 118 Sep 22 14:40
/etc/sysconfig/networking/devices/ifcfg-eth1
325449 -rw-r--r-- 3 root root 118 Sep 22 14:40
/etc/sysconfig/networking/profiles/default/ifcfg-eth1
325449 -rw-r--r-- 3 root root 118 Sep 22 14:40
/etc/sysconfig/network-scripts/ifcfg-eth1
325450 -rw-r--r-- 3 root root 117 Sep 22 14:40
/etc/sysconfig/networking/devices/ifcfg-eth0
325450 -rw-r--r-- 3 root root 117 Sep 22 14:40
/etc/sysconfig/networking/profiles/default/ifcfg-eth0
325450 -rw-r--r-- 3 root root 117 Sep 22 14:40
/etc/sysconfig/network-scripts/ifcfg-eth0
I've also ensured that /etc/modprobe.conf is correct:
# cat /etc/modprobe.conf
alias scsi_hostadapter pata_marvell
alias scsi_hostadapter1 ata_piix
alias eth0 atl1e
alias eth1 r8169
I've then had to carefully reconfigure Rocks itself for the correct
networking:
# rocks list network
NETWORK SUBNET NETMASK MTU
private: 192.168.100.0 255.255.255.0 1500
public: 123.123.0.0 255.255.252.0 1500
# rocks list host interface
SUBNET IFACE MAC IP NETMASK
MODULE NAME VLANID
private eth0 01:02:03:04:05:06 192.168.100.1 255.255.255.0
atl1e bigcluster ------
public eth1 11:12:13:14:15:16 123.123.0.99 255.255.252.0
r8169 bigcluster.dsto.defence.gov.au ------
The major task was to ensure that the rocks database was correct:
# rocks list host attr
HOST ATTR VALUE
SOURCE
bigcluster: HttpConf
/etc/httpd/conf O
bigcluster: HttpConfigDirExt
/etc/httpd/conf.d O
bigcluster: HttpRoot /var/www/html
O
bigcluster: Info_CertificateCountry AU
G
bigcluster: Info_CertificateLocality Locality
G
bigcluster: Info_CertificateOrganization DSTO
G
bigcluster: Info_CertificateState State
G
bigcluster: Info_ClusterContact
matthe...@dsto.defence.gov.au G
bigcluster: Info_ClusterName Big
G
bigcluster: Kickstart_DistroDir /export/rocks
G
bigcluster: Kickstart_Keyboard us
G
bigcluster: Kickstart_Lang en_US
G
bigcluster: Kickstart_Langsupport en_US
G
bigcluster: Kickstart_Multicast
238.102.163.221 G
bigcluster: Kickstart_PrivateAddress 192.168.100.1
H
bigcluster: Kickstart_PrivateBroadcast
192.168.100.255 H
bigcluster: Kickstart_PrivateDNSDomain local
G
bigcluster: Kickstart_PrivateDNSServers 192.168.100.1
H
bigcluster: Kickstart_PrivateGateway 192.168.100.1
H
bigcluster: Kickstart_PrivateHostname bigcluster
G
bigcluster: Kickstart_PrivateKickstartBasedir install
G
bigcluster: Kickstart_PrivateKickstartCGI
sbin/kickstart.cgi G
bigcluster: Kickstart_PrivateKickstartHost 192.168.100.1
H
bigcluster: Kickstart_PrivateNTPHost 192.168.100.1
H
bigcluster: Kickstart_PrivateNetmask 255.255.255.0
H
bigcluster: Kickstart_PrivateNetmaskCIDR 24
H
bigcluster: Kickstart_PrivateNetwork 192.168.100.0
H
bigcluster: Kickstart_PrivatePortableRootPassword #### OBSCURED
#### G
bigcluster: Kickstart_PrivateRootPassword #### OBSCURED
#### G
bigcluster: Kickstart_PrivateSHARootPassword #### OBSCURED
#### G
bigcluster: Kickstart_PrivateSyslogHost 192.168.100.1
H
bigcluster: Kickstart_PublicAddress 123.123.0.99
H
bigcluster: Kickstart_PublicBroadcast 123.123.3.255
H
bigcluster: Kickstart_PublicDNSDomain
dsto.defence.gov.au G
bigcluster: Kickstart_PublicDNSServers
123.123.0.32,123.123.4.56 G
bigcluster: Kickstart_PublicGateway 123.123.0.1
G
bigcluster: Kickstart_PublicHostname
bigcluster.dsto.defence.gov.au G
bigcluster: Kickstart_PublicKickstartHost
central.rocksclusters.org G
bigcluster: Kickstart_PublicNTPHost
the-time-server.dsto.defence.gov.au G
bigcluster: Kickstart_PublicNetmask 255.255.252.0
H
bigcluster: Kickstart_PublicNetmaskCIDR 22
H
bigcluster: Kickstart_PublicNetwork 123.123.0.0
H
bigcluster: Kickstart_Timezone Australia/City
G
bigcluster: RootDir /root
O
bigcluster: Server_Partitioning
force-default-root-disk-only G
bigcluster: hostname bigcluster
I
bigcluster: managed false
A
bigcluster: os linux
H
bigcluster: rack 0
I
bigcluster: rank 0
I
bigcluster: rocks_version 5.2
G
IMPORTANT: This email remains the property of the Australian Defence Organisation and is subject to the jurisdiction of section 70 of the CRIMES ACT 1914. If you have received this email in error, you are requested to contact the sender and delete the email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20090923/ea2bb2be/attachment.html
what is the name of the atheros driver module?
once you determine that, you could try adding the kernel boot
parameter to the installing kernel:
driverload=<atheros driver name>
see section "7.4.1. Adding Kernel Boot Parameters to the Installation
Kernel" in:
http://www.rocksclusters.org/roll-documentation/base/5.2/bootflags.html
and if that doesn't work and if you really are in a panic, rub a
little money on the problem -- go buy intel e1000 cards for each of
your nodes.
- gb
look for modules.cgz in the expanded initrd.img. that is, don't look
for modules.cgz in dd.img.
- gb