options lnet networks=o2ib0(ib0)
and these are the entries in my /etc/fstab
172.16.34.1@o2ib:/home /lustre/home lustre auto,_netdev 1 2
172.16.34.1@o2ib:/scratch /lustre/scratch lustre auto,_netdev 1 2
I have a similar setup with Lustre 1.6.7.2 client running on RHEL 4.5 and it loads fine there.
What am I missing?
Thanks,
Mike Robbert
_______________________________________________
Lustre-discuss mailing list
Lustre-...@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
Are there any messages in dmesg on boot? I've seen it on occasion where
the IB takes a second to actually start. If that's the case, you might
need to add mounts to rc.local, or try to get openibd to start earlier.
- Kit
--
---
Kit Westneat
kwes...@datadirectnet.com
812-484-8485
[root@compute-2-1 ~]# mount -a
mount.lustre: mount 172.16.34.1@o2ib:/home at /lustre/home failed: No such device
Are the lustre modules loaded?
Check /etc/modprobe.conf and /proc/filesystems
Note 'alias lustre llite' should be removed from modprobe.conf
mount.lustre: mount 172.16.34.1@o2ib:/scratch at /lustre/scratch failed: No such device
Are the lustre modules loaded?
Check /etc/modprobe.conf and /proc/filesystems
Note 'alias lustre llite' should be removed from modprobe.conf
Here are some dmesg entries from a boot that does not mount the FSs:
ADDRCONF(NETDEV_UP): eth0: link is not ready
bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
ADDRCONF(NETDEV_UP): ib0: link is not ready
ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
Lustre: OBD class driver, http://www.lustre.org/
Lustre: Lustre Version: 1.8.2
Lustre: Build Version: 1.8.2-20100122190848-PRISTINE-2.6.18-164.15.1.el5
ko2iblnd: disagrees about version of symbol ib_fmr_pool_unmap
ko2iblnd: Unknown symbol ib_fmr_pool_unmap
... Lots more ko2iblnd errors here (Is this part of the problem or a red herring? ...
ko2iblnd: disagrees about version of symbol ib_fmr_pool_map_phys
ko2iblnd: Unknown symbol ib_fmr_pool_map_phys
LustreError: 3288:0:(api-ni.c:1043:lnet_startup_lndnis()) Can't load LND o2ib, module ko2iblnd, rc=256
LustreError: 3288:0:(events.c:729:ptlrpc_init_portals()) network initialisation failed
LustreError: 165-2: Nothing registered for client mount! Is the 'lustre' module loaded?
LustreError: 3381:0:(obd_mount.c:2042:lustre_fill_super()) Unable to mount (-19)
Thanks,
Mike
Robert,
I'm not sure of the root cause of your mount problems, but we were also
hitting a timing problem when mounting file systems over Infiniband at
boot time. To avoid it, since the IB may still not be initialized when
rc.local runs, the solution I used was to add the following to the
"start)" section of /etc/rc.d/init.d/netfs. You could put something
similar in rc.local if you prefer.
# Spin until we find an "Active" IB device
if [ -d /sys/class/infiniband ]; then
tries=1
maxtries=10
delay=5
while [ $tries -le $maxtries ]; do
grep -q ACTIVE /sys/class/infiniband/*/ports/*/state 2>&1 &&
break
logger -s -t netfs "WARNING: No "ACTIVE" Infiniband ports
found: try $tries/$maxtries, sleep $delay"
sleep $delay
(( tries++ ))
[ $tries -gt $maxtries ] && logger -s -t "ERROR: No
"ACTIVE" Infiniband ports found."
done
fi
Hope this helps!
-Nathan
That's pretty odd, it looks like the o2ib module has a symbol mismatch
with the ofed driver. I'm surprised it works at all...can you send the
dmesg output after modprobe lustre + mounting, as well as the lctl
list_nids output?
Thanks,
Kit
Mike