[Lustre-discuss] Unable to activate OST

1,265 views
Skip to first unread message

Dusty Marks

unread,
Jan 14, 2010, 9:31:27 PM1/14/10
to lustre-...@lists.lustre.org
Greetings,

I'm trying to get Lustre 1.8.1.1 working, but have been running into nothing but trouble.

Long story short, i'm trying to mount the OST on the OSS, but i keep getting this error:

[root@oss ~]# mount -t lustre /dev/lustre/OST /lustre/OSS
mount.lustre: mount /dev/lustre/OST at /lustre/OSS failed: Input/output error
Is the MGS running?

What i don't understand is, the MGS is running as far as i know. I followed this guide http://manual.lustre.org/manual/LustreManual16_HTML/ConfiguringLustreExamples.html. After that didn't work, i tried this guide http://unixfoo.blogspot.com/2009/11/lustre-cluster-filesystem-quick-setup.html, and that didn't work either. I have the MDS and MGS on the same system.

In the below walk-through of what i've done from start to finish, it is clear that i mounted the MGS partition, which, if i understand lustre correctly, should start the MGS service, yet the OSS complains with an I/O error.

I've been struggling with this for some time, and am near giving up. Any help will be greatly appreciated.


Thanks,
Dusty

------------ here is what i typed in on the mds server (and its output). -------------
[root@mds ~]# nano /etc/modprobe.conf
[root@mds ~]# pvcreate /dev/hdb1
  Physical volume "/dev/hdb1" successfully created
[root@mds ~]# vgcreate lustre /dev/hdb1
  Volume group "lustre" successfully created
[root@mds ~]# lvcreate -L 19G -n MGS lustre
  Logical volume "MGS" created
[root@mds ~]# mkfs.lustre --mgs /dev/lustre/MGS

   Permanent disk data:
Target:     MGS
Index:      unassigned
Lustre FS:  lustre
Mount type: ldiskfs
Flags:      0x74
              (MGS needs_index first_time update )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters:

checking for existing Lustre data: not found
device size = 19456MB
2 6 18
formatting backing filesystem ldiskfs on /dev/lustre/MGS
    target name  MGS
    4k blocks     0
    options        -J size=400 -q -O dir_index,extents,uninit_groups -F
mkfs_cmd = mke2fs -j -b 4096 -L MGS  -J size=400 -q -O dir_index,extents,uninit_groups -F /dev/lustre/MGS
Writing CONFIGS/mountdata
[root@mds ~]# mount -t lustre /dev/lustre/MGS /lustre/MGS/

[root@mds ~]# lvcreate -L 18G -n MDT lustre
  Logical volume "MDT" created
[root@mds ~]# mkfs.lustre --fsname=datafs --mdt --reformat --mgsnode=192.168.0.2@tcp0 /dev/lustre/MDT

   Permanent disk data:
Target:     datafs-MDTffff
Index:      unassigned
Lustre FS:  datafs
Mount type: ldiskfs
Flags:      0x71
              (MDT needs_index first_time update )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters: mgsnode=192.168.0.2@tcp mdt.group_upcall=/usr/sbin/l_getgroups

device size = 18432MB
2 6 18
formatting backing filesystem ldiskfs on /dev/lustre/MDT
    target name  datafs-MDTffff
    4k blocks     0
    options        -J size=400 -i 4096 -I 512 -q -O dir_index,extents,uninit_groups -F
mkfs_cmd = mke2fs -j -b 4096 -L datafs-MDTffff  -J size=400 -i 4096 -I 512 -q -O dir_index,extents,uninit_groups -F /dev/lustre/MDT
Writing CONFIGS/mountdata
[root@mds ~]# mkdir /lustre/MDT
[root@mds ~]# mount -t lustre /dev/lustre/MDT /lustre/MDT


------------ Here is exactly what i typed in on the OSS server, as well as what was output. -------------------
[root@oss ~]# pvcreate /dev/hdc1
  Physical volume "/dev/hdc1" successfully created
[root@oss ~]# vgcreate lustre /dev/hdc1
  /dev/hdb: open failed: No medium found
  Volume group "lustre" successfully created
[root@oss ~]# vgs
  VG         #PV #LV #SN Attr   VSize  VFree
  VolGroup00   1   2   0 wz--n- 74.41G     0
  lustre       1   0   0 wz--n- 37.27G 37.27G
[root@oss ~]# lvcreate -n OST -L 37GB lustre
  Logical volume "OST" created
[root@oss ~]# mkfs.lustre --fsname=datafs --ost --mgsnode=192.168.0.2@tcp0 /dev/lustre/OST

   Permanent disk data:
Target:     datafs-OSTffff
Index:      unassigned
Lustre FS:  datafs
Mount type: ldiskfs
Flags:      0x72
              (OST needs_index first_time update )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=192.168.0.2@tcp

checking for existing Lustre data: not found
device size = 37888MB
2 6 18
formatting backing filesystem ldiskfs on /dev/lustre/OST
    target name  datafs-OSTffff
    4k blocks     0
    options        -J size=400 -i 16384 -I 256 -q -O dir_index,extents,uninit_groups -F
mkfs_cmd = mke2fs -j -b 4096 -L datafs-OSTffff  -J size=400 -i 16384 -I 256 -q -O dir_index,extents,uninit_groups -F /dev/lustre/OST
Writing CONFIGS/mountdata
[root@oss ~]# mkdir -p /lustre/OSS
[root@oss ~]# mount -t lustre /dev/lustre/OST /lustre/OSS
mount.lustre: mount /dev/lustre/OST at /lustre/OSS failed: Input/output error
Is the MGS running?


--
The graduate with a Science degree asks, "Why does it work?" The graduate with an Engineering degree asks, "How does it work?" The graduate with an Accounting degree asks, "How much will it cost?" The graduate with an Arts degree asks, "Do you want fries with that?"

Andreas Dilger

unread,
Jan 14, 2010, 11:27:24 PM1/14/10
to Dusty Marks, lustre-...@lists.lustre.org
On 2010-01-14, at 21:31, Dusty Marks wrote:
> [root@oss ~]# mkfs.lustre --fsname=datafs --ost --
> mgsnode=192.168.0.2@tcp0 [root@oss ~]# mount -t lustre /dev/lustre/
> OST /lustre/OSS
> mount.lustre: mount /dev/lustre/OST at /lustre/OSS failed: Input/
> output error
> Is the MGS running?


There is probably an error in /var/log/messages and/or "dmesg" that
will tell you what is going wrong.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

_______________________________________________
Lustre-discuss mailing list
Lustre-...@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Andreas Dilger

unread,
Jan 15, 2010, 12:03:15 AM1/15/10
to Dusty Marks, lustre-discuss@lists.lustre.org discuss
On 2010-01-14, at 23:51, Dusty Marks wrote:
> You are correct, there is information in messages. Following are the
> entries related the lustre. The line that says 192.168.0.2@tcp is
> unreachable makes sense, but what exactly is the problem? I entered
> the line "options lnet networks=tcp" in modprobe.conf on the oss and
> mds. The only difference was, i entered that line AFTER i setup
> lustre on the OSS. Could that be the problem? I don't see why that
> would be the problem, as the oss is trying to reach the MDS/MGS,
> which is 192.168.0.2.
>
> --------------------------------------- /var/log/messages
> -----------------------------------------------------------
> Jan 14 22:41:07 oss kernel: Lustre: 2846:0:(linux-tcpip.c:
> 688:libcfs_sock_connect()) Error -113 connecting 0.0.0.0/1023 ->
> 192.168.0.2/988
> Jan 14 22:41:07 oss kernel: Lustre: 2846:0:(acceptor.c:
> 95:lnet_connect_console_error()) Connection to 192.168.0.2@tcp at
> host 192.168.0.2 was unreachable: the network or that node may be
> down, or Lustre may be misconfigured.


Please read the chapter in the manual about network configuration. I
suspect the .0.2 network is not your eth0 network interface, and your
modprobe.conf needs to be fixed.

Reply all
Reply to author
Forward
0 new messages