Having trouble exporting a lustre mount with NFS..
When mounting the NFS export on client:
[root@nfsclient /]# mount 192.168.100.77:/mnt/lustre_mail_fs
/mnt/lustre (have tried nearly every possible switch that I can think
of, don't think it's related to "how" I attempt to mount)
mount: 192.168.100.77:/mnt/lustre_mail_fs failed, reason given by
server: Permission denied
Server shows log:
Jul 14 10:24:28 lustreclient mountd[9868]: authenticated mount request
from 192.168.100.2:720 for /mnt/lustre_mail_fs (/mnt/lustre_mail_fs)
Jul 14 10:24:28 lustreclient mountd[9868]: can't stat exported dir
/mnt/lustre_mail_fs: Permission denied
Everything from a NFS service/config perspective appears to be
functional, and I can indeed export a local filesystem without errors...
Additionally, I have no trouble using the mounted lustre filesystem, I
can even rsync data from the nfs client to the lustre file system on the
nfs server..
Any clues? Your knowledge and experience would be greatly appreciated!
See below for config specifics..
Thanks!
Billy Olson
Server/Client Specifics:
Lustre Client/NFS Server: 192.168.100.77
CentOS release 5.4 (Final)
lustre-client-modules-1.8.3-2.6.18_164.11.1.el5_lustre.1.8.3
lustre-client-1.8.3-2.6.18_164.11.1.el5_lustre.1.8.3
Kernel: 2.6.18-164.11.1.el5
/etc/exports:
/mnt/lustre_mail_fs 192.168.100.0/24(ro,insecure)
I've tried various other options all the same outcome, these seemed like
the best to test with though..
[root@lustreclient ]# rpcinfo -p
program vers proto port
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
100024 1 udp 645 status
100024 1 tcp 648 status
100011 1 udp 702 rquotad
100011 2 udp 702 rquotad
100011 1 tcp 705 rquotad
100011 2 tcp 705 rquotad
100003 2 udp 2049 nfs
100003 3 udp 2049 nfs
100003 4 udp 2049 nfs
100021 1 udp 37527 nlockmgr
100021 3 udp 37527 nlockmgr
100021 4 udp 37527 nlockmgr
100003 2 tcp 2049 nfs
100003 3 tcp 2049 nfs
100003 4 tcp 2049 nfs
100021 1 tcp 52555 nlockmgr
100021 3 tcp 52555 nlockmgr
100021 4 tcp 52555 nlockmgr
100005 1 udp 716 mountd
100005 1 tcp 719 mountd
100005 2 udp 716 mountd
100005 2 tcp 719 mountd
100005 3 udp 716 mountd
100005 3 tcp 719 mountd
[root@lustreclient ]# exportfs -v -r
exporting 192.168.100.0/24:/mnt/lustre_mail_fs
NFS Client: 192.168.100.2
_______________________________________________
Lustre-discuss mailing list
Lustre-...@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
I can't comment, since my NFS re-exporting (to a MacOS client) is working fine. Hopefully soon I can stop doing that and use the native Mac client, but I don't think my wife is willing to use alpha software yet.
Have you restarted the NFS server after changing the exports?
Are you sure the permissions are correct?
Have you verified that exporting a non-lustre filesystem from this server works?
My /etc/exports is:
/myth 192.168.10.160(rw,async,insecure)
Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.
Thanks for Responding at least Andreas!
I'm convinced that this is more of a lustre centric issue than an NFS
issue, must be some interaction either on the kernel level or otherwise
that's running into a permissions block somewhere..
# lctl clear
# (mount NFS client)
# lctl dk /tmp/debug
Then search through the logs for -2 errors (-EPERM).
Cheers, Andreas
Dumb question, but have you checked the permissions on the NFS server's
Lustre mount point (before/after Lustre is mounted), and exported a
non-Lustre directory successfully?
Kevin
Andreas Dilger wrote:
> My only other suggestion is to dump the Lustre kernel debug log on the NFS server after a mount failure to see where/why it is getting the permission error.
>
> # lctl clear
> # (mount NFS client)
> # lctl dk /tmp/debug
>
> Then search through the logs for -2 errors (-EPERM).
>
> Cheers, Andreas
>
> On 2010-07-16, at 10:06, William Olson <lustre...@reachone.com> wrote:
>
>
>> On 7/15/2010 5:48 PM, Andreas Dilger wrote:
>>
>>> On 2010-07-15, at 08:33, William Olson wrote:
>>>
>>>
>>>> Somebody, anybody? I'm sure it's something fairly simple, but it
>>>> escapes me, assistance would be greatly appreciated!
>>>>
_______________________________________________
This is covered earlier in the thread.
> Andreas Dilger wrote:
>> My only other suggestion is to dump the Lustre kernel debug log on the NFS server after a mount failure to see where/why it is getting the permission error.
>>
>> # lctl clear
>> # (mount NFS client)
>> # lctl dk /tmp/debug
>>
>> Then search through the logs for -2 errors (-EPERM).
Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.
_______________________________________________
Hmm, you must have a low debug level. Please try enabling full debug during the mount:
# lctl clear
# DBGSAVE=$(lctl get_param -n debug)
# lctl set_param debug=-1
# mount ...
# lctl dk /tmp/debug
# lctl set_param debug="$DBGSAVE"
No -43 errors, and I have group_upcall turned off anyhow( I think that's
what the -43 corresponds too.. )
I'm not having any issues with permissions when using the lustre mount
locally, or when rsyncing data from another client to the server hosting
the lustre fs.
> Dumb question, but have you checked the permissions on the NFS
> server's Lustre mount point (before/after Lustre is mounted), and
> exported a non-Lustre directory successfully?
>
Lustre mounted:
drwxrwxrwx 29 root root 4.0K Jul 12 17:03 lustre_mail_fs
Lustre not mounted:
drwxrwxrwx 2 root root 4.0K Jun 10 13:26 lustre_mail_fs
I have no trouble exporting a local fs..
Thanks Again!
You could try "strace -f" on the mount process, to see which syscall is failing. It may be failing with something before it gets to Lustre.
Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.
_______________________________________________
[root@lustreclient mnt]# strace -f -p 15964
Process 15964 attached - interrupt to quit
select(1024, [3 4 5 6 7], NULL, NULL, NULL) = 1 (in [6])
recvmsg(6, {msg_name(16)={sa_family=AF_INET, sin_port=htons(42303),
sin_addr=inet_addr("192.168.100.2")},
msg_iov(1)=[{"O^\240\350\0\0\0\0\0\0\0\2\0\1\206\245\0\0\0\3\0\0\0\0\0\0\0\0\0\0\0\0"...,
8800}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=,
...}, msg_flags=0}, 0) = 40
stat("/etc/hosts.allow", {st_mode=S_IFREG|0644, st_size=189, ...}) = 0
stat("/etc/hosts.deny", {st_mode=S_IFREG|0644, st_size=347, ...}) = 0
sendmsg(6, {msg_name(16)={sa_family=AF_INET, sin_port=htons(42303),
sin_addr=inet_addr("192.168.100.2")},
msg_iov(1)=[{"O^\240\350\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 24}],
msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=, ...},
msg_flags=0}, 0) = 24
select(1024, [3 4 5 6 7], NULL, NULL, NULL) = 1 (in [6])
recvmsg(6, {msg_name(16)={sa_family=AF_INET, sin_port=htons(1002),
sin_addr=inet_addr("192.168.100.2")},
msg_iov(1)=[{"{\243\34\22\0\0\0\0\0\0\0\2\0\1\206\245\0\0\0\3\0\0\0\1\0\0\0\1\0\0\0D"...,
8800}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=,
...}, msg_flags=0}, 0) = 132
stat("/etc/hosts.allow", {st_mode=S_IFREG|0644, st_size=189, ...}) = 0
stat("/etc/hosts.deny", {st_mode=S_IFREG|0644, st_size=347, ...}) = 0
open("/var/lib/nfs/etab", O_RDONLY) = 10
fstat(10, {st_mode=S_IFREG|0644, st_size=184, ...}) = 0
close(10) = 0
lstat("/mnt", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("/mnt/lustre_mail_fs", 0x7fff4bd4b2b0) = -1 EACCES (Permission denied)
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2875, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2875, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2875, ...}) = 0
sendto(9, "<29>Jul 16 17:22:06 mountd[15964"..., 132, MSG_NOSIGNAL,
NULL, 0) = 132
stat("/mnt/lustre_mail_fs", 0x7fff4bd4b410) = -1 EACCES (Permission denied)
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2875, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2875, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2875, ...}) = 0
sendto(9, "<28>Jul 16 17:22:06 mountd[15964"..., 97, MSG_NOSIGNAL, NULL,
0) = 97
sendmsg(6, {msg_name(16)={sa_family=AF_INET, sin_port=htons(1002),
sin_addr=inet_addr("192.168.100.2")},
msg_iov(1)=[{"{\243\34\22\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\r",
28}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=,
...}, msg_flags=0}, 0) = 28
select(1024, [3 4 5 6 7], NULL, NULL, NULL
On Jul 16, 2010, at 6:23 PM, William Olson <lustre...@reachone.com>
wrote:
> On 7/16/2010 5:12 PM, Andreas Dilger wrote:
>>
>>> Well that improved the debug level, but didn't reveal any -2
>>> errors.. In fact I can't seem to find a line with an error in
>>> it... Is there a specific verbiage used on error lines that I can
>>> grep for? 90% is "Process entered" or "Process leaving"...
>>>
>> You could try "strace -f" on the mount process, to see which
>> syscall is failing. It may be failing with something before it
>> gets to Lustre.
>>
> Results of strace below:
>
> [root@lustreclient mnt]# strace -f -p 15964
> Process 15964 attached - interrupt to quit
> lstat("/mnt", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
> lstat("/mnt/lustre_mail_fs", 0x7fff4bd4b2b0) = -1 EACCES (Permission
> denied)
> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2875, ...}) = 0
Lustre mounted:
drwxrwxrwx 29 root root 4.0K Jul 12 17:03 lustre_mail_fs
Lustre not mounted:
drwxrwxrwx 2 root root 4.0K Jun 10 13:26 lustre_mail_fs
NFSClient mount dir:
drwxrwxrwx 2 root root 4.0K Jul 12 15:09 lustre
On Jul 16, 2010, at 6:50 PM, William Olson <lustre...@reachone.com>
I think my server names are confusing the situation..
Lustre client = NFS server
Lustre is mounted in /mnt/lustre_mail_fs
/mnt/lustre_mail_fs is exported with NFS
NFS client is attempting to mount the /mnt/lustre_mail_fs export to it's
/mnt/lustre directory..
mountd(on the NFSserver/lustre client) fails to stat the correctly
mounted and fully operational /mnt/lustre_mail_fs, during an NFS client
connection attempt.
The NFS client authenticates properly according to the logs, it's only
when mountd attempts to stat the lustre fs that problems arise..
Again, for clarity, I can successfully export and mount any other
directory from this same machine, to the same client..
We had such a problem when first experimenting with a lustre reexport
via nfs. In our case the "permission denied" was a problem that the
"user(owner)" of the reexporting machine (directory) wasn't known on
the mds.
Regards
Heiko
----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
Thanks for all your help so far, any more ideas?
Have you tried to add "fsid=xxx" to your exports line? I think with recently
Lustre versions (I don't remember the implementation details) it should not be
required any more and so it should not with recent nfs-utils and until-linux
(the filesystem uuid is automatically used with those, instead of device
major/minor as fsid), but maybe both type of workarounds conflict on your
system?
You also might consider to simply use unfs3, although performance will be
limited to about 120MB/s, as unfs3 is only single threaded. It also does not
support NFS locks.
If it still does not work out, you should enabled lustre debugging, nfs
debugging and you probably should use wireshark to see what it going on.
Hope it helps,
Bernd
--
Bernd Schubert
DataDirect Networks
When it comes to inexplicable permission problems, have you checked if
SELinux is turned off on the NFS server?
Regards,
Daniel.
Thanks.
Suvendra.
THANK YOU!!
So, set selinux into permissive mode, adjusted iptables(wasn't part of
the original problem, but I didn't save my rules before rebooting) and
guess what?.. It works. :)
YAY!
I think my sysadmin badge needs to be revoked for a day...
Cheers, Andreas