Switch !O_CREAT case to use of do_last()
... and now we have all intents crap well localized
Signed-off-by: Al Viro <vi...@zeniv.linux.org.uk>
:040000 040000 53c1effc5b22746bb83cdbc6a419bf898067882d 9606f7275fde188a056fd174aa1d622ac0630893 M fs
I have a file corruption regression bisected to above. (The one step before def4af30cf945a3735ffca865788ea84b30b25d9
was fine)
The test:
- I have an exofs filesystem mounted on /mnt/exofs
- []$ cd /mnt/exofs/some_linux_git; git status;
All is fine
- []$ mount -t nfs4 -o minorversion=0 localhost:/ /mnt/nfs
(Where etc/exports will export /mnt/exofs via nfs4.1)
- []$ cd /mnt/nfs/some_linux_git; git status;
This will fail and will corrupt the .git/index file. Sometimes the file would be
too short, and sometimes the file will become a directory (Yes really)
Simple revert of this patch over 2.6.34-rc2 will not work so I've not checked it.
A single threaded read like: "diff -r /mnt/nfs/some_linux_git /mnt/nfs/some_linux_git.back"
or even "diff -r /mnt/exofs/some_linux_git /mnt/nfs/some_linux_git" works fine.
A simple dd write test works fine as well.
So it looks like a race that only something like git can exercise, with multi threaded
reads and writes, through nfs/nfsd.
I know Benny also had failures with files export ontop of 2.6.34-rc2
(I looked at the patch, it is totally out of my league, please help)
I will try to reproduce this with a more conventional FS, or any thing else
I should test?
Thanks
Boaz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
What happens if you export to box running older kernel *or* from box
running older kernel? IOW, is that nfsd or nfs client getting unhappy?
I'd suspect the latter, but...
Only via NFS. All local access is fine.
After the corruption above I can cd to the local mount cp a fresh copy
of .git/index file and play around just fine.
Once I return to the NFS mounted directory, a git status will do it.
It does not matter if caches are cold (Takes a long time) or hot it happens
every time.
Weird I know, I'm playing some more with it as we speak
Thanks
Bloody impressive... Does that happen to underlying fs or to what you
are seeing via NFS?
Good question, I'm just getting to that because currently it's all
over localhost (same kernel, BTW inside a UML)
I will try what you said. Please through any other tests on me, if needed.
Boaz
Very interesting... Just to see which path we are hitting: add
if (IS_ERR(nd->intent.open.file))
printk("foo: %s", pathname);
right after
error = do_lookup(nd, &nd->last, path);
if (error)
goto exit;
in fs/namei.c:do_last() and see whether we are hitting it or not on objects
that get corrupted.
As you suspected old-server+new-client fails. any-thing+old-client is
fine. (two separate machines this time)
> Very interesting... Just to see which path we are hitting: add
> if (IS_ERR(nd->intent.open.file))
> printk("foo: %s", pathname);
> right after
> error = do_lookup(nd, &nd->last, path);
> if (error)
> goto exit;
> in fs/namei.c:do_last() and see whether we are hitting it or not on objects
> that get corrupted.
Sorry was busy shifting setups, didn't see your mail, will do that next ...
Thanks
Boaz
Below is what I changed. (I hope its what you meant)
It does not get hit, just that git corruption as before but I don't see the prints.
I'll try running with nfs dbg-prints on see what it does around the time gits complains
Boaz
---
diff --git a/fs/namei.c b/fs/namei.c
index 1c0fca6..d1c96f0 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1650,6 +1650,12 @@ static struct file *do_last(struct nameidata *nd, struct path *path,
error = do_lookup(nd, &nd->last, path);
if (error)
goto exit;
+
+ if (IS_ERR(nd->intent.open.file)) {
+ printk(KERN_ERR "foo: %s", pathname);
+ WARN_ON(1);
+ }
+
error = -ENOENT;
if (!path->dentry->d_inode)
goto exit_dput;
Al hi
Would you like to attempt a revert of this patch (or group of patches)
Just to get rid of the thought that git bisect was just peeking the
wrong guy. Maybe it's just something else? Can you understand the
relevance of all this?
(I'll try other setups as well but tomorrow, it's getting late out here)
Thanks for your help
Boaz
Something weird is going on in your trace:
NFS: open file(5b/46ff70a61cf4e159a0339df0e02113bf35f805)
NFS: permission(0:12/323044), mask=0x24, res=0
NFS: revalidating (0:12/323044)
--> nfs4_setup_sequence clp 00000000791f3000 session (null) sr_slotid
128
<-- nfs4_setup_sequence status=0
encode_compound: tag=
decode_attr_type: type=00
decode_attr_change: change attribute=10077553255782547456
decode_attr_size: file size=921
decode_attr_fsid: fsid=(0x0/0x0)
decode_attr_fileid: fileid=0
decode_attr_fs_locations: fs_locations done, error = 0
decode_attr_mode: file mode=00
decode_attr_nlink: nlink=1
decode_attr_owner: uid=-2
decode_attr_group: gid=-2
decode_attr_rdev: rdev=(0x0:0x0)
decode_attr_space_used: space used=0
decode_attr_time_access: atime=0
decode_attr_time_metadata: ctime=1269422731
decode_attr_time_modify: mtime=1269422731
decode_attr_mounted_on_fileid: fileid=0
decode_getfattr: xdr returned 0
A file type of '0' in the above trace is just wrong, and probably
indicates that the server didn't even return that attribute.
I'd say you have a corruption issue either on the server side or on your
client.
Trond
If you see breakage at that commit and do not see it on its parent, we
do have the right guy...
As for reverting, try reverting 781b16775ba0bb55fac0e1757bf0bd87c8879632
first, then this commit.
How consistent are the effects you are seeing from test to test on the same
kernel? This one was very interesting, since it seemed to fail with
-EISDIR while opening .git/objects/pack. Which is a directory and which
should fail with -EISDIR if and only if we pass O_CREAT to open(). And
passing O_CREAT on that one is probably not an intended behaviour of git...
Does anybody else see NFS breakage starting at that commit, BTW? Other
testcases would be useful...
By the way. Does this issue happen with NFSv4 too, or is it only
NFSv4.1?
I also see it....didn't bisect it yet since I was in a rush and had a
simple work around. I was doing a git pull between two repositories on a
NFS 4 w/krb5 security. Got the error about .git/objects/pack (error:
unable to open object pack directory: .git/objects/pack: Is a
directory). If I then run 'ls .git/objects/pack', and then the 'git
pull' again it works.
Doug
Very interesting... The damn thing *is* a directory, which should have
made NFS skip all lookup_instantiate_filp() tricks completely. IOW, it's
not hitting anything intent-related at that case.
I really wonder where the hell does EISDIR come from; no matter how screwed
the cached attributes are, there's not a lot of places where we can return
that sucker. It either comes from something in NFS itself, or it's
may_open() getting MAY_WRITE on that object or it has to see O_CREAT in
open_flag. And acc_mode is not modified after it's set in do_filp_open().
If that calculation (acc_mode by open_flag) would be buggered for O_RDONLY,
we'd be seeing a lot more breakage, starting with ls(1) ;-)
One possibility is that shit hits the fan a bit earlier and git really passes
something odd to that open() as the result of bogus stat(), etc.
Folks, could you try the following: in do_last() move case LAST_DOT:
to immediately after follow_dotdot(nd); and see if that changes anything?
It shouldn't, but if nfs is playing odd tricks with ->d_revalidate() for
directories acting differently depending on LOOKUP_PARENT in flags...
IOW, replace
case LAST_DOTDOT:
follow_dotdot(nd);
dir = nd->path.dentry;
if (nd->path.mnt->mnt_sb->s_type->fs_flags & FS_REVAL_DOT) {
if (!dir->d_op->d_revalidate(dir, nd)) {
error = -ESTALE;
goto exit;
}
}
/* fallthrough */
case LAST_DOT:
case LAST_ROOT:
with
case LAST_DOTDOT:
follow_dotdot(nd);
/* fallthrough */
case LAST_DOT:
dir = nd->path.dentry;
if (nd->path.mnt->mnt_sb->s_type->fs_flags & FS_REVAL_DOT) {
if (!dir->d_op->d_revalidate(dir, nd)) {
error = -ESTALE;
goto exit;
}
}
/* fallthrough */
case LAST_ROOT:
and see what'll change.
It is surely a corruption on the client. I've tested with an old server
and it fails just the same.
> By the way. Does this issue happen with NFSv4 too, or is it only
> NFSv4.1?
>
> Trond
>
-o minorversion=0 or =1 does not matter.
nfsv3 does not have that problem. (Same client different server though)
Boaz
(Below is what I changed)
It makes no difference, fails just the same. Would an "strace" help?
I will now setup an ext2 export. exofs is an heavy ext2 copy paste in every
thing meta-data and nfs-export. I hope to strimline to the basics.
You have asked before, if the failure is reproducible. Yes it fails every
time in exactly the same way. "git status" on that tree has never succeeded
from the 2.6.34-rc2 client.
Boaz
---
diff --git a/fs/namei.c b/fs/namei.c
index 1c0fca6..e8a6a7b 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1618,8 +1618,10 @@ static struct file *do_last(struct nameidata *nd, struct path *path,
int error = -EISDIR;
switch (nd->last_type) {
- case LAST_DOTDOT:
+ case LAST_DOTDOT:
follow_dotdot(nd);
+ /* fallthrough */
+ case LAST_DOT:
dir = nd->path.dentry;
if (nd->path.mnt->mnt_sb->s_type->fs_flags & FS_REVAL_DOT) {
if (!dir->d_op->d_revalidate(dir, nd)) {
@@ -1628,7 +1630,6 @@ static struct file *do_last(struct nameidata *nd, struct path *path,
}
}
/* fallthrough */
- case LAST_DOT:
case LAST_ROOT:
if (open_flag & O_CREAT)
goto exit;
@@ -1650,6 +1651,12 @@ static struct file *do_last(struct nameidata *nd, struct path *path,
error = do_lookup(nd, &nd->last, path);
if (error)
goto exit;
+
+ if (IS_ERR(nd->intent.open.file)) {
+ printk(KERN_ERR "foo: %s", pathname);
+ WARN_ON(1);
+ }
+
error = -ENOENT;
if (!path->dentry->d_inode)
goto exit_dput;
> It makes no difference, fails just the same. Would an "strace" help?
It might, especially if you ran it for identical repositories on local
fs and on NFS; at least that way it would be possible to see where do
they diverge...
I reproduced this both over v4 and v4.1 with ext3 exported on the server.
Benny
>
> Trond
>
> _______________________________________________
> pNFS mailing list
> pN...@linux-nfs.org
> http://linux-nfs.org/cgi-bin/mailman/listinfo/pnfs
ext3 fs exported on the server
client mount -t nfs4 localhost:/ /mnt/localhost
$ strace git status
...
open(".git/objects/pack", O_RDONLY|O_NONBLOCK|O_DIRECTORY|0x80000) = -1 EISDIR (Is a directory)
FWIW, This could be timing related as retrying with nfs debugging on
failed to reproduce the problem.
Benny
> _______________________________________________
> pNFS mailing list
> pN...@linux-nfs.org
> http://linux-nfs.org/cgi-bin/mailman/listinfo/pnfs
Interesting. It is a directory, indeed, but why the hell does that call
fail with -EISDIR?
Does that happen with nfsv3 or is that v4-only? I'm going to set up v4
server and client and see what happens, but that information could be
useful...
We've seen this with v4 only so far.
BTW I added this WARN_ON:
@@ -1656,8 +1659,10 @@ static struct file *do_last(struct nameidata *nd, struct path *path,
if (path->dentry->d_inode->i_op->follow_link)
return NULL;
error = -ENOTDIR;
- if (*want_dir && !path->dentry->d_inode->i_op->lookup)
+ if (*want_dir && !path->dentry->d_inode->i_op->lookup) {
+ WARN_ON(1);
goto exit_dput;
+ }
path_to_nameidata(path, nd);
audit_inode(pathname, nd->path.dentry);
goto ok;
but it is NOT tripping for this scenario.
(for some reason I saw it tripping when building the kernel over nfs
but it's benign)
Benny
I wonder... What happens if you add
if (error == -EISDIR)
printk("blah: %s", pathname);
right after do_lookup() call in do_last()? That would separate -EISDIR
coming from NFS from the same thing coming from fs/namei.c...
It's ENOTDIR, not EISDIR, anyway. Happens if you ask to open foo/ or
foo with O_DIRECTORY when foo is not a directory.
Duh, you were referring to the hunk I added a WARN_ON to.
Sure. you're right!
>
>>> open(".git/objects/pack", O_RDONLY|O_NONBLOCK|O_DIRECTORY|0x80000) = -1 EISDIR (Is a directory)
>
> The odd thing is the app incorrectly gets EISDIR despite
> the fact that it is opening a directory with O_DIRECTORY.
> _______________________________________________
> pNFS mailing list
> pN...@linux-nfs.org
> http://linux-nfs.org/cgi-bin/mailman/listinfo/pnfs
Hmm, not according to my strace (or am I blind today? :)
> > open(".git/objects/pack", O_RDONLY|O_NONBLOCK|O_DIRECTORY|0x80000) = -1 EISDIR (Is a directory)
The odd thing is the app incorrectly gets EISDIR despite
the fact that it is opening a directory with O_DIRECTORY.
I'll add that check too.
Benny
> right after do_lookup() call in do_last()? That would separate -EISDIR
> coming from NFS from the same thing coming from fs/namei.c...
> _______________________________________________
> pNFS mailing list
> pN...@linux-nfs.org
> http://linux-nfs.org/cgi-bin/mailman/listinfo/pnfs
> Does that happen with nfsv3 or is that v4-only? I'm going to set up v4
> server and client and see what happens, but that information could be
> useful...
*grumble*
I'd managed to forget that rpc.imapd in its infinite wisdom relies on
the FPOS I normally disable. Oh, well... rebuilding testbox kernel
with dnotify enabled ;-/
Bingo.
It is returned from do_lookup.
@@ -1648,6 +1654,8 @@ static struct file *do_last(struct nameidata *nd, struct path *path,
/* just plain open? */
if (!(open_flag & O_CREAT)) {
error = do_lookup(nd, &nd->last, path);
+ if (error == -EISDIR)
+ printk("%s: do_lookup returned -EISDIR %s\n", __func__, pathname);
if (error)
goto exit;
error = -ENOENT;
> _______________________________________________
> pNFS mailing list
> pN...@linux-nfs.org
> http://linux-nfs.org/cgi-bin/mailman/listinfo/pnfs
Indeed this error is coming from the server:
nfsd_dispatch: vers 4 proc 1
nfsv4 compound op #1/7: 22 (OP_PUTFH)
nfsd: fh_verify(16: 01010001 00000000 000e6592 345b9f25 00000000 00000000)
nfsv4 compound op ffff880076734078 opcnt 7 #1: 22: status 0
nfsv4 compound op #2/7: 32 (OP_SAVEFH)
nfsv4 compound op ffff880076734078 opcnt 7 #2: 32: status 0
nfsv4 compound op #3/7: 18 (OP_OPEN)
NFSD: nfsd4_open filename pack op_stateowner (null)
renewing client (clientid 4bab503e/00000002)
nfsd: nfsd_lookup(fh 16: 01010001 00000000 000e6592 345b9f25 00000000 00000000, pack)
nfsd: fh_verify(16: 01010001 00000000 000e6592 345b9f25 00000000 00000000)
nfsd: fh_compose(exp 08:05/106497 objects/pack, ino=943508)
nfsd: fh_verify(16: 01010001 00000000 000e6594 345b9f26 00000000 00000000)
nfsv4 compound op ffff880076734078 opcnt 7 #3: 18: status 21
nfsv4 compound returned 21
> Indeed this error is coming from the server:
>
> nfsd_dispatch: vers 4 proc 1
> nfsv4 compound op #1/7: 22 (OP_PUTFH)
> nfsd: fh_verify(16: 01010001 00000000 000e6592 345b9f25 00000000 00000000)
> nfsv4 compound op ffff880076734078 opcnt 7 #1: 22: status 0
> nfsv4 compound op #2/7: 32 (OP_SAVEFH)
> nfsv4 compound op ffff880076734078 opcnt 7 #2: 32: status 0
> nfsv4 compound op #3/7: 18 (OP_OPEN)
> NFSD: nfsd4_open filename pack op_stateowner (null)
> renewing client (clientid 4bab503e/00000002)
> nfsd: nfsd_lookup(fh 16: 01010001 00000000 000e6592 345b9f25 00000000 00000000, pack)
> nfsd: fh_verify(16: 01010001 00000000 000e6592 345b9f25 00000000 00000000)
> nfsd: fh_compose(exp 08:05/106497 objects/pack, ino=943508)
> nfsd: fh_verify(16: 01010001 00000000 000e6594 345b9f26 00000000 00000000)
> nfsv4 compound op ffff880076734078 opcnt 7 #3: 18: status 21
> nfsv4 compound returned 21
Ho-hum... So it hits the "let's try to open it atomically" path and
gets told to FOAD by server (as it should, of course).
And if we see different behaviour after ls -l, presumably that's a
difference between ->lookup() and ->d_revalidate() paths on client...
OK, I think I see what's going on in this case. However, it doesn't
explain everything; my current theory is that we used to get LOOKUP_DIRECTORY
on the last components in O_DIRECTORY opens and we don't do that now.
That used to derail the is_atomic_open(), now it's hit and there we go.
It's not hard to verify (and it might take care of this testcase), but
I still have questions about the way this code used to work *without*
O_DIRECTORY.
Let's try this: before do_lookup() call there add
if (*want_dir)
nd->flags |= LOOKUP_DIRECTORY;
and see how does it behave.
However, even if it does help, it doesn't explain everything. Normal
open() on a directory without O_DIRECTORY if flags shouldn't fail with
-EISDIR. How did that manage to avoid it all along?
I just wanted to make sure, on top of 2.6.34-rc2 i did:
git revert 781b16775ba0bb55fac0e1757bf0bd87c8879632
Revert "Fix a dumb typo - use of & instead of &&"
and:
git revert 1f36f774b22a0ceb7dd33eca626746c81a97b6a5
Revert "Switch !O_CREAT case to use of do_last()"
The revert is now clean and the problem does not show.
Are you able to reproduce this? Anything I can do to help?
Thanks
Boaz
I'm not sure I mentioned this. If you are able to reproduce the
error, then after the error in "git status" the tree becomes unusable.
Not even from the local_mount. I have not found a method to fix the tree.
So what I do is:
- Before I start playing I save "cp some_tree/.git/index index.back"
- then after the error and tree damage On local mount I:
"rm some_tree/.git/index"
"cp index.back cp some_tree/.git/index"
"cd some_tree; git status"
this will fix it so I can try a new iteration
Yes this fixes it!!
2.6.34-rc2 plus above, now works, horay. (diff attached)
> and see how does it behave.
>
> However, even if it does help, it doesn't explain everything. Normal
> open() on a directory without O_DIRECTORY if flags shouldn't fail with
> -EISDIR. How did that manage to avoid it all along?
---
diff --git a/fs/namei.c b/fs/namei.c
index 1c0fca6..434ad2a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1647,6 +1647,8 @@ static struct file *do_last(struct nameidata *nd, struct path *path,
/* just plain open? */
if (!(open_flag & O_CREAT)) {
+ if (*want_dir)
+ nd->flags |= LOOKUP_DIRECTORY;
error = do_lookup(nd, &nd->last, path);
if (error)
goto exit;
Does open() of directory _without_ O_DIRECTORY work in e.g. vanilla 2.6.33?
It certainly does for local filesystems and it does for NFSv3; does it work
for NFSv4?
In my tests. Every thing is the same safe the client with the above change.
So I guess NFSv4 does something different when asked for directory lookup
as opposed to files lookup. I guess there is something added/removed to
the compound depending on that flag. But I wouldn't know, I am not familiar
with this code. NFSv4 someone?
Boaz
No, it doesn't.
# mount localhost:/usr0/nfs4export /mnt/localhost; strace cat /mnt/localhost/server 2>&1 | grep 'open.*server'; umount /mnt/localhost
open("/mnt/localhost/server", O_RDONLY) = 3
# mount -t nfs4 localhost:/ /mnt/localhost; strace cat /mnt/localhost/server 2>&1 | grep 'open.*server'; umount /mnt/localhost
open("/mnt/localhost/server", O_RDONLY) = -1 EISDIR (Is a directory)
OK, what happens if you do the following:
mount the same fs from two clients
on one client:
mkdir /mnt/weird_name_69
on another:
echo 'main() {open("/mnt/weird_name_69", 0);}' >/tmp/a.c
gcc /tmp/a.c
strace ./a.out
ls -l /mnt/weird_name_69
strace ./a.out
Will the first strace show EISDIR and the second succeed?
From my reading of that code (2.6.33, before all that stuff got merged),
we have different behaviour depending on which codepath do we hit.
If we go through ->d_revalidate(), it sees that it's not S_ISREG() and
doesn't try to play with atomic open. If we go through ->lookup(), we
tell the server to open it, and when it tells us to bugger off (it's a
directory, NFSv4 doesn't support atomic open for those), -EISDIR is
passed to caller. Which leads to open() failing.
It definitely looks like a bug. Masked by O_DIRECTORY in 2.6.33. Bug
in fs/namei.c patch has exposed that crap both for O_DIRECTORY and !O_DIRECTORY
cases.
So immediate fix will need to be along the lines of "add LOOKUP_DIRECTORY
even on the last step if we have *want_dir" (and I'd probably get rid of
want_dir then and just abuse nd->flags), but there's a real NFS bug as
well.
Gets better - if you do ls -l /mnt/localhost/server and then repeat that
open(), it'll succeed.
Correct :)
NFS bug, you mean by not being consistent when opening a directory from
cache or when having to fetch it for the first time? I guess...
But I still think it is a problem that the application, git in this case,
did an open(,O_DIRECTORY...) and because of code reuse some low-level did
not see that flag, relaying on the fact that in most systems they don't care.
The nd->flags things looks like the right thing to me.
And about the NFS thing, let misbehaving applications byte there own bullets.
But a good app like git, does put an O_DIRECTORY on the directory open, and
therefore will receive consistent results, right?
Please push a fix, meanwhile I'm running with this one, but I will need it
in linux-next later.
Thanks for everything
Boaz
So, previously (pre 1f36f77), do_last called do_path_lookup with
lookup_flags(open_flag)|LOOKUP_OPEN and lookup_flags did
- if (f & O_DIRECTORY)
- retval |= LOOKUP_DIRECTORY;
and this was dropped in 1f36f77, right?
Benny
> _______________________________________________
> pNFS mailing list
> pN...@linux-nfs.org
> http://linux-nfs.org/cgi-bin/mailman/listinfo/pnfs
What the hell do you mean, "misbehaving"? O_DIRECTORY means one thing:
"refuse to open if it turns out to be a non-directory". If open() succeed
with it and fails without, it's a definite bug. Application has every
right to call the damn thing without that flag.
It *can* be used as a hint by fs code, and it's a useful hint, so yes,
we need to restore LOOKUP_DIRECTORY (until we get rid of the entire
d_revalidate/lookup+open mess and pass struct file + open flags directly
to fs method for the last step of pathname resolution; basically, most
of the do_last() guts would become that method, killing the intents
crap for good; new method would simply check for O_DIRECTORY presence
instead of looking at LOOKUP_DIRECTORY).
However, NFS behaviour in cases when O_DIRECTORY is absent is a separate
(and quite real) bug. If nfs_atomic_lookup() sees -EISDIR from attempt
to open on server, it should either issue plain lookup or try to use the
information it got in response to open attempt (if there's enough of it
there). Passing -EISDIR to ->lookup() caller is simply wrong.
I'm going to send a fix for O_DIRECTORY case (restoring the behaviour
we had in 2.6.33) today, but NFS side of things also needs to be dealt
with.
Yes, but we should not be seeing crap like that in the XDR decoding of a
revalidate request. I strongly suspect the server should be returning an
error here, probably NFS4ERR_BAD_HANDLE... After we fix the client side,
I'd like to take a look at the server and see why you are getting bogus
info for your GETATTR.
Trond
Ha, OK, sorry I thought exactly the opposite. I assumed it meant
"don't allow a directory to be opened, unless set". As an application
hint of saying I'm going to do file things to this handle, don't want
any dirs.
So yes definitely it looks like an NFS bug masked by the fact that everyone
is using opendir() form libc for this.
> It *can* be used as a hint by fs code, and it's a useful hint, so yes,
> we need to restore LOOKUP_DIRECTORY (until we get rid of the entire
> d_revalidate/lookup+open mess and pass struct file + open flags directly
> to fs method for the last step of pathname resolution; basically, most
> of the do_last() guts would become that method, killing the intents
> crap for good; new method would simply check for O_DIRECTORY presence
> instead of looking at LOOKUP_DIRECTORY).
>
> However, NFS behaviour in cases when O_DIRECTORY is absent is a separate
> (and quite real) bug. If nfs_atomic_lookup() sees -EISDIR from attempt
> to open on server, it should either issue plain lookup or try to use the
> information it got in response to open attempt (if there's enough of it
> there). Passing -EISDIR to ->lookup() caller is simply wrong.
>
> I'm going to send a fix for O_DIRECTORY case (restoring the behaviour
> we had in 2.6.33) today, but NFS side of things also needs to be dealt
> with.
Thank you Al for fixing this, I hope some capable NFS person will take
that issue to heart.
Boaz
> > I'm going to send a fix for O_DIRECTORY case (restoring the behaviour
> > we had in 2.6.33) today, but NFS side of things also needs to be dealt
> > with.
>
> Thank you Al for fixing this, I hope some capable NFS person will take
> that issue to heart.
The NFSv4 fix is a trivial 1 liner. Pushed to bugfixes.
Now to understand that screwed up xdr decode...
Trond
------------------------------------------------------------------------------------------
NFSv4: Fall back to ordinary lookup if nfs4_atomic_open() returns EISDIR
From: Trond Myklebust <Trond.M...@netapp.com>
Signed-off-by: Trond Myklebust <Trond.M...@netapp.com>
---
fs/nfs/dir.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index c6f2750..be46f26 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1025,12 +1025,12 @@ static struct dentry *nfs_atomic_lookup(struct inode *dir, struct dentry *dentry
res = NULL;
goto out;
/* This turned out not to be a regular file */
+ case -EISDIR:
case -ENOTDIR:
goto no_open;
case -ELOOP:
if (!(nd->intent.open.flags & O_NOFOLLOW))
goto no_open;
- /* case -EISDIR: */
/* case -EINVAL: */
default:
goto out;
Thanks
> Now to understand that screwed up xdr decode...
>
Tell me if you need that I run any tests. I still have that
failing setup if you need it.
> Trond
Boaz
Do you have a wireshark (binary) dump that includes that particular
getattr call and reply? That would help.
Trond
Wahoo sorry, it's already 20:31 out here It'll have to wait til Sunday
(You know that Hebrew weekend).
Tell me if you still need it sure, NP, I will make one first thing Sunday
morning.
> Trond
>
Boaz