zfs-fuse nfs sharing problems

Jorge Gea

unread,

Sep 7, 2012, 9:02:59 AM9/7/12

to zfs-...@googlegroups.com

Hi everybody,

We are facing a tricky problem with zfs-fuse library.

The whole story is we need to get zfs file system working over 
fuse and share this system over nfs properly, without the ESTALE 
known problems. To do so, we need to apply the many times commented 
parameters "use_ino,noforget" to fuse mount options.

We are using latest version of fuse lib (2.9.1) and
zfs-fuse (0.7.0), also have running the fuse kernel module.
However, zfs-fuse crashes if we try to apply the mentioned 
options. Even applying directly those parameters
on the source code and recompiling, we get same result.

Any ideas?

Thank you very much in advance.

sehe

unread,

Sep 7, 2012, 11:07:12 AM9/7/12

to zfs-...@googlegroups.com

When does it crash?

Upon launch? Upon mounting a dataset? Upon sharing a dataset? Upon some specific operation?

Oh, please post any contents of /etc/zfs/zfsrc

Regards,

Seth

Emmanuel Anne

unread,

Sep 7, 2012, 12:46:38 PM9/7/12

to zfs-...@googlegroups.com

2012/9/7 Jorge Gea <jorg...@gmail.com>

Hi everybody,

We are facing a tricky problem with zfs-fuse library.

The whole story is we need to get zfs file system working over 
fuse and share this system over nfs properly, without the ESTALE 
known problems. To do so, we need to apply the many times commented 
parameters "use_ino,noforget" to fuse mount options.

I don't know where you have read this, but probably not on a zfs-fuse list !

The only known problem with nfs is when exporting a zfs-fuse pool which contains sub filesystems in it, they must be exported separately using the fsid option, which is handled automatically if you use the "zfs sharenfs" command. If you prefer to do it with the exports file instead be sure to set the fsid parameter correctly where needed.

With this I never got any ESTALE error, now I must say that I use nfs share only to be able to read files remotely, not to write them, but this error is not specific to write operations normally.

Moreover, the option you are giving "no_ino" and "noforget" are specific to fuse high level api, it will be refused by the low level fuse api used by zfs-fuse. You can see that by editing your /etc/zfs/zfsrc, changing the fuse options like that :

fuse-mount-options = default_permissions,nonempty,use_ino,noforget

Then run the daemon manually : sudo zfs-fuse -n

and then in another window : sudo zfs mount -a

you'll get the error about no_ino first in the daemon window, and then about "noforget" if you remove "no_ino". Neither of these options can be used :

fuse: unknown option `use_ino'

fuse: unknown option `noforget'

Now I just tested this, and I confirm you shouldn't get any crash because of this, so there is probably something very wrong on your system, maybe the running kernel (but fuse 2.9.x is supposed to be backwards compatible with everything until at least 2.6.9 !). No idea what it is, check syslog, but it just writes an error on stdout here. Ok, I don't use the official 0.7.0 neither, but I don't think there is any difference at this level, see the url of my git repo in my signature if you want.

(tested with debian libfuse 2.9.0 and kernel 3.5.3).

Anyway to sum up : you don't need any specific option for fuse for nfs, then you need either to use zfs sharenfs or manually set the fsid parameter in the exports file, that's all.

Jorge Gea

unread,

Sep 8, 2012, 5:42:58 AM9/8/12

to zfs-...@googlegroups.com

Thank you very much for your repplys.

Sehe, the problem is, as Emmanuel also could reproduce, upon daemon init. As Emmanuel did, we could not apply the use_ino and noforget parameters on /etc/zfs/zfsrc, neither as daemon parameters. So, we tried to do so over source code, with same results. Now, thanks to Emmanuel's explanation I understand that those parameters are useless because they have to do with high-level api, which is not used by zfs-fuse, so we have to find another way to solve the problem.

Emmanuel, we already tried to assing same fsid on server and clients, also tried several parameters combinations on exports, but, unfortunately got same results. We only get this kind of problems with the zfs systems, on same environment, we can share nfs volumes over other file systems without this problem. We are using 3.0.x series kernels. So we thought the solution maybe applying those parameters to zfs-fuse, after reading a lot about fuse library and fuse kernel module.

On monday I will be able to do new tests, and i'm going to try changing the share method to 'zfs sharenfs' as you recommended, we have not tried this. I will report here the results, hopefully good ;), and any new useful data about the system if the problem persists or about the possible solution.

Thank you again, I really appreciate your help.

Jorge

Message has been deleted

Jorge Gea

unread,

Sep 11, 2012, 6:11:42 AM9/11/12

to zfs-...@googlegroups.com

Hi again,

I have done serveral tests. I have tryed with 'zfs sharenfs=on agg1/vol1', also in the exports with fsid equals on server and client and also mounting the volume on client with 'defaults,timeo=30,retrans=5,

rsize=8192,wsize=8192,intr' options, trying to avoid de-sincronization. Still having stale problems ...This problem could be reproduced executing the script I put at the end of this post at the mountpoint of the nfs shared volume. For example:

bash nfs-crash /agg1/vol2/prueba

Simultaneously, at client, I execute:

ls -l

Several times until the error occurs. If it does not occur at first time executing the script at server, it does at second or third time. System recovers and it's only a temporary situation. But in backup processes and other critic situations, this represents a big deal.

Here is the little script:

#!/bin/bash
BASE_DIR=
if [ ! -z $1 ]; then
BASE_DIR=$1
fi

for D in $(seq 1 1 1000); do
if [ -z $BASE_DIR ]; then
    DIR="dir$D"
else
    DIR="$BASE_DIR/dir$D"
fi
if [ ! -d $DIR ]; then
    echo -n "$DIR-> "
    mkdir $DIR
    for F in $(seq 1 1 100); do
      echo -n "."
      FILE=$(seq 1 1 100)
      echo $FILE > $DIR/$F.txt
    done
    echo " done"
fi
done
find $BASE_DIR -type f $ ! -iname "backup.cpio" $ > $BASE_DIR/lista_ficheros.txt
cat $BASE_DIR/lista_ficheros.txt | cpio -o > $BASE_DIR/backup.cpio

Also, if it helps, here is relevant content of my /etc/zfs/zfsrc

vdev-cache-size = 10
max-arc-size = 100
zfs-prefetch-disable
fuse-attr-timeout = 3600
fuse-entry-timeout = 3600
fuse-mount-options = default_permissions

Thanks in advanced for any help.

El viernes, 7 de septiembre de 2012 15:02:59 UTC+2, Jorge Gea escribió:

Emmanuel Anne

unread,

Sep 11, 2012, 7:26:07 AM9/11/12

to zfs-...@googlegroups.com

Hum, did I mention that you should use nfsv3 and not nfsv4 ?

Maybe not in this post !

Anyway, I have run your script on a simple setup, running the script on the server at the base directory of the export while running ls -R continually in the same directory on the client with no problem, I have done that more than 10 times, and stopped when dir79 was created.

the dir was simply exported by zfs set sharenfs=on

and mounted using autofs on the other side.

No problem to report !

Anyway to be precise, there is absolutely no code specific to nfs in zfs-fuse, it's all transparent for us and handled completely by fuse.

This is really the simplest setup I could make, no fancy option anywhere at all !

2012/9/11 Jorge Gea <jorg...@gmail.com>

--
To post to this group, send email to zfs-...@googlegroups.com
To visit our Web site, click on http://zfs-fuse.net/

--
my zfs-fuse git repository : http://rainemu.swishparty.co.uk/cgi-bin/gitweb.cgi?p=zfs;a=summary

Jorge Gea

unread,

Sep 11, 2012, 11:40:08 AM9/11/12

to zfs-...@googlegroups.com

Ok, then, could you give it a try with this? It's an improved version of previous scripts and I get the error always.
After executing it, wait for a moment after the end of directory creation process and you should get it too (ls -l on client).

Anyway, you're right, it might be a fuse problem. Probably the problem is related with the fuse kernel module, we are researching a solution by that way too, but can't solve it by now...

Thanks again,

The script, if you could try is:

#!/bin/bash
BASE_DIR=
if [ ! -z $1 ]; then
BASE_DIR=$1
fi

for D in $(seq 1 1 100); do

if [ -z $BASE_DIR ]; then
    DIR="dir$D"
else
    DIR="$BASE_DIR/dir$D"
fi
if [ ! -d $DIR ]; then
    echo -n "$DIR-> "
    mkdir $DIR
    for F in $(seq 1 1 100); do
      echo -n "."
      FILE=$(seq 1 1 100)
      echo $FILE > $DIR/$F.txt
    done
    echo " done"
fi
done

if [ ! -e $BASE_DIR/lista_ficheros.txt ]; then

find $BASE_DIR -type f $ ! -iname "backup.cpio" $ > $BASE_DIR/lista_ficheros.txt

fi
for D in $(seq 1 1 20); do

cat $BASE_DIR/lista_ficheros.txt | cpio -o > $BASE_DIR/backup.cpio

done

Emmanuel Anne

unread,

Sep 11, 2012, 12:18:24 PM9/11/12

to zfs-...@googlegroups.com

Well 1st run went ok, except that I ran it in the root of my directory and so the backup took everything, and since it's a deduped volume, it was taking a very looooooong time, but except that everything was running fine, ran a lot of ls -tr -R one the client while the backup was growing and everything was ok.

After a while I stopped it, and I'll run it later in a separate dir, just to be sure.

But as I said, it's more likely to be on the nfs software (using kernel-server here), or nfs version (3 here, don't know if it can run with 4, never actually tested), or fuse if you have a very old kernel (unlikely but who knows ?).

Yeah I can confirm after running it in a separate directory (much faster ! ;-)), everything ran smoothly here (the script on the server, running ls -tr -R on the client while the script is running and after it has run, no problem at all).

2012/9/11 Jorge Gea <jorg...@gmail.com>

--
To post to this group, send email to zfs-...@googlegroups.com

To visit our Web site, click on http://zfs-fuse.net/

Jorge Gea

unread,

Sep 11, 2012, 12:43:04 PM9/11/12

to zfs-...@googlegroups.com

Damn, if you coudn't reproduce the error whith that script, i'm really lost here :S.

We are running 3.0.8 kernel, also with nfs-kernel-server (1:1.2.2-4), and also nfsv3.. I don't know what more difference could be implied among your system and mine. This is really strange,
Well, I'll continue reseraching on fuse side meanwhile, if I had news I'll post them here.

Thanks for your time and guideness Emmanuel, and sorry for the script problem ;)

Durval Menezes

unread,

Sep 11, 2012, 12:50:23 PM9/11/12

to zfs-...@googlegroups.com

Hi Jorge,

If I was in your place, I would try it with an older, "more stable" kernel; my "go to" kernel for debugging crazy stuff like this is currently the latest 2.6.32.x

Just a suggestion...

Cheers,
--
Durval.

--

Emmanuel Anne

unread,

Sep 11, 2012, 1:09:29 PM9/11/12

to zfs-...@googlegroups.com

lol, well I don't know what to say. I use 3.5.3 here because of btrfs (which is better with every new kernel version, so you are encouraged to upgrade often if you use it !). But nfs works ok as well.

The ESTALE error tell that the inode number taken for a file is bad. Maybe there is a bad cache somewhere ? I know some specific cache has been added for nfs lately, didn't experiment a lot with that though.

You could also try to lower the fuse-attr-timeout and fuse-entry-timeout which control exactly that kind of cache (association of names - inodes for entry, and attributes - inodes for attr), try to put them both to 0 to disable completely the cache (performance will suffer but it's for testing), and if it works you can try to set them to 1 (which is vastly better than 0 already), and try again.

Good luck and post again if you find the solution...

2012/9/11 Durval Menezes <durval....@gmail.com>

Emmanuel Anne

unread,

Sep 12, 2012, 9:19:12 AM9/12/12

to zfs-...@googlegroups.com

Another wild idea for you out of nowhere : normally nfsv3 handles 64 bits nodes and sizes, so you shouldn't have any problem with zfs...

But I never actually tested this on a very big volume, I always used pools < 1 To for that.

Normally it works though, I remember seeing the inode in 2 parts for 64 bits in the code, so it should work anywhere, but just to be 100% sure maybe you could test your script on a small volume and see if it makes any difference.

Anyway for the record I ran it with default settings everywhere including fuse-attr-timeout = fuse-entry-timeout = 3600.

2012/9/11 Emmanuel Anne <emmanu...@gmail.com>

Jorge Gea

unread,

Sep 12, 2012, 11:58:22 AM9/12/12

to zfs-...@googlegroups.com

I've tryed today setting fuse-attr-timeout=0 and fuse-entry-timeout=0. First with big volume, and then with small volume, but it seems didn't make any difference, no luck yet ..

Now I've checked latest version of stable kernel (3.5.3) and the source code of the fuse module is quite different from what I have on my kernel version (3.0.8).
Changing the entire kernel is pretty complicated right now at this environment. Maybe I'll try to do it for testing, applying an older version, but, in case it worked, I can't do that as solution
(many other things implicated could stop working, many many tests to do ...)

But I'm going to try applying the lastest fuse module source code to my kernel, if I can, and see what happens, maybe this magically solves it...

Thank you all, I'll post results when I had them

Jorge Gea

unread,

Sep 12, 2012, 12:18:58 PM9/12/12

to zfs-...@googlegroups.com

Just in case it was interesting for any of you, a guy from fuse-devel lists said this after I questioned them about my problem:

"I found a potential issue of FUSE nfs support that might be related with this (or maybe not, this need the confirm of zfs-fuse team :-).

In kernel's vfs operations, FUSE use fuse_ino to build an NFS file handle. This might generate stale-file-handle issue if fuse_ino is not persistent because kernel nfsd need file handle to be persistent (and this is because NFSv3 client need this). But in most cases, fuse_ino is not persistent (for example, the fuse high level api).

This requirement actually means that to support NFS, the file system needs to use fuse_ll_ops, and also has the ability to work with the input fuse_ino even if it's totally "forgotten". It might be too hard for the file system implementations... I don't know about fuse zfs but at least you can never do this in an NFS mount ...

Is that possible to transfer the export_ops from kernel to the fuse library so that the file system can implement this by its own? I thing this might also be helpful to support the open-by-handle system calls in new kernels.

http://thread.gmane.org/gmane.linux.file-systems/39647 "

El viernes, 7 de septiembre de 2012 15:02:59 UTC+2, Jorge Gea escribió:

Emmanuel Anne

unread,

Sep 12, 2012, 12:23:48 PM9/12/12

to zfs-...@googlegroups.com

No worry about that, we map directly the inode to the nfs handle so it's definetely persistant !

For your idea : err it seems much more complicated to backport the fuse module to an older kernel than to upgrade the kernel !

For the record, I used 3.0.12 for quite some time with no problem to report on nfs (even if I used it only to read, not to write).

2012/9/12 Jorge Gea <jorg...@gmail.com>

--

To post to this group, send email to zfs-...@googlegroups.com
To visit our Web site, click on http://zfs-fuse.net/

Reply all

Reply to author

Forward