We are facing a tricky problem with zfs-fuse library.
The whole story is we need to get zfs file system working over fuse and share this system over nfs properly, without the ESTALE known problems. To do so, we need to apply the many times commented parameters "use_ino,noforget" to fuse mount options.
We are using latest version of fuse lib (2.9.1) and zfs-fuse (0.7.0), also have running the fuse kernel module. However, zfs-fuse crashes if we try to apply the mentioned options. Even applying directly those parameters on the source code and recompiling, we get same result. Any ideas?
> We are facing a tricky problem with zfs-fuse library.
> The whole story is we need to get zfs file system working over
> fuse and share this system over nfs properly, without the ESTALE
> known problems. To do so, we need to apply the many times commented
> parameters "use_ino,noforget" to fuse mount options.
I don't know where you have read this, but probably not on a zfs-fuse list !
The only known problem with nfs is when exporting a zfs-fuse pool which
contains sub filesystems in it, they must be exported separately using the
fsid option, which is handled automatically if you use the "zfs sharenfs"
command. If you prefer to do it with the exports file instead be sure to
set the fsid parameter correctly where needed.
With this I never got any ESTALE error, now I must say that I use nfs share
only to be able to read files remotely, not to write them, but this error
is not specific to write operations normally.
Moreover, the option you are giving "no_ino" and "noforget" are specific to
fuse high level api, it will be refused by the low level fuse api used by
zfs-fuse. You can see that by editing your /etc/zfs/zfsrc, changing the
fuse options like that :
fuse-mount-options = default_permissions,nonempty,use_ino,noforget
Then run the daemon manually : sudo zfs-fuse -n
and then in another window : sudo zfs mount -a
you'll get the error about no_ino first in the daemon window, and then
about "noforget" if you remove "no_ino". Neither of these options can be
used :
fuse: unknown option `use_ino'
fuse: unknown option `noforget'
Now I just tested this, and I confirm you shouldn't get any crash because
of this, so there is probably something very wrong on your system, maybe
the running kernel (but fuse 2.9.x is supposed to be backwards compatible
with everything until at least 2.6.9 !). No idea what it is, check syslog,
but it just writes an error on stdout here. Ok, I don't use the official
0.7.0 neither, but I don't think there is any difference at this level, see
the url of my git repo in my signature if you want.
(tested with debian libfuse 2.9.0 and kernel 3.5.3).
Anyway to sum up : you don't need any specific option for fuse for nfs,
then you need either to use zfs sharenfs or manually set the fsid parameter
in the exports file, that's all.
Sehe, the problem is, as Emmanuel also could reproduce, upon daemon init. As Emmanuel did, we could not apply the use_ino and noforget parameters on /etc/zfs/zfsrc, neither as daemon parameters. So, we tried to do so over source code, with same results. Now, thanks to Emmanuel's explanation I understand that those parameters are useless because they have to do with high-level api, which is not used by zfs-fuse, so we have to find another way to solve the problem.
Emmanuel, we already tried to assing same fsid on server and clients, also tried several parameters combinations on exports, but, unfortunately got same results. We only get this kind of problems with the zfs systems, on same environment, we can share nfs volumes over other file systems without this problem. We are using 3.0.x series kernels. So we thought the solution maybe applying those parameters to zfs-fuse, after reading a lot about fuse library and fuse kernel module.
On monday I will be able to do new tests, and i'm going to try changing the share method to 'zfs sharenfs' as you recommended, we have not tried this. I will report here the results, hopefully good ;), and any new useful data about the system if the problem persists or about the possible solution.
Thank you again, I really appreciate your help.
Jorge
El viernes, 7 de septiembre de 2012 15:02:59 UTC+2, Jorge Gea escribió:
> We are facing a tricky problem with zfs-fuse library.
> The whole story is we need to get zfs file system working over > fuse and share this system over nfs properly, without the ESTALE > known problems. To do so, we need to apply the many times commented > parameters "use_ino,noforget" to fuse mount options.
> We are using latest version of fuse lib (2.9.1) and
> zfs-fuse (0.7.0), also have running the fuse kernel module.
> However, zfs-fuse crashes if we try to apply the mentioned > options. Even applying directly those parameters
> on the source code and recompiling, we get same result.
> Any ideas?
I have done serveral tests. I have tryed with 'zfs sharenfs=on agg1/vol1', also in the exports with fsid equals on server and client and also mounting the volume on client with 'defaults,timeo=30,retrans=5,
rsize=8192,wsize=8192,intr' options, trying to avoid de-sincronization. Still having stale problems ...This problem could be reproduced executing the script I put at the end of this post at the mountpoint of the nfs shared volume. For example:
bash nfs-crash /agg1/vol2/prueba
Simultaneously, at client, I execute: ls -l Several times until the error occurs. If it does not occur at first time executing the script at server, it does at second or third time. System recovers and it's only a temporary situation. But in backup processes and other critic situations, this represents a big deal.
Here is the little script:
#!/bin/bash
BASE_DIR=
if [ ! -z $1 ]; then
BASE_DIR=$1
fi
for D in $(seq 1 1 1000); do
if [ -z $BASE_DIR ]; then
DIR="dir$D"
else
DIR="$BASE_DIR/dir$D"
fi
if [ ! -d $DIR ]; then
echo -n "$DIR-> "
mkdir $DIR
for F in $(seq 1 1 100); do
echo -n "."
FILE=$(seq 1 1 100)
echo $FILE > $DIR/$F.txt
done
echo " done"
fi
done
find $BASE_DIR -type f \( ! -iname "backup.cpio" \) > $BASE_DIR/lista_ficheros.txt
cat $BASE_DIR/lista_ficheros.txt | cpio -o > $BASE_DIR/backup.cpio
Also, if it helps, here is relevant content of my /etc/zfs/zfsrc
vdev-cache-size = 10
max-arc-size = 100
zfs-prefetch-disable
fuse-attr-timeout = 3600
fuse-entry-timeout = 3600
fuse-mount-options = default_permissions
Thanks in advanced for any help.
El viernes, 7 de septiembre de 2012 15:02:59 UTC+2, Jorge Gea escribió:
> We are facing a tricky problem with zfs-fuse library.
> The whole story is we need to get zfs file system working over > fuse and share this system over nfs properly, without the ESTALE > known problems. To do so, we need to apply the many times commented > parameters "use_ino,noforget" to fuse mount options.
> We are using latest version of fuse lib (2.9.1) and
> zfs-fuse (0.7.0), also have running the fuse kernel module.
> However, zfs-fuse crashes if we try to apply the mentioned > options. Even applying directly those parameters
> on the source code and recompiling, we get same result.
> Any ideas?
Hum, did I mention that you should use nfsv3 and not nfsv4 ?
Maybe not in this post !
Anyway, I have run your script on a simple setup, running the script on the
server at the base directory of the export while running ls -R continually
in the same directory on the client with no problem, I have done that more
than 10 times, and stopped when dir79 was created.
the dir was simply exported by zfs set sharenfs=on
and mounted using autofs on the other side.
No problem to report !
Anyway to be precise, there is absolutely no code specific to nfs in
zfs-fuse, it's all transparent for us and handled completely by fuse.
This is really the simplest setup I could make, no fancy option anywhere at
all !
> I have done serveral tests. I have tryed with 'zfs sharenfs=on agg1/vol1',
> also in the exports with fsid equals on server and client and also mounting
> the volume on client with 'defaults,timeo=30,retrans=5,
> **rsize=8192,wsize=8192,intr' options, trying to avoid de-sincronization.
> Still having stale problems ...This problem could be reproduced executing
> the script I put at the end of this post at the mountpoint of the nfs
> shared volume. For example:
> bash nfs-crash /agg1/**vol2/prueba
> Simultaneously, at client, I execute:
> ls -l
> Several times until the error occurs. If it does not occur at first time
> executing the script at server, it does at second or third time. System
> recovers and it's only a temporary situation. But in backup processes and
> other critic situations, this represents a big deal.
> Here is the little script:
> #!/bin/bash
> BASE_DIR=
> if [ ! -z $1 ]; then
> BASE_DIR=$1
> fi
> for D in $(seq 1 1 1000); do
> if [ -z $BASE_DIR ]; then
> DIR="dir$D"
> else
> DIR="$BASE_DIR/dir$D"
> fi
> if [ ! -d $DIR ]; then
> echo -n "$DIR-> "
> mkdir $DIR
> for F in $(seq 1 1 100); do
> echo -n "."
> FILE=$(seq 1 1 100)
> echo $FILE > $DIR/$F.txt
> done
> echo " done"
> fi
> done
> find $BASE_DIR -type f \( ! -iname "backup.cpio" \) >
> $BASE_DIR/lista_ficheros.txt
> cat $BASE_DIR/lista_ficheros.txt | cpio -o > $BASE_DIR/backup.cpio
> Also, if it helps, here is relevant content of my /etc/zfs/zfsrc
> El viernes, 7 de septiembre de 2012 15:02:59 UTC+2, Jorge Gea escribió:
>> Hi everybody,
>> We are facing a tricky problem with zfs-fuse library.
>> The whole story is we need to get zfs file system working over
>> fuse and share this system over nfs properly, without the ESTALE
>> known problems. To do so, we need to apply the many times commented
>> parameters "use_ino,noforget" to fuse mount options.
>> We are using latest version of fuse lib (2.9.1) and
>> zfs-fuse (0.7.0), also have running the fuse kernel module.
>> However, zfs-fuse crashes if we try to apply the mentioned
>> options. Even applying directly those parameters
>> on the source code and recompiling, we get same result.
>> Any ideas?
>> Thank you very much in advance.
>> --
> To post to this group, send email to zfs-fuse@googlegroups.com
> To visit our Web site, click on http://zfs-fuse.net/
Ok, then, could you give it a try with this? It's an improved version of previous scripts and I get the error always. After executing it, wait for a moment after the end of directory creation process and you should get it too (ls -l on client).
Anyway, you're right, it might be a fuse problem. Probably the problem is related with the fuse kernel module, we are researching a solution by that way too, but can't solve it by now...
Thanks again,
The script, if you could try is:
#!/bin/bash BASE_DIR= if [ ! -z $1 ]; then BASE_DIR=$1 fi
for D in $(seq 1 1 100); do if [ -z $BASE_DIR ]; then DIR="dir$D" else DIR="$BASE_DIR/dir$D" fi if [ ! -d $DIR ]; then echo -n "$DIR-> " mkdir $DIR for F in $(seq 1 1 100); do echo -n "." FILE=$(seq 1 1 100) echo $FILE > $DIR/$F.txt done echo " done" fi done if [ ! -e $BASE_DIR/lista_ficheros.txt ]; then find $BASE_DIR -type f \( ! -iname "backup.cpio" \) > $BASE_DIR/lista_ficheros.txt fi for D in $(seq 1 1 20); do cat $BASE_DIR/lista_ficheros.txt | cpio -o > $BASE_DIR/backup.cpio done
Well 1st run went ok, except that I ran it in the root of my directory and
so the backup took everything, and since it's a deduped volume, it was
taking a very looooooong time, but except that everything was running fine,
ran a lot of ls -tr -R one the client while the backup was growing and
everything was ok.
After a while I stopped it, and I'll run it later in a separate dir, just
to be sure.
But as I said, it's more likely to be on the nfs software (using
kernel-server here), or nfs version (3 here, don't know if it can run with
4, never actually tested), or fuse if you have a very old kernel (unlikely
but who knows ?).
Yeah I can confirm after running it in a separate directory (much faster !
;-)), everything ran smoothly here (the script on the server, running ls
-tr -R on the client while the script is running and after it has run, no
problem at all).
> Ok, then, could you give it a try with this? It's an improved version of
> previous scripts and I get the error always.
> After executing it, wait for a moment after the end of directory creation
> process and you should get it too (ls -l on client).
> Anyway, you're right, it might be a fuse problem. Probably the problem is
> related with the fuse kernel module, we are researching a solution by that
> way too, but can't solve it by now...
> Thanks again,
> The script, if you could try is:
> #!/bin/bash
> BASE_DIR=
> if [ ! -z $1 ]; then
> BASE_DIR=$1
> fi
> for D in $(seq 1 1 100); do
> if [ -z $BASE_DIR ]; then
> DIR="dir$D"
> else
> DIR="$BASE_DIR/dir$D"
> fi
> if [ ! -d $DIR ]; then
> echo -n "$DIR-> "
> mkdir $DIR
> for F in $(seq 1 1 100); do
> echo -n "."
> FILE=$(seq 1 1 100)
> echo $FILE > $DIR/$F.txt
> done
> echo " done"
> fi
> done
> if [ ! -e $BASE_DIR/lista_ficheros.txt ]; then
> find $BASE_DIR -type f \( ! -iname "backup.cpio" \) >
> $BASE_DIR/lista_ficheros.txt
> fi
> for D in $(seq 1 1 20); do
> cat $BASE_DIR/lista_ficheros.txt | cpio -o > $BASE_DIR/backup.cpio
> done
> --
> To post to this group, send email to zfs-fuse@googlegroups.com
Damn, if you coudn't reproduce the error whith that script, i'm really lost here :S.
We are running 3.0.8 kernel, also with nfs-kernel-server (1:1.2.2-4), and also nfsv3.. I don't know what more difference could be implied among your system and mine. This is really strange, Well, I'll continue reseraching on fuse side meanwhile, if I had news I'll post them here.
Thanks for your time and guideness Emmanuel, and sorry for the script problem ;)
If I was in your place, I would try it with an older, "more stable" kernel;
my "go to" kernel for debugging crazy stuff like this is currently the
latest 2.6.32.x
> Damn, if you coudn't reproduce the error whith that script, i'm really
> lost here :S.
> We are running 3.0.8 kernel, also with nfs-kernel-server (1:1.2.2-4), and
> also nfsv3.. I don't know what more difference could be implied among your
> system and mine. This is really strange,
> Well, I'll continue reseraching on fuse side meanwhile, if I had news I'll
> post them here.
> Thanks for your time and guideness Emmanuel, and sorry for the script
> problem ;)
> --
> To post to this group, send email to zfs-fuse@googlegroups.com
> To visit our Web site, click on http://zfs-fuse.net/
lol, well I don't know what to say. I use 3.5.3 here because of btrfs
(which is better with every new kernel version, so you are encouraged to
upgrade often if you use it !). But nfs works ok as well.
The ESTALE error tell that the inode number taken for a file is bad. Maybe
there is a bad cache somewhere ? I know some specific cache has been added
for nfs lately, didn't experiment a lot with that though.
You could also try to lower the fuse-attr-timeout and fuse-entry-timeout
which control exactly that kind of cache (association of names - inodes for
entry, and attributes - inodes for attr), try to put them both to 0 to
disable completely the cache (performance will suffer but it's for
testing), and if it works you can try to set them to 1 (which is vastly
better than 0 already), and try again.
Good luck and post again if you find the solution...
> If I was in your place, I would try it with an older, "more stable"
> kernel; my "go to" kernel for debugging crazy stuff like this is currently
> the latest 2.6.32.x
>> Damn, if you coudn't reproduce the error whith that script, i'm really
>> lost here :S.
>> We are running 3.0.8 kernel, also with nfs-kernel-server (1:1.2.2-4), and
>> also nfsv3.. I don't know what more difference could be implied among your
>> system and mine. This is really strange,
>> Well, I'll continue reseraching on fuse side meanwhile, if I had news
>> I'll post them here.
>> Thanks for your time and guideness Emmanuel, and sorry for the script
>> problem ;)
>> --
>> To post to this group, send email to zfs-fuse@googlegroups.com
>> To visit our Web site, click on http://zfs-fuse.net/
> --
> To post to this group, send email to zfs-fuse@googlegroups.com
> To visit our Web site, click on http://zfs-fuse.net/
Another wild idea for you out of nowhere : normally nfsv3 handles 64 bits
nodes and sizes, so you shouldn't have any problem with zfs...
But I never actually tested this on a very big volume, I always used pools
< 1 To for that.
Normally it works though, I remember seeing the inode in 2 parts for 64
bits in the code, so it should work anywhere, but just to be 100% sure
maybe you could test your script on a small volume and see if it makes any
difference.
Anyway for the record I ran it with default settings everywhere
including fuse-attr-timeout = fuse-entry-timeout = 3600.
> Well 1st run went ok, except that I ran it in the root of my directory and
> so the backup took everything, and since it's a deduped volume, it was
> taking a very looooooong time, but except that everything was running fine,
> ran a lot of ls -tr -R one the client while the backup was growing and
> everything was ok.
> After a while I stopped it, and I'll run it later in a separate dir, just
> to be sure.
> But as I said, it's more likely to be on the nfs software (using
> kernel-server here), or nfs version (3 here, don't know if it can run with
> 4, never actually tested), or fuse if you have a very old kernel (unlikely
> but who knows ?).
> Yeah I can confirm after running it in a separate directory (much faster !
> ;-)), everything ran smoothly here (the script on the server, running ls
> -tr -R on the client while the script is running and after it has run, no
> problem at all).
>> Ok, then, could you give it a try with this? It's an improved version of
>> previous scripts and I get the error always.
>> After executing it, wait for a moment after the end of directory creation
>> process and you should get it too (ls -l on client).
>> Anyway, you're right, it might be a fuse problem. Probably the problem is
>> related with the fuse kernel module, we are researching a solution by that
>> way too, but can't solve it by now...
>> Thanks again,
>> The script, if you could try is:
>> #!/bin/bash
>> BASE_DIR=
>> if [ ! -z $1 ]; then
>> BASE_DIR=$1
>> fi
>> for D in $(seq 1 1 100); do
>> if [ -z $BASE_DIR ]; then
>> DIR="dir$D"
>> else
>> DIR="$BASE_DIR/dir$D"
>> fi
>> if [ ! -d $DIR ]; then
>> echo -n "$DIR-> "
>> mkdir $DIR
>> for F in $(seq 1 1 100); do
>> echo -n "."
>> FILE=$(seq 1 1 100)
>> echo $FILE > $DIR/$F.txt
>> done
>> echo " done"
>> fi
>> done
>> if [ ! -e $BASE_DIR/lista_ficheros.txt ]; then
>> find $BASE_DIR -type f \( ! -iname "backup.cpio" \) >
>> $BASE_DIR/lista_ficheros.txt
>> fi
>> for D in $(seq 1 1 20); do
>> cat $BASE_DIR/lista_ficheros.txt | cpio -o > $BASE_DIR/backup.cpio
>> done
>> --
>> To post to this group, send email to zfs-fuse@googlegroups.com
I've tryed today setting fuse-attr-timeout=0 and fuse-entry-timeout=0. First with big volume, and then with small volume, but it seems didn't make any difference, no luck yet ..
Now I've checked latest version of stable kernel (3.5.3) and the source code of the fuse module is quite different from what I have on my kernel version (3.0.8). Changing the entire kernel is pretty complicated right now at this environment. Maybe I'll try to do it for testing, applying an older version, but, in case it worked, I can't do that as solution (many other things implicated could stop working, many many tests to do ...)
But I'm going to try applying the lastest fuse module source code to my kernel, if I can, and see what happens, maybe this magically solves it...
Just in case it was interesting for any of you, a guy from fuse-devel lists said this after I questioned them about my problem:
"I found a potential issue of FUSE nfs support that might be related with this (or maybe not, this need the confirm of zfs-fuse team :-).
In kernel's vfs operations, FUSE use fuse_ino to build an NFS file handle. This might generate stale-file-handle issue if fuse_ino is not persistent because kernel nfsd need file handle to be persistent (and this is because NFSv3 client need this). But in most cases, fuse_ino is not persistent (for example, the fuse high level api).
This requirement actually means that to support NFS, the file system needs to use fuse_ll_ops, and also has the ability to work with the input fuse_ino even if it's totally "forgotten". It might be too hard for the file system implementations... I don't know about fuse zfs but at least you can never do this in an NFS mount ...
Is that possible to transfer the export_ops from kernel to the fuse library so that the file system can implement this by its own? I thing this might also be helpful to support the open-by-handle system calls in new kernels.
> We are facing a tricky problem with zfs-fuse library.
> The whole story is we need to get zfs file system working over > fuse and share this system over nfs properly, without the ESTALE > known problems. To do so, we need to apply the many times commented > parameters "use_ino,noforget" to fuse mount options.
> We are using latest version of fuse lib (2.9.1) and
> zfs-fuse (0.7.0), also have running the fuse kernel module.
> However, zfs-fuse crashes if we try to apply the mentioned > options. Even applying directly those parameters
> on the source code and recompiling, we get same result.
> Any ideas?
No worry about that, we map directly the inode to the nfs handle so it's
definetely persistant !
For your idea : err it seems much more complicated to backport the fuse
module to an older kernel than to upgrade the kernel !
For the record, I used 3.0.12 for quite some time with no problem to report
on nfs (even if I used it only to read, not to write).
> Just in case it was interesting for any of you, a guy from fuse-devel
> lists said this after I questioned them about my problem:
> "I found a potential issue of FUSE nfs support that might be related with
> this (or maybe not, this need the confirm of zfs-fuse team :-).
> In kernel's vfs operations, FUSE use fuse_ino to build an NFS file handle.
> This might generate stale-file-handle issue if fuse_ino is not persistent
> because kernel nfsd need file handle to be persistent (and this is because
> NFSv3 client need this). But in most cases, fuse_ino is not persistent (for
> example, the fuse high level api).
> This requirement actually means that to support NFS, the file system needs
> to use fuse_ll_ops, and also has the ability to work with the input
> fuse_ino even if it's totally "forgotten". It might be too hard for the
> file system implementations... I don't know about fuse zfs but at least you
> can never do this in an NFS mount ...
> Is that possible to transfer the export_ops from kernel to the fuse
> library so that the file system can implement this by its own? I thing this
> might also be helpful to support the open-by-handle system calls in new
> kernels.
> El viernes, 7 de septiembre de 2012 15:02:59 UTC+2, Jorge Gea escribió:
>> Hi everybody,
>> We are facing a tricky problem with zfs-fuse library.
>> The whole story is we need to get zfs file system working over
>> fuse and share this system over nfs properly, without the ESTALE
>> known problems. To do so, we need to apply the many times commented
>> parameters "use_ino,noforget" to fuse mount options.
>> We are using latest version of fuse lib (2.9.1) and
>> zfs-fuse (0.7.0), also have running the fuse kernel module.
>> However, zfs-fuse crashes if we try to apply the mentioned
>> options. Even applying directly those parameters
>> on the source code and recompiling, we get same result.
>> Any ideas?
>> Thank you very much in advance.
>> --
> To post to this group, send email to zfs-fuse@googlegroups.com
> To visit our Web site, click on http://zfs-fuse.net/