segfault at 70 in libpthread-2.12.1.so

1,517 views
Skip to first unread message

BWN

unread,
Nov 11, 2010, 1:20:39 PM11/11/10
to zfs-fuse
This happened once before. Any ideas?


Nov 11 13:02:24 hq2 kernel: [166895.295117] zfs-fuse[1959]: segfault
at 70 ip 00000035bf409160 sp 00002b0021815bd8 error 4 in
libpthread-2.12.1.so[35bf400000+17000]
Nov 11 13:02:43 hq2 abrt[17778]: saved core dump of pid 1913 (/usr/bin/
zfs-fuse) to /var/spool/abrt/ccpp-1289498545-1913.new/coredump
(782307328 bytes)
Nov 11 13:02:43 hq2 abrtd: Directory 'ccpp-1289498545-1913' creation
detected
Nov 11 13:02:43 hq2 abrtd: Size of '/var/spool/abrt' >= 1000 MB,
deleting 'ccpp-1289328410-1965'
Nov 11 13:02:43 hq2 abrt[17778]: size of '/var/spool/abrt' >= 1250 MB,
deleting 'ccpp-1289328410-1965'
Nov 11 13:02:43 hq2 abrtd: Lock file '/var/spool/abrt/
ccpp-1289328410-1965.lock' is locked by process 17778
Nov 11 13:02:46 hq2 abrtd: New crash /var/spool/abrt/
ccpp-1289498545-1913, processing
Nov 11 13:02:46 hq2 abrtd: Registered Action plugin 'RunApp'
Nov 11 13:02:46 hq2 abrtd: RunApp('/var/spool/abrt/
ccpp-1289498545-1913','test x"`cat component`" = x"xorg-x11-server-
Xorg" && cp /var/log/Xorg.0.log .')


]# rpm -qf /lib64/libpthread-2.12.1.so
glibc-2.12.1-2.x86_64

Emmanuel Anne

unread,
Nov 11, 2010, 2:50:23 PM11/11/10
to zfs-...@googlegroups.com
Not even a backtrace of a debug build ?
Nor the exact version being run, its uptime, the context in which it happened ?
It's going to be hard to say then !

All I can say is that I haven't seen any segfault for a very very long time now (in months). And is it a kernel with special security features ?

2010/11/11 BWN <goo...@brianneu.com>

--
To post to this group, send email to zfs-...@googlegroups.com
To visit our Web site, click on http://zfs-fuse.net/



--
zfs-fuse git repository : http://rainemu.swishparty.co.uk/cgi-bin/gitweb.cgi?p=zfs;a=summary

BWN

unread,
Nov 11, 2010, 3:36:40 PM11/11/10
to zfs-fuse
You're going to have to help with the "backtrace of a debug build"
request.

zfs-fuse-0.6.9-6.20100709git.fc13.x86_64

Context is running qemu-img convert on a 73G VMWare disk image to
generate a qcow2 qemu/kvm image . Both the read file and the output
file were on the 6-disk raid-z pool.

zfs-fuse is taking up a rather large chunk of RAM:
1910 root 20 0 5381m 401m 1640 S 0.0 5.0 1:06.74 zfs-fuse
> > To visit our Web site, click onhttp://zfs-fuse.net/

Emmanuel Anne

unread,
Nov 12, 2010, 3:28:47 AM11/12/10
to zfs-...@googlegroups.com
2010/11/11 BWN <goo...@brianneu.com>

You're going to have to help with the "backtrace of a debug build"
request.

download source
Install dependancies (libfuse-dev, libaio1-dev, scons).
scons debug=2 (-j4 if you have 4 cores)
scons install

Just make sure your system uses the newly installed binaries which are in /usr/local/sbin, when the rpm installs in /sbin. (replace the rpm binaries by symlinks which point to their new versions in /usr/local/sbin).
Then run zfs-fuse with ulimit -c unlimited in its startup script to be sure it will generate a core file in case of problem (there must be a way to tell it where to put the core file, but I forgot how, anyway usually it's in / in this case).
Once you have reproduced the crash and you have a core file, I'll tell you what to do with it !

zfs-fuse-0.6.9-6.20100709git.fc13.x86_64

Context is running qemu-img convert on a 73G VMWare disk image to
generate a qcow2 qemu/kvm image .  Both the read file and the output
file were on the 6-disk raid-z pool.

A single operation probably didn't produce the crash, but if it's so easy to reproduce you'll have your core file easily then ! The more likely reason here is something like error while creating a thread (but not sure there is a single case where a thread creation error produces a crash - usually the program just exits).
This kind of thing could happen after a very long uptime if something limits the number of threads, like a security enhanced kernel.

zfs-fuse is taking up a rather large chunk of RAM:
1910 root      20   0 5381m 401m 1640 S  0.0  5.0   1:06.74 zfs-fuse

The 1st number is not ram actually used, but virtual ram.
If you find the 2nd number (401m) too high, then use max-arc-size = 100 in your /etc/zfs/zfsrc (but maybe usage can become higher with a 6 disk raid, not what I use...).

Anyway I'd say the cause is very likely something on your system which helps zfs-fuse from running normally rather than zfs-fuse itself because it's been a very very long time since we have heard about a real crash. But anyway the backtrace could be useful in all cases.
To visit our Web site, click on http://zfs-fuse.net/

BWN

unread,
Nov 12, 2010, 7:53:13 PM11/12/10
to zfs-fuse
Thanks for your help with this Emmanuel.


>Then run zfs-fuse with ulimit -c unlimited in its startup script to be sure

This is the only part where I'm lost. Line 84 & 85 of /etc/init.d/zfs-
fuse is

echo -n $"Starting $prog: "
daemon $exec -p "$PIDFILE"

So what should line 85 be?


I've already downloaded a SRPM and edited the .spec file per your
compile recommendations. I decided to just use an RPM because it
seemed easier to clean up.

BWN

unread,
Nov 12, 2010, 10:45:11 PM11/12/10
to zfs-fuse
Ignore that last message, I figured out the ulimit line.

I'll let you know when I catch something.

BWN

unread,
Nov 14, 2010, 11:27:13 AM11/14/10
to zfs-fuse
OK, now ABRT is deleting my core dumps because the RPM's I compiled
weren't "signed" . I've disabled that setting now and increased the
core dump size to 50G.

I also wanted to note that while I think you're right that the problem
isn't zfs-fuse code, because I just had libvirt dump with the same
error message, others are getting this:
http://www.art.ubuntuforums.org/showthread.php?p=10042597

I just wish I knew the answer.

BWN

unread,
Nov 15, 2010, 7:34:31 AM11/15/10
to zfs-fuse
After working with abrt a little more, I have a 192MB file generated
by abrt of the crash (tar.gz). Uncompressed it's 343MB.

I'd love to post the appropriate snippets to the list, but I don't
know what's relevant. I have a 2Mbps upstream, so Emmanuel, I've sent
you an email with a link to the archive. If that doesn't work I can
create a temporary shell for someone.

Unfortunately, the crash file resides on the problematic filesystem,
so let me know if it crashes and I'll reboot.

Thank you!

BWN

unread,
Nov 15, 2010, 12:13:39 PM11/15/10
to zfs-fuse
I wanted to see if we could bring this discussion back into the public
eye.

Emmanuel wrote:
About the backtrace : the crash is in the libc when waiting for a
thread to wake up.
This is a very simple function normally, just a conditional variable
which becomes true when the thread wakes up, it doesn't crash programs
normally.
Now the question is : why does it crash for you ?

After looking a bit with google, they say it could be a stack
overflow, that's all.
So : do you use the stack-size parameter in /etc/zfs/zfsrc ?
I use stack-size = 32 and never get any crash with this, but maybe
it's worth commenting it out to have the huge stacks of 8Mb instead,
that's the default normally.


I wrote back:
:(

The only change that I made to zfsrc is max-arc-size = 256 & 'noatime'
to the mount options.

/etc/zfs/zfsrc:
vdev-cache-size = 10
max-arc-size = 256
zfs-prefetch-disable
fuse-attr-timeout = 3600
fuse-entry-timeout = 3600
fuse-mount-options = default_permissions,noatime

BWN

unread,
Nov 15, 2010, 12:39:07 PM11/15/10
to zfs-fuse
Is there any chance that this is actually a glibc problem? Keep in
mind here, I'm babbling out of complete ignorance here, but is there a
chance that anything in the changelog listed here addresses my
problem? http://koji.fedoraproject.org/koji/buildinfo?buildID=204705

Again, speaking out of ignorance though.

Emannuel, I have no idea what part of that backtrace led you to the
libc conclusion because again, I have no idea what I'm doing here.
Could you paste something that might help me?

I will make the ulimit = 8192 change for you and run again.

Emmanuel Anne

unread,
Nov 15, 2010, 12:51:50 PM11/15/10
to zfs-...@googlegroups.com
Don't know, I guess it's a theoric possibility, we still have glibc-2.11.2 in debian testing, so I can't really say.
Although I can't remember of such a big bug in glibc, ever.

2010/11/15 BWN <goo...@brianneu.com>
--
To post to this group, send email to zfs-...@googlegroups.com
To visit our Web site, click on http://zfs-fuse.net/

Marcin Mirosław

unread,
Nov 15, 2010, 1:01:44 PM11/15/10
to zfs-...@googlegroups.com
W dniu 2010-11-15 18:51, Emmanuel Anne pisze:

> Don't know, I guess it's a theoric possibility, we still have
> glibc-2.11.2 in debian testing, so I can't really say.
> Although I can't remember of such a big bug in glibc, ever.

I'm using glibc-2.11.2 and glibc-2.12.1 on Gentoo, i didn't had similiar
issues. But, this is not debian.
Regards

sgheeren

unread,
Nov 15, 2010, 1:15:33 PM11/15/10
to zfs-...@googlegroups.com
On 11/15/2010 06:13 PM, BWN wrote:
I wanted to see if we could bring this discussion back into the public
eye.

Emmanuel wrote:
About the backtrace : the crash is in the libc when waiting for a
thread to wake up.
This is a very simple function normally, just a conditional variable
which becomes true when the thread wakes up, it doesn't crash programs
normally.
Now the question is : why does it crash for you ?
  

Who sais it does

A signal (including segfaults, e.g. but probably any SIGTERM that results from calling abort() as well etc.) will _always_ interupt sleeps and thread waits.
E.g. man sleep(3):
RETURN VALUE
       Zero  if  the  requested  time has elapsed, or the number of seconds left to sleep, if the call was interrupted by a
       signal handler.

Still my $0.02
I'm guessing the stack location is from a 'random' thread really, and in that case it is gigantic no-brainer that of course the odds are pretty high that a thread will be somewhere in it's idle phase especially with such thread-heavy designs as zfs-fuse.

I'm not saying I can prove this to be the case, but I haven't seen proof to the contrary (that I recall) and it _does_ make sense

Emmanuel Anne

unread,
Nov 15, 2010, 3:08:42 PM11/15/10
to zfs-...@googlegroups.com
What you are saying is that it's not the backtrace of the crash itself, the crash is elsewhere then ?

2010/11/15 sgheeren <sghe...@hotmail.com>

--
To post to this group, send email to zfs-...@googlegroups.com
To visit our Web site, click on http://zfs-fuse.net/

Russ Gibson

unread,
Nov 15, 2010, 3:59:16 PM11/15/10
to zfs-...@googlegroups.com
On 11/15/2010 09:39 AM, BWN wrote:
> Is there any chance that this is actually a glibc problem? Keep in
> mind here, I'm babbling out of complete ignorance here, but is there a
> chance that anything in the changelog listed here addresses my
> problem? http://koji.fedoraproject.org/koji/buildinfo?buildID=204705
>
In the latest and greatest glibc, they modified the memcpy function in
such a way that it will now fail if the source and destination overlap
and the cpu is core 2 or later. While this was never officially
supported (that's the whole point of memmove...), in practice, a lot of
(non-security conscience) people wrote code making the assumption it'd
work, and since in reality it normally did, all was fine. Until now.

Fedora 14 uses a glibc with that change; Dunno about other distro's (I
use archlinux, and I've had no issues).


BWN

unread,
Nov 16, 2010, 12:51:53 AM11/16/10
to zfs-fuse
Russ, I did read this over the weekend where Linus and others filed
bugs in regard to choppy audio playback through Flash and such. It is
specific to a change in F14, and I'm still on F13.

I also reverted glibc, but I don't think it's going to do a bit of
good judging by the install times and the crashes unless I went back
to 2.12.1-1, but that doesn't feel like a winning move.
Oct 21 15:27:22 hq2 yum[3813]: Updated: glibc-2.12.1-2.x86_64
Nov 9 13:47:08 hq2 abrt[18825]: saved core dump of pid 1965 (/usr/bin/
zfs-fuse) to /var/spool/abrt/ccpp-1289328410-1965.new/coredump
(773505024 bytes)
Nov 11 13:02:43 hq2 abrt[17778]: saved core dump of pid 1913 (/usr/bin/
zfs-fuse) to /var/spool/abrt/ccpp-1289498545-1913.new/coredump
(782307328 bytes)
Nov 11 13:23:16 hq2 yum[5180]: Updated: glibc-2.12.1-4.x86_64
Nov 14 09:53:25 hq2 abrt[26883]: saved core dump of pid 1914 (/usr/bin/
zfs-fuse) to /var/spool/abrt/ccpp-1289746387-1914.new/coredump
(756883456 bytes)
Nov 14 21:50:39 hq2 abrt[6581]: saved core dump of pid 1938 (/usr/bin/
zfs-fuse) to /var/spool/abrt/ccpp-1289789425-1938.new/coredump
(666238976 bytes)

I'm going to take glibc to 2.12.90-x from koji and see what happens.
If that doesn't work, I think F14 is probably the way to go.

It may help to note that from what I remember the crash is brought on
by a very long write to a single file, like this hd cloning that I'm
trying to do from a physical drive to a virtual drive file located on
a zfs fs.

The last thing I'll throw out for now is that a bug this system
unearthed in btrfs had something to do with O_DIRECT and the bios
being too big for btrfs to handle. I honestly have no idea what Chris
Mason was talking about, or when he's going to really get around to
fixing it, but here is a link to our exchange:
http://www.mail-archive.com/linux...@vger.kernel.org/msg06705.html
. Is there any chance the bios issue is effecting zfs too?

I'm still a little confused at to whether or not I've supplied
sufficient crash data to Emmanuel and sgheeren. Was the bzip file
comprehensive enough or do you think there is more valuable crash data
on the system that would help?

I want to thank everyone again for stepping up and trying to help.

Emmanuel Anne

unread,
Nov 16, 2010, 2:28:28 AM11/16/10
to zfs-...@googlegroups.com
Where did you use the o_direct flag to do any copy on zfs ? It's not used by default, maybe databases use it, but I don't see any reason a virtualization system would, so there is probably no link between the 2.
How big the files ? More than 4 Gb I guess... I guess it can't be the only reason, with all the people who use it, someone else would have had the problem otherwise...

2010/11/16 BWN <goo...@brianneu.com>
--
To post to this group, send email to zfs-...@googlegroups.com
To visit our Web site, click on http://zfs-fuse.net/

Emmanuel Anne

unread,
Nov 16, 2010, 5:32:18 AM11/16/10
to zfs-...@googlegroups.com
I finally downloaded the 190 Mb tar.gz, with the 650 Mb core dump file inside !
Well it's not very usable as it is, you end up in thread 1, and info thread displays nothing useful :
(gdb) info thread 
  127 Thread 2093  0x0000003fe5e0b3bc in ?? ()
  126 Thread 1980  0x0000003fe5e0dfb4 in ?? ()
  125 Thread 1950  0x0000003fe5e0d6a0 in ?? ()
  124 Thread 1953  0x0000003fe5e0d6a0 in ?? ()
  123 Thread 2137  0x0000003fe5e0b3bc in ?? ()
  122 Thread 2081  0x00000030e3800614 in ?? ()
  121 Thread 2154  0x0000003fe5e0b3bc in ?? ()
  120 Thread 2134  0x0000003fe5e0b3bc in ?? ()
  119 Thread 1954  0x0000003fe5e0d6a0 in ?? ()
...

But the stacks seem in good shape since using back finally returns to known code.
There are 127 threads to check without any real info about what to look for, it's quite crazy.

The ideal solution would be that you run directly zfs-fuse in gdb :
stop the zfs-fuse daemon
cd source directory (zfs-fuse/src normally)
scons debug=2
gdb zfs-fuse/zfs-fuse
run -n

Then in another window
zfs mount -a
to mount your usual volumes
and then try to recreate the crash once more.
This time gdb will pick the right thread with the right info.
Return to the gdb window, type
bt
to get the backtrace
and post it here.

Also I need to be sure about the exact version you are using, I guess it's 0.6.9, right ?

Sorry for all the inconvinience, but it's not easy to handle, and all this is quite unusual.
By the way, I tried to copy a big file this way, just to be sure :
dd if=/dev/sda6 of=mac bs=1G count=30
it ended up copying 21G only, and not 30 ? Maybe there is a limit in the size you can pass with bs ? Anyway it worked, 29 Mb/s for the copy.

2010/11/16 Emmanuel Anne <emmanu...@gmail.com>

BWN

unread,
Nov 17, 2010, 8:47:25 AM11/17/10
to zfs-fuse
this may be helpful, or may just be redundant noise

I had another crash @ 4am. If I run

[]# gdb /usr/bin/zfs-fuse /var/spool/abrt/ccpp-1289985171-29878/
coredump

where "coredump" is the 669MB file, then the tail end of the output is
this:

Core was generated by `/usr/bin/zfs-fuse -p /var/run/zfs-fuse.pid'.
Program terminated with signal 6, Aborted.
#0 0x0000003be3e329a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/
linux/raise.c:64
64 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);

I'm still running these versions (the zfs-fuse I built per instruction
from Emmanuel):
zfs-fuse-0.6.9-8.20100709gitDebug2.fc13.x86_64
glibc-2.12.1-2.x86_64
> 2010/11/16 Emmanuel Anne <emmanuel.a...@gmail.com>
> >> http://www.mail-archive.com/linux-bt...@vger.kernel.org/msg06705.html
> >> .  Is there any chance the bios issue is effecting zfs too?
>
> >> I'm still a little confused at to whether or not I've supplied
> >> sufficient crash data to Emmanuel and sgheeren.  Was the bzip file
> >> comprehensive enough or do you think there is more valuable crash data
> >> on the system that would help?
>
> >> I want to thank everyone again for stepping up and trying to help.
>
> >> On Nov 15, 3:59 pm, Russ Gibson <philome...@gmail.com> wrote:
> >> > On 11/15/2010 09:39 AM, BWN wrote:> Is there any chance that this is
> >> actually a glibc problem?  Keep in
> >> > > mind here, I'm babbling out of complete ignorance here, but is there a
> >> > > chance that anything in the changelog listed here addresses my
> >> > > problem?  http://koji.fedoraproject.org/koji/buildinfo?buildID=204705
>
> >> > In the latest and greatest glibc, they modified the memcpy function in
> >> > such a way that it will now fail if the source and destination overlap
> >> > and the cpu is core 2 or later.  While this was never officially
> >> > supported (that's the whole point of memmove...), in practice, a lot of
> >> > (non-security conscience) people wrote code making the assumption it'd
> >> > work, and since in reality it normally did, all was fine.  Until now.
>
> >> > Fedora 14 uses a glibc with that change;  Dunno about other distro's (I
> >> > use archlinux, and I've had no issues).
>
> >> --
> >> To post to this group, send email to zfs-...@googlegroups.com
> >> To visit our Web site, click onhttp://zfs-fuse.net/

Emmanuel Anne

unread,
Nov 17, 2010, 9:41:07 AM11/17/10
to zfs-...@googlegroups.com
You installed a glibc with debug symbols inside ?
Anyway the info you have here is the very last call to raise, in glibc.
If you still have the core file, try to type bt in gdb to see which calls have led there...

2010/11/17 BWN <goo...@brianneu.com>
To visit our Web site, click on http://zfs-fuse.net/

BWN

unread,
Nov 17, 2010, 9:56:17 AM11/17/10
to zfs-fuse
On Nov 17, 9:41 am, Emmanuel Anne <emmanuel.a...@gmail.com> wrote:
> You installed a glibc with debug symbols inside ?
huh? :)
]# rpm -qa |grep glibc
glibc-headers-2.12.1-2.x86_64
glibc-common-2.12.1-2.x86_64
glibc-debuginfo-2.12.1-2.x86_64
glibc-devel-2.12.1-2.x86_64
glibc-2.12.1-2.x86_64


> Anyway the info you have here is the very last call to raise, in glibc.
> If you still have the core file, try to type bt in gdb to see which calls
> have led there...
(gdb) bt
#0 0x0000003be3e329a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/
linux/raise.c:64
#1 0x0000003be3e34185 in abort () at abort.c:92
#2 0x0000003be3e2b935 in __assert_fail (assertion=0x521cad "zp-
>z_dbuf != ((void *)0)", file=<value optimized out>, line=606,
function=<value optimized out>) at assert.c:81
#3 0x00000000004bfd38 in zfs_znode_dmu_fini (zp=0x2b65efd59660) at
lib/libzpool/zfs_znode.c:606
#4 0x000000000040dd88 in zfs_rmnode (zp=0x2b65efd59660) at zfs-fuse/
zfs_dir.c:643
#5 0x00000000004c1f9e in zfs_zinactive (zp=0x2b65efd59660) at lib/
libzpool/zfs_znode.c:1103
#6 0x00000000004275c7 in zfs_inactive (vp=0x2b660cae21a0,
cr=0x7c1650, ct=0x0) at zfs-fuse/zfs_vnops.c:4035
#7 0x00000000005007b9 in fop_inactive (vp=0x2b660cae21a0,
cr=0x7c1650, ct=0x0) at lib/libsolkerncompat/vnode.c:911
#8 0x00000000004fffa7 in vn_rele (vp=0x2b660cae21a0) at lib/
libsolkerncompat/vnode.c:653
#9 0x0000000000429c16 in zfsfuse_getattr (req=0x2b66b80008c0,
ino=407647, fi=0x0) at zfs-fuse/zfs_operations.c:185
#10 0x0000000000429ca8 in zfsfuse_getattr_helper (req=0x2b66b80008c0,
ino=407647, fi=0x0) at zfs-fuse/zfs_operations.c:198
#11 0x0000003be4e15222 in do_getattr (req=<value optimized out>,
nodeid=<value optimized out>, inarg=<value optimized out>) at
fuse_lowlevel.c:550
#12 0x0000000000428753 in zfsfuse_listener_loop (arg=0x0) at zfs-fuse/
fuse_listener.c:278
#13 0x0000003be4207761 in start_thread (arg=0x2b65ee5e0710) at
pthread_create.c:301
#14 0x0000003be3ee151d in clone () at ../sysdeps/unix/sysv/linux/
x86_64/clone.S:115

BWN

unread,
Nov 17, 2010, 12:40:06 PM11/17/10
to zfs-fuse
Hey, now that I know what to do, here's another from Nov 11, which
involves postgresql:

(gdb) bt
#0 0x0000003fe5a329a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/
linux/raise.c:64
#1 0x0000003fe5a34185 in abort () at abort.c:92
#2 0x000000000040deea in zfs_rmnode (zp=0x2b8a4dc3a8a0) at zfs-fuse/
zfs_dir.c:658
#3 0x00000000004c1f9e in zfs_zinactive (zp=0x2b8a4dc3a8a0) at lib/
libzpool/zfs_znode.c:1103
#4 0x00000000004275c7 in zfs_inactive (vp=0x2b8a4e413330,
cr=0x7c1650, ct=0x0) at zfs-fuse/zfs_vnops.c:4035
#5 0x00000000005007b9 in fop_inactive (vp=0x2b8a4e413330,
cr=0x7c1650, ct=0x0) at lib/libsolkerncompat/vnode.c:911
#6 0x00000000004fffa7 in vn_rele (vp=0x2b8a4e413330) at lib/
libsolkerncompat/vnode.c:653
#7 0x0000000000426465 in zfs_rename (sdvp=0x2b8a4e4134c0,
snm=0x2b8a03a9f040 "pgstat.tmp", tdvp=0x2b8a4e4134c0,
tnm=0x2b8a03a9f04b "pgstat.stat", cr=0x2b89fe6ebd40, ct=0x0,
flags=0) at zfs-fuse/zfs_vnops.c:3400
#8 0x0000000000500d41 in fop_rename (sdvp=0x2b8a4e4134c0,
snm=0x2b8a03a9f040 "pgstat.tmp", tdvp=0x2b8a4e4134c0,
tnm=0x2b8a03a9f04b "pgstat.stat", cr=0x2b89fe6ebd40, ct=0x0,
flags=0) at lib/libsolkerncompat/vnode.c:1118
#9 0x000000000042df40 in zfsfuse_rename (req=0x2b8a34000a30,
parent=368677, name=0x2b8a03a9f040 "pgstat.tmp", newparent=368677,
newname=0x2b8a03a9f04b "pgstat.stat")
at zfs-fuse/zfs_operations.c:1637
#10 0x000000000042dfea in zfsfuse_rename_helper (req=0x2b8a34000a30,
parent=368677, name=0x2b8a03a9f040 "pgstat.tmp", newparent=368677,
newname=0x2b8a03a9f04b "pgstat.stat")
at zfs-fuse/zfs_operations.c:1652
#11 0x0000000000428753 in zfsfuse_listener_loop (arg=0x0) at zfs-fuse/
fuse_listener.c:278
#12 0x0000003fe5e07761 in start_thread (arg=0x2b89fe6ec710) at
pthread_create.c:301
#13 0x0000003fe5ae14fd in clone () at ../sysdeps/unix/sysv/linux/
x86_64/clone.S:103


There was a setting that I had to change to get postgres to work under
zfs-fuse called "wal_sync_method". I was told that fsync was the
"lowest common denominator". Until I set it to fsync, it wouldn't
even start up when on zfs-fuse. It was previously commented out.

from /var/lib/pgsql/data/postgresql.conf :
wal_sync_method = fsync # the default is the first option
# supported by the operating
system:
# open_datasync
# fdatasync
# fsync
# fsync_writethrough
# open_sync

Seth

unread,
Nov 17, 2010, 3:07:31 PM11/17/10
to zfs-...@googlegroups.com
Op 17-11-2010 18:40, BWN schreef:

> Hey, now that I know what to do, here's another from Nov 11, which
> involves postgresql:
>
> (gdb) bt
> #0 0x0000003fe5a329a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/
> linux/raise.c:64
> #1 0x0000003fe5a34185 in abort () at abort.c:92
> #2 0x000000000040deea in zfs_rmnode (zp=0x2b8a4dc3a8a0) at zfs-fuse/
> zfs_dir.c:658
> #3 0x00000000004c1f9e in zfs_zinactive (zp=0x2b8a4dc3a8a0) at lib/
> libzpool/zfs_znode.c:1103
Have look at issues #108 and #29

There is definitely a bug lurking in unlinked files - but it appears
hard for some people to reproduce, and easier for others. If you can
find a way to reproduce it and supply stack traces/steps I'd love to get
a handle on this one.

Seth

PS. Meanwhile consider reporting a bug/adding to #108?

Emmanuel Anne

unread,
Nov 17, 2010, 4:26:38 PM11/17/10
to zfs-...@googlegroups.com
Ok, congratulations, you found a new bug, which is quite rare these days (yeah I am sure you would have prefered not to find 1 !).

The good news is that this part of the code has been updated for solaris a long time ago, and it's been in my branch for ages (you know, the one noone uses these days !). More precisely : this assertion is gone, replaced by another one, but the part which triggered your crash is nowhere to be seen, which is a good sign.

So... could you try to use my branch instead of your version ?
You can find it there :

If you don't know how to use git, just download the tar.gz from the 1st line of this page (this is the current master).
Then compile it as usual with debug enabled (scons debug=2)
sudo scons install
and test again.

It it works, it would be quite ironic, I have been using this for months now...
By the way, avoid to run zpool upgrade -a with this version or you would not be able to use your old rpm after this.

2010/11/17 Seth <sghe...@hotmail.com>
--
To post to this group, send email to zfs-...@googlegroups.com
To visit our Web site, click on http://zfs-fuse.net/

sgheeren

unread,
Nov 17, 2010, 5:58:55 PM11/17/10
to zfs-...@googlegroups.com
On 11/17/2010 10:26 PM, Emmanuel Anne wrote:
Ok, congratulations, you found a new bug, which is quite rare these days (yeah I am sure you would have prefered not to find 1 !).

That would make some 4 or five persons that 'found' it all together over the last 4 months. It's in the tracker, you know, and #29 is about 80 issues back in history, just for the record.
The good news is that this part of the code has been updated for solaris a long time ago, and it's been in my branch for ages (you know, the one noone uses these days !).

Huh. Please do not gloss over the fact that your repository is prominently mirrored on gitweb.zfs-fuse.net every 10 minutes, and the unstable branch is effectively your master branch. So unstable should be fine, even with your irony.
I don't actually know who exactly uses the unstable branch a.t.m., but I think you are overreacting you state that your work is not being 'used': don't trust your apache2 logs on it. [1]



It it works, it would be quite ironic, I have been using this for months now...
I'm afraid the irony could well be that you kind of lost track with what's happening on zfs-fuse.net. That's no real problem, but it doesn't really feel nice when you appear dismiss all that with a (?) sarcastic undertone. I know, it's all about the usual communication, but as always, that is a two-way-street, and no need to park your lorry sideways in it :)


By the way, avoid to run zpool upgrade -a with this version or you would not be able to use your old rpm after this.
Good advice :)

Seth

[1] If you want, I could gather up some stats from gitweb.zfs-fuse.nl; that'll be a bit of work since I don't know whether git-daemon keeps useful logs by default, and where

sgheeren

unread,
Nov 17, 2010, 6:23:42 PM11/17/10
to zfs-...@googlegroups.com
On 11/17/2010 11:58 PM, sgheeren wrote:
On 11/17/2010 10:26 PM, Emmanuel Anne wrote:
Ok, congratulations, you found a new bug, which is quite rare these days (yeah I am sure you would have prefered not to find 1 !).

That would make some 4 or five persons that 'found' it all together over the last 4 months. It's in the tracker, you know, and #29 is about 80 issues back in history, just for the record.
Having said that, I might include #43 in the picture, and I had another look at issue #43 's fix, which is in all 0.6.9 branches including your very own sun and master branches. [1]

What that fix also changed was that instead of crashing, VN_RELE would be called on the (apparently contextually invalid) inode. For pending node removals, that would result in zfs rmnode being called. Now, in retrospect I'm not so sure that that was completely safe. I say, we try to 'err on the safe side' with the following approach:

 647     error = VOP_CLOSE(info->vp, info->flags, 1, (offset_t) 0, &cred, NULL);
 648     if (error)
 649     {
 650         syslog(LOG_WARNING, "zfsfuse_release: stale inode (%s)?", strerror(error));
 651     } else
 652     {
 653         VN_RELE(info->vp);
 654         kmem_cache_free(file_info_cache, info);
 655     }

I applied it to testing _and_ unstable, because I'd like for people to stay as close to their running versions as possible so we can have a proper A-B test for this patch.
Could you (BWN) test whether this removes the instability? I will cross post at issue #108 for Jan to evaluate.

Cheers,
Seth

[1] observe
git annotate -L/zfsfuse_release/,+25 "$BRANCHNAME" src/zfs-fuse/zfs_operations.c

sgheeren

unread,
Nov 17, 2010, 6:32:45 PM11/17/10
to zfs-...@googlegroups.com
On 11/18/2010 12:23 AM, sgheeren wrote:
> I will cross post at issue #108 for Jan to evaluate.
Hi Dustin, I had you confused for Jan, sorry

Emmanuel Anne

unread,
Nov 17, 2010, 7:24:05 PM11/17/10
to zfs-...@googlegroups.com
Yes yes, I included this 43 fix in a batch of patches I merged, maybe I should have looked twice, but I might have made the mistake too.

Anyway, please stop calling my branch "unstable".
Reminder, it's stoppped at the onnv tags releases, + some other fixes merged, so it's relatively stable (relatively because the onnv releases are not 100% bug free, but almost nothing is anyway, + there is always the possibility of a mistake while merging their patches, whcih happened only once so far, so it's quite rare). The effect lately is that nobody used it, which is probably a mistake.

2010/11/18 sgheeren <sghe...@hotmail.com>
On 11/18/2010 12:23 AM, sgheeren wrote:
> I will cross post at issue #108 for Jan to evaluate.
Hi Dustin, I had you confused for Jan, sorry
--
To post to this group, send email to zfs-...@googlegroups.com
To visit our Web site, click on http://zfs-fuse.net/

Brian Neu

unread,
Nov 17, 2010, 9:52:12 PM11/17/10
to zfs-...@googlegroups.com
I'd like to take a moment to mention how much I appreciate the work that everyone has done to make zfs-fuse a functional reality.  I can't speak for all users, but I'd like to invoke the immortal words of the great Vice President Joe Biden when I say, "This is a big fucking deal."  Seriously, every developer should be very proud of this because it is so very much needed.

Also, thanks for sticking with me with what has probably been the most painfully frustrating thread ever.  I've learned a lot working with the zfs-fuse team.

I compiled RPMs for "testing" and "unstable".  I'm going to try out the testing branch first.  I'm going to start qemu-kvm conversion now which has consistently brought about the crash.  It takes an insane amount of time to complete, so please be patient.

Emmanuel, Seth I thank you both, and if the opportunity presents itself, I recommend a 3-legged pub crawl until you both pass out snuggled up to each other (builds team spirit and makes for great photos).



sgheeren

unread,
Nov 18, 2010, 4:17:58 AM11/18/10
to zfs-...@googlegroups.com
On 11/18/2010 01:24 AM, Emmanuel Anne wrote:
Yes yes, I included this 43 fix in a batch of patches I merged, maybe I should have looked twice, but I might have made the mistake too.

Anyway, please stop calling my branch "unstable".
Emmanuel

I'm very happy to see you helping out with a few of the issues. Thanks for that[0].

Still, I don't think the discussion on whether your branch is superior and should not have been forgotten is relevant: it isn't forgotten anyway[1]

Right now it might seem as though you are in denial, unpleasantly surprised to find that there are problems and trying to convince yourself that the problems don't exist with your version. I suggest you spend some time with the tracker to properly dispel that dream :)

I'd like _address issues_ here. Let's avoid fuss/confusion.
So let me deflate two points:

1. this is (most likely) NOT a new bug (see issues #29, #43, #108 and a few possibly related ones)
2. people have been testing with your branch ('unstable', for lack of a better name) _especially_ the people reporting problems on the tracker

From the moment that the stack trace showed zfs_rmnode I think it is time to make a switch from 'search' mode to 'destroy' mode. Once we know where the bugs reside, let's smoke-em out.

==== on branches and names

I think I made it clear what the branch naming was all about and how it was going change[2].
Here is the basic logic: http://zfs-fuse.net/documentation/building/what-branch[4]
In fact, most of my modest contribution to zfs-fuse has been focusing on this: trying to get a clearer picture for users/public zfs-fuse [3].

So if you want to help get your branch more testing, please continue to help me get 0.7.0 out the door!

I know you do the heavy lifting code-wise. I'm perfectly happy that you focus on that and don't have time to mess with releasing, bug tracking and all that *crap* (:))
But at the end of the day there is work to do to get a release properly done. We need to fight the bugs, not the idea that bugs even exist.

Cheers
Seth

[0] I rather liked listening in on how other people treat an issue like this. Some topics (like glibc) weren't on my radar - so it is good that you brought them to my attention
[1] <rant>
It is also not helping. It is not helping the users get their problem resolved. It is not helping me getting 0.7.0 ready for release.
Once 0.7.0 is out of the door, 'unstable' will be 'testing' anyway, so I don't really see what the problem is.
If anything, this is not a mistake, but rather priorities. (We gotta have priorities, if you ask me.)

After 0.7.0 I'm sure we can adopt a better name for the 'now-unstable' branch.
</rant>
[2] I can't find the mail but I remember discussing it with you and drawing a diagram of it. I hope you got it, because I never got round to posting it on the site anyways
[3] What is zfs-fuse, where is zfs-fuse, what version is best, what else is going on. I made the site for that reason. I (try to) maintain the tracker for that reason. I fix simple items that I understand for that reason.
[4] this has been created in response to _users actually reporting being confused_ in the first place.

Andrea Gelmini

unread,
Nov 18, 2010, 5:52:51 AM11/18/10
to zfs-...@googlegroups.com
2010/11/18 Emmanuel Anne <emmanu...@gmail.com>:

> once so far, so it's quite rare). The effect lately is that nobody used it,
> which is probably a mistake.

Well, for what it's worth I'm using your branch.
I upgraded to version 26 without problems.
That's because:
a) I'm interested in faster removing snapshot;
b) I would like to help the porting (doing the boring things: testing,
stressing, and so on).

Well, I'm using zfs-fuse for my /home.
I backup often, and make compare of content every day.

Also, if anybody has clues about fs stress/testing suite, you're welcome!

I can only scream my big big big *thank you* to all people working on this prj.

Thanks again,
Andrea

Andrea Gelmini

unread,
Nov 18, 2010, 5:56:10 AM11/18/10
to zfs-...@googlegroups.com
2010/11/18 Brian Neu <brian...@gmail.com>:

> Emmanuel, Seth I thank you both, and if the opportunity presents itself, I

Well, the dynamic duo is working harder.
I join your appreciation.

Thanks to both all,
Andrea

Emmanuel Anne

unread,
Nov 18, 2010, 5:59:32 AM11/18/10
to zfs-...@googlegroups.com
2010/11/18 sgheeren <sghe...@hotmail.com>
On 11/18/2010 01:24 AM, Emmanuel Anne wrote:
Yes yes, I included this 43 fix in a batch of patches I merged, maybe I should have looked twice, but I might have made the mistake too.

Anyway, please stop calling my branch "unstable".
Emmanuel

I'm very happy to see you helping out with a few of the issues. Thanks for that[0].

Still, I don't think the discussion on whether your branch is superior and should not have been forgotten is relevant: it isn't forgotten anyway[1]

You miss the point, it's not about superior.
I know it's frustrating, the way zfs works is not usual, they don't stop new features and stabilize things for a long time as the majority of projects do, they always take patches, including fixes and new features. I guess sometimes they probably stop new features for a while, but it never lasts long. There are fixes which address bugs a few pool versions back, it happens all the time. So, the "stable" target is a moving target, and you can't really call it stable neither because all these new features always add problems too. 
That's the way it is, if you stop the train, you take risks not to fix a few bugs, it has already happened before, and I guess it could happen again...
I guess we could work it out if the hg commit messages were clearer, but it's not the case, so it's very easy to miss some important fixes.

Right now it might seem as though you are in denial, unpleasantly surprised to find that there are problems and trying to convince yourself that the problems don't exist with your version. I suggest you spend some time with the tracker to properly dispel that dream :)

Already talked about the tracker : too slow, too many unconfirmed reports, not dynamic enough (no automatic posting), some reports stay opened for way too long, etc... for me it's unusable, my approach is that if someone really has something important, he will eventually find the group and post here. We already talked about it, let's not repeat ourselves.


Emmanuel Anne

unread,
Nov 18, 2010, 6:01:28 AM11/18/10
to zfs-...@googlegroups.com
Yeah that's what I do too, use it for home, and backup it.
But I moved a few configurations directories out of home, .mozilla and .config, zfs-fuse is too bad with small io operations for now, and until we find a better solution, the easiest fix is to move these directories out of the way.

2010/11/18 Andrea Gelmini <andrea....@gmail.com>
--
To post to this group, send email to zfs-...@googlegroups.com
To visit our Web site, click on http://zfs-fuse.net/

Andrea Gelmini

unread,
Nov 18, 2010, 6:20:55 AM11/18/10
to zfs-...@googlegroups.com
2010/11/18 Emmanuel Anne <emmanu...@gmail.com>:

> But I moved a few configurations directories out of home, .mozilla and
> .config, zfs-fuse is too bad with small io operations for now, and until we
> find a better solution, the easiest fix is to move these directories out of
> the way.

Well, I like to keep everything in it because I snapshot every 5 minutes.
That's important for me, because I do a lot of hibernate/resume, and sometimes
I have problem with resume that corrupts config on my "live" apps.¹
Usually I have no problem with performance.
I go really slow only with dedup and/or gzip enabled.

Is there anyone else on the list using zfs-fuse in production?

Thanks a lot for your precious work,
Andrea

---------------------
¹ Things are a little bit more complicated, but I don't want to bother
you with details. That's not related to zfs-fuse.

Xavier M.

unread,
Nov 18, 2010, 6:28:11 AM11/18/10
to zfs-...@googlegroups.com
Hi Andrea
yes we use zfs-fuse in production right now ; we built a nas product  for videosecurity, but we don't have the same problems as yours because we don't use deduplication, and we store only videos which are quite large, which fits well with zfs-fuse capabilties.
Maybe if performances are not well enough, we'll try to look into nexenta core or zfsonlinux, butfor the moment it is good, even if the overall performances are not marvellous
regards
Xavier

2010/11/18 Andrea Gelmini <andrea....@gmail.com>

--

Andrea Gelmini

unread,
Nov 18, 2010, 6:41:26 AM11/18/10
to zfs-...@googlegroups.com
2010/11/18 Xavier M. <xavier...@gmail.com>:

Xavier,
thanks a lot for your quick reply.

> Maybe if performances are not well enough, we'll try to look into nexenta
> core or zfsonlinux, butfor the moment it is good, even if the overall
> performances are not marvellous

Well, of course native kernel porting of ZFS can be much more faster
than zfs-fuse. The problem (almos, my problem) is they usually require
64bit processor and a lot of RAM. Instead zfs-fuse can work on any
common hardware. Slower, but it works.
In my daily work at keyboard I need much more snapshot capabilities
and complete data checksum than anything else.

By the way, Emmanuel that's the reason I still didn't played/benchmarked
zfsonlinux... At the moment I don't have spare 64bit machine full of RAM
to play on.

So my urge is to expand the zfs-fuse community, because only a big enough
user base can trigger all the bugs and problems.

I'm also evaluating the idea to find a sponsor/financer for the work of
the dynamic duo. But that's is nothing more than a wish, right now.

Thanks again,
Andrea

Xavier M.

unread,
Nov 18, 2010, 7:24:43 AM11/18/10
to zfs-...@googlegroups.com
our current configuration is corei3 and 4GO memory for instance

2010/11/18 Andrea Gelmini <andrea....@gmail.com>

--

Seth

unread,
Nov 18, 2010, 6:45:38 PM11/18/10
to zfs-...@googlegroups.com
Op 18-11-2010 11:59, Emmanuel Anne schreef:

> I know it's frustrating, the way zfs works is not usual,
You mean the way Sun works. AFAIK we run a different project

> [...]


> That's the way it is, if you stop the train, you take risks not to fix
> a few bugs, it has already happened before, and I guess it could
> happen again..

Not a chance. Note how there aren't any changes coming out any more :)

I agree on your analysis of upstream development, I just don't know how
that means we absolutely need to copy that. If people want, they can run
your branch. Like I said, I'm all 'pro' a rename of that pre-testing
branch.
Note that unstable will be the new testing anyway, and barring other
sources of development, there is no need for a 'pre-test' branch
[whatever the name] for a while

Seth

Seth

unread,
Nov 18, 2010, 6:48:41 PM11/18/10
to zfs-...@googlegroups.com
Op 18-11-2010 11:59, Emmanuel Anne schreef:
>
> You miss the point, it's not about superior.
By the way, did you choose not address the issue at hand; I'm having
trouble figuring out whether you simply agree or haven�t read it.

Emmanuel Anne

unread,
Nov 19, 2010, 3:02:42 AM11/19/10
to zfs-...@googlegroups.com
Which issues ? The ones in the tracker about this very same problem ? Just wait for more tests from Brian for that.
For the rest I think anough has been said already.

Andrea : ok for your resources. I am just surprised nobody tried to benchmark this and report here, maybe I'll try that soon then.

2010/11/19 Seth <sghe...@hotmail.com>
Op 18-11-2010 11:59, Emmanuel Anne schreef:

You miss the point, it's not about superior.
By the way, did you choose not address the issue at hand; I'm having trouble figuring out whether you simply agree or haven´t read it.

--
To post to this group, send email to zfs-...@googlegroups.com
To visit our Web site, click on http://zfs-fuse.net/

Brian Neu

unread,
Nov 19, 2010, 12:21:04 PM11/19/10
to zfs-...@googlegroups.com

I have been using the testing build for over 36hours now with the same activity that used to crash it.  So far, no crashes!

BTW this is an 8-core Opteron @ 2Ghz, with a 6 way, 500G drive RAIDZ1.  Its no screamer, but I could certainly help test.

On Nov 19, 2010 3:02 AM, "Emmanuel Anne" <emmanu...@gmail.com> wrote:

sgheeren

unread,
Nov 20, 2010, 10:57:40 AM11/20/10
to zfs-...@googlegroups.com
On 11/19/2010 06:21 PM, Brian Neu wrote:
>
> I have been using the testing build for over 36hours now with the same
> activity that used to crash it. So far, no crashes!
>
I'd give it a week, fingers crossed.

I'm ready to label it 0.7.0 if Dustin Ward reports ok on #108 too

Reply all
Reply to author
Forward
0 new messages