Hi golang-nuts,
I see that syscall.Dup2 uses Syscall instead of RawSyscall, which I understand
to mean that the runtime assumes it cannot block. (Please correct me if I'm
wrong.)
If I understand correctly, this assumption is correct for "real" file systems,
but it does not appear to be true for fuse. The following call chain causes
fuse to send a request to userspace and wait for the response:
I discovered this while writing tests for a fuse package. The test runs a fuse
server on one goroutine, and makes file system requests on another. When it
calls syscall.Dup2, it deadlocks very hard, with kernel stacks as pasted below.
I believe what's happening is that the goroutine calling syscall.Dup2 is
holding some lock that a goroutine calling syscall.Read on /dev/fuse is waiting
for. (The latter would otherwise server the fuse flush request.) Indeed if I
call via syscall.RawSyscall, the problem goes away.
(If you want to play, I can reliably reproduce this with "go test -v
commit ad2317642f1ba7a61263718bde6d83b6a0843e1c in that repo.)
Does it make sense to make Dup2 call via Syscall rather than RawSyscall? I
don't think this problem is due to a buggy fuse server implementation but
rather may be present in some form for any fuse file system, so I guess I'm
asking what level of support for fuse's weirdness I should expect from Go.
Thanks,
Aaron
===================================================
flushfs.test(19419)─┬─{flushfs.test}(19420)
├─{flushfs.test}(19421)
├─{flushfs.test}(19423)
├─{flushfs.test}(19426)
└─{flushfs.test}(19427)
[<0000000000000000>] futex_wait_queue_me+0xdd/0x140
[<0000000000000000>] futex_wait+0x182/0x290
[<0000000000000000>] do_futex+0xde/0x760
[<0000000000000000>] SyS_futex+0x71/0x150
[<0000000000000000>] system_call_fastpath+0x1a/0x1f
[<0000000000000000>] 0xffffffffffffffff
[<0000000000000000>] poll_schedule_timeout+0x49/0x70
[<0000000000000000>] do_select+0x5b6/0x780
[<0000000000000000>] core_sys_select+0x1cc/0x2e0
[<0000000000000000>] SyS_select+0xab/0x100
[<0000000000000000>] system_call_fastpath+0x1a/0x1f
[<0000000000000000>] 0xffffffffffffffff
[<0000000000000000>] wait_answer_interruptible+0x6a/0xa0
[<0000000000000000>] __fuse_request_send+0x1fb/0x280
[<0000000000000000>] fuse_request_send+0x12/0x20
[<0000000000000000>] fuse_flush+0xd7/0x120
[<0000000000000000>] filp_close+0x2f/0x70
[<0000000000000000>] do_dup2+0x68/0xb0
[<0000000000000000>] SyS_dup2+0xa6/0x140
[<0000000000000000>] system_call_fastpath+0x1a/0x1f
[<0000000000000000>] 0xffffffffffffffff
[<0000000000000000>] ep_poll+0x262/0x340
[<0000000000000000>] SyS_epoll_wait+0xd5/0x100
[<0000000000000000>] system_call_fastpath+0x1a/0x1f
[<0000000000000000>] 0xffffffffffffffff
[<0000000000000000>] futex_wait_queue_me+0xdd/0x140
[<0000000000000000>] futex_wait+0x182/0x290
[<0000000000000000>] do_futex+0xde/0x760
[<0000000000000000>] SyS_futex+0x71/0x150
[<0000000000000000>] system_call_fastpath+0x1a/0x1f
[<0000000000000000>] 0xffffffffffffffff
[<0000000000000000>] futex_wait_queue_me+0xdd/0x140
[<0000000000000000>] futex_wait+0x182/0x290
[<0000000000000000>] do_futex+0xde/0x760
[<0000000000000000>] SyS_futex+0x71/0x150
[<0000000000000000>] system_call_fastpath+0x1a/0x1f
[<0000000000000000>] 0xffffffffffffffff