rmdir deadlock?

191 views
Skip to first unread message

ivenhov

unread,
Dec 6, 2011, 8:41:40 AM12/6/11
to fus...@googlegroups.com
Hi everyone

First of all I would like to take this opportunity and say massive THANK YOU to all of you guys that work on Fuse4X and OSXFUSE
project and brought FUSE on Mac platform back to life. Anatol, Benjamin, Erik (and I'm sure others) you are doing great job which we all appreciate.

I just switched from MacFUSE to fuse4x implementation and so far everything worked great until
now when I hit some problem that may look like a deadlock or race condition on rmdir calls.
I haven't seen it before (using MacFUSE) so I suspect it's something new in fuse4x

I mount filesystem with options:
        "-f",
        "-odebug",
        "-odaemon_timeout=60",
        "-oiosize=65536",
        "-oallow_other",
        "-odefer_permissions",
        "-oping_diskarb",

Here's my test (Java)
It creates a directory and makes sure it cannot be deleted if there is a subdirectory or a file in it

String filename = "test_that_cannot_delete_non_empty";
File parent = getDstDir();
assertThat(parent).exists();
boolean res;

// create subdirectory
File subdir = new File(parent, "subdir");
subdir.mkdir();

// make sure we cannot delete if dir exists inside directory
res = parent.delete();
assertThat(res).isFalse();
assertThat(parent).exists();


// delete subdirectory, make sure parent still exist
res = subdir.delete();
assertThat(res).isTrue();
assertThat(subdir).doesNotExist();
assertThat(parent).exists();

// create file under a parent
File f = new File(parent, filename);
TestHelper.createFileWithZeroes(f.getAbsolutePath(), 0L);
assertThat(f).exists();

// make sure we cannot delete if file exists inside directory 
System.out.println("Try to delete dir " + parent + " with file in it"); 
// THIS IS WHERE IT CAN HANG INDEFINITELY
res = parent.delete();
assertThat(res).isFalse();
assertThat(parent).exists();

I need to run it several times and eventually filesystem hangs, always in the same place 
After hang in last delete() I cannot do anything. Even umount -f hangs with error:

umount: pthread_cond_timeout failed; continuing with unmount: Operation timed out


In my logs I can see that Java code comes returns to native code and hangs there.
Here's the output of fuse in debug mode, last couple of lines


getattr /test.fs.filesystem.DeleteTest/test_that_cannot_delete_non_empty
   unique: 1, success, outsize: 136
unique: 3, opcode: OPEN (14), nodeid: 5, insize: 48
open flags: 0x1 /test.fs.filesystem.DeleteTest/test_that_cannot_delete_non_empty 
   open[4479518656] flags: 0x1 /test.fs.filesystem.DeleteTest/test_that_cannot_delete_non_empty 
   unique: 3, success, outsize: 32
unique: 2, opcode: FLUSH (25), nodeid: 5, insize: 64
flush[4479518656]
lock[4479518656] F_SETLK F_UNLCK start: 0 len: 0 pid: 0
   unique: 2, success, outsize: 16
unique: 0, opcode: RELEASE (18), nodeid: 5, insize: 64
release[4479518656] flags: 0x1
   unique: 0, success, outsize: 16
unique: 1, opcode: LOOKUP (1), nodeid: 3, insize: 78
LOOKUP /test.fs.filesystem.DeleteTest/test_that_cannot_delete_non_empty 
getattr /test.fs.filesystem.DeleteTest/test_that_cannot_delete_non_empty 
   NODEID: 5
   unique: 1, success, outsize: 160
Try to delete dir /Volumes/UnitTest-Mac/test.fs.filesystem.DeleteTest with file in it
unique: 3, opcode: RMDIR (11), nodeid: 1, insize: 78
rmdir /test.fs.filesystem.DeleteTest
unique: 2, opcode: LOOKUP (1), nodeid: 1, insize: 78
LOOKUP /test.fs.filesystem.DeleteTest
getattr /test.fs.filesystem.DeleteTest
   NODEID: 3
   unique: 2, success, outsize: 160
unique: 0, opcode: GETATTR (3), nodeid: 1, insize: 56
getattr /
   unique: 3, error: -66 (Directory not empty), outsize: 16
unique: 1, opcode: OPENDIR (27), nodeid: 3, insize: 48
   unique: 0, success, outsize: 136
opendir flags: 0x0 /test.fs.filesystem.DeleteTest
   opendir[0] flags: 0x0 /test.fs.filesystem.DeleteTest
   unique: 1, success, outsize: 32
unique: 2, opcode: READDIR (28), nodeid: 3, insize: 80
readdir[0] from 0
   unique: 2, success, outsize: 144


Any help will be appreciated.

Daniel

Anatol Pomozov

unread,
Dec 6, 2011, 12:42:01 PM12/6/11
to fus...@googlegroups.com
Hi

On Tue, Dec 6, 2011 at 5:41 AM, ivenhov <iwan....@gmail.com> wrote:
Hi everyone

First of all I would like to take this opportunity and say massive THANK YOU to all of you guys that work on Fuse4X and OSXFUSE
project and brought FUSE on Mac platform back to life. Anatol, Benjamin, Erik (and I'm sure others) you are doing great job which we all appreciate.

Sure, you are very welcome.
A few quick questions:
1) Make sure that the deadlock hangs for more than 60 seconds (daemon_timeout option). If it does then it is fuse4x deadlock.
2) What fuse4x version do you use? Make sure that you have 0.8.13. You can find the version in logs (either /var/log/kernel.log or /var/log/system.log)
3) Do you see anything suspicious related to fuse4x in log files? (/var/log/(kernel|system).log)
4) What kernel version do you have? Please run uname -v
5) Mount fuse folder without flags. I am interested how it affects deadlock if you remove "allow_other" and/or "defer_permissions". Try to mount without any flags, is the deadlock still exist?

If nothing suspicious will be found we'll try to setup symbols for your kernel and get the deadlock stacktrace.

ivenhov

unread,
Dec 7, 2011, 5:21:53 AM12/7/11
to fus...@googlegroups.com
Hi Anatol

My replies below 
 
A few quick questions:
1) Make sure that the deadlock hangs for more than 60 seconds (daemon_timeout option). If it does then it is fuse4x deadlock.

It's definitely longer than 60 seconds. In fact the whole Finder hangs and cannot recover from it. I cannot even shutdown the machine.
Hard reboot with power button is needed
 
2) What fuse4x version do you use? Make sure that you have 0.8.13. You can find the version in logs (either /var/log/kernel.log or /var/log/system.log)

I'm using 0.8.13
Kernel.log states:

fuse4x: starting (version 0.8.13, Oct 17 2011, 13:17:58)

 
3) Do you see anything suspicious related to fuse4x in log files? (/var/log/(kernel|system).log)

There is nothing fuse4x related in system.log and only information about starting fuse4x in kernel.log

 
4) What kernel version do you have? Please run uname -v
Snow Leopard 10.6.8 in 64-bit mode
Darwin Kernel Version 10.8.0: Tue Jun  7 16:32:41 PDT 2011; root:xnu-1504.15.3~1/RELEASE_X86_6

it also happens on when used on machine with 10.6.7

Darwin Kernel Version 10.7.0: Sat Jan 29 15:16:10 PST 2011; root:xnu-1504.9.37~1/RELEASE_X86_64

Interestingly I could reproduce it only in 64-bit bt not when machine booted in 32 bit mode.

 
5) Mount fuse folder without flags. I am interested how it affects deadlock if you remove "allow_other" and/or "defer_permissions". Try to mount without any flags, is the deadlock still exist?


I tried without any flags and it looks like allow_other makes a difference. Without that flag I could not reproduce the problem.
Once turned back on (as the only option) I got the error third time I launched test.

To eliminate my implementation I took Loopback filesystem from MacFUSE and compiled/linked to fuse4x.
The same problems happens and it's actually easier to reproduce using the same test.
 
If nothing suspicious will be found we'll try to setup symbols for your kernel and get the deadlock stacktrace.

I downloaded the KernelDebugKit matching version of the kernel of the machine I'm using for testing
I also did a stackshot using

sudo /usr/libexec/stackshot -i -f stackshot.untxt && sudo /usr/sbin/symstacks.rb -f stackshot.untxt -w stackshot.txt && sudo chown USERNAME stackshot.*

but I don't know how to setup symbols. Can you walk me through this?

Thanks

D.



 

Anatol Pomozov

unread,
Dec 7, 2011, 3:42:13 PM12/7/11
to fus...@googlegroups.com
Hi

On Wed, Dec 7, 2011 at 2:21 AM, ivenhov <iwan....@gmail.com> wrote:
Hi Anatol

My replies below 
 
A few quick questions:
1) Make sure that the deadlock hangs for more than 60 seconds (daemon_timeout option). If it does then it is fuse4x deadlock.

It's definitely longer than 60 seconds. In fact the whole Finder hangs and cannot recover from it. I cannot even shutdown the machine.
Hard reboot with power button is needed
 
2) What fuse4x version do you use? Make sure that you have 0.8.13. You can find the version in logs (either /var/log/kernel.log or /var/log/system.log)

I'm using 0.8.13
Kernel.log states:

fuse4x: starting (version 0.8.13, Oct 17 2011, 13:17:58)

 
3) Do you see anything suspicious related to fuse4x in log files? (/var/log/(kernel|system).log)

There is nothing fuse4x related in system.log and only information about starting fuse4x in kernel.log

 
4) What kernel version do you have? Please run uname -v
Snow Leopard 10.6.8 in 64-bit mode
Darwin Kernel Version 10.8.0: Tue Jun  7 16:32:41 PDT 2011; root:xnu-1504.15.3~1/RELEASE_X86_6

it also happens on when used on machine with 10.6.7

Darwin Kernel Version 10.7.0: Sat Jan 29 15:16:10 PST 2011; root:xnu-1504.9.37~1/RELEASE_X86_64

Interestingly I could reproduce it only in 64-bit bt not when machine booted in 32 bit mode.


32 and 64bit code have different synchronization logic (and it should be reworked both on 32 and 64bit). The deadlock happens in 64bit codepath.

5) Mount fuse folder without flags. I am interested how it affects deadlock if you remove "allow_other" and/or "defer_permissions". Try to mount without any flags, is the deadlock still exist?


I tried without any flags and it looks like allow_other makes a difference. Without that flag I could not reproduce the problem.
Once turned back on (as the only option) I got the error third time I launched test.

To eliminate my implementation I took Loopback filesystem from MacFUSE and compiled/linked to fuse4x.
The same problems happens and it's actually easier to reproduce using the same test.

Thanks, this is valuable information. I'll try to write a simple test in C that uses loopback and reproduces the issue on 64bit kernel. I think I know where the deadlock happens, but I need some time for careful testing, give me a few days at least.
 
 If nothing suspicious will be found we'll try to setup symbols for your kernel and get the deadlock stacktrace.

I downloaded the KernelDebugKit matching version of the kernel of the machine I'm using for testing
I also did a stackshot using

sudo /usr/libexec/stackshot -i -f stackshot.untxt && sudo /usr/sbin/symstacks.rb -f stackshot.untxt -w stackshot.txt && sudo chown USERNAME stackshot.*

but I don't know how to setup symbols. Can you walk me through this?

Just mount *.dmg file, this will create a volume. If I remember correctly I had problems with kernel symbols on 64bit kernel, but on 32bit kernels the command above works fine for me.

ivenhov

unread,
Dec 7, 2011, 8:05:08 PM12/7/11
to fus...@googlegroups.com
I wrote a simple Ruby script that reproduces the problem on Loopback if that's any help

do chmod +x on it, go into mounted volume and launch it via ./crash.rb

Hang should happen after several tries.

D.

crash.rb

Anatol Pomozov

unread,
Dec 10, 2011, 7:55:34 PM12/10/11
to fus...@googlegroups.com
Hi,

Could you please try to install this fuse4x version and carefully test it?



I think I fixed the issue. At least your test case passes fine on my 10.6 64bit. All my tests (~200 testcases that generate a lot of fs requests for fuse filesystem) are also passed.

What happens here is a race condition between rmdir operation and other thread that accesses the same filesystem (fsdaemon in my case). I still try to analyze the deadlock to make sure that my fix does not introduce any side-effects. If everything is ok then the fix will be released in the next fuse4x version (it will be either 0.8.14 or 0.9.0).

ivenhov

unread,
Dec 12, 2011, 8:03:33 AM12/12/11
to fus...@googlegroups.com
Hi

I tested it with virtual machine and seems to be OK but I still get a deadlock on a real machine:

Darwin daniel-macbook 10.8.0 Darwin Kernel Version 10.8.0: Tue Jun  7 16:32:41 PDT 2011; root:xnu-1504.15.3~1/RELEASE_X86_64 x86_64

In kernel.log I have:
fuse4x: starting (version fuse4x_0_8_13-6-gb6c05bb, Dec 11 2011, 19:14:21)

So I guess that's a correct version.




Anatol Pomozov

unread,
Dec 12, 2011, 11:43:41 AM12/12/11
to fus...@googlegroups.com
Hi

On Mon, Dec 12, 2011 at 5:03 AM, ivenhov <iwan....@gmail.com> wrote:
Hi

I tested it with virtual machine and seems to be OK but I still get a deadlock on a real machine:

So you are saying that the same script works fine on one 10.6 machine but fails on another one? This sounds weird...

Could you please provide debug output for the issue ("-f -d")? I just want to make sure that it is the same deadlock.

Darwin daniel-macbook 10.8.0 Darwin Kernel Version 10.8.0: Tue Jun  7 16:32:41 PDT 2011; root:xnu-1504.15.3~1/RELEASE_X86_64 x86_64

In kernel.log I have:
fuse4x: starting (version fuse4x_0_8_13-6-gb6c05bb, Dec 11 2011, 19:14:21)

Yep this is the version that expect to fix the race condition. Ok, I'll test it more..

ivenhov

unread,
Dec 12, 2011, 12:54:31 PM12/12/11
to fus...@googlegroups.com

Yes, I can't reproduce it in VM but I got it on real machine. I guess it's purely timing issue
here since kernels are the same. Or other software is interfering.

So far I could not get it in -f -d mode since it's far slower than without debugging.
Again timing is critical here I think.

I'll keep testing with debug on to see if I can reproduce it.
BTW. I'm running the same script with 10 thousand iterations now

Daniel.

ivenhov

unread,
Dec 12, 2011, 1:03:26 PM12/12/11
to fus...@googlegroups.com
I just got it with debug mode:

mounted using:
./loopback /tmp/loop -d -f -omodules=threadid:subdir,subdir=/tmp/dir -oallow_other,native_xattr,volname=LoopbackFS

output:

unique: 2, opcode: GETATTR (3), nodeid: 1, insize: 56
getattr /
   unique: 2, success, outsize: 136
unique: 0, opcode: MKDIR (9), nodeid: 1, insize: 55
mkdir /parent 040755 umask=0177600
getattr /parent
   NODEID: 52502
   unique: 0, success, outsize: 160
unique: 1, opcode: LOOKUP (1), nodeid: 52502, insize: 47
LOOKUP /parent/subdir
getattr /parent/subdir
   unique: 1, error: -2 (No such file or directory), outsize: 16
unique: 2, opcode: GETATTR (3), nodeid: 1, insize: 56
getattr /
   unique: 2, success, outsize: 136
unique: 0, opcode: MKDIR (9), nodeid: 52502, insize: 55
mkdir /parent/subdir 040755 umask=0177600
getattr /parent/subdir
   NODEID: 52503
   unique: 0, success, outsize: 160
unique: 1, opcode: GETATTR (3), nodeid: 52502, insize: 56
getattr /parent
   unique: 1, success, outsize: 136
unique: 2, opcode: LOOKUP (1), nodeid: 52502, insize: 45
LOOKUP /parent/file
getattr /parent/file
   unique: 2, error: -2 (No such file or directory), outsize: 16
unique: 0, opcode: RMDIR (11), nodeid: 52502, insize: 47
rmdir /parent/subdir
   unique: 0, success, outsize: 16
unique: 1, opcode: GETATTR (3), nodeid: 52502, insize: 56
getattr /parent
   unique: 1, success, outsize: 136
unique: 2, opcode: LOOKUP (1), nodeid: 52502, insize: 45
LOOKUP /parent/file
getattr /parent/file
   unique: 2, error: -2 (No such file or directory), outsize: 16
unique: 0, opcode: CREATE (35), nodeid: 52502, insize: 61
create flags: 0x202 /parent/file 0100644 umask=0000
   create[5] flags: 0x202 /parent/file
fgetattr[5] /parent/file
   NODEID: 52504
   unique: 0, success, outsize: 176
unique: 1, opcode: GETATTR (3), nodeid: 52504, insize: 56
getattr /parent/file
   unique: 1, success, outsize: 136
unique: 2, opcode: WRITE (16), nodeid: 52504, insize: 96
write[5] 16 bytes to 0 flags: 0x0
   write[5] 16 bytes to 0
   unique: 2, success, outsize: 2
unique: 0, opcode: GETATTR (3), nodeid: 52504, insize: 56
getattr /parent/file
   unique: 0, success, outsize: 136
unique: 1, opcode: FLUSH (25), nodeid: 52504, insize: 64
flush[5]
lock[5] F_SETLK F_UNLCK start: 0 len: 0 pid: 0
   unique: 1, success, outsize: 16
unique: 2, opcode: RELEASE (18), nodeid: 52504, insize: 64
release[5] flags: 0x0
   unique: 2, success, outsize: 16
unique: 0, opcode: RMDIR (11), nodeid: 1, insize: 47
rmdir /parent
   unique: 0, error: -66 (Directory not empty), outsize: 16
unique: 1, opcode: OPENDIR (27), nodeid: 52502, insize: 48
opendir flags: 0x0 /parent
   opendir[4350049984] flags: 0x0 /parent
   unique: 1, success, outsize: 32
unique: 2, opcode: READDIR (28), nodeid: 52502, insize: 80
readdir[4350049984] from 0
   unique: 2, success, outsize: 112
unique: 0, opcode: RELEASEDIR (29), nodeid: 52502, insize: 64
releasedir[4350049984] flags: 0x0
   unique: 0, success, outsize: 16
unique: 1, opcode: GETATTR (3), nodeid: 1, insize: 56
getattr /
   unique: 1, success, outsize: 136
unique: 2, opcode: LOOKUP (1), nodeid: 1, insize: 47
LOOKUP /parent
getattr /parent
   NODEID: 52502
   unique: 2, success, outsize: 160
unique: 0, opcode: LOOKUP (1), nodeid: 52502, insize: 45
LOOKUP /parent/file
getattr /parent/file
   NODEID: 52504
   unique: 0, success, outsize: 160
unique: 1, opcode: GETATTR (3), nodeid: 52504, insize: 56
getattr /parent/file
   unique: 1, success, outsize: 136
unique: 2, opcode: UNLINK (10), nodeid: 52502, insize: 45
unlink /parent/file
   unique: 2, success, outsize: 16
unique: 0, opcode: GETATTR (3), nodeid: 52502, insize: 56
getattr /parent
   unique: 0, success, outsize: 136
unique: 1, opcode: RMDIR (11), nodeid: 1, insize: 47
rmdir /parent
   unique: 1, success, outsize: 16
unique: 2, opcode: LOOKUP (1), nodeid: 1, insize: 47
LOOKUP /parent
getattr /parent
   unique: 2, error: -2 (No such file or directory), outsize: 16
unique: 0, opcode: GETATTR (3), nodeid: 1, insize: 56
getattr /
   unique: 0, success, outsize: 136
unique: 1, opcode: MKDIR (9), nodeid: 1, insize: 55
mkdir /parent 040755 umask=0177600
getattr /parent
   NODEID: 52505
   unique: 1, success, outsize: 160
unique: 2, opcode: LOOKUP (1), nodeid: 52505, insize: 47
LOOKUP /parent/subdir
getattr /parent/subdir
   unique: 2, error: -2 (No such file or directory), outsize: 16
unique: 0, opcode: MKDIR (9), nodeid: 52505, insize: 55
mkdir /parent/subdir 040755 umask=0177600
getattr /parent/subdir
   NODEID: 52506
   unique: 0, success, outsize: 160
unique: 1, opcode: GETATTR (3), nodeid: 1, insize: 56
getattr /
   unique: 1, success, outsize: 136
unique: 2, opcode: GETATTR (3), nodeid: 52505, insize: 56
getattr /parent
   unique: 2, success, outsize: 136
unique: 1, opcode: RMDIR (11), nodeid: 52505, insize: 47
rmdir /parent/subdir
   unique: 1, success, outsize: 16
unique: 0, opcode: LOOKUP (1), nodeid: 52505, insize: 45
LOOKUP /parent/file
getattr /parent/file
   unique: 0, error: -2 (No such file or directory), outsize: 16
unique: 2, opcode: GETATTR (3), nodeid: 52505, insize: 56
getattr /parent
   unique: 2, success, outsize: 136
unique: 1, opcode: CREATE (35), nodeid: 52505, insize: 61
create flags: 0x202 /parent/file 0100644 umask=0000
   create[5] flags: 0x202 /parent/file
fgetattr[5] /parent/file
   NODEID: 52507
   unique: 1, success, outsize: 176
unique: 0, opcode: GETATTR (3), nodeid: 52507, insize: 56
getattr /parent/file
   unique: 0, success, outsize: 136
unique: 2, opcode: LOOKUP (1), nodeid: 52505, insize: 45
LOOKUP /parent/file
getattr /parent/file
   NODEID: 52507
   unique: 2, success, outsize: 160
unique: 1, opcode: WRITE (16), nodeid: 52507, insize: 96
write[5] 16 bytes to 0 flags: 0x0
   write[5] 16 bytes to 0
   unique: 1, success, outsize: 24
unique: 0, opcode: GETATTR (3), nodeid: 52507, insize: 56
getattr /parent/file
   unique: 0, success, outsize: 136
unique: 2, opcode: FLUSH (25), nodeid: 52507, insize: 64
flush[5]
lock[5] F_SETLK F_UNLCK start: 0 len: 0 pid: 0
   unique: 2, success, outsize: 16
unique: 1, opcode: RELEASE (18), nodeid: 52507, insize: 64
release[5] flags: 0x0
   unique: 1, success, outsize: 16
unique: 0, opcode: RMDIR (11), nodeid: 1, insize: 47
unique: 2, opcode: LOOKUP (1), nodeid: 1, insize: 47
rmdir /parent
LOOKUP /parent
   unique: 0, error: -66 (Directory not empty), outsize: 16
getattr /parent
   NODEID: 52505
   unique: 2, success, outsize: 160
unique: 1, opcode: GETATTR (3), nodeid: 52505, insize: 56
getattr /parent
   unique: 1, success, outsize: 136
unique: 2, opcode: OPENDIR (27), nodeid: 52505, insize: 48
opendir flags: 0x0 /parent
   opendir[4350049984] flags: 0x0 /parent
   unique: 2, success, outsize: 32
unique: 0, opcode: GETATTR (3), nodeid: 1, insize: 56
getattr /
   unique: 0, success, outsize: 136
unique: 1, opcode: READDIR (28), nodeid: 52505, insize: 80
readdir[4350049984] from 0
   unique: 1, success, outsize: 112
unique: 2, opcode: RELEASEDIR (29), nodeid: 52505, insize: 64
releasedir[4350049984] flags: 0x0
   unique: 2, success, outsize: 16


Deadlock, cannot umount but I managed to kill loopback process.
Volume stayed there, unresponsive.

Daniel

Anatol Pomozov

unread,
Dec 12, 2011, 9:30:14 PM12/12/11
to fus...@googlegroups.com
Hi,

Please download this version and test it as well http://dl.dropbox.com/u/3842605/Fuse4X-0.8.14.dmg

This is another deadlock but it has similar roots. It should be fixed as well. Let me know if you see any other issues.

Thanks for the report.

ivenhov

unread,
Dec 13, 2011, 5:32:24 AM12/13/11
to fus...@googlegroups.com
Looks good so far.

Occasionally I got the error though:

/Users/daniel/Desktop/crash.rb:13:in `doTest': Parent dir already exist (RuntimeError)
from /Users/daniel/Desktop/crash.rb:52

Which means a parent directory could not be deleted after previous iteration
This may be just a weakness of the script or loopback fs implementation

Log below

getattr /parent/file
   unique: 3, success, outsize: 136
unique: 0, opcode: FLUSH (25), nodeid: 31429, insize: 64
flush[5]
lock[5] F_SETLK F_UNLCK start: 0 len: 0 pid: 0
   unique: 0, success, outsize: 16
unique: 1, opcode: RELEASE (18), nodeid: 31429, insize: 64
release[5] flags: 0x0
   unique: 1, success, outsize: 16
unique: 5, opcode: RMDIR (11), nodeid: 1, insize: 47
rmdir /parent
   unique: 5, error: -66 (Directory not empty), outsize: 16
unique: 4, opcode: LOOKUP (1), nodeid: 1, insize: 47
unique: 2, opcode: OPENDIR (27), nodeid: 31427, insize: 48
LOOKUP /parent
opendir flags: 0x0 /parent
   opendir[4398973344] flags: 0x0 /parent
   unique: 2, success, outsize: 32
getattr /parent
   NODEID: 31427
   unique: 4, success, outsize: 160
unique: 3, opcode: READDIR (28), nodeid: 31427, insize: 80
readdir[4398973344] from 0
   unique: 3, success, outsize: 112
unique: 0, opcode: RELEASEDIR (29), nodeid: 31427, insize: 64
releasedir[4398973344] flags: 0x0
   unique: 0, success, outsize: 16
unique: 1, opcode: LOOKUP (1), nodeid: 31427, insize: 45
LOOKUP /parent/file
getattr /parent/file
   NODEID: 31429
   unique: 1, success, outsize: 160
unique: 5, opcode: GETATTR (3), nodeid: 31429, insize: 56
getattr /parent/file
   unique: 5, success, outsize: 136
unique: 2, opcode: UNLINK (10), nodeid: 31427, insize: 45
unlink /parent/file
unique: 3, opcode: LOOKUP (1), nodeid: 31427, insize: 45
LOOKUP /parent/file
getattr /parent/file
   unique: 3, error: -2 (No such file or directory), outsize: 16
   unique: 2, success, outsize: 16
unique: 0, opcode: GETATTR (3), nodeid: 31427, insize: 56
getattr /parent
   unique: 0, success, outsize: 136
unique: 4, opcode: RMDIR (11), nodeid: 1, insize: 47
rmdir /parent
unique: 1, opcode: LOOKUP (1), nodeid: 1, insize: 47
LOOKUP /parent
getattr /parent
   NODEID: 31427
   unique: 1, success, outsize: 160
   unique: 4, success, outsize: 16
unique: 5, opcode: GETATTR (3), nodeid: 1, insize: 56
getattr /
   unique: 5, success, outsize: 136
unique: 3, opcode: LOOKUP (1), nodeid: 31427, insize: 45
   unique: 3, error: -2 (No such file or directory), outsize: 16
unique: 2, opcode: GETATTR (3), nodeid: 1, insize: 56
getattr /
   unique: 2, success, outsize: 136
unique: 0, opcode: GETATTR (3), nodeid: 1, insize: 56
getattr /
   unique: 0, success, outsize: 136
unique: 4, opcode: GETXATTR (22), nodeid: 1, insize: 77
getxattr / com.apple.FinderInfo 32 0
   unique: 4, error: -93 (Attribute not found), outsize: 16


Thanks
Daniel

Anatol Pomozov

unread,
Dec 13, 2011, 7:15:05 PM12/13/11
to fus...@googlegroups.com
Hi

On Tue, Dec 13, 2011 at 2:32 AM, ivenhov <iwan....@gmail.com> wrote:
Looks good so far.

Occasionally I got the error though:

/Users/daniel/Desktop/crash.rb:13:in `doTest': Parent dir already exist (RuntimeError)
from /Users/daniel/Desktop/crash.rb:52

Which means a parent directory could not be deleted after previous iteration
This may be just a weakness of the script or loopback fs implementation

It sounds like undesired side-effect of the workarounds added to fuse4x.

Here is a technical explanation of what happens here: 32bit kernel code locks a mutex assigned to vnode before using it. This means that access to a vnode is serialized and if someone performs a fs operation on the same vnode then other requests are blocked. These file operations are executed atomically.

64bit code uses mutex assigned to filesystem (aka biglock), the idea was to serialize access to the whole filesystem. Unfortunately this causes deadlocks when some operation (e.g. name cache lookup) triggers some other thread to access the filesystem. The workaround is to release the biglock right before such operations (like name cache lookup) and reacquire it later. But it where the problem is - imagine that you UNLINK a file, you start this vfs operation then release the biglock, at this moment ACCESS vfs operation on the same vnode starts, this marks this vnode as used (it increases its usage count), then ACCESS releases the biglock and then UNLINK acquires it again and tries to remove vnode. But UNLINK cannot remove vnode as it is still used by ACCESS vfs operation. vfs operations on vnode are not atomic anymore.

Moral of the story - synchronization in 64bit code need to be reimplemented to use vnode-specific mutexes (the similar way as 32bit version).

ivenhov

unread,
Dec 14, 2011, 5:36:43 AM12/14/11
to fus...@googlegroups.com
Thanks for the explanation Anatol.

Just o let you know, so far I haven't seen any issue with the latest build (appart from occasional problem with deleting parent dir)
Any idea when it's going to be an official build?

Daniel 

Anatol Pomozov

unread,
Dec 14, 2011, 11:57:11 AM12/14/11
to fus...@googlegroups.com
Hi

A new build will be released in a few days. Give me some time to run it on my server. Most likely the new version will be the same as the one I put to the Dropbox.
Reply all
Reply to author
Forward
0 new messages