OSX - kill FUSE and bring back DYLD?

Mike Shal

unread,

Nov 12, 2013, 6:00:05 PM11/12/13

to tup-...@googlegroups.com

Hi all,

I'm considering removing the use of FUSE on OSX and bringing back the DYLD_INSERT_LIBRARIES approach. When Anatol suggested it a few months ago I gave it a whirl but was stymied by a few test cases that indicated performance would be worse than fuse. This consisted of a simple empty .c file compilation:

$ time gcc -c ok.c
real    0m0.026s
user    0m0.012s
sys    0m0.011s

$ time DYLD_FORCE_FLAT_NAMESPACE=1 gcc -c ok.c
real    0m0.049s
user    0m0.036s
sys    0m0.010s

However, for some more real-world use-cases, it seems this slow-down is additive rather than multiplicative. Eg I have one link job that takes 5.436s, and with DYLD_FORCE_FLAT_NAMESPACE it goes up to 5.472s. In comparison, running in a chrooted fuse environment is 9.669s, so the shared library approach clearly wins.

I started to implement the old ldpreload approach (for OSX only), and the branch is on github. At the moment I'm looking for some feedback to know if it is worth finishing the branch and merging it in, or if we should still stay the course with fuse. If you use tup on OSX, please copy your project into a separate area to try it out since not all of the test cases are passing yet. I'm curious to know:

1) How does performance compare vs the latest tup master using fuse?

2) Is there anything broken (aside from the caveats below)?

Caveats:

1) No variants yet

2) No run-scripts yet

3) Some symlink tests and wrong target tests are still broken, so it is not suited for actual development yet.

Fixing the above will take a while, so before I put in the effort I want to know if it will be worth it :)

The advantages should be the performance, and the fact that we don't need a chroot environment to avoid the annoying paths issue with fuse. The disadvantage is figuring out how to support variants and such with it. Based on what Anatol said, I believe we don't have as big of a concern with binaries on OSX statically linking libc, so we should be safe to use the approach.

Let me know your thoughts/concerns/results/etc.

Thanks!
-Mike

Anatol Pomozov

unread,

Nov 12, 2013, 6:58:36 PM11/12/13

to tup-...@googlegroups.com

Hi, Mike.

This sounds great!

On Tue, Nov 12, 2013 at 3:00 PM, Mike Shal <mar...@gmail.com> wrote:
> Hi all,
>
> I'm considering removing the use of FUSE on OSX and bringing back the
> DYLD_INSERT_LIBRARIES approach. When Anatol suggested it a few months ago I
> gave it a whirl but was stymied by a few test cases that indicated
> performance would be worse than fuse. This consisted of a simple empty .c
> file compilation:
>
> $ time gcc -c ok.c
> real 0m0.026s
> user 0m0.012s
> sys 0m0.011s
>
> $ time DYLD_FORCE_FLAT_NAMESPACE=1 gcc -c ok.c
> real 0m0.049s
> user 0m0.036s
> sys 0m0.010s
>
> However, for some more real-world use-cases, it seems this slow-down is
> additive rather than multiplicative. Eg I have one link job that takes
> 5.436s, and with DYLD_FORCE_FLAT_NAMESPACE it goes up to 5.472s. In
> comparison, running in a chrooted fuse environment is 9.669s, so the shared
> library approach clearly wins.

I am a bit surprised. I would expect that DYDL way will be faster and
definitely more scalable. FUSE has to copy data to kernel and then
back to userspace, while DYDL does not do it. Another issue with fuse
on osx is that it has limited scalability. Fuse4X has
one-request-per-filesystem limitation, while OSXFUSE even worse - one
request for all filesystems. So running build with many threads should
show better performance with DYDL.

> I started to implement the old ldpreload approach (for OSX only), and the
> branch is on github. At the moment I'm looking for some feedback to know if
> it is worth finishing the branch and merging it in, or if we should still
> stay the course with fuse. If you use tup on OSX, please copy your project
> into a separate area to try it out since not all of the test cases are
> passing yet. I'm curious to know:
>
> 1) How does performance compare vs the latest tup master using fuse?
> 2) Is there anything broken (aside from the caveats below)?
>
> Caveats:
>
> 1) No variants yet
> 2) No run-scripts yet
> 3) Some symlink tests and wrong target tests are still broken, so it is not
> suited for actual development yet.
>
> Fixing the above will take a while, so before I put in the effort I want to
> know if it will be worth it :)
>
> The advantages should be the performance, and the fact that we don't need a
> chroot environment to avoid the annoying paths issue with fuse.

Another difference is that fuse implementations on osx seriously
lagging and have many "issues". Here are few examples:

- osx vfs kernel layer is derived from freebsd. Its kernel API does
not allow to distinguish operation for different file descriptors. So
when several threads open the same file and read it then kernel does
not know what exactly fd is used. From other side fuse userspace API
follows Linux kernel API that passes valid filedescriptor to vfs
functions. See fuse_file_info->fh field in libfuse API.
- fuse on osx does not have ioctl() implementation. fuse
implementation on linux has it (almost) for free, while osx kernel
requires a lot plumbing code.

> The
> disadvantage is figuring out how to support variants and such with it. Based
> on what Anatol said, I believe we don't have as big of a concern with
> binaries on OSX statically linking libc, so we should be safe to use the
> approach.

I confirm it. Here is more info
http://stackoverflow.com/questions/5259249/creating-static-mac-os-x-c-build

Basically Apple developers decide to provide libSystem as an API and
hide kernel details. This allows Apple developers do not worry about
syscall compatibility issues. There is probably possible to write an
assembler program that uses syscalls directly to manipulate files but
I doubt anyone does it in the real life.

comex

unread,

Nov 12, 2013, 7:30:24 PM11/12/13

to tup-...@googlegroups.com

On Wed, Nov 13, 2013 at 8:00 AM, Mike Shal <mar...@gmail.com> wrote:
> $ time DYLD_FORCE_FLAT_NAMESPACE=1 gcc -c ok.c
> real 0m0.049s
> user 0m0.036s
> sys 0m0.010s

Hi --

I suggest using dyld interposing
(http://www.opensource.apple.com/source/dyld/dyld-97.1/include/mach-o/dyld-interposing.h)
rather than DYLD_FORCE_FLAT_NAMESPACE, as it avoids any issues caused
by changing the lookup rules and should have better performance.

Mike Shal

unread,

Nov 13, 2013, 8:02:47 PM11/13/13

to tup-...@googlegroups.com

On Tue, Nov 12, 2013 at 6:58 PM, Anatol Pomozov <anatol....@gmail.com> wrote:

Another difference is that fuse implementations on osx seriously
lagging and have many "issues". Here are few examples:

- osx vfs kernel layer is derived from freebsd. Its kernel API does
not allow to distinguish operation for different file descriptors. So
when several threads open the same file and read it then kernel does
not know what exactly fd is used. From other side fuse userspace API
follows Linux kernel API that passes valid filedescriptor to vfs
functions. See fuse_file_info->fh field in libfuse API.
- fuse on osx does not have ioctl() implementation. fuse
implementation on linux has it (almost) for free, while osx kernel
requires a lot plumbing code.

Another issue I forgot is where we were running out of file descriptors with FUSE. This was much more prevalent in OSX as compared to Linux since the default fd limit is so low.

> The
> disadvantage is figuring out how to support variants and such with it. Based
> on what Anatol said, I believe we don't have as big of a concern with
> binaries on OSX statically linking libc, so we should be safe to use the
> approach.

I confirm it. Here is more info
http://stackoverflow.com/questions/5259249/creating-static-mac-os-x-c-build

Basically Apple developers decide to provide libSystem as an API and
hide kernel details. This allows Apple developers do not worry about
syscall compatibility issues. There is probably possible to write an
assembler program that uses syscalls directly to manipulate files but
I doubt anyone does it in the real life.

Good to know - thanks for the info!

-Mike

Mike Shal

unread,

Nov 13, 2013, 8:05:15 PM11/13/13

to tup-...@googlegroups.com

Cool! I didn't know that existed. I switched out the DYLD_FORCE_FLAT_NAMESPACE for the __interpose section - it definitely helps for those small gcc tests:

before:
1) [0.075s] gcc -c foo.c -o foo.o
2) [0.086s] gcc foo.o -o prog.exe

after:
1) [0.032s] gcc -c foo.c -o foo.o
2) [0.037s] gcc foo.o -o prog.exe

Thanks for the tip! This is now incorporated into the ldpreload branch.

I'd love to hear some real-world performance benchmarks of the current ldpreload branch vs FUSE.

-Mike

encodr

unread,

Nov 14, 2013, 3:14:26 PM11/14/13

to tup-...@googlegroups.com

Hi Mike

non-fuse tup? Hmm, yes please.

1) I'm not sure how far you got with BSD - will the same solution apply?

2) As I indicated in a post last month, we don't absolutely need your "variants code" in order to compile variants ...

Brian

--
--
tup-users mailing list
email: tup-...@googlegroups.com
unsubscribe: tup-users+...@googlegroups.com
options: http://groups.google.com/group/tup-users?hl=en
---
You received this message because you are subscribed to the Google Groups "tup-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tup-users+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Anatol Pomozov

unread,

Nov 17, 2013, 11:17:50 AM11/17/13

to tup-...@googlegroups.com

Hi, Mike.

Thanks for the changes - they look promising.

I was trying to run ./bootstrap.sh on macosx and it fails. ./build.sh
file should be updated accordingly.

On Tue, Nov 12, 2013 at 3:00 PM, Mike Shal <mar...@gmail.com> wrote:

Mike Shal

unread,

Nov 19, 2013, 1:32:23 PM11/19/13

to tup-...@googlegroups.com

Hi Brian,

On Thu, Nov 14, 2013 at 3:14 PM, encodr <enc...@googlemail.com> wrote:

Hi Mike

non-fuse tup? Hmm, yes please.

1) I'm not sure how far you got with BSD - will the same solution apply?

I got close once using FUSE on FreeBSD, but it is difficult for me to find the time to fix issues on all platforms. Unfortunately I don't believe this change for OSX will help BSD at all - last I checked, most of the core binaries in FreeBSD were statically linked, so using an LD_PRELOAD technique won't get any file accesses at all. I'm not sure how much work there is left to get it working on BSD (with either FUSE or something else).

2) As I indicated in a post last month, we don't absolutely need your "variants code" in order to compile variants ...

Yeah, thanks for sharing your post! Can anyone who is currently using the built-in variants code try out this approach and compare?

-Mike

Mike Shal

unread,

Nov 19, 2013, 1:34:12 PM11/19/13

to tup-...@googlegroups.com

On Sun, Nov 17, 2013 at 11:17 AM, Anatol Pomozov <anatol....@gmail.com> wrote:

Hi, Mike.

Thanks for the changes - they look promising.

I was trying to run ./bootstrap.sh on macosx and it fails. ./build.sh
file should be updated accordingly.

Thanks for the heads up - the ldpreload branch is currently very alpha. For the master branch I make sure bootstrap.sh and all the test cases run successfully for each commit, but I don't always do that for test branches like this. As it nears completion I'll go back and fix up the commits to conform to these standards, but for now you should be able to build the ldpreload branch by using an existing tup binary from the master branch.

-Mike

Anatol Pomozov

unread,

Dec 6, 2013, 10:44:15 AM12/6/13

to tup-...@googlegroups.com

Hi,

I tested it a little bit and tried to compile different projects on
macosx and it looks fine.

I see issues when run tests, e.g.

--- Run t3027-chain2.sh ---
.tup repository initialized.
[ tup ] [0.001s] Scanning filesystem...
[ tup ] [0.001s] Reading in new environment variables...
[ tup ] [0.002s] Parsing Tupfiles...
1) [0.001s] .
[ ] 100%
[ tup ] [0.003s] No files to delete.
[ tup ] [0.003s] Generating .gitignore files...
[ tup ] [0.003s] Executing Commands...

1) [0.032s] gcc -c foo.c -o foo.o

2) [0.034s] gcc -c bar.c -o bar.o
* 3) nm foo.o > foo.nm
*** tup errors ***
tup error: File
'/var/folders/00/0hl3r000h01000cxqpysvccm0022cg/T/xcrun_db' was
written to, but is not in .tup/db. You probably should specify it as
an output
*** Command ID=13 ran successfully, but tup failed to save the dependencies.
* 4) nm bar.o > bar.nm
*** tup errors ***
tup error: File
'/var/folders/00/0hl3r000h01000cxqpysvccm0022cg/T/xcrun_db' was
written to, but is not in .tup/db. You probably should specify it as
an output
*** Command ID=15 ran successfully, but tup failed to save the dependencies.
[ ] 100%
*** tup: 2 jobs failed.
TODO: Server quit
*** Failed to update!
*** t3027-chain2.sh failed

It seems 'nm' tries to use /var folder and tup does not like it. I use
standard nm from XCode.

Everything else seems fine.

Mike Shal

unread,

Dec 6, 2013, 4:59:12 PM12/6/13

to tup-...@googlegroups.com

Is it reproduceable for you? I hit the same error the first time I run the test after booting, but after that the test seems to run fine. Maybe it should just ignore writes to '/var/*/xcrun_db'?

Thanks for trying it out!

-Mike

comex

unread,

Dec 6, 2013, 8:30:58 PM12/6/13

to tup-...@googlegroups.com

On Fri, Dec 6, 2013 at 4:59 PM, Mike Shal <mar...@gmail.com> wrote:
> Is it reproduceable for you? I hit the same error the first time I run the
> test after booting, but after that the test seems to run fine. Maybe it
> should just ignore writes to '/var/*/xcrun_db'?

FYI, xcrun_db is what the wrapper binary at /usr/bin/nm is using to
cache the path of the real nm; this applies to everything in the
toolchain, including clang. Without a cache, it takes an impressive
0.18s each time just to decide what to invoke, so I'd avoid dropping
such writes.

Mike Shal

unread,

Dec 7, 2013, 11:15:07 AM12/7/13

to tup-...@googlegroups.com

Ahh, thanks for the info. By "ignore" I meant it would detect & allow the write to go through, but not count it as an actual output when it goes to check the actual vs. expected outputs. So xcrun_db should still be written, just that tup wouldn't complain about it the first time it runs something in the toolchain. Would that work?

-Mike

comex

unread,

Dec 7, 2013, 2:54:16 PM12/7/13

to tup-...@googlegroups.com

On Sat, Dec 7, 2013 at 11:15 AM, Mike Shal <mar...@gmail.com> wrote:
> Ahh, thanks for the info. By "ignore" I meant it would detect & allow the
> write to go through, but not count it as an actual output when it goes to
> check the actual vs. expected outputs. So xcrun_db should still be written,
> just that tup wouldn't complain about it the first time it runs something in
> the toolchain. Would that work?
>
> -Mike

Yeah, that sounds fine.

Tzu-Mao Li

unread,

Dec 28, 2015, 11:46:19 AM12/28/15

to tup-users

Hi!

I just started to use tup these days and ran into the xcrun_db issue mentioned in this thread. Every time after I boot the machine I will need to manually invoke clang so that the paths of the clang toolchain are cached, otherwise tup upd will fail. Is this issue fixed? Or is it possible to tell tup to ignore certain outputs?

mar...@gmail.com於 2013年12月7日星期六 UTC-5上午11時15分07秒寫道：

Mike Shal

unread,

Jan 4, 2016, 3:39:06 PM1/4/16

to tup-...@googlegroups.com

On Sun, Dec 27, 2015 at 5:45 PM, Tzu-Mao Li <bach...@gmail.com> wrote:

Hi!

I just started to use tup these days and ran into the xcrun_db issue mentioned in this thread. Every time after I boot the machine I will need to manually invoke clang so that the paths of the clang toolchain are cached, otherwise tup upd will fail. Is this issue fixed? Or is it possible to tell tup to ignore certain outputs?

Unfortunately not. I believe there is an issue open about ignoring certain outputs, though for something like this that is universal to a platform, we should probably just build it into tup to ignore it globally rather than expect everyone to add it to some ignore list in their Tupfiles.

-Mike

Reply all

Reply to author

Forward