Is your disk by chance encrypted? I wonder if the cpu load to
encrypt/decrypt blocks is getting attributed to the tup process. The
two things going on while running sub-processes are:
1) The fuse thread processing fs requests. Ultimately all reads/writes
have to go through here (which is what accesses the underlying disk),
so maybe tup is taking the blame for that. I don't think there is much
else here that would be CPU-bound.
2) For finished jobs, tup does the post-processing on the files that
were read/written. Some of this is currently inefficient (like
O(#files^2)... I have a patch in the works but still needs to be
ported to windows :-/), so if a single sub-process accesses a lot of
files that could be an additional bottleneck. The file access data
must then be compared to the existing db contents to see what new
links need to be written / removed, etc.
Ultimately we will probably need to profile it to find out where the
real slowdown is. If it's in 1), then perhaps going back to a
multi-fuse (as mentioned with the ulimit issue), or at least to a
multi-threaded fuse loop will help. If it's in 2), we would need to
replace any dumb existing algorithms with less-dumb ones.
-Mike
Ok, I'll have to try profiling and see if there's anything obvious
that can be improved. Might be some time though, I'm a bit swamped at
the moment...
-Mike
On Wed, Apr 25, 2012 at 8:11 AM, Slawomir Czarko
<slawomi...@gmail.com> wrote:
> Still seeing this. I switched from ext4 to ext3 and it reduced load from tup
> from 100% of one core to about 70%. tup becomes a bottleneck in the build -
> on a 6 core machines I'm running 9 parallel builds but it looks like only
> 50% of cores are being utilized.
If I increase number of parallel jobs theAs a quick test can you try to replace "fuse_loop(fs.fuse)" with
> load from tup goes up as well. Compile commands execute at a rate of about 1
> per second so it shouldn't be cause such a massive load in tup.
>
> Any idea how to troubleshoot/debug/profile this?
"fuse_loop_mt(fs.fuse)" in src/tup/server/fuse_server.c ? This will
run fuse in multi-threaded mode. It runs single-threaded by default to
avoid reports from helgrind against libfuse, but I could work around
that with a suppression file if necessary. If that doesn't help at all
I guess we'd need to benchmark further - maybe compare against a fuse
example file-system to see how much of an effect just using fuse has.
Thanks,
-Mike
--
tup-users mailing list
email: tup-...@googlegroups.com
unsubscribe: tup-users+...@googlegroups.com
options: http://groups.google.com/group/tup-users?hl=en
On Wed, Apr 25, 2012 at 8:11 AM, Slawomir Czarko
<slawomi...@gmail.com> wrote:
> Still seeing this. I switched from ext4 to ext3 and it reduced load from tup
> from 100% of one core to about 70%. tup becomes a bottleneck in the build -
> on a 6 core machines I'm running 9 parallel builds but it looks like only
> 50% of cores are being utilized.
If I increase number of parallel jobs theAs a quick test can you try to replace "fuse_loop(fs.fuse)" with
> load from tup goes up as well. Compile commands execute at a rate of about 1
> per second so it shouldn't be cause such a massive load in tup.
>
> Any idea how to troubleshoot/debug/profile this?
"fuse_loop_mt(fs.fuse)" in src/tup/server/fuse_server.c ? This will
run fuse in multi-threaded mode. It runs single-threaded by default to
avoid reports from helgrind against libfuse, but I could work around
that with a suppression file if necessary. If that doesn't help at all
I guess we'd need to benchmark further - maybe compare against a fuse
example file-system to see how much of an effect just using fuse has.
On Wed, Apr 25, 2012 at 8:11 AM, Slawomir Czarko
This makes some sense (I think?) - if tup is busy doing some work,
maybe it doesn't have a chance to run more processes. I think tup
should definitely scale better wrt. job parallelization than what
you're seeing, so we'll have to benchmark to find out what is holding
it up. (I just have a dual-core machine, so I haven't seen it).
I do have a bit of work to do as far as getting some branches merged
before I'll have a chance to profile - if you want to try to build tup
with profiling options and see if it points to a specific function
that would help move things along. Hopefully it's something simple
like some dumb O(n^2) algorithm lurking around...
On Fri, Apr 27, 2012 at 2:18 PM, Slawomir Czarko
On Fri, May 4, 2012 at 8:06 AM, Slawomir Czarko
I don't exactly know why this is, but it looks when it is cold it
accesses the header files more times than when the cache is warm. I
temporarily added some debug to tup to print out each file access, and
in one case I tried with a warm cache there were 1171 accesses, and
with a cold cache the same compilation had 3807 accesses. Tup has to
process each one, which somewhat explains the extra CPU time (I have a
branch in progress that tries to reduce this by avoiding duplicate
work if the same file is accessed multiple times, but it isn't merged
yet). Though in my tests using the same cpu usage command that you
were, I only saw about a 2x increase in cpu for the cold case vs the
warm case. I have no idea why you're seeing 20x more time...
-Mike
On Thu, May 24, 2012 at 4:35 PM, Slawomir Czarko