Why doesn't tup simply remember dependencies from outside the build
tree? The only reason cleaning is necessary in this case is because
tup deliberately discards some of the dependency information.
Why doesn't tup simply remember dependencies from outside the build
tree? The only reason cleaning is necessary in this case is because
tup deliberately discards some of the dependency information.
No, I mean: tup already gets complete filesystem dependency
information, it just discards all paths outside the project tree.
No, I mean: tup already gets complete filesystem dependency
information, it just discards all paths outside the project tree.
Sorry, but I'm still going to say "no" purely on ideological reasons.
This violates my rule #3 for build systems, which is that there must
only be one command to update a system. In tup, this is 'tup upd'.
This means we always have this separation between the developer and
the build system:
developer:
1) Change/add/remove some files in the system.
2) Run 'tup upd'
build system:
1) Analyze file states (timestamps, contents, etc)
2) Determine the minimum amount of work to do
3) Do that work
As soon as you add a 'tup clean' to the mix, now you are moving some
of the logic that belongs to the build system into the developer's
space. You might have something like:
developer:
1) Change/add/remove some files in the system.
2) Which files did I change? Just stuff in the project that can be
handled with 'tup upd'? Or other things that may need a full build?
2a) Maybe run 'tup upd'
2b) Maybe run 'tup clean; tup upd'
3) If I picked 2a) when I should've picked 2b), then try again with
'tup clean; tup upd'.
Notice how this sounds an awful lot like the responsibilities of the
build system. It is not your job to make those decisions; it is the
build system's. Therefore, tup will not have a 'clean'.
> As expected it should remove all objects added by the build process.
> If I remember correctly, Mike told that 'tup clean' is not needed because
> the clean operation is used only to avoid 'incorrect dependencies' problem.
> But in fact there are other cases when I would like to see 'tup clean':
> - I test 'clean build' speed, before running a new iteration I need to
> revert project to pristine state.
What are you trying to accomplish here? If you want to know the time
it takes a new developer to get the software installed, I would think
you'd have to add the time it takes to clone/checkout from version
control, or unpack the tarball. Once the new developer is setup,
however, the full build time is not a useful metric, since the only
builds should be incremental builds at that point.
> - I upgraded compiler (e.g. gcc from 4.4 to 4.5) or a system library and
> want to recompile my project and see if everything works fine.
> Currently I do 'git clean -xdf' or something similar to remove all garbage
> from the project. But I would prefer that the build system was able to clean
> after itself.
This is definitely a case that tup should handle properly. As tup was
initially designed, however, all of the dependencies are rooted at the
top of the project. The database will need to be re-worked to handle
dependencies on things like /lib/libc.so or whatever. In other words,
the fix here is not to add 'clean' and make you try to figure out when
you need to run 'clean' and when you don't; the fix is to have tup
properly analyze and track the dependencies.
-Mike
Part of it is the way the nodes are stored in the database. This will
need to be updated to provide a full view, rather than a local view of
just the project tree.
Another part as you mention is the inotify/scanning logic. This will
need to be updated to watch/scan files in directories that are outside
of the project tree.
Finally, there is the method of watching the file accesses of the
sub-process. Right now with the fuse setup, we can detect file
accesses anywhere in the file-system, but only if they use relative
paths. To detect full-path accesses (ie:
open("/usr/include/stdio.h")), the sub-process needs to execute in a
chroot environment (or maybe there's some other way to handle this?).
I have been playing around with getting chroot to work, but it is
tricky because chroot needs additional permissions, but other things
(like the sub-processes) need to run as the user.
So it is not a trivial thing to add, but I would like to do so at some
point to make tup much more robust when updating system libraries.
> But the influence of environment variables means that there is always a use
> case (not involving timing) where you want to do a clean rebuild even though
> the build system is functioning perfectly.
Tup controls the environment of the sub-processes, so we know whether
or not the environment changes between invocations. Although it
wouldn't have the granularity to know that a particular sub-process
looked at a particular environment variable, it can know that
something in the environment changed and therefore do a full re-build.
Or maybe you could explicitly list which environment variables are
passed to the sub-processes in Tuprules.tup, and then those are the
ones that can be compared against their previous values. That might be
too tedious, though.
-Mike
On Fri, Nov 11, 2011 at 9:47 PM, Jed Brown <j...@59a2.org> wrote:Part of it is the way the nodes are stored in the database. This will
> On Fri, Nov 11, 2011 at 20:39, Elliott Hird
> <penguino...@googlemail.com> wrote:
>>
>> No, I mean: tup already gets complete filesystem dependency
>> information, it just discards all paths outside the project tree.
>
> I assume it's for performance/inotify reasons. It definitely seems like an
> issue if the compiler suite was on an NFS mount.
need to be updated to provide a full view, rather than a local view of
just the project tree.
Another part as you mention is the inotify/scanning logic. This will
need to be updated to watch/scan files in directories that are outside
of the project tree.
Finally, there is the method of watching the file accesses of the
sub-process. Right now with the fuse setup, we can detect file
accesses anywhere in the file-system, but only if they use relative
paths. To detect full-path accesses (ie:
open("/usr/include/stdio.h")), the sub-process needs to execute in a
chroot environment (or maybe there's some other way to handle this?).
I have been playing around with getting chroot to work, but it is
tricky because chroot needs additional permissions, but other things
(like the sub-processes) need to run as the user.
There's fakeroot [1], which Debian packaging uses, but it's a return
to LD_PRELOAD.
A more promising avenue is UMLBox [2], which I've used, and which can
run Linux programs (under Linux only, unfortunately) with arbitrary
filesystem mount configurations without much of a speed reduction. It
takes about half a second to start up, though, which might be
considered too slow -- on the other hand, it only needs to be started
up once per build, and /only/ if there's any work to be done, so it's
not that bad. tup needing to be provided with a specially-configured
Linux kernel (not the one run on the host, just a compiled one) might
not be desirable, though.
[1] http://fakeroot.alioth.debian.org/
[2] https://bitbucket.org/GregorR/umlbox/wiki/Home
--
tup-users mailing list
email: tup-...@googlegroups.com
unsubscribe: tup-users+...@googlegroups.com
options: http://groups.google.com/group/tup-users?hl=en
What's your practical suggestion for someone with half-built tree, who just, say, changed the version of gcc. Or hacked a system h file to determine whether there's a bug there. Or scp'ed a half-build file system from a friend's freeBSD to your Ubuntu (he went to vacation, and you really need to finish the bugfix he just started, 'cause a client has called).
Tracking dependencies outside of the project root would handle all of these.
The only likely kind of dependencies that can't be feasibly tracked
are dependencies on the system clock, on random number sources, and on
environment variables. All but the last of these should be harmless in
practice.
Thanks for the links! It sounds like fakeroot won't work for us
because of the static binary issue that keeps popping up. UMLBox is
interesting. I guess we'll have to compare it's performance vs. a
chrooted fuse and see which is best.
-Mike
I don't have a practical suggestion at the moment. But adding a
'clean' command to tup is not going to help, since 'clean' is broken
by design. I'd rather fix the actual issues than introduce more
brokenness. There should be some branches to try out in a few days...
-Mike
Unfortunately I think that puts us in the same boat as 'tup clean' -
essentially it is up to the developer to try to figure out whether it
is ok to run just 'tup upd' or if you have to do 'tup upd -f'. That
kind of decision making belongs in the build system -- putting that
responsibility onto the user means that the build system is broken.
Therefore, tup is currently broken. I think it can be fixed.
-Mike
I note that these kinds of deep dependency tracking issues are also
experienced by the Nix package manager [1]. In Nix's case, it's to
eliminate undeclared ones; in tup's, it's to track them.
Unfortunately, I'm not sure a solution to one helps with the other.
There are currently 3 environment test branches out. They are as follows:
'environ':
- Tup saves the whole environment in its config table, and compares
the previous config with the current environment. If the environment
is different, all sub-processes are re-executed. This should work with
all existing Tupfiles, but can be annoying if your environment changes
for random reasons (for example, ssh'ing into a machine might have a
different environment if you are local. So switching between the two
will cause tup to update lots of stuff). Some ever-changing variables,
like PWD and OLDPWD are ignored to maintain some sanity :)
'environ-clear':
- Similar to 'environ', except the only environment variable that is
passed down is PATH. Everything else is purged from the environment,
so if for example a compiler needs a special environment variable set,
you would have to add it to your compiler rule. If the PATH changes
from one update to the next, everything is re-built. Probably more
usable than 'environ', but I'm curious to know what it would break.
'environ-export':
- Similar to 'environ-clear', only PATH is exported by default.
Anything else can be exported by using the export keyword. Eg:
Tupfile:
export FOO
: |> gcc -c bar.c ... |>
This will read FOO from the environment, and turn the command into
'FOO=value gcc -c bar.c ...'. If FOO changes in a future update, all
Tupfiles that have 'export FOO' are re-parsed. Currently there's no
way to unexport in a Tupfile, but it could probably be added if this
branch makes people happiest. This may help in not updating too much
during environment changes, but it does add some complexity to tup.
If you have any time to try out any or all of the branches, or just
have any general thoughts, please let me know what you think. I
recommend trying them out in a separate .tup directory (so re-check
out your source code, and run 'tup init' with the new tup) because if
you switch back it may confuse the older version of tup.
Thanks!
-Mike
++
This seems like the obvious best solution in retrospect. environ-clear
would probably be OK too, but this sugar seems handy.
I'm not sure what use unexport would be; surely you could just omit
the export line?
> There are currently 3 environment test branches out. They are as follows:
> If you have any time to try out any or all of the branches, or just
> have any general thoughts, please let me know what you think. I
On the subject of general thoughts, I think the environ-export is likely
the best approach. The first approach's invalidating of the build would be
really irritating every time you change something unrelated. builds should
not be affected by environment variables. I've seen setups where people
had to source a pile of C-shell common ".profile" files before the build
would work. But sometimes it can be useful to allow one or two through.
I have some GNU make include files that are used by internal projects
and it loops through all environment variables (using $(.VARIABLES)
and $(origin)) and unexports them. I have a few exceptions which I pass
through: PATH, TERM, DISPLAY, HOME, LOGNAME, MAKE%, LC_% and LANG. HOME
is needed by some commands to find their rc files. TERM and DISPLAY are
mainly there for make run/make test though some things like fop might
need a DISPLAY. For the locale stuff I explicitly export LC_COLLATE=C
and LC_NUMERIC=C. Setting LC_CTYPE can also fix some things but break
others. Thinking about it now, I also wonder whether TMPDIR should
perhaps also be allowed through.
Oliver
++
I wouldn't pass HOME; it's the kind of thing I could see affecting the
outcome of a build undesirably. Such things should be done as tup
configuration variables instead.
Right now once you do "export FOO" in a Tupfile, then FOO is exported
to the commands in all following :-rules. Since you have to write
commands in order, you might have something like this:
export FOO
: |> I need FOO here |> output.txt
: output.txt |> I don't want FOO here, but need output.txt |>
That's the only reason I think you would need an "unexport". Not sure
if there is a real-world practical example though.
-Mike
So it sounds like there are some cases here where you'd want things
other than PATH? I was hoping if it worked we could stick with
environ-clear since it is simpler, but if you need to pass things
through from the actual environment then we'd probably have to use
environ-export. (Keep in mind if you just need to explicitly set an
environment variable like LC_COLLATE=C, tup doesn't actually need a
dependency on the environment - that could just be written directly in
the :-rule for commands that need it).
-Mike
There is now also an 'environ-export2' branch, which is similar to
environ-export but tracks things a little differently. I had some
issues getting it to work in Windows with the environ-export approach.
Feel free to try it out (again, a separate workspace is probably best)
and let me know if there are any issues. I'll probably merge
environ-export2 soon if there are no major problems.
-Mike
#!/bin/bash
RPMS="gcc glibc-devel" # add any other RPMS which are used in the
build
BUILD_SYSTEM_HASH=`rpm -q $RPMS | md5sum | awk '{print $1}'` tup upd
It depends on your project. I don't know if there's an automated way
to figure it out - maybe running the whole build process with strace
would help. Or you could scan the code for system header includes and
then maybe use "rpm -qf ..." to find corresponding RPMs. Similar with
system libraries. Otherwise you need to figure out the dependencies
manually, similar like when writing RPM spec file. Determining which
other packages is not something to be done on each build if an
automated method is used since it most likely will take too long.
In such a case you can use this to be less sensitive to minor updates:
BUILD_SYSTEM_HASH=`rpm -q --qf "%{VERSION}\n" $RPMS | md5sum | awk
'{print $1}'` tup upd
I'm not sure what do you mean by major version and subminor version.
"rpm -q glibc-devel" produces
glibc-devel-2.14-5.i686
That sounds like a clever solution to me, at least until tup supports
tracking file accesses from outside the dev tree. I would think you'd
want to generate the hash as part of your OS update process (like a
user-generated post-install script if that is available). Or were you
suggesting tup do this hash automatically somehow? I don't think there
is an easy way to do this in a cross-platform manner.
-Mike
One test I tried is to use the example fuse file-system to mirror the
real fs (fusexmp.c). I compared the build.sh script in tup using a
chroot environment in the mirror fs against the real fs. (The build.sh
script just compiles a bunch of C files - it doesn't actually run
tup). In the real fs, the script ran in 10.598s, whereas in the chroot
it ran in 14.183s. So there is some overhead just from having all file
accesses go through fuse. There would of course be additional overhead
in tracking these dependencies in tup, but I don't know how
significant that would be yet.
Though I don't understand - what is the purpose of tracking mtimes for
external build dependencies, but not checking them during 'tup upd'?
-Mike
Though I don't understand - what is the purpose of tracking mtimes for
external build dependencies, but not checking them during 'tup upd'?