I'm writing a build system with different goals than tup, but love the
LD_PRELOAD way of grabbing dependencies. However, my project is BSD
licensed, and so I can't simply re-use the tup code, which is GPL.
I understand there is code copied from linux, and which is GPL v2. I
believe it would be possible to license the ld-preload.c file
separately from the rest of the project, assuming it doesn't link to
the copied code, which I believe is the case. Are my assumptions
correct? Does that seem like something you'd be willing to do?
Thanks in advance,
Paul
--
Paul Biggar
paul....@gmail.com
Hi Paul,
Sure, I have no problem licensing ldpreload under BSD. I don't think
it makes use of any of the linux code, so that shouldn't be a problem.
You would just need a socket to read the file access data in your
program. I can put an official BSD license in the source for ldpreload
later today if I remember to get to it.
So what are the goals for your build system? Do you have anything
online yet? I'm always interested in looking at what other build
systems are doing :)
Also just a heads up in case you are trying to run your build system
on FreeBSD, last time I did a standard install of it, the gcc
executable was statically linked. So, ldpreload won't get you any
dependencies that way. If you come up with a tracing mechanism there
that can handle statically linked executables, I'd be interested to
see how it's done! (I think ptrace() was limited on FreeBSD, last I
remember).
-Mike
On Wed, Mar 16, 2011 at 11:32 AM, Mike Shal <mar...@gmail.com> wrote:
> Sure, I have no problem licensing ldpreload under BSD. I don't think
> it makes use of any of the linux code, so that shouldn't be a problem.
> You would just need a socket to read the file access data in your
> program. I can put an official BSD license in the source for ldpreload
> later today if I remember to get to it.
Fantastic, thanks. I'll make sure I contribute back anything useful.
> So what are the goals for your build system? Do you have anything
> online yet? I'm always interested in looking at what other build
> systems are doing :)
It's called gin, and I'm developing it at
https://github.com/pbiggar/gin. I should add a disclaimer that it is
very early stage, and I spend more time rewriting what I already have
to be more "extensible", then actually filling in the functionality :)
As for the goals, well, I've half an essay written on them. But to
summarize it, I want to replace autotools.
The goals are to be something simple, maintainable, portable,
extensible, and fast. As such, I intend the specification file to be
largely declarative (in a similar fashion to a Makefile.am), to
include configuration (like autoconf), and work on Windows and well as
Unixes.
An important goal to be trivial to extend, and trivial to contribute
those extensions back to the larger project. The goal here is to make
it easy to use the myriad tools that are available to C/C++
development without difficulty. So simple things like profile-guided
optimization, code coverage, link-time-optimization, ctags, etc,
should be built in. Furthermore, I want to include support for
fanciful tools like include-what-you-use, sending reports of
bad-builds or test failures to the web, and package manager
integration (both for dependencies and building packages).
A motivation is to replace the aging Mozilla build system (I work for
mozilla), though I've not raised this with the maintainers, and won't
until I have something to evaluate.
> Also just a heads up in case you are trying to run your build system
> on FreeBSD, last time I did a standard install of it, the gcc
> executable was statically linked. So, ldpreload won't get you any
> dependencies that way. If you come up with a tracing mechanism there
> that can handle statically linked executables, I'd be interested to
> see how it's done! (I think ptrace() was limited on FreeBSD, last I
> remember).
Interesting. I suppose it isn't too hard to fall back to the gcc -M
flags, but it does kinda suck to have to do it. Fabricate uses access
times, so that might work, depending on the file system. Other than
that, I suppose the fallback is to build everything unconditionally.
Thanks again,
I suppose it isn't too hard to fall back to the gcc -M
flags, but it does kinda suck to have to do it.
Interesting. I suppose it isn't too hard to fall back to the gcc -M
flags,
but it does kinda suck to have to do it. Fabricate uses access
times, so that might work,
After all, you'll get the "correct" result
Aside from the (important) point that Jed mentions in a previous
reply, the ldpreload "hack" works for dependencies which are not gcc,
such as parser generators, codegen scripts, etc.
If access times are a fallback for systems which don't support an
ldpreload-like mechanism, for example FreeBSD, then there likely won't
be virus scanners involved. At the end of the day, there seems to be
no perfect solution, so its reasonable to pick the optimal solution
for the largest audience, and add fallbacks as necessary.
On Wed, Mar 16, 2011 at 20:38, Elazar Leibovich <ela...@gmail.com> wrote:After all, you'll get the "correct" resultFor some value of "correct"?
You end up needing "make clean" or similar to handle ghost dependencies.
Aside from the (important) point that Jed mentions in a previous
reply, the ldpreload "hack" works for dependencies which are not gcc,
such as parser generators, codegen scripts, etc.
If access times are a fallback for systems which don't support an
ldpreload-like mechanism, for example FreeBSD, then there likely won't
be virus scanners involved.
At the end of the day, there seems to be
no perfect solution, so its reasonable to pick the optimal solution
for the largest audience, and add fallbacks as necessary.
On just one project I work on, I use gengetopt, gperf, bison, yacc,
and a homegrown project called maketea, which forms the platform on
which we work. See
https://code.google.com/p/phc/source/browse/#svn%2Ftrunk%2Fsrc%2Fgenerated_src.
It is very very common to do this in C projects, and I do not recall
any large C/C++ project I've looked at that doesn't have this sort of
build-time code generation.
> What about the daily updatedb which will be cron'd exactly when hudson makes
> the daily build... The AV is just an example.
Sure, its not a nice solution. A better fallback would be preferred.
>> At the end of the day, there seems to be
>> no perfect solution, so its reasonable to pick the optimal solution
>> for the largest audience, and add fallbacks as necessary.
>
> I agree, but I'm not sure LD hijacking is the right compromise. I'm still
> not convinced -MM is that bad, even considering the ghost dependencies
> (which BTW the atime solution doesn't give), or the strace.
What exactly is wrong with LD hijacking, apart from that it doesn't
work on some platforms? Compared to -MM it works properly, and
supports almost any program. This suggests that -MM should be the
fallback.
You asked in a different email whether ghost dependencies were a
problem. Here is one use case: configuration using ./configure.
Suppose I have an optional dependency, which requires a certain
header. Configure commonly tests for that by compiling a simple
program which includes that header. Now suppose that the dependency is
not installed, and the user installs it later. Now we wish to
recompile with the dependency, but -MM does not support that, and so
we need to |make clean| and start again.
Other examples is updating your compiler (or whatever tools you use),
or changing your INCLUDE_PATH.
From your state.py:process() function, it looks like you're iterating
through the build tree and checking each dependency in order to decide
if you should rebuild - is that correct? If so you might want to check
out the PDF linked from the tup website for a better way of building
the DAG. The method you're using won't scale to a large project (or
maybe I'm misunderstanding your python :).
Or, have you considered using tup as the dependency manager, and
wrapping it with something to easily configure your optimization/code
coverage tools?
-Mike
It is probably pretty rare, but it is definitely frustrating and
confusing when it happens. By using make it might be even more rare
since you will be hesitant to move files around, given how often it
breaks when you do. I'm not sure what you mean by a "code smell"
though. It seems cleaner to me to have the dependency listing logic in
one place (at the file access or kernel level), rather than in every
program that you might use in a build system. While gcc may have it's
-M switches, I don't think /bin/sh, or perl, or python (etc) do.
-Mike
You're absolutely right, and I'm aware of it. I've read your paper and
using partial dependency trees seems a good way to manage this. As I
mentioned, the code is still in the very early stages, and all of this
will change (I'm currently in the middle of a big rewrite of what's
there, so it hasn't been pushed in a while).
> Or, have you considered using tup as the dependency manager, and
> wrapping it with something to easily configure your optimization/code
> coverage tools?
I did consider this, and in general I do like to reuse as much code as
possible. The problems I see with including tup, so far:
- GPL, though perhaps it wouldn't strictly be "linking"
- avoiding the mistakes of autotools
- autotools comprises makefiles and shell files built from m4 and
processed by C and Perl. I'm aiming for a pure python program, as much
as possible. I'm certainly not interested in generating tupfiles,
though perhaps we could expose a n API to python, perhaps via ctypes.
- extensibility: I want to allow arbitrary extensions to wrap nodes
in the dependency graph, add nodes of their own, and allow
dependencies which are not simply files. I'm unclear of how possible
that is in tup, and I didn't want to read the source due to the GPL
issue above.
So perhaps you can advise me on the last one. I assumed that managing
my own dependency tree would be essential. What do you think?
I'm certainly not interested in generating tupfiles,
though perhaps we could expose a n API to python, perhaps via ctypes.
It is probably pretty rare, but it is definitely frustrating and
confusing when it happens. By using make it might be even more rare
since you will be hesitant to move files around, given how often it
breaks when you do. I'm not sure what you mean by a "code smell"
though.
It seems cleaner to me to have the dependency listing logic in
one place (at the file access or kernel level), rather than in every
program that you might use in a build system.
While gcc may have it's -M switches, I don't think /bin/sh, or perl, or python (etc) do.
> I try never to have two .h files with the same name, and if I do, make sure
> no single .c file has two directories with bla.h in them in its include
> path.
Of course I want the order of the -I switches to matter. I'm likely to
hit multiple config.h files, so I put -I. first. I have the system
version of libxml.h in /usr/include, but I added -I~/libxml-latest to
my CFLAGS to make sure I got the right version. And so on.
> What if some C compiler have a
> /etc/cc.conf?
That is exactly why I want this. What if we change cc.conf? Of course
we want to pick up that change, or else we'll be wondering why our
package is miscompiled.
> You don't want it on your dependency list, do you? You can't
> know with LD hijacking which file is used for the implementation and which
> is really needed for compilation.
If the implementation changes, the compilation changes. Now there are
extremes here of course, but I trust you'd exclude /var/ccache and
/tmp and such.
--
Paul Biggar
paul....@gmail.com
Ahh, I thought you meant that having tup check for ENOENT file
accesses was a code smell. I agree that having same-named .h files all
over the place and controlling it through -I would be messy. But it is
not uncommon for a developer to move a .h file to a better location,
which may affect the build in unintended ways. The build system needs
to re-build all necessary files in this case, rather than silently
succeed on an incremental build only to fail when a new developer does
a fresh checkout.
>>
>> It seems cleaner to me to have the dependency listing logic in
>> one place (at the file access or kernel level), rather than in every
>> program that you might use in a build system.
>
> As I said, my problem with this solution is that it's shaky, different
> implementations of the compiler would give you "incorrect" results, and you
> wouldn't be able to know about it. What if some C compiler have a
> /etc/cc.conf? You don't want it on your dependency list, do you? You can't
> know with LD hijacking which file is used for the implementation and which
> is really needed for compilation.
Well tup only tracks files in the directory where you run 'tup init'
and lower, so unless you ran 'tup init' in /etc/ or /, that file won't
be tracked. But if I get your meaning, say one person's gcc looks for
different files than another person's within the tup hierarchy. Maybe
they have a modified compiler that always looks and includes "local.h"
in the current directory, or something. In this case, the DAGs will be
different - one person's will have nodes for "local.h" while the other
won't. In either case you don't have to list "local.h" in the Tupfile
- you only have to list dependencies on generated files. Since
generated files are only created by tup, they are always under your
control, and not dependent on the details of a specific C compiler.
The "local.h" node will be added and removed from the DAG
automatically as necessary.
>>
>> While gcc may have it's -M switches, I don't think /bin/sh, or perl, or
>> python (etc) do.
>
> But python or perl or sh don't need any other file for compiling a .py file
> to a .pyc file. They only need this single .py file and nothing more, and I
> think that most modern programming languages have similar features, it's
> only C's broken import system which causes all this problems in the first
> place, (and as others have said, also code generation tools which
> depends/create output for gcc).
>
Sorry, I wasn't clear - I'm not talking about building .py into .pyc.
What I mean is if I use a perl or python script as part of the build
process (like generating a C file) -
: |> foo.pl |> generated.c
: generated.c |> gcc ... |> generated.o
We need to track the file accesses of the foo.pl script itself in
order to make sure it actually writes to "generated.c". If someone
comes along and edits foo.pl so it now writes to "foo-generated.c" but
forgets to update the Tupfile, they should get an error message that
it's doing something wrong. In make if you do something like that,
things will silently succeed if you still have "generated.c" lying
around, and you'll be none the wiser until you do a clean build or
fresh checkout. Similarly, if foo.pl ends up reading from another file
(like "generated.c.in" or something), we need to add that file as an
input dependency. So what I mean when I say that perl has no -M
switch, is that anytime it does an open(FOO, "generated.c.in"), it
won't automatically store the "generated.c.in" string in a foo.pl.d
file. It's possible someone will edit foo.pl to read from another file
as well (generated.base or some such). If you are expecting all
developers who ever edit foo.pl to remember to re-list every file
access as a dependency in the build description file, you'll find no
end of trouble.
-Mike
What is your thinking here? I'm assuming you want to replace the
parser stage, and keep tup as-is after that? So you could do something
like this:
tup.start_directory("foo")
tup.add_rule(inputs, "gcc -c blah", outputs)
tup.end_directory()
...
tup.build()
The start/end_directory functions would tell tup that all rules in
between will be run in that directory (I think it would need to behave
this way, so tup knows when a command and its output files need to be
deleted). Then the build() function would go through and delete the
unused outputs execute the main updater logic.
How would your main python script know when it needs to re-generate
the rules for a certain directory? For example, if you create a new .c
file in the "foo" directory, it would presumably want to create a new
gcc command for that file. Tup does this now by re-parsing any
directory that has new or deleted files (or is dependent on
directories that do). Would you be planning to re-implement that logic
in python? Or would you always generate all the rules? (That would be
really ineffecient). Or use tup to decide when to re-parse a directory
and tell your python script which directories it needs to re-generate?
If it's this case, is it really that different from the runscr branch?
-Mike
Mike Shal <mar...@gmail.com> wrote:
>--
>tup-users mailing list
>email: tup-...@googlegroups.com
>unsubscribe: tup-users+...@googlegroups.com
>options: http://groups.google.com/group/tup-users?hl=en
The plan (at least on Linux) is to look at the ELF header for
PT_DYNAMIC segments - if it has them, use ldpreload, and if not, use
ptrace(). The current ptrace branch is out of date with the changes
that have been made to support Windows, so it needs to be
re-implemented anyway. Also, ptrace is highly platform, and even
machine, specific. As I recall, ptrace() could also be used on
OpenBSD, with just some different flags. But I think FreeBSD lacks the
ability to automatically trace a grand-child when the child forks. OSX
lacks that as well as the ability to automatically stop on the next
syscall. There are probably other tracing mechanisms that can be used
on those platforms. I think someone mentioned a tool for OSX when this
came up last, but I can't seem to find the thread at the moment :(
-Mike
# Clone Mike repository (unless you did not do this)
git clone git://gittup.org/tup.git
cd tup
# Then checkout the branch
git checkout runscr
See some info here
http://stackoverflow.com/questions/315911/git-for-beginners-the-definitive-practical-guide
Even better would be if the builders themselves could be written in
Python; e.g.,
tup.add_rule(inputs, GccBuilder(opts='-g'), outputs)
where the middle argument should be a Python object that implements some
specified interface (call it "Builder"). This way the rule could be
executed without (necessarily) forking a new process, and the main tup
script could interact with the builder via Python function calls,
parameter passing, etc. rather than having to squeeze all interactions
through command-line arguments.
There would presumably be a requirement that the "builder" argument is
serializable (which Python objects are by default using the pickle
module). This way (1) the serialized form could be stored in the
database, so that only rules that need to be executed need to be
unpickled (rather than having to re-run the tupscript every time to
regenerate the builder instances) and (2) the string form of the
serialized builders could be compared and if they are not identical tup
would know that the builder should be re-run. I believe that it would
be possible to do some Python import magic to determine the Python files
that the build procedure depends on.
Of course if the builder is a Python callable that runs in the same
process as tup, then the usual LD_PRELOAD trick will not be able to be
used to police builder inputs and outputs, so the builder API should
provide a way for the builder itself to specify these. One of the
Builder classes would do the current tup thing (i.e., fork a separate
process and monitor its inputs and outputs using LD_PRELOAD), but other
builders could do other things. This flexibility to choose a different
policy would also be an improvement IMHO.
I don't know whether you want to take tup in this direction, away from
specifying strict policy about how everything has to be done and being
able to make some near-guarantees that incremental builds work by
construction and towards giving the user more flexibility but more
chances to screw things up. But it seems to me that a "libtup" would
allow people to use the very cool tup dependency management engine even
if they (for example) don't want to use the parser. For example,
perhaps it would be possible to write a parser that reads Makefiles or
Ant build files and generates tup rules on the fly; such a thing would
be a great tool for transitioning from make/ant to tup incrementally.
Over time, make/ant rules could be rewritten as tup rules (in whatever
syntax).
Michael
--
Michael Haggerty
mha...@alum.mit.edu
http://softwareswirl.blogspot.com/
I really don't know. I don't think that having tup be the backend that
does everything is going to be the right option, since I think there
is going to be an impedance mismatch between the two. But I'm not
really certain, since I don't know a great deal about the internals of
tup, nor about what we're going to need for gin.
For example, I don't believe our notions of dependency graphs are
going to be compatible. There may be some mapping from one to the
other though, I'm unclear.
> How would your main python script know when it needs to re-generate
> the rules for a certain directory? For example, if you create a new .c
> file in the "foo" directory, it would presumably want to create a new
> gcc command for that file. Tup does this now by re-parsing any
> directory that has new or deleted files (or is dependent on
> directories that do). Would you be planning to re-implement that logic
> in python? Or would you always generate all the rules? (That would be
> really ineffecient). Or use tup to decide when to re-parse a directory
> and tell your python script which directories it needs to re-generate?
I think we're speaking a different language here. It seems that you're
assuming a Tup-like dependency-graph at least, but I'm not sure that
that's what I have in mind.
> If it's this case, is it really that different from the runscr branch?
What does the runscr branch do? I checked it out and I don't see any
description of it.
Maybe we can look at it a different way: what services that tup
provides can be easily separated from the core and exposed as an API?
Ideally, tup would use the APIs too.
Can you explain what you mean by a "builder" here? It seems you're
mixing up a python object with what tup calls a "command" (where a
"command" is something that is ultimately run verbatim by /bin/sh). I
don't plan to have tup store python objects in the database, since tup
is not python. If there was a libtup interface that allowed a python
parser to interact with tup, then my concern was to determine how that
interaction would take place. I think tup would still need to be the
one who keeps track of which commands and outputs are still generated,
since the python part wouldn't have direct access to that. Then the
python part would just need to export rules, which would consist of a
list of inputs, a command, and a list of outputs.
The concern that the python script itself wouldn't be examined for
dependencies is a valid one, and would need to be considered.
>
> I don't know whether you want to take tup in this direction, away from
> specifying strict policy about how everything has to be done and being
> able to make some near-guarantees that incremental builds work by
> construction and towards giving the user more flexibility but more
> chances to screw things up. But it seems to me that a "libtup" would
> allow people to use the very cool tup dependency management engine even
> if they (for example) don't want to use the parser. For example,
> perhaps it would be possible to write a parser that reads Makefiles or
> Ant build files and generates tup rules on the fly; such a thing would
> be a great tool for transitioning from make/ant to tup incrementally.
> Over time, make/ant rules could be rewritten as tup rules (in whatever
> syntax).
Such a thing might be possible with the runscr branch, when that is
finished - I don't know for sure though. In general, I am opposed to
anything that takes away from the incremental build guarantees. If the
user has a way to screw things up without getting an error message, I
consider that a bug. (And there are ways to do that in tup - for
example, by running static executables. There are plans to fix that,
though).
-Mike
Ahh, ok. Tup might not be the right approach for your case then :)
>
>
>
>> If it's this case, is it really that different from the runscr branch?
>
> What does the runscr branch do? I checked it out and I don't see any
> description of it.
It is just supposed to add a 'run' command that let's you run an
external script to generate rules. You still create a Tupfile, but the
Tupfile might just have:
run ./generate_rules.py
The python script would then output a list of :-rules to stdout. There
are still issues wrt the external script calling opendir()/readdir(),
since tup doesn't account for the generated files in this case.
>
>
> Maybe we can look at it a different way: what services that tup
> provides can be easily separated from the core and exposed as an API?
> Ideally, tup would use the APIs too.
I think the only one that makes sense at this point is the ldpreload
part. It's already a separate library anyway. Even though there is a
libtup.a generated as part of the build process, I'm not ready to nail
down a solid external interface at this point.
-Mike
That seems reasonable. Certainly, we'll work with the ldpreload part
for now. I'm largely concerned with extensibility and having a nice
front-end, so I can just knock up a crude back end for now. When it
inevitably becomes a bottleneck, we can revisit this and we'll
probably both have more of an idea of what that means.
Thanks again,
It is just supposed to add a 'run' command that let's you run an
external script to generate rules. You still create a Tupfile, but the
Tupfile might just have:
run ./generate_rules.py
The python script would then output a list of :-rules to stdout.