Jason,
What happens if credo creates a file which the user subsequently
modifies. Will credo clobber the user changes or recognize they are
there? What if it is called to create a file that does have an
associated do file but has already been first created by the user?
--
________________________
Warm Regards
Prakhar Goel
e-mail: newt...@gmail.com, newt...@caltech.edu
LinkedIn Profile: http://www.linkedin.com/in/newt0311
Cell: (972) 992-8078
"The real universe is always one step beyond logic." --- Frank Herbert, Dune
Yeah, as Prakhar says, it's a little convoluted and doesn't really
work right in odd situations. This is a bug in the implementation,
not really a bug in the fundamental concepts of redo.
The reason we don't want to have a redo "project" exactly is that
sometimes, projects are embedded into other projects as subprojects.
Imagine someone didn't want to depend on the recipient of a project
having a copy of redo installed; they might take the redo source repo
and make it a subdir of their project. In that case, the two projects
need to combine into a single project, and all the .do files should
cooperate with each other.
With the system as it is now, that *almost* works right. It works
fine as long as .redo gets created at the top level of your combined
project. And that's exactly what redo will do, as long as your first
run of 'redo' is at the top level. But that's obviously a little
error prone.
As Prakhar says, the most likely solution to this (which is much
better conceptually than what we have now) is to have one .redo
directory in each target dir. That way, you could actually depend on
files outside your current project (../src/gtk/libgtk.a), and have
them get built correctly. It also avoids problems with symlinks and
absolute paths. (redo is *pretty* good with those already, but I know
there are a few cases it gets wrong.)
As for default.do files in the parent directory of the redo project...
good point. I think that in general we need a way for a default*.do
to say "I don't know how to build this, pretend I don't exist" instead
of "I don't know how to do this, fail." Otherwise creating a
default.do can cause rather odd results. This hasn't been properly
sorted out yet.
Have fun,
Avery
What about files like default.do? Just the fact that default.do
exists shouldn't mean that every file in $PWD must be generated.
Avery
I believe the name default.do means different things to credo and
redo. In credo, it's how to build a target (possibly a file) named
default. It has no role in building targets generally, nor
generated-file checking. In redo, if I take it correctly, it's
something that is applied as a do script for all files, or only those
without their own do script.
Let's take as an example default.o.do, which I assume is interpreted
similarly in both redo and credo as the do file to call when c?redo
needs to make a *.o file.[1] The presence of this file does tell
credo that every *.o file in the directory should be generated from
it, if the *.o file is out-of-date, regardless whether the file is
present or not, unless its *.sum file is out-of-date. Iterate this
over default.*.do with * standing in for any file extension.
[1] Presumably from *.c files, and headers, and the values of shell
variables like $cc and $ccflags; but this depends on the content of
the default.o.do file.
This discussion started by asking what a ``redo project'' was and I want
to get back to that for a bit.
Maybe there is a really obvious reason I'm missing, but it seems that a
lot of these problems would be fixed by having an explicit ``redo init''
that creates the ``.redo'' directory the first time in (presumably) the
right place, and if not it's easy to figure out what happened. This
solidifies to the user that the project is everything under the
directory they just created. This means that redo should fail if it
walks up the tree and doesn't find it. It also seems that having
sub-projects would work, though because redo would have to call itself
in the sub-project (with a different CWD?) it won't be as efficient as
possible in dependency checking across sub-projects, but the gain of
knowing what's what seems worth it.
Maybe I'm just desperately trying to find a way to not have a ``.redo''
in each directory because I don't like that it's non-obvious where the
info for system files (``/usr/include/something.h'') goes and it implies
that each directory (at every level) can be treated independently.
Thoughts?
-- San
--
Those who are willing to trade freedom for security deserve
neither freedom nor security.
-- Benjamin Franklin
I'm just not really convinced that a "redo project" should mean
anything. I've never heard of a "make project" either. Redo is just
a tool you run on files in your project. Just the fact that you run
gcc doesn't make it a "gcc project." :)
> Maybe there is a really obvious reason I'm missing, but it seems that a
> lot of these problems would be fixed by having an explicit ``redo init''
> that creates the ``.redo'' directory the first time in (presumably) the
> right place, and if not it's easy to figure out what happened. This
> solidifies to the user that the project is everything under the
> directory they just created. This means that redo should fail if it
> walks up the tree and doesn't find it.
Yeah, this is not a bad idea. If I hadn't been talked into the
one-.redo-per-target-directory idea, I might be able to be talked into
this one :)
The down side is that you have to actually run redo-init, though.
That's an annoying extra step compared to make. On the other hand,
the increase in predictability would be nice.
> It also seems that having
> sub-projects would work, though because redo would have to call itself
> in the sub-project (with a different CWD?) it won't be as efficient as
> possible in dependency checking across sub-projects, but the gain of
> knowing what's what seems worth it.
That's not actually a problem at all. redo knows how to check all the
dependencies in the entire database without forking, even with nested
"projects" in subdirs.
> Maybe I'm just desperately trying to find a way to not have a ``.redo''
> in each directory because I don't like that it's non-obvious where the
> info for system files (``/usr/include/something.h'') goes and it implies
> that each directory (at every level) can be treated independently.
I was worried about that too when I first heard of this idea, but the
answer is rather elegant. You don't need a .redo in the directories
containing *source* files - only in the directories containing
*target* files. The .redo database contains, for each target in that
directory, a list of paths to dependencies (source files). When
checking those dependencies, redo can look to see if there's a .redo
in the dependency's directory; if there is, follow the chain
recursively. If there isn't (and there are no matching .do files),
then obviously the files in that directory are not generated, so they
don't have any dependencies, so there is no need for a .redo file in
there. Thus, it doesn't matter that /usr/include isn't writable; we
don't need to write anything.
If you think about it, this also completely resolves any questions
with symlinks, relative vs. absolute paths, and so on. The directory
containing a given file also contains the .redo with its dependency
information, and you don't need to normalize paths for that to work.
Have fun,
Avery
The reason we don't want to have a redo "project" exactly is that
sometimes, projects are embedded into other projects as subprojects.
With the system as it is now, that *almost* works right. It works
fine as long as .redo gets created at the top level of your combined
project. And that's exactly what redo will do, as long as your first
run of 'redo' is at the top level. But that's obviously a little
error prone.
As Prakhar says, the most likely solution to this (which is much
better conceptually than what we have now) is to have one .redo
directory in each target dir.
As for default.do files in the parent directory of the redo project...
good point.
Yeah, that would work, pretty much. The problem is then if you decide
redo has done something wrong (it *is* just version 0.06, after all)
you have to wipe out the entire .redo for your entire account, rather
than just for a particular project. Other than that, having a single
centralized db ought to be fine.
> Back to the following case (from my first mail):
> $ cd p && redo a && cd .. && redo p/a
> ->fails
> The .redo dir is first created in p, but then in its parent p/..
> so that I can redo something in p only if I am in p.
Yes. If you instead do:
redo p/a && cd p && redo a && cd .. && redo p/a
It will work fine, because it'll create a toplevel .redo.
In fact, you can just do 'mkdir .redo' at the toplevel and that'll
also work fine.
> Isn't it possible, as a first fix, to literally 'cd /path/to && redo
> target'
> (maybe 'cd -P' ?) each time we have 'redo /path/to/target', and take
> the innermost .redo dir as the correct one?
The current code just doesn't work like that; it assumes all the
dependencies are in a single database. Recursively calling another
copy of redo just to check dependencies would bring us back all the
problems of recursive make, which would be awful. Plus, just doing a
"mkdir .redo" at the top level is much easier and doesn't have any of
these downsides.
> instead of one .redo per dir, move that dir inside a cloned hierarchy
> rooted at e.g. ~/.redoroot (having ~/.redoroot/a/b/.redo instead of
> /a/b/.redo, or more likely named directly ~/.redoroot/a/b).
That wouldn't give the very nice advantages of built-in symlink
resolution. Let's say /a/b is a symlink to /a/x. Then when I redo
/a/b/q, should the .redo dir be .redoroot/a/b/file or
.redoroot/a/x/file? What if I *then* do this?
rm a/b
mv a/x a/b
redo a/b/q
With one .redo per dir, that will magically work perfectly. With a
shadow tree (or the current redo implementation), it won't.
> At least it
> allows to "wipe out the .redo for exactly one project" (cf.
> "The .redo directory problem" discussion).
That's really not that hard anyway:
cd ~/project
find -name '.redo' | xargs rm -rf
Or just 'git clean -fdx' if you're using git.
> Then the user could choose (via e.g. ~/.redorc) between the two
> options.
Currently redo has no global configuration, which is nice (make
doesn't either). I wouldn't want to add it unless we had a *very*
good reason.
Have fun,
Avery
However, I just found a case where it makes a difference:
[example removed]
A related problem: I tried to run the tests from the redo
sources, and this failed because the redo's parent
directory contained a default.do file (assume this file
only contains 'false'). So currently, the build of a
project depends on what is outside the root of the project,
even though the implementation tries hard not to....
Thinking a bit outside the box here -- is having redo recurse up through
parent directories to find .do files really that desirable, or is it
what's causing a lot of these problems? Admittedly, if you have a big
project and build .o files in the same way, it's convenient to have
the master default.o.do at the top; on the other hand, it's not that
hard to add a per-directory default.o.do that says something like
O_DO=../../../redo/default.o.do
redo-ifchange $O_DO
. $O_DO
Then the whole question of whether there are files outside the
project goes away.
Thoughts?
Bill
Well, with a redo --init, I assume subprojects will just be subsumed
under the parent project. I don't see the value of including a .redo
directory in SCM. Actually, efficiency is the biggest problem with the
per-dir solution: linux FSs (and most FSs in general) work a lot
better with a few large files than a lot of tiny files.
> Maybe I'm just desperately trying to find a way to not have a ``.redo''
> in each directory because I don't like that it's non-obvious where the
> info for system files (``/usr/include/something.h'') goes
There is no per-system-file info because these are not managed by
redo. There is information kept on their mtime, etc.... In this case,
one copy is stored with every file that depends on them. I don't think
this is a major concern.
I.e. if ~/proj/foo.o depends on /usr/include/something.h then mtime
information/hashes/etc... is kept somewhere in ~/proj/.redo. If
~/proj/bar/foo2.o also depends on something.h, then another copy of
the hashes and mtime information is kept in ~/proj/bar/.redo. This may
initially seem redundant but consider the case where seomthing.h is
changed and foo2.o is rebuilt without rebuilding foo.o. In that case,
the multiple copies are essential.
Currently, the system records the runid of when a file was last
rebuilt (right, Avery?) but this is imo much more complicated than the
info-copy solution.
> and it implies
> that each directory (at every level) can be treated independently.
Yes, but why is this a problem? I thought this was the entire point of
the per-dir system.
I can't tell if you're saying it's non-obvious because you don't know
or because you don't think it's not intuitive. For the former, my
understanding of the the proposal has been to put the .redo files in
the target's directory, so all the information for the headers for a
particular object file would be stored in the same folder as that
object file. Although that makes me curious -- if we went the
per-directory .redo way of doing it, would different folders
containing object files within a project all end up independently
tracking the same system headers? That would seem to be an argument in
favor of "redo init", though that breaks compatibility with djb's redo
in a limited way.
It is not as redundant as you think to have multiple copies of the
system file information as you think. What if the system headers
changed and one set of object files were recompiled but the others
weren't? Then having multiple copies become critical. Even the present
system stores multiple copies with with an indirect link between them:
the runid information. Personally, I don't think the multiple copies
are a problem.
Ha, that's what redo used to do, and after some discussion, I was
convinced to implement the search-up-the-tree mode :)
So far, I'm still convinced. Certain things are just much cleaner and
more elegant with search-up-the-tree. Most importantly, we can
automatically set $PWD to the directory containing the toplevel
default.o.do file in a source tree when running it, and the target
filename will be path/to/subdir/filename.o, which you can turn into
the source file named path/to/subdir/filename.c, and all your
-Ipath/to/include options can be identical no matter where in the tree
your source file is located. (ie. you don't have to use absolute
pathnames, and you don't have to adapt the relative pathnames
depending on your location in the tree).
I used redo before and after this change, and I definitely prefer it
the new way - a bunch of annoying stuff suddenly just got easier.
Incidentally, the actual problem being reported here - where a
toplevel default.do caused confusion in child directories - should be
a very rare case and is a bit of a red herring. It only causes
trouble if:
- you take an existing project that depends on a source file that
doesn't have a .do
- you insert that project as a subdir of your project
- you have a matching default*.do file in your containing project
(In this thread's example, redo's self-tests have a test to confirm
that we don't have a default.do, and it gets defeated if you embed
redo as a subproject of something with a toplevel default.do.)
Sucks, right? Except it's completely avoidable, and should never
catch you by surprise. The people who made the subproject can just do
their own thing in their own project. The person embedding that
subproject has control over the containing project they're pulling the
subproject into, so if something goes wrong, they can just... stop
putting a toplevel default.do into their project.
So this really doesn't come up very much, I should think. Still,
giving a default*.do file the ability to say "oops, I don't know how
to build that, pretend I don't exist" *might* be nice, aside from the
extra complexity it would add.
Another, simpler option would be to have a '.redo-top' file or
something that tells redo never to look for default*.do files above
that directory. But then that depends on the *subproject* having such
a file, even though the subproject, by itself, doesn't have any
problems even without that file. So it's kind of inelegant.
Have fun,
Avery
Right, for source files we just track a few things like mtime.
Together this is called the "stamp" in state.py.
> I.e. if ~/proj/foo.o depends on /usr/include/something.h then mtime
> information/hashes/etc... is kept somewhere in ~/proj/.redo. If
> ~/proj/bar/foo2.o also depends on something.h, then another copy of
> the hashes and mtime information is kept in ~/proj/bar/.redo. This may
> initially seem redundant but consider the case where seomthing.h is
> changed and foo2.o is rebuilt without rebuilding foo.o. In that case,
> the multiple copies are essential.
>
> Currently, the system records the runid of when a file was last
> rebuilt (right, Avery?) but this is imo much more complicated than the
> info-copy solution.
Well, I think you've slightly misinterpreted the reason for the mild
insanity that is the runid :)
The reason current redo does it in such an odd way is to avoid the
need to stat() and re-stat each of the source files, in redo
subprocesses, after the first time.
Let's imagine that:
a depends on b depends on c depends on d,
and also,
x depends on y depends on z depends on d.
So to check dependencies for a, you have to stat b, c, and d. If any
of them are dirty, run a.do. Then we have to check the dependencies
for b, c, and d, and if any are dirty, run c.do, and so on. Similarly
for the chain to build x.
There's a lot of redundant statting in there. If I "redo-ifchange a
x", I end up statting d a *lot* of times, c and z somewhat less, b and
y a bit less still, and a and x only once each. But really, there's
no reason to stat() any of the 7 files any more than once each.
On some operating systems, stat() on a file is really fast after the
first time, because it stays in cache. But on other operating systems
(Windows, so very slow at filesystem operations) or on network
filesystems, stat() can be *really* slow, and doing it repeatedly
makes the difference between a horrendously slow "null build" (where
all dependencies are up to do) vs. a fast one.
Thus, redo tries to reduce statting to a minimum, and just stores the
stamps in its .redo database. Once it determines that d, c, and b are
up to date, then you can ask about the up-to-dateness of 'b' as many
times as you want, and it never has to re-stat any of the three. But
this is only applicable inside a single run of redo, because once redo
finishes, it knows that 'd' might change at any time, so it has to
check again. This difference between the current and subsequent runs
is the runid.
So anyway, all that is to say that if we have one .redo database per
target directory, sharing stat() information ends up being a little
more complicated; even though storing stamp information for sources in
each target directory looks pretty clean and would allow
*correctness*, we still need to implement the "runid" stuff *somehow*
for speed... it's just that doing it in a temporary in-memory database
or shared memory or something might be the smarter and cleaner way to
go, and then the actual file storage format can be clean, and it's
only the implementation that isn't as pretty :)
By the way, if you're wondering why looking stuff up in a database is
faster than doing stat(), it's for two reasons. First of all, once
you just mmap() the database file, it's *zero* syscalls to look stuff
up, and (especially on Windows) syscalls are disgustingly expensive.
Secondly, you get locality of reference: if I'm looking up multiple
related dependencies, they'll be close together on the disk, because
they're all in a single database file. Thirdly, NFS tends to cache
metadata (like stat()) for less time than file contents.
Have fun,
Avery