meaning of a redo project

106 views
Skip to first unread message

Rafaël Fourquet

unread,
May 11, 2011, 11:34:46 AM5/11/11
to redo
I'm new to redo since yesterday (and already addicted, thanks!), so I
can miss the obvious and my comments below may only point out some non
trivial (for a new user) aspects of redo.

I don't get how a redo project is separate from the outside,
and how strong this notion is. In git for example you have to
'git init' (or clone...) a new repo, but I don't see an equivalent for
redo. What happens (IIUC) is that a .redo directory is created if none
exists in the current one, or in one of its parents. But if I have
redo something at my root file-system (forget about permissions for
the
example), which creates /.redo, then there won't be any new "redo
project" on my system, because any subsequent redo will fall back on
the root one. So from the user perspective, it seems that there is no
"project" concept (which is fine for me, with the recursive nature of
redo). Is this the case?

Also, the redo man-page says (DISCUSSION part) that for a target named
../a/b/xtarget.y, redo looks for a .do file up to $PWD/../default.do.
So if the above is true, this is not the last one checked:
$PWD/../../default.do and so on will also be checked, up to /
default.do,
right?

However, I just found a case where it makes a difference:
$ mkdir p && cd p
$ echo 'echo' > a.do
$ redo a
$ ls -A
a a.do .redo
$ cd ..
$ ls -A
p
$ redo p/a
redo p/a: exists and not marked as generated; not redoing.
$ ls -A
p .redo

So this seems to contradict the man page, which says:
"... the following two commands always have exactly identical
behaviour:
redo path/to/target
cd path/to && redo target
"
Except if we consider that if I then do the following, the output is
now the same (failure):

$ cd p
$ redo a
redo a: exists and not marked as generated; not redoing.

So, is the rule only true if only one .redo dir is implied? Then the
'redo project' concept is visible to the user.


A related problem: I tried to run the tests from the redo sources, and
this failed because the redo's parent [1] directory contained a
default.do file (assume this file only contains 'false'). So
currently, the build of a project depends on what is outside the root
of the project, even though the implementation tries hard not to (cf.
comments in the '_possible_do_files' function). In this case, a
clearly defined notion of project would help, with a rule like: "don't
search .do files across the project's root dir" (I'm not particularly
suggesting anything, as I don't know the whole picture).

A last question: is it allowed to give absolute paths to redo,
like : "redo /absolute/path/to/my/target" ?
(I ask because I got different behaviours if the absolute path
goes through a symlink or not)

Thanks for any light on these questions!


[1] my redo repo is at: '/usr/local/src/redo'. A surprising thing is
that if I move the default.do in the root dir '/', all tests succeed,
but fail if the file is in /usr/, /usr/local/, or /usr/local/src.

Jason Catena

unread,
May 11, 2011, 3:35:21 PM5/11/11
to redo
On May 11, 11:34 am, Rafaël Fourquet <fourque...@gmail.com> wrote:
> $ mkdir p && cd p
> $ echo 'echo' > a.do

The file p/a does not exist, so

> $ redo a

succeeds.

> $ ls -A
> a  a.do  .redo
> $ cd ..
> $ ls -A
> p

p/a exists from the previous run, so

> $ redo p/a
> redo  p/a: exists and not marked as generated; not redoing.

fails.

> $ ls -A
> p  .redo
>
> So this seems to contradict the man page, which says:
> "... the following two commands always have exactly identical
> behaviour:
> redo path/to/target
> cd path/to && redo target
> "
> Except if we consider that if I then do the following, the output is
> now the same (failure):
>

p/a still exists from the previous run, so

> $ cd p
> $ redo a
> redo  a: exists and not marked as generated; not redoing.

still fails.

> So, is the rule only true if only one .redo dir is implied? Then the
> 'redo project' concept is visible to the user.

The rule is only true if the state of the system is the same. By
running redo a, you changed the state of the system, and therefore the
starting conditions for redo. The second and third redo a start from
a different state (a exists) than the first (redo does not exist).

You also have to do something else to say that a is a file that redo
manages, so it doesn't bail out when it finds that a exists. I'm not
sure what that is, since my variation (credo) doesn't track meta-
information about what it's supposed to manage. If there's a *.do
file, credo checks its separate *.dep file (md5sums, the last time *
built, of files on which * depends) for out-of-date dependencies, and
goes ahead and runs the *.do file just to create *.

Jason Catena

Prakhar Goel

unread,
May 11, 2011, 3:44:34 PM5/11/11
to Jason Catena, redo
Nope, Rafael is correct in that the current semantics are broken. The
"exists and not marked as generated; not redoing." should only happen
if the user modifies a file. That redo throws this warning in his
example is evidence of a bug. In fact, it is a bug that we know about:
the location of the .redo directory is not well specified. When this
first error is thrown, redo creates a new .redo directory in his
current dir and this new .redo db has no information about p/a.
Therefore redo thinks that p/a is a file that the _user_ created and
of course, we don't want to clobber a _user_ created file and
therefore it quits. The reality of course is that p/a was created by
redo according to a.do and therefore should be managed by redo. When
we cd into p, the old .redo dir is still floating around which is why
the behavior does not change. Essentially, the old .redo dir in p has
been made irrelevant by the new .redo dir. The solution (one possible
one of course) is to use the .redo dir per directory design. We still
don't have any concept of a redo "project" but now it doesn't matter
because information is kept not by project but by file.

Jason,

What happens if credo creates a file which the user subsequently
modifies. Will credo clobber the user changes or recognize they are
there? What if it is called to create a file that does have an
associated do file but has already been first created by the user?
--
________________________
Warm Regards
Prakhar Goel
e-mail: newt...@gmail.com, newt...@caltech.edu
LinkedIn Profile: http://www.linkedin.com/in/newt0311
Cell: (972) 992-8078

"The real universe is always one step beyond logic." --- Frank Herbert, Dune

Avery Pennarun

unread,
May 11, 2011, 4:37:15 PM5/11/11
to Rafaël Fourquet, redo
2011/5/11 Rafaël Fourquet <fourq...@gmail.com>:

> I don't get how a redo project is separate from the outside,
> and how strong this notion is. In git for example you have to
> 'git init' (or clone...) a new repo, but I don't see an equivalent for
> redo. What happens (IIUC) is that a .redo directory is created if none
> exists in the current one, or in one of its parents. But if I have
> redo something at my root file-system (forget about permissions for
> the
> example), which creates /.redo, then there won't be any new "redo
> project" on my system, because any subsequent redo will fall back on
> the root one. So from the user perspective, it seems that there is no
> "project" concept (which is fine for me, with the recursive nature of
> redo). Is this the case?

Yeah, as Prakhar says, it's a little convoluted and doesn't really
work right in odd situations. This is a bug in the implementation,
not really a bug in the fundamental concepts of redo.

The reason we don't want to have a redo "project" exactly is that
sometimes, projects are embedded into other projects as subprojects.
Imagine someone didn't want to depend on the recipient of a project
having a copy of redo installed; they might take the redo source repo
and make it a subdir of their project. In that case, the two projects
need to combine into a single project, and all the .do files should
cooperate with each other.

With the system as it is now, that *almost* works right. It works
fine as long as .redo gets created at the top level of your combined
project. And that's exactly what redo will do, as long as your first
run of 'redo' is at the top level. But that's obviously a little
error prone.

As Prakhar says, the most likely solution to this (which is much
better conceptually than what we have now) is to have one .redo
directory in each target dir. That way, you could actually depend on
files outside your current project (../src/gtk/libgtk.a), and have
them get built correctly. It also avoids problems with symlinks and
absolute paths. (redo is *pretty* good with those already, but I know
there are a few cases it gets wrong.)

As for default.do files in the parent directory of the redo project...
good point. I think that in general we need a way for a default*.do
to say "I don't know how to build this, pretend I don't exist" instead
of "I don't know how to do this, fail." Otherwise creating a
default.do can cause rather odd results. This hasn't been properly
sorted out yet.

Have fun,

Avery

Jason Catena

unread,
May 11, 2011, 4:40:32 PM5/11/11
to redo
On May 11, 3:44 pm, Prakhar Goel <newt0...@gmail.com> wrote:
> Nope, Rafael is correct in that the current semantics are broken.

Sorry for confusing the issue.

> Jason,
> What happens if credo creates a file which the user subsequently
> modifies. Will credo clobber the user changes or recognize they are
> there?

credo creates a *.sum file for the target, which stores the md5sum of
the target the last time credo built it. If the current md5sum of the
file is different than this stored value, then credo assumes that
something other than credo changed it, and (like redo) says it won't
regenerate it.

> What if it is called to create a file that does have an
> associated do file but has already been first created by the user?

I found three different ways to say "credo will clobber it".

If there is a target file, and its *.do file, but no *.sum file, then
either the target was created by something else, or (more commonly)
something deleted the *.sum file. If the user deletes the *.sum file
after a build (eg, out of frustration at all these annoying files
cluttering up the directory), then if credo takes the absence of a
*.sum file as evidence that the user created the file, then I've
locked up the system (credo won't rebuild the file) until the user
recreates the *.sum file with a matching md5sum (admittedly as simple
as "md5sum a > a.sum").

Stated another way: I could check whether the target has a *.sum file,
and refuse to rebuild it if there isn't one and the target exists.
However, I expect users may delete all the files that comprise the
dependency-checking state (eg, from rage at credo for not recompiling
when the user thinks it should), so I don't want that possibly common
user action to lock up the system. I would rather credo start over
from scratch, and again build up the state of the system.

Restated yet again: If the user wants credo ignore a file with a *.do
script, the user can either build it once and modify it, or create a
*.sum file with a mismatching (eg, 0) md5sum (echo '0 a' with a tab
separator). Just creating the target file is not enough, since
credo's imperative is to rebuild files with *.do files, and without a
*.sum file it assumes the user has messed with the files in which it
stores its state. It may mean I build one more time than I would if I
had perfect information, but I'd rather build once more than lock up
the system.


You might ask yourself at this point whether it's really a good idea
for credo to store its state out in the open, divided up into small
files, cluttering up the directory. I agree that it's less pleasant
to edit all the little files out of one's view of the directory.
However, I believe the greater good is that the state of the rebuild
system, and all its little scripts and dependency lists, /is/ so
accessible and modifiable. This directly supports adding other tools
over top of credo that use its state and files, without adding
additional features to credo. This means credo can stick to its core
function of checking dependency lists and calling itself, and leaving
the rest to other tools. (Credo is currently only 110 lines of actual
code spread over 5 shell scripts that call each other).

Jason Catena

Avery Pennarun

unread,
May 11, 2011, 4:57:01 PM5/11/11
to Jason Catena, redo
On Wed, May 11, 2011 at 4:40 PM, Jason Catena <jason....@gmail.com> wrote:
> Restated yet again: If the user wants credo ignore a file with a *.do
> script, [...]

What about files like default.do? Just the fact that default.do
exists shouldn't mean that every file in $PWD must be generated.

Avery

Jason Catena

unread,
May 11, 2011, 5:25:42 PM5/11/11
to Avery Pennarun, redo

I believe the name default.do means different things to credo and
redo. In credo, it's how to build a target (possibly a file) named
default. It has no role in building targets generally, nor
generated-file checking. In redo, if I take it correctly, it's
something that is applied as a do script for all files, or only those
without their own do script.

Let's take as an example default.o.do, which I assume is interpreted
similarly in both redo and credo as the do file to call when c?redo
needs to make a *.o file.[1] The presence of this file does tell
credo that every *.o file in the directory should be generated from
it, if the *.o file is out-of-date, regardless whether the file is
present or not, unless its *.sum file is out-of-date. Iterate this
over default.*.do with * standing in for any file extension.
[1] Presumably from *.c files, and headers, and the values of shell
variables like $cc and $ccflags; but this depends on the content of
the default.o.do file.

Saneesh

unread,
May 11, 2011, 7:37:25 PM5/11/11
to redo

This discussion started by asking what a ``redo project'' was and I want
to get back to that for a bit.

Maybe there is a really obvious reason I'm missing, but it seems that a
lot of these problems would be fixed by having an explicit ``redo init''
that creates the ``.redo'' directory the first time in (presumably) the
right place, and if not it's easy to figure out what happened. This
solidifies to the user that the project is everything under the
directory they just created. This means that redo should fail if it
walks up the tree and doesn't find it. It also seems that having
sub-projects would work, though because redo would have to call itself
in the sub-project (with a different CWD?) it won't be as efficient as
possible in dependency checking across sub-projects, but the gain of
knowing what's what seems worth it.

Maybe I'm just desperately trying to find a way to not have a ``.redo''
in each directory because I don't like that it's non-obvious where the
info for system files (``/usr/include/something.h'') goes and it implies
that each directory (at every level) can be treated independently.

Thoughts?

-- San

--

Those who are willing to trade freedom for security deserve
neither freedom nor security.
-- Benjamin Franklin

Avery Pennarun

unread,
May 11, 2011, 11:17:44 PM5/11/11
to Saneesh, redo
On Wed, May 11, 2011 at 7:37 PM, Saneesh <san...@clockwatching.net> wrote:
> This discussion started by asking what a ``redo project'' was and I want
> to get back to that for a bit.

I'm just not really convinced that a "redo project" should mean
anything. I've never heard of a "make project" either. Redo is just
a tool you run on files in your project. Just the fact that you run
gcc doesn't make it a "gcc project." :)

> Maybe there is a really obvious reason I'm missing, but it seems that a
> lot of these problems would be fixed by having an explicit ``redo init''
> that creates the ``.redo'' directory the first time in (presumably) the
> right place, and if not it's easy to figure out what happened.  This
> solidifies to the user that the project is everything under the
> directory they just created.  This means that redo should fail if it
> walks up the tree and doesn't find it.

Yeah, this is not a bad idea. If I hadn't been talked into the
one-.redo-per-target-directory idea, I might be able to be talked into
this one :)

The down side is that you have to actually run redo-init, though.
That's an annoying extra step compared to make. On the other hand,
the increase in predictability would be nice.

> It also seems that having
> sub-projects would work, though because redo would have to call itself
> in the sub-project (with a different CWD?) it won't be as efficient as
> possible in dependency checking across sub-projects, but the gain of
> knowing what's what seems worth it.

That's not actually a problem at all. redo knows how to check all the
dependencies in the entire database without forking, even with nested
"projects" in subdirs.

> Maybe I'm just desperately trying to find a way to not have a ``.redo''
> in each directory because I don't like that it's non-obvious where the
> info for system files (``/usr/include/something.h'') goes and it implies
> that each directory (at every level) can be treated independently.

I was worried about that too when I first heard of this idea, but the
answer is rather elegant. You don't need a .redo in the directories
containing *source* files - only in the directories containing
*target* files. The .redo database contains, for each target in that
directory, a list of paths to dependencies (source files). When
checking those dependencies, redo can look to see if there's a .redo
in the dependency's directory; if there is, follow the chain
recursively. If there isn't (and there are no matching .do files),
then obviously the files in that directory are not generated, so they
don't have any dependencies, so there is no need for a .redo file in
there. Thus, it doesn't matter that /usr/include isn't writable; we
don't need to write anything.

If you think about it, this also completely resolves any questions
with symlinks, relative vs. absolute paths, and so on. The directory
containing a given file also contains the .redo with its dependency
information, and you don't need to normalize paths for that to work.

Have fun,

Avery

Rafaël Fourquet

unread,
May 12, 2011, 11:33:47 AM5/12/11
to redo, Avery Pennarun
The reason we don't want to have a redo "project" exactly is that
sometimes, projects are embedded into other projects as subprojects.


Yes, I would think that we don't want to have a redo project because
a redo project does not exist, or rather because every directory is
a "redo project", that may happen to be somewhat incomplete by itself 
(if one of its possible targets depends
on one of its parents to have a something.do file).
I see redo as an extremely pervasive tool, at each
level of the system, i.e. with arbitrary many levels of sub-project nesting
(the redo dir repo in my /usr/local/src/ is a "sub-project" of /usr/local/src, 
maybe itself a sub-project of /usr/local or /, or both). 
The concept of a "particular" directory which would delimit the
project (and the recursion?) seems superfluous and limiting to me.


With the system as it is now, that *almost* works right.  It works
fine as long as .redo gets created at the top level of your combined
project.  And that's exactly what redo will do, as long as your first
run of 'redo' is at the top level.  But that's obviously a little
error prone.

So with the current implementation, the solution would be to call redo from
the topmost level (say the root / if possible, or ~/), before any other
playing with redo ? (this comes close to having by default a single 
~/.redo dir for the system).

Back to the following case (from my first mail):
$ cd p && redo a && cd .. && redo p/a
->fails
The .redo dir is first created in p, but then in its parent p/.. 
so that I can  redo something in p only if I am in p. 
Isn't it possible, as a first fix, to literally  'cd /path/to && redo target'
(maybe 'cd -P' ?) each time we have 'redo /path/to/target', and take
the innermost .redo dir as the correct one?

 
As Prakhar says, the most likely solution to this (which is much
better conceptually than what we have now) is to have one .redo
directory in each target dir. 

I would think that... does the following sound possible:
instead of one .redo per dir, move that dir inside a cloned hierarchy
rooted at e.g. ~/.redoroot (having ~/.redoroot/a/b/.redo instead of 
/a/b/.redo, or more likely named directly ~/.redoroot/a/b). At least it
allows to "wipe out the .redo for exactly one project" (cf. 
"The .redo directory problem" discussion).
Then the user could choose (via e.g. ~/.redorc) between the two
options. 
 
As for default.do files in the parent directory of the redo project...
good point.

At least, if someone wants to be sure her "project" is totally separate from
the outside, it isn't necessary to have a "redo project" with a particular .redo
dir: just create default.do at the project root dir.

Rafaël

Avery Pennarun

unread,
May 12, 2011, 11:59:22 AM5/12/11
to Rafaël Fourquet, redo
2011/5/12 Rafaël Fourquet <fourq...@gmail.com>:

> So with the current implementation, the solution would be to call redo from
> the topmost level (say the root / if possible, or ~/), before any other
> playing with redo ? (this comes close to having by default a single
> ~/.redo dir for the system).

Yeah, that would work, pretty much. The problem is then if you decide
redo has done something wrong (it *is* just version 0.06, after all)
you have to wipe out the entire .redo for your entire account, rather
than just for a particular project. Other than that, having a single
centralized db ought to be fine.

> Back to the following case (from my first mail):
> $ cd p && redo a && cd .. && redo p/a
> ->fails
> The .redo dir is first created in p, but then in its parent p/..
> so that I can  redo something in p only if I am in p.

Yes. If you instead do:

redo p/a && cd p && redo a && cd .. && redo p/a

It will work fine, because it'll create a toplevel .redo.

In fact, you can just do 'mkdir .redo' at the toplevel and that'll
also work fine.

> Isn't it possible, as a first fix, to literally  'cd /path/to && redo
> target'
> (maybe 'cd -P' ?) each time we have 'redo /path/to/target', and take
> the innermost .redo dir as the correct one?

The current code just doesn't work like that; it assumes all the
dependencies are in a single database. Recursively calling another
copy of redo just to check dependencies would bring us back all the
problems of recursive make, which would be awful. Plus, just doing a
"mkdir .redo" at the top level is much easier and doesn't have any of
these downsides.

> instead of one .redo per dir, move that dir inside a cloned hierarchy
> rooted at e.g. ~/.redoroot (having ~/.redoroot/a/b/.redo instead of
> /a/b/.redo, or more likely named directly ~/.redoroot/a/b).

That wouldn't give the very nice advantages of built-in symlink
resolution. Let's say /a/b is a symlink to /a/x. Then when I redo
/a/b/q, should the .redo dir be .redoroot/a/b/file or
.redoroot/a/x/file? What if I *then* do this?

rm a/b
mv a/x a/b
redo a/b/q

With one .redo per dir, that will magically work perfectly. With a
shadow tree (or the current redo implementation), it won't.

> At least it
> allows to "wipe out the .redo for exactly one project" (cf.
> "The .redo directory problem" discussion).

That's really not that hard anyway:

cd ~/project
find -name '.redo' | xargs rm -rf

Or just 'git clean -fdx' if you're using git.

> Then the user could choose (via e.g. ~/.redorc) between the two
> options.

Currently redo has no global configuration, which is nice (make
doesn't either). I wouldn't want to add it unless we had a *very*
good reason.

Have fun,

Avery

Bill Trost

unread,
May 13, 2011, 10:35:00 AM5/13/11
to redo...@googlegroups.com
<fourq...@gmail.com> wrote:
Also, the redo man-page says (DISCUSSION part) that for a target named
../a/b/xtarget.y, redo looks for a .do file up to $PWD/../default.do.
So if the above is true, this is not the last one checked:
$PWD/../../default.do and so on will also be checked, up to /default.do,
right?

However, I just found a case where it makes a difference:

[example removed]

A related problem: I tried to run the tests from the redo
sources, and this failed because the redo's parent

directory contained a default.do file (assume this file
only contains 'false'). So currently, the build of a
project depends on what is outside the root of the project,

even though the implementation tries hard not to....

Thinking a bit outside the box here -- is having redo recurse up through
parent directories to find .do files really that desirable, or is it
what's causing a lot of these problems? Admittedly, if you have a big
project and build .o files in the same way, it's convenient to have
the master default.o.do at the top; on the other hand, it's not that
hard to add a per-directory default.o.do that says something like

O_DO=../../../redo/default.o.do
redo-ifchange $O_DO
. $O_DO

Then the whole question of whether there are files outside the
project goes away.

Thoughts?
Bill

Prakhar Goel

unread,
May 11, 2011, 7:45:59 PM5/11/11
to Saneesh, redo
On Wed, May 11, 2011 at 4:37 PM, Saneesh <san...@clockwatching.net> wrote:
>
>
> This discussion started by asking what a ``redo project'' was and I want
> to get back to that for a bit.
>
> Maybe there is a really obvious reason I'm missing, but it seems that a
> lot of these problems would be fixed by having an explicit ``redo init''
> that creates the ``.redo'' directory the first time in (presumably) the
> right place, and if not it's easy to figure out what happened.  This
> solidifies to the user that the project is everything under the
> directory they just created.  This means that redo should fail if it
> walks up the tree and doesn't find it.  It also seems that having
> sub-projects would work, though because redo would have to call itself
> in the sub-project (with a different CWD?) it won't be as efficient as
> possible in dependency checking across sub-projects, but the gain of
> knowing what's what seems worth it.

Well, with a redo --init, I assume subprojects will just be subsumed
under the parent project. I don't see the value of including a .redo
directory in SCM. Actually, efficiency is the biggest problem with the
per-dir solution: linux FSs (and most FSs in general) work a lot
better with a few large files than a lot of tiny files.

> Maybe I'm just desperately trying to find a way to not have a ``.redo''
> in each directory because I don't like that it's non-obvious where the
> info for system files (``/usr/include/something.h'') goes

There is no per-system-file info because these are not managed by
redo. There is information kept on their mtime, etc.... In this case,
one copy is stored with every file that depends on them. I don't think
this is a major concern.

I.e. if ~/proj/foo.o depends on /usr/include/something.h then mtime
information/hashes/etc... is kept somewhere in ~/proj/.redo. If
~/proj/bar/foo2.o also depends on something.h, then another copy of
the hashes and mtime information is kept in ~/proj/bar/.redo. This may
initially seem redundant but consider the case where seomthing.h is
changed and foo2.o is rebuilt without rebuilding foo.o. In that case,
the multiple copies are essential.

Currently, the system records the runid of when a file was last
rebuilt (right, Avery?) but this is imo much more complicated than the
info-copy solution.

> and it implies
> that each directory (at every level) can be treated independently.

Yes, but why is this a problem? I thought this was the entire point of
the per-dir system.

Joseph Garvin

unread,
May 11, 2011, 10:18:31 PM5/11/11
to Saneesh, redo
On Wed, May 11, 2011 at 6:37 PM, Saneesh <san...@clockwatching.net> wrote:
> Maybe I'm just desperately trying to find a way to not have a ``.redo''
> in each directory because I don't like that it's non-obvious where the
> info for system files (``/usr/include/something.h'') goes and it implies
> that each directory (at every level) can be treated independently.

I can't tell if you're saying it's non-obvious because you don't know
or because you don't think it's not intuitive. For the former, my
understanding of the the proposal has been to put the .redo files in
the target's directory, so all the information for the headers for a
particular object file would be stored in the same folder as that
object file. Although that makes me curious -- if we went the
per-directory .redo way of doing it, would different folders
containing object files within a project all end up independently
tracking the same system headers? That would seem to be an argument in
favor of "redo init", though that breaks compatibility with djb's redo
in a limited way.

Prakhar Goel

unread,
May 13, 2011, 3:16:37 PM5/13/11
to Joseph Garvin, Saneesh, redo
On Wed, May 11, 2011 at 7:18 PM, Joseph Garvin
<joseph....@gmail.com> wrote:
[snip]

> Although that makes me curious -- if we went the
> per-directory .redo way of doing it, would different folders
> containing object files within a project all end up independently
> tracking the same system headers? That would seem to be an argument in
> favor of "redo init", though that breaks compatibility with djb's redo
> in a limited way.
>

It is not as redundant as you think to have multiple copies of the
system file information as you think. What if the system headers
changed and one set of object files were recompiled but the others
weren't? Then having multiple copies become critical. Even the present
system stores multiple copies with with an indirect link between them:
the runid information. Personally, I don't think the multiple copies
are a problem.

Avery Pennarun

unread,
May 15, 2011, 7:08:44 PM5/15/11
to Bill Trost, redo...@googlegroups.com
On Fri, May 13, 2011 at 10:35 AM, Bill Trost <tr...@cloud.rain.com> wrote:
> <fourq...@gmail.com> wrote:
>   Also, the redo man-page says (DISCUSSION part) that for a target named
>   ../a/b/xtarget.y, redo looks for a .do file up to $PWD/../default.do.
>   So if the above is true, this is not the last one checked:
>   $PWD/../../default.do and so on will also be checked, up to /default.do,
>   right?
>
>   However, I just found a case where it makes a difference:
>    [example removed]
>
>   A related problem: I tried to run the tests from the redo
>   sources, and this failed because the redo's parent
>   directory contained a default.do file (assume this file
>   only contains 'false'). So currently, the build of a
>   project depends on what is outside the root of the project,
>   even though the implementation tries hard not to....
>
> Thinking a bit outside the box here -- is having redo recurse up through
> parent directories to find .do files really that desirable, or is it
> what's causing a lot of these problems? Admittedly, if you have a big
> project and build .o files in the same way, it's convenient to have
> the master default.o.do at the top; on the other hand, it's not that
> hard to add a per-directory default.o.do that says something like

Ha, that's what redo used to do, and after some discussion, I was
convinced to implement the search-up-the-tree mode :)

So far, I'm still convinced. Certain things are just much cleaner and
more elegant with search-up-the-tree. Most importantly, we can
automatically set $PWD to the directory containing the toplevel
default.o.do file in a source tree when running it, and the target
filename will be path/to/subdir/filename.o, which you can turn into
the source file named path/to/subdir/filename.c, and all your
-Ipath/to/include options can be identical no matter where in the tree
your source file is located. (ie. you don't have to use absolute
pathnames, and you don't have to adapt the relative pathnames
depending on your location in the tree).

I used redo before and after this change, and I definitely prefer it
the new way - a bunch of annoying stuff suddenly just got easier.

Incidentally, the actual problem being reported here - where a
toplevel default.do caused confusion in child directories - should be
a very rare case and is a bit of a red herring. It only causes
trouble if:

- you take an existing project that depends on a source file that
doesn't have a .do

- you insert that project as a subdir of your project

- you have a matching default*.do file in your containing project

(In this thread's example, redo's self-tests have a test to confirm
that we don't have a default.do, and it gets defeated if you embed
redo as a subproject of something with a toplevel default.do.)

Sucks, right? Except it's completely avoidable, and should never
catch you by surprise. The people who made the subproject can just do
their own thing in their own project. The person embedding that
subproject has control over the containing project they're pulling the
subproject into, so if something goes wrong, they can just... stop
putting a toplevel default.do into their project.

So this really doesn't come up very much, I should think. Still,
giving a default*.do file the ability to say "oops, I don't know how
to build that, pretend I don't exist" *might* be nice, aside from the
extra complexity it would add.

Another, simpler option would be to have a '.redo-top' file or
something that tells redo never to look for default*.do files above
that directory. But then that depends on the *subproject* having such
a file, even though the subproject, by itself, doesn't have any
problems even without that file. So it's kind of inelegant.

Have fun,

Avery

Avery Pennarun

unread,
May 15, 2011, 7:23:58 PM5/15/11
to Prakhar Goel, Saneesh, redo
On Wed, May 11, 2011 at 7:45 PM, Prakhar Goel <newt...@gmail.com> wrote:
> On Wed, May 11, 2011 at 4:37 PM, Saneesh <san...@clockwatching.net> wrote:
>> Maybe I'm just desperately trying to find a way to not have a ``.redo''
>> in each directory because I don't like that it's non-obvious where the
>> info for system files (``/usr/include/something.h'') goes
>
> There is no per-system-file info because these are not managed by
> redo. There is information kept on their mtime, etc.... In this case,
> one copy is stored with every file that depends on them. I don't think
> this is a major concern.

Right, for source files we just track a few things like mtime.
Together this is called the "stamp" in state.py.

> I.e. if ~/proj/foo.o depends on /usr/include/something.h then mtime
> information/hashes/etc... is kept somewhere in ~/proj/.redo. If
> ~/proj/bar/foo2.o also depends on something.h, then another copy of
> the hashes and mtime information is kept in ~/proj/bar/.redo. This may
> initially seem redundant but consider the case where seomthing.h is
> changed and foo2.o is rebuilt without rebuilding foo.o. In that case,
> the multiple copies are essential.
>
> Currently, the system records the runid of when a file was last
> rebuilt (right, Avery?) but this is imo much more complicated than the
> info-copy solution.

Well, I think you've slightly misinterpreted the reason for the mild
insanity that is the runid :)

The reason current redo does it in such an odd way is to avoid the
need to stat() and re-stat each of the source files, in redo
subprocesses, after the first time.

Let's imagine that:
a depends on b depends on c depends on d,
and also,
x depends on y depends on z depends on d.

So to check dependencies for a, you have to stat b, c, and d. If any
of them are dirty, run a.do. Then we have to check the dependencies
for b, c, and d, and if any are dirty, run c.do, and so on. Similarly
for the chain to build x.

There's a lot of redundant statting in there. If I "redo-ifchange a
x", I end up statting d a *lot* of times, c and z somewhat less, b and
y a bit less still, and a and x only once each. But really, there's
no reason to stat() any of the 7 files any more than once each.

On some operating systems, stat() on a file is really fast after the
first time, because it stays in cache. But on other operating systems
(Windows, so very slow at filesystem operations) or on network
filesystems, stat() can be *really* slow, and doing it repeatedly
makes the difference between a horrendously slow "null build" (where
all dependencies are up to do) vs. a fast one.

Thus, redo tries to reduce statting to a minimum, and just stores the
stamps in its .redo database. Once it determines that d, c, and b are
up to date, then you can ask about the up-to-dateness of 'b' as many
times as you want, and it never has to re-stat any of the three. But
this is only applicable inside a single run of redo, because once redo
finishes, it knows that 'd' might change at any time, so it has to
check again. This difference between the current and subsequent runs
is the runid.

So anyway, all that is to say that if we have one .redo database per
target directory, sharing stat() information ends up being a little
more complicated; even though storing stamp information for sources in
each target directory looks pretty clean and would allow
*correctness*, we still need to implement the "runid" stuff *somehow*
for speed... it's just that doing it in a temporary in-memory database
or shared memory or something might be the smarter and cleaner way to
go, and then the actual file storage format can be clean, and it's
only the implementation that isn't as pretty :)

By the way, if you're wondering why looking stuff up in a database is
faster than doing stat(), it's for two reasons. First of all, once
you just mmap() the database file, it's *zero* syscalls to look stuff
up, and (especially on Windows) syscalls are disgustingly expensive.
Secondly, you get locality of reference: if I'm looking up multiple
related dependencies, they'll be close together on the disk, because
they're all in a single database file. Thirdly, NFS tends to cache
metadata (like stat()) for less time than file contents.

Have fun,

Avery

Reply all
Reply to author
Forward
0 new messages