Re: new redo implementation in go

Sergey Matveev

unread,

Jan 12, 2021, 10:55:49 AM1/12/21

to redo...@googlegroups.com

Greetings, everyone!

*** José I. Rojas Echenique [2015-02-28 21:07]:
>I realize that yet another redo is the last thing we need, but...
>Annoyed by all the current implementations I wrote one that feels perfect.

Same here! That is why I wrote several months ago yet another Go
implementation of redo: http://www.goredo.cypherpunks.ru/
And that is certainly the perfect one :-)

At first, I want to thank (again) apenwarr/redo project and
Avery Pennarun! Without them I probably won't dive into "redo" at all,
continue suffering all my life with that Makefile-driven world. Now I am
a huge fan of redo. However it required me several months even to
clearly realize (without any doubts) that moving to redo was worth of
it. Now I literally has no projects (I manage, author) with Makefiles,
forgetting them as a dark ages.

I decided to write own implementation, as far as I remember, only
because of Python's speed (even its interpreter invocation takes time),
however it was not some kind of bottleneck of course. Due to much better
dependencies management and good jobs parallelizing it greatly saved
much time already comparing to make-based solutions.

My goredo implementation fully resembles most of apenwarr/redo's
behaviour and features: sanitizers/safety checks (both $3 and stdout are
written simultaneously; $1 was not touched during target execution;
externally modified files checks (very convenient!)), parallel builds
(jobserver with possibly unlimited number of jobs), coloured messages,
target's execution time, various debug messages, stderr output prefixing
with the PID, optional stderr capturing to logfile on the disk (with
ability to view it later), statusline with running/done/waiting jobs,
ability to trace scripts (sh -x), redo-ifchange/ifcreate/always/stamp/whichdo,
redo-dot (DOT dependency graph generation) commands, ability to .do
files be any kind of executable (not just shell). All my projects can
work/build under apenwarr/redo, apenwarr/do, redo-c (without redo-stamp
features, which I "disable" by using "command -v redo-stamp > /dev/null
|| exit 0" checks) and goredo.

It has technical differences of course, as it is not a clone at all:
.redo directory with recfiles (https://www.gnu.org/software/recutils/)
stores the state in each directory, instead of a single .redo with
SQLite3 database. recfiles are even conveniently human readable and can
be scripted/processed easily (if someone wants to). You can "limit" top
directory searching either by REDO_TOP_DIR=PATH environment variable, or
by touch-ing /path/to/top/dir/.redo/top. stderr log messages are stored
with TAI64N timestamp as daemontools'es multilog do with the logs (if
you do not need it, it is trivially stripped).

Single major difference is that goredo (as redo-c) always checksums
targets/dependencies (if ctime differs, with BLAKE3). Unfortunately I
can not agree with most arguments given in apenwarr/redo's FAQ. Even
modern filesystems tend to checksum (even with cryptographic hashes!)
all the data passed between the OS and disks -- it is very cheap even
from delay-s point of view, not possible throughput. And checksumming
greatly simplifies many things and makes redo-stamp completely useless.

If you want always out of date targets -- just do not echo to stdout and
do not touch $3, to skip file creation. If you really wish to produce
empty target (and make it participating in checksumming and determining
out-of-date-ness) -- touch $3. redo utilities must help people.
Automatic recording of "ifcreate default.do" dependency, "ifchange
$CURRENT.do" and similar things -- all of them are done implicitly
(unlike the FAQ's note about Python Zen's "explicit is better") and it
is very very convenient. Checksumming is just a permanent
redo-stamp-ing, that removes all complications with it (at least the
fact that not all redo implementations implement it and you have to
check its ability as I do with "command redo-stamp || exit"). Possibly
it won't help you. But possibly it will just behave as automatic
convenient redo-stamp. But it won't harm you anyhow. Initial goredo
implementation did not permanently checksum and tried to honestly do
redo-stamp-ing, but soon I have completely replaced it with permanent
checksuming -- it just *heavily* simplifies everything inside redo
implementation itself and is too convenient to the user. redo-stamp
works too, records the stamp, but it plays no role later.

I fully agree with Avery that the worst thing can happen is that some
target won't be build when it have to. But with permanent checksumming I
see no problem at all, if, and *only* if you have ability to work with
*both* stdout and $3 *and* no file is created if nothing was in stdout.
It is default behaviour in apenwarr/redo (perfect behaviour!), was (two
commits ago) default behaviour in redo-c and the only behaviour in
goredo. If empty file is always created (even if $3 was not touched and
no output was in stdout) -- you loose ability to control the behaviour
of dependencies (yeah, that empty's file hash won't differ and that is
why target won't be out-of-date, comparing to lack of target's file and
its interpretation as out-of-date). I really convinced and see no other
options than to capture stdout (and $3), determine lack of target as
out-of-date, and do not create file if stdout was empty. Lack of any of
that will ruing everything. DJB is a genius that just did not explain
how important both stdout/$3. And of course all of that is fully
perfectly friendly with permanent checksumming.

goredo also uses fsync on files/states/directories by default (can be
disabled) to make guarantees about built targets.

It passes (except for two) all apenwarr/redo implementation-neutral tests:
http://www.git.cypherpunks.ru/?p=goredo.git;a=blob;f=t/apenwarr/README;h=ae5e7e0a23cfae6835d5ac5b49695d04e588a6e7;hb=HEAD
Also it passes all redo.sh-tests too:
http://www.git.cypherpunks.ru/?p=goredo.git;a=blob;f=t/redo-sh.tests/README;h=08b4ae86fc80442cb90110f8481fd197561623b4;hb=HEAD

And it consists only of single binary (can be just copied to necessary
systems, without requiring of Python or any of its libraries) to which
redo* symlinks are created. I have doubts that continuous parsing and
writing of recfiles will be fast enough (and won't be surprised if
SQlite3 will outperform goredo's speed), but goredo even with fsyncs is
much more faster (visually!) than Python-based software.

PS: I have doubts about posting the advertisement of my creation to that
maillist, but I mention that redo-list@ is a "A group for discussing
implementations of djb's "redo" software build system.", so it seems
ethical to post it here :-)

--
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: CF60 E89A 5923 1E76 E263 6422 AE1A 8109 E498 57EF

signature.asc

Karolis K

unread,

Jan 12, 2021, 12:21:30 PM1/12/21

to Sergey Matveev, redo...@googlegroups.com

Hello Dear Sergey,

I love seeing this here. I am heavy user of apenwarr’s redo, but have found a few small issues with it. apenwarr seems to put this project in hybernation for long periods of time and so it’s unlikely it will be updated anytime soon. Given this, any alternative implementation is welcome.

I checked your version just briefly, and here is my feedback (might have more later)

Things I like:

1. separate .redo/ being placed in directories of targets (seems like this allows to simply remove/rename directories within projects without loosing any state)
2. dependencies being written in separate files, rather than a database (allows to quickly inspect dependencies and manually manage them when needed: i.e. renaming and not loosing state)
3. stamped by default (always wanted this behaviour, instead of always relying on time stamps: allows to make cosmetic changes to do files, without restarting whole pipeline)

Things I don’t like:

1. I miss redo-ood command. My default all.do file always was redo-ood | xargs redo-ifchange
2. I also miss redo-targets and redo-sources. I know there is redo-dot, but it’s output is designed for visualization, not programming.
3. I don’t like that files have to be executable in order to run custom shebang paths. To me an executable file “smth.do” that cannot be executed directly “./smth.do” makes little sense.
4. I don’t like that `redo what/what`, when “what” dir doesn’t exist, will create an empty “what” directory.

Just some feedback based on quick testing. I think I will start using your version for some projects and see if I run into further issue.

So thanks a lot, and would love to hear your comments about this feedback, if any.

Kind regards,
Karolis Koncevičius.

> --
> You received this message because you are subscribed to the Google Groups "redo" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to redo-list+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/redo-list/X/3GdjciB5BuAejR%40stargrave.org.

Sergey Matveev

unread,

Jan 12, 2021, 12:41:14 PM1/12/21

to Karolis K, redo...@googlegroups.com

*** Karolis K [2021-01-12 19:21]:

>1. I miss redo-ood command. My default all.do file always was redo-ood | xargs redo-ifchange

Currently I do not understand what is this for? Am I right that: for
example you have "lib" target (that builds .so) and independent "doc"
target (that builds some textfile). redo-ood will print that both lib
and doc are OOD and you will immediately rebuild both of them. So this
-ood | -ifchange is literally for "rebuild everything that seems to be OOD"?

>2. I also miss redo-targets and redo-sources. I know there is redo-dot, but it’s output is designed for visualization, not programming.

Seems easy to implement, but honestly by quick looking I can not
understand what they are useful for (except for debugging)?

>3. I don’t like that files have to be executable in order to run custom shebang paths. To me an executable file “smth.do” that cannot be executed directly “./smth.do” makes little sense

Why custom executable smth.do can not be run directly with ./? Exactly
that way it is executed by goredo (as an example). I used to place
Python scripts that way:

$ cat > foo.do <<EOF
#!/usr/bin/env python
print("hello")
EOF
$ chmod +x foo.do
$ ./foo.do
hello
$ redo foo
redo foo (0.016084sec)
$ cat foo
hello

apenwarr/redo parses files to determine shebang and (if it is not the
shell) prepares command line for execution. Isn't it completely the same
task the OS/kernel does by reading/parsing shebang and executing it?
"/bin/sh" is just a special use-case for convenience and ability to add
"-x" to it.

signature.asc

Karolis K

unread,

Jan 12, 2021, 12:50:53 PM1/12/21

to Sergey Matveev, redo...@googlegroups.com

Thanks for such a quick reply!

I think I should have specified that I am using redo for managing dependencies within data-analysis projects. So my use-case might be a bit unconventional.

> On Jan 12, 2021, at 7:40 PM, Sergey Matveev <star...@stargrave.org> wrote:
>
> *** Karolis K [2021-01-12 19:21]:
>> 1. I miss redo-ood command. My default all.do file always was redo-ood | xargs redo-ifchange
>
> Currently I do not understand what is this for? Am I right that: for
> example you have "lib" target (that builds .so) and independent "doc"
> target (that builds some textfile). redo-ood will print that both lib
> and doc are OOD and you will immediately rebuild both of them. So this
> -ood | -ifchange is literally for "rebuild everything that seems to be OOD"?

Yes, I am using this for redoing every target that is out of date, for whatever reason. This wouldn’t be necessary if the whole pipeline creates one output at the end, then simply `redo final-target` will do the job. However in my case there might be many “leaf” targets. So when I change a file that is a dependency for 5 unrelated targets - redo-ood | xargs redo-ifchange will update all of them. I am of course open to learning about better ways of doing things.

>> 2. I also miss redo-targets and redo-sources. I know there is redo-dot, but it’s output is designed for visualization, not programming.
>
> Seems easy to implement, but honestly by quick looking I can not
> understand what they are useful for (except for debugging)?

Within the same context, if I am considering a change to some file - before doing that I often want to get a list of files that will be impacted by this change (depend on the file I am changing).

>> 3. I don’t like that files have to be executable in order to run custom shebang paths. To me an executable file “smth.do” that cannot be executed directly “./smth.do” makes little sense
>
> Why custom executable smth.do can not be run directly with ./? Exactly
> that way it is executed by goredo (as an example). I used to place
> Python scripts that way:

In my case - because of special arguments passed by redo: $1, $2, $3. They are not set if the script is ran directly and an error is produced.
This is a bit of a nit-pick, I can easily live with it, I think. Just seemed a bit odd, since I never thought about .do files being “executable”.

Kind regards,
Karolis K.

Sergey Matveev

unread,

Jan 12, 2021, 1:18:42 PM1/12/21

to redo...@googlegroups.com

*** Karolis K [2021-01-12 19:50]:

>I am of course open to learning about better ways of doing things.

Personally I have never though about this task and seems that I will
just create manually "everything.do" with "redo-ifchange lib doc" and so
on. redo-ood seems useful helper here.

>Within the same context, if I am considering a change to some file - before doing that I often want to get a list of files that will be impacted by this change (depend on the file I am changing).

Well, still can not imaging the usefulness, but I believe it can be so.
Hopefully will add those commands, as that seems pretty easy task.

>In my case - because of special arguments passed by redo: $1, $2, $3. They are not set if the script is ran directly and an error is produced.

But in my opinion when you run any kind of program you know what it
requires as a input, you know program's "API". .do extension tells you
that this is "redo target", a program having some specific requirements
for arguments.

Moreover, if you want to run object file, that does not contain any kind
of shebang, even with apenwarr/redo, you have to make it executable as
the kernel expects too. Most implementations I seen (quickly) actually
expects all .do to be executable. But because in most cases they are
ordinary shell scripts, it is convenient to make an exception for them
to not forcing +x permission, #!sh shebang existence and ability to "set -x".

I see only two kind of .do targets: shell-scripts (shebang-less, +x-less
with ability to "set -x" externally) that are "popular" use-case and
everything else. "shell-scripts" are not executables, because they do
not contain shebang (for convenience) and really can not be ./run.
"Everything else" can be run anyway. Executable flag just differentiates
them for redo. Shebang-ed Python script and executable C object file is
no way different for the kernel to run.

http://web.archive.org/web/20201231033027/http://jdebp.eu/FGA/introduction-to-redo.html
at the end of the page tells about unnecessary shebang parsing too.

>since I never thought about .do files being “executable”.

Well, it is an ordinary shell-script, shebang-whatever script, just an
executable program... anyway it is just some program, that hints about
its "interface" with ".do" extension. Like ".t" programs
(http://testanything.org/) hints about their expected output with ".t"
extension. Of course from personally my point of view.

signature.asc

Karolis K

unread,

Jan 12, 2021, 1:56:50 PM1/12/21

to Sergey Matveev, redo...@googlegroups.com

> But in my opinion when you run any kind of program you know what it
> requires as a input, you know program's "API". .do extension tells you
> that this is "redo target", a program having some specific requirements
> for arguments.
>
> Moreover, if you want to run object file, that does not contain any kind
> of shebang, even with apenwarr/redo, you have to make it executable as
> the kernel expects too. Most implementations I seen (quickly) actually
> expects all .do to be executable. But because in most cases they are
> ordinary shell scripts, it is convenient to make an exception for them
> to not forcing +x permission, #!sh shebang existence and ability to "set -x".
>
> I see only two kind of .do targets: shell-scripts (shebang-less, +x-less
> with ability to "set -x" externally) that are "popular" use-case and
> everything else. "shell-scripts" are not executables, because they do
> not contain shebang (for convenience) and really can not be ./run.
> "Everything else" can be run anyway. Executable flag just differentiates
> them for redo. Shebang-ed Python script and executable C object file is
> no way different for the kernel to run.

I see, your explanation makes sense too. But it has some potential downsides. Nothing major I think, but as one example: I typically use dircolors for controlling the colors of `ls` output. With executable .do files I can no longer change the color of all .do files for `ls`, because executable flag takes priority over file extension. Personally, I would probably prefer all .do files being treated the same way, instead of having 2 variants (shell scripts vs everything else). But it’s your call of course.

I think I will try your version further and see how it goes. Each target saving its dependencies within .redo/ in the same directory is a big plus for me. I think even apenwarr has this feature as “planned” in his docs.

Sergey Matveev

unread,

Jan 12, 2021, 2:05:53 PM1/12/21

to Karolis K, redo...@googlegroups.com

*** Karolis K [2021-01-12 20:56]:

> I think even apenwarr has this feature as “planned” in his docs.

It has. I also added ability to specify "top" directory (.redo/top or
REDO_TOP_DIR), not to force searching up to the top-level of filesystem
and to be able to "isolate" visibility of subdirectories (with various
default.do-s).

signature.asc

Reply all

Reply to author

Forward