redo cannot find .do file for target in symlinked subdirectory, but redo-whichdo can

31 views
Skip to first unread message

Wolfgang Faust

unread,
Jan 6, 2022, 12:51:47 AM1/6/22
to redo...@googlegroups.com
I'm trying to use redo on a project which produces some very large
output files which I'd prefer to store on a different filesystem. So,
I've created a symlink to a separate data directory, and put my .do
files one level up from it (so I can .gitignore the data directory and
still store the .do files in the repository):

$ ls -l .
total 140
lrwxrwxrwx 1 wolf wolf 59 Jan 5 19:39 data -> /media/fe2o3/prdata_202112/
-rw-rw-r-- 1 wolf wolf 86 Jan 5 20:28 default.links.dat.do

Using redo-whichdo, I confirm that this setup works and it can find
the .do file:

$ redo-whichdo data/epzi.links.dat; echo $?
data/epzi.links.dat.do
data/default.links.dat.do
data/default.dat.do
data/default.do
default.links.dat.do
0

But when I try to actually build the target, it fails:

$ redo data/epzi.links.dat
redo: no rule to redo 'data/epzi.links.dat'

At a guess, it's probably getting confused by the way that '..' in
data/ ends up in a completely different place; I did a bit of
rummaging and it seems like redo-whichdo uses a different algorithm
from redo.paths.find_do_file(), though I couldn't immediately work out
where File.name was coming from.

I get the vague impression that this may not be as trivial to solve as
it looks at first glance—for one, the semantics here are probably
complicated by the fact that there could theoretically be a symlink
somewhere else, with a completely different set of .do files above it.
But maybe redo doesn't consider that any different than what you'd get
if you kept changing the .do files back and forth? I haven't wrapped
my brain around redo enough yet to have a good notion of how this
would fit together.

spacefro...@meterriblecrew.net

unread,
Jan 6, 2022, 4:31:14 AM1/6/22
to redo...@googlegroups.com
I have also discovered these kinds of inconsistencies. There is no universal agreement, that I am aware of, that makes clear, whether paths have to be treated as logical paths or physical paths.

This is also not so easy to solve, if at all, because even though making redo behave consistently is easy enough to achieve. This doesn't mean that compilers etc. running inside .do files won't behave inconsistently again, once physical and logical paths start to deviate from each other.

This said: Don't cross softlinks upwards in build environments. You may safely use softlink files and directories, you want to write into or read from, but must not rely on a stable relation for directory traversal.

To solve your issue:
Solution 1: Put one level higher, including your .do files into the remote location. Make sure you have a default.do file, such that traversal ends in the remote location in all cases.

Solution 2: Put a softlink named default.do into data/, linking back to a sources/datarules.do and run all build rules for your data/files from that single .do file.

Wolfgang Faust

unread,
Jan 7, 2022, 12:42:00 AM1/7/22
to redo
I've just noticed I neglected to mention that I'm using apenwarr redo in my original post, whoops.

With that out of the way, some further thoughts now that I've had more time to play around with this:

On Thursday, January 6, 2022 at 1:31:14 AM UTC-8 spacefro...@meterriblecrew.net wrote:
I have also discovered these kinds of inconsistencies. There is no universal agreement, that I am aware of, that makes clear, whether paths have to be treated as logical paths or physical paths.

This is also not so easy to solve, if at all, because even though making redo behave consistently is easy enough to achieve. This doesn't mean that compilers etc. running inside .do files won't behave inconsistently again, once physical and logical paths start to deviate from each other.

I hadn't even thought of that—in my case I'm using custom tooling which was written to expect this situation (and really only refers to these files "from above", so mostly doesn't encounter anything unusual), but I can definitely see how it might cause complications.
 
This said: Don't cross softlinks upwards in build environments. You may safely use softlink files and directories, you want to write into or read from, but must not rely on a stable relation for directory traversal.

I'm *trying* not to do this, hence why I've tried to put my .do files in the upper directory, where they can use relative paths to refer to other files and tools, and only need to refer down the symlink. Unfortunately it seems I've been stymied here by (apenwarr) redo's approach to seeking the files.
 
To solve your issue:
Solution 1: Put one level higher, including your .do files into the remote location. Make sure you have a default.do file, such that traversal ends in the remote location in all cases.

Unfortunately git doesn't look past symlinks, so this one is a non-starter—moving the symlink up a level means the entire directory disappears from git's view; and stopping the traversal at the boundary means I won't be able to "redo-ifchange" build dependencies above the boundary. I can't move the entire repository into the other filesystem, either, since then I just have a *different* set of files that would need to be symlinked onto the first filesystem. (Yes, this is a weird and awkward situation.)

Solution 2: Put a softlink named default.do into data/, linking back to a sources/datarules.do and run all build rules for your data/files from that single .do file.

This approach also has a few problems:

1) The .do script ends up running with its working dir set to data/ (since that's where redo found it), meaning it needs to somehow find its way back to the repository so it can use all of the tooling—as you said, crossing symlinks upward causes complications. Even if you can find the right directory, `cd`-ing to it is not trivial either, since doing so breaks "$3" (which is passed as a relative path).
2) Because all of the rules are in a single .do file, redo wants to rebuild *everything* whenever you edit any part of the script.
3) You also lose all of the dispatching features, and have to implement things like $2 yourself.

Still, it seems like the best approach I have; I ended up with a variant of this idea using separate .do files, plus prepare-data-dir.do:

```
# .up is needed because, since data is a symlink, .. will take us somewhere entirely different
ln -fs "$(pwd)" "data/.up"

for f in data.do/*.do; do
        REDIRECTFILE="data/$(basename "$f")"
        REALDO="$(realpath "$f")"
        CONTENT='redo-ifchange "'"$REALDO"'"; . "'"$REALDO"'"'
        if [ ! -e "$REDIRECTFILE" ] || [ "$(cat "$REDIRECTFILE")" != "$CONTENT" ]; then
                rm "$REDIRECTFILE"
                echo -n "$CONTENT" > "$REDIRECTFILE"
        fi
done
```

`redo prepare-data-dir` should be run whenever the list of files in data.do/ doesn't match the .do files in data/. It sets up "virtual" links using a script that calls the right directory (mainly because I wasn't sure about whether a redo would notice the .do changes through a symlink, so this seemed like a safer approach). These files can mostly be written as though they're a .do file in the data/ directory, but there are some complications:

* This simple approach does not handle .do files with a shebang.
* Because they are run inside the symlinked directory rather than above it, these scripts need to know how to haul themselves out. The `data/.up` symlink is provided for this purpose, but scripts need to be careful that they are following it rather than merely treating it as a directory; `cd .up/..`, for example, will end up in the wrong place so you need to use `cd -P .up/..` to use the physical structure.
* (I think this one is a general limitation of redo, but it's made more complicated by the symlink.) If you `cd` in the .do file, you need to be careful to store "$(realpath "$3")" somewhere first. "$3" is a relative path, so using it after changing directories will put the file somewhere redo doesn't know about, and it will assume that it's a virtual target instead.

As you can see, this is not an ideal way of doing things, but it seems to be working well enough to serve my immediate needs. I'd appreciate any suggestions for improvements, or even an entirely different approach.

Wolfgang Faust

unread,
Jan 14, 2022, 2:46:09 PM1/14/22
to redo
Minor correction to prepare-data-dir.do from the previous post, the first command should be:

ln -fsn "$(pwd)" "data/.up"

(-n being "don't dereference the LINK_NAME"; otherwise subsequent runs will end up creating a symlink to the upper directory inside itself, rather than replacing "data/.up",)
Reply all
Reply to author
Forward
0 new messages