variable number of outputs + javac like languages

281 views
Skip to first unread message

ILYA

unread,
Jun 24, 2012, 7:42:25 PM6/24/12
to tup-users
Hi,

I want to resurrect variable number of outputs discussion. Tup is a
great tool and I'm glad it exists. I'm currently trying to adapt it to
my project and given set of existent tup features and limitations it
is seems like it would be very hard to do. Given problems that javac/
ocaml/vala/erlang users are experiencing I think that their use cases
should be considered. In order to find a solution which would be easy
to implement and which satisfy most of the users.

The project that I'm working on uses elixir language which is working
on top of Erlang VM. Elixir project in addition supports specifying
multiple modules in one single source file.
More over the directory structure is quite strict and you cannot
really adjust it. The structure of typical elixir project is like
this:
ebin/__MAIN__-M-A.beam
ebin/__MAIN__-M-B.beam
ebin/application.app # generated file containing
# modules = ['__MAIN__-M-A.beam',
'__MAIN__-M-B.beam']
src/M.ex # it defines modules M.A and M.B

All of the above assumes following set of features supported by build
system.
1. Writing to directories other than directory containing source
files.
2. Variable number of outputs.
3. In-out files (When multiple modules are specified in single source
file elixirc checks that target beam file might be compiled already).
4. Target file name is more complex and cannot be represented through
usage of %-flags

For in-out files there is a workaround as follows:
elixirc -o .build %f && mv .build/* . && rm -rf .build
However in order to use it I have to specify:
: $(APP_TOP)/src/M.ex |> elixirc -o .build %f && mv .build/* . && rm -
rf .build |> __MAIN__-M-A.beam
__MAIN__-M-B.beam
I.e. we have to explicitly specify for every input which outputs it
produces.

I thought I could write a file which can be used to map src file to
multiple *.beam files. So later I could generate rules like this:
: src/application_module.ex |> elixirc -o ebin %f |> __MAIN__-M-
B.beam __MAIN__-M-A.beam
But due to 1.2 and 1.3 (see below) it failed.

At one point it was a proposal to let tup to write to any files into
any directory below the current one. What is the status of it?

The things I've tried.
1. Running script invoked from ebin/Tupfile
The reasons why this approach fails are:
1.1. Run script can read files only in current directory
1.2. Run script cannot write to files.
1.3. Run scripts cannot use generated files.
2. Copying files into intermediate directory compile it there and then
move them into ebin
The reasons why this approach fails are:
2.1. You can write only into current directory
2.2. You cannot read from /tmp or .build (hidden) directories
(from different rules)

Possible solutions are (this is sort of brainstorm so some of them
might be stupid or very hard to implement):
1. Allow specifying *.beam (i.e. wildcards) in output part of a rule
(automatically add outputs (detected by fuse) into DAG). Which is
similar implementation wise to the latest 'groups' feature.
2. Provide a way to read outputs from file.
3. Allow specifying shell commands in output part of a rule.
4. Allow specifying shell commands in input part of a rule.
5. Eval shell commands in place. When we specify 'FILES=`find src/
*.ex`' current implementation replaces every $(FILES) reference with
'`find src/*.ex`'. If we would have an option to evaluate them (or
call external scripts) on parse stage we could use variables in input
or output part of a rule.
6. Fix 1.1 or (1.2 and 1.3) (see above).
7. Provide additional hooks for run files of different kind.
7.1. When foreach rule detects new files (which match with given
regexp) in previously scanned directory we call user provided script.
7.2. When fuse detects write to unspecified output we call user
provided script.

Best regards,
ILYA

Mike Shal

unread,
Jun 29, 2012, 5:49:12 PM6/29/12
to tup-...@googlegroups.com
Hi ILYA,

On Sun, Jun 24, 2012 at 7:42 PM, ILYA <ilya.kh...@gmail.com> wrote:
> Hi,
>
> I want to resurrect variable number of outputs discussion. Tup is a
> great tool and I'm glad it exists. I'm currently trying to adapt it to
> my project and given set of existent tup features and limitations it
> is seems like it would be very hard to do. Given problems that javac/
> ocaml/vala/erlang users are experiencing I think that their use cases
> should be considered. In order to find a solution which would be easy
> to implement and which satisfy most of the users.

By variable outputs, what exactly do you mean? Tup does support more
than a single output, but it has the requirement that all outputs must
be specified. If you mean that you'd like to be able to run a command
and not explicitly say what files are created, I'm not sure that would
help in all of these cases.

I'm not explicitly ignoring these languages, and I don't think the
problems that arise are specific to the languages themselves, but
rather that the compilers try to take on some of the responsibilities
of the build system instead of just being a simple compiler. This
conflicts with tup (and other build systems) that are supposed to be
the ones doing dependency management and the like. For example,
consider the java case - if I have A.java and B.java, where A uses
stuff from B. As a build system, by looking at the directory I see:

A.java
B.java

From this I have no idea which should be built first. Suppose I build
A.java first - javac sees that it depends on B.java and so it builds
A.class *and* B.class. This presents a problem if I am also trying to
build B.java in to B.class independently (or compiling another java
file into a class file), because it will randomly fail if B.class is
only generated partially when another parallel job tries to read from
it.

Another problem is that if I build B.java first and get B.class, now
when I build A.java it reads from the class file instead of the java
file. As I recall, another language (maybe Ocaml or Haskell?) actually
*requires* that you build the base file before building the dependent
file. This is problematic as well because we have to read the source
files (not just the directory entries) before we can know what order
things have to be built in. As a side effect, it also means you have
to re-parse the build description every time a source file changes,
which shouldn't be necessary (but for build systems that always parse
the entire build description, maybe that doesn't matter).

What the compiler should be doing is when I tell it "compile A.java
and create A.class", it should read from A.java, and then when it sees
that it uses B, it should read from B.java. The only class file it
should create is A.class, since that's all that was requested. I
should note that the OpenJDK compiler at least has options to do just
this:

-implicit:none disables generating B.class in this case
-Xprefer:source reads from B.java instead of B.class when we compile A.java

Unfortunately OpenJDK also tries to stat every file in a directory
(rather than just the ones it uses), which causes tup to think it has
a dependency on all of those files. I was able to hack javac to not do
this and can build java files like so:

: foreach *.java |> javac %f -implicit:none -Xprefer:source
-sourcepath . |> %B.class

Someone with real java experience could probably do a proper patch and
get it into OpenJDK. The only thing that remains (as far as I know) is
that compiling a java file with inner classes results in multiple
class files. This can be partly addressed with some changes I just
pushed to allow variable outputs based on the input filename. Eg you
could do:

outputs_A.java += A$inner.class (or however these are named)
: foreach *.java |> javac ... |> %B.class $(outputs_%f)

So I don't believe there is anything in Java (the language) that
prevents tup from building it. Getting javac (the compiler) to do only
what it's told is a bit tricky, but can be done.


As far as I can tell, the problem with vala isn't that the outputs are
variable, but rather with specifying the --fast-vapi flags. I don't
believe there is the equivalent of a "#include" statement, so instead
of specifying what vapi files a vala file depends on in the vala file,
you have to specify it in the build description. Consider this analogy
to C:

bar.c:
#include "foo.h"
void bar(void) { foo(); }

build file:
foreach .c file
gcc -c FILENAME

So you tell the compiler that we need foo.h in bar.c. You could also
do this in the build description:

bar.c:
void bar(void) { foo(); }

build file:
foreach .c file
gcc -c FILENAME -include foo.h

Managing the 'include X.h' flags for each .c file is more difficult to
do here, but it could be done. It's much easier to just put the
#include's in the .c file. However, putting the --use-fast-vapi flags
in the build description is essentially the only way to do it with
vala. The best way I know of to handle this in tup is to manually list
what vapi files each input needs:

: foreach *.vala |> valac --fast-vapi=%B.vapi %f |> %B.vapi
$(MYPROJ_ROOT)/<vapi>

VAPISFLAGS_main.vala += --use-fast-vapi=foo.vapi
VAPISFLAGS_foo.vala += --use-fast-vapi=bar.vapi
: foreach *.vala | $(MYPROJ_ROOT)/<vapi> |> valac $(VAPISFLAGS_%f) %f -C |> %B.c

I don't recall ocaml well enough to know exactly what the issue is. I
think the compiler behaved similar to javac except that it also
required you to order the ocamlc compilations correctly (so tup has to
know that it must do 'ocamlc B' before 'ocamlc A'), otherwise it
fails. If it had an option similar to javac to only read from source
files, I think it would be much easier to work with.

Keep in mind I have no real experience with any of these languages. If
I am wrong in any of my assumptions please yell at me.

>
> The project that I'm working on uses elixir language which is working
> on top of Erlang VM. Elixir project in addition supports specifying
> multiple modules in one single source file.
> More over the directory structure is quite strict and you cannot
> really adjust it. The structure of typical elixir project is like
> this:
> ebin/__MAIN__-M-A.beam
> ebin/__MAIN__-M-B.beam
> ebin/application.app # generated file containing
>                               # modules = ['__MAIN__-M-A.beam',
> '__MAIN__-M-B.beam']
> src/M.ex # it defines modules M.A and M.B
>
> All of the above assumes following set of features supported by build
> system.
> 1. Writing to directories other than directory containing source
> files.

Is there a reason you can't put the Tupfile in ebin? There isn't
really much difference between doing:

src/Tupfile:
compile *.ex -> ../ebin

...compared to...

ebin/Tupfile:
compile ../src/*.ex -> .

> 2. Variable number of outputs.

Can you try out the latest master and see if it helps? You still have
to specify the outputs, but since those only change when you
add/remove modules (correct?) I don't think it is too much of a
burden. I was able to install elixir and build some test files like
so:

Tupfile:
export HOME

files_math.ex += __MAIN__-Math.beam
files_math.ex += __MAIN__-NewMath.beam

files_macro_test.exs += __MAIN__-MacroTest.beam
files_macro_test.exs += __MAIN__-Macro-ExternalTest.beam

: foreach math.ex macro_test.exs |> ../bin/elixirc %f |> $(files_%f)

math.ex:
defmodule Math do
def sum(a, b) do
a + b
end
end

defmodule NewMath do
def sum(a, b) do
a + b
end
end

And macro_test.exs is from the elixir/test/elixir directory in the git
tree. Whenever you add/remove a module you have to update the Tupfile,
but tup will give an error if it's wrong. You could try to generate
the files_* variables from a script, but the only way to do that would
be to read the .ex/.exs files while parsing the Tupfile, which means
anytime you modify the source file at all it will re-parse the
Tupfile.

> 3. In-out files (When multiple modules are specified in single source
> file elixirc checks that target beam file might be compiled already).

Can you explain this further? When I tried running elixirc twice in a
row on math.ex in the command-line, it just gave an error. Since tup
removes the outputs before running the command again, it essentially
prevents the use of in-out files. Is there a case where two commands
need to update the same beam file?

> 4. Target file name is more complex and cannot be represented through
> usage of %-flags

Again please let me know if the latest master helps at all here. You
can see tup/test/t2126 as an example.

>
> For in-out files there is a workaround as follows:
>   elixirc -o .build %f && mv .build/* . && rm -rf .build
> However in order to use it I have to specify:
> : $(APP_TOP)/src/M.ex |> elixirc -o .build %f && mv .build/* . && rm -
> rf .build |> __MAIN__-M-A.beam
> __MAIN__-M-B.beam
> I.e. we have to explicitly specify for every input which outputs it
> produces.
>
> I thought I could write a file which can be used to map src file to
> multiple *.beam files. So later I could generate rules like this:
>  : src/application_module.ex |> elixirc -o ebin %f |> __MAIN__-M-
> B.beam __MAIN__-M-A.beam
> But due to 1.2 and 1.3 (see below) it failed.
>
> At one point it was a proposal to let tup to write to any files into
> any directory below the current one. What is the status of it?

I haven't had time to look into it at all, sorry.

>
> The things I've tried.
> 1. Running script invoked from ebin/Tupfile
>    The reasons why this approach fails are:
>      1.1. Run script can read files only in current directory
>      1.2. Run script cannot write to files.
>      1.3. Run scripts cannot use generated files.

1.1: I think this could be fixed by specifying what directories are
readable in the 'run' line, though I'm curious if you still find it
necessary to use a run-script.
1.2: What files are you trying to write to while parsing the Tupfile, and why?
1.3: No generated files exist when you first parse a Tupfile, so this
doesn't really make sense.

> 2. Copying files into intermediate directory compile it there and then
> move them into ebin
>    The reasons why this approach fails are:
>    2.1. You can write only into current directory
>    2.2. You cannot read from /tmp or .build (hidden) directories
> (from different rules)
>
> Possible solutions are (this is sort of brainstorm so some of them
> might be stupid or very hard to implement):
>  1. Allow specifying *.beam (i.e. wildcards) in output part of a rule
> (automatically add outputs (detected by fuse) into DAG). Which is
> similar implementation wise to the latest 'groups' feature.

I tried to do something like this a while ago, where you would specify
a 'dynamic bin' as the output instead of files. Then it would attach
all generated files to this bin. The problems that I couldn't figure
out how to solve were:

1) What should tup do when a generated file would be processed by the
Tupfile? Eg: my command creates a .c file, and I have a "foreach *.c"
rule.
2) What should tup do when a generated file overwrites a ghost node?
Ghosts are created in the database whenever a command attempts to
access a file but fails. These files would affect the build when they
are created, so when this happens tup would somehow need to restart
the build and load a new DAG knowing that the file now exists.

In short, I don't think it is going to be an easy fix to allow
commands to run with unspecified outputs.

>  2. Provide a way to read outputs from file.

I'm not sure what you mean here - can you clarify?

>  3. Allow specifying shell commands in output part of a rule.
>  4. Allow specifying shell commands in input part of a rule.
>  5. Eval shell commands in place. When we specify 'FILES=`find src/
> *.ex`' current implementation replaces every $(FILES) reference with
> '`find src/*.ex`'. If we would have an option to evaluate them (or
> call external scripts) on parse stage we could use variables in input
> or output part of a rule.
>  6. Fix 1.1 or (1.2 and 1.3) (see above).

I think 3,4,5 are better handled by the run-scripts. If 1.1 is
addressed would that be sufficient?

>  7. Provide additional hooks for run files of different kind.
>      7.1. When foreach rule detects new files (which match with given
> regexp) in previously scanned directory we call user provided script.
>      7.2. When fuse detects write to unspecified output we call user
> provided script.

Right now tup runs in very well-defined stages. The parser stage reads
the build description, and generates command strings in the DAG. The
updater stage, where commands actually run and create their outputs,
loads a fixed part of the DAG and then walks through it. I think
adding the ability for the updater stage to jump back to the parser
stage and apply rules to generate new commands in the currently
running DAG would add a lot of complexity and be very difficult to get
right.

Please try the latest master and let me know if that helps at all in
explicitly specifying the outputs for elixirc. It seemed to work well
in the simple example I tried, but this is just another language I
have no experience with so I may be missing the bigger picture :)

Thanks,
-Mike

Simon Werbeck

unread,
Jun 29, 2012, 7:42:04 PM6/29/12
to tup-...@googlegroups.com
I use this is the approach for one project. However, I have some
concerns about this and I'm not sure whether it's the compiler that's
supposed to change or the build system. To illustrate the problem,
consider the following dependency graph:

a.vala -> b.vala -> c.vala

Where c.vala depends on the fast-vapi of b.vala etc. So that the
VAPISFLAGS_ will look like this:

VAPISFLAGS_b.vala += --use-fast-vapi=a.vapi
VAPISFLAGS_c.vala += --use-fast-vapi=a.vapi
VAPISFLAGS_c.vala += --use-fast-vapi=b.vapi

As you can see a.vapi has to be listed twice even though only b.vala
directly depends on it which makes this approach rather impractical for
larger projects (think of renaming/removing files, moving symbols around
files). An improvement would then be to write out include hierarchies
using $-references:

VAPISFLAGS_b.vala += --use-fast-vapi=a.vapi
VAPISFLAGS_c.vala += $(VAPISFLAGS_b.vapi)
VAPISFLAGS_c.vala += --use-fast-vapi=b.vapi

But this only makes sense for acyclic dependencies. Once you have a
situation like this:

other deps other deps
. \ / .
. - a.vala <=> b.vala - .
. / \ .

you end up with:

VAPISFLAGS_a.vala += other deps
VAPISFLAGS_a.vala += --use-fast-vapi=b.vapi
VAPISFLAGS_a.vala += $(VAPISFLAGS_b.vala) #which is undeclared at this point

But even if that weren't an issue there is still the problem, that valac
doesn't support over-specifying fast-vapis, i.e. by running

valac --use-fast-vapi=foo.vapi --use-fast-vapi=foo.vapi bar.vala

the compiler will complain about multiple definitions (which translates
to no #include-guard support like in c).

From this point on, I guess the sanest solution would be to write
dependency lists in foo.inc files and use scripts to generate the flags
from that.

Mike Shal

unread,
Jun 30, 2012, 9:38:45 PM6/30/12
to tup-...@googlegroups.com
Hi Simon,
Yes, that definitely is annoying. Good point.

>
> VAPISFLAGS_b.vala += --use-fast-vapi=a.vapi
> VAPISFLAGS_c.vala += $(VAPISFLAGS_b.vapi)
> VAPISFLAGS_c.vala += --use-fast-vapi=b.vapi
>
> But this only makes sense for acyclic dependencies. Once you have a
> situation like this:
>
> other deps other deps
> . \ / .
> . - a.vala <=> b.vala - .
> . / \ .
>
> you end up with:
>
> VAPISFLAGS_a.vala += other deps
> VAPISFLAGS_a.vala += --use-fast-vapi=b.vapi
> VAPISFLAGS_a.vala += $(VAPISFLAGS_b.vala) #which is undeclared at this point

Ok, I wasn't aware you could have dependencies like this too.

>
> But even if that weren't an issue there is still the problem, that valac
> doesn't support over-specifying fast-vapis, i.e. by running
>
> valac --use-fast-vapi=foo.vapi --use-fast-vapi=foo.vapi bar.vala
>
> the compiler will complain about multiple definitions (which translates to
> no #include-guard support like in c).
>
> From this point on, I guess the sanest solution would be to write dependency
> lists in foo.inc files and use scripts to generate the flags from that.

This definitely sounds like a good idea to me. Have you tried to write
such a script? If not, can you try the attached one to see if it
works? You should be able to run it like:

: foreach *.vala |> valac `vapiflags.pl %f` %f -C |> %B.c

Though I don't have a working valac on this machine so I can't try it
out for sure. You would need to write a .inc file for each .vapi file
that just has simple "include" statements. Eg if a.vapi depends on
b.vapi:

a.inc:
include b.inc

b.inc:
(empty)

It should remove duplicates, and not specify a.vapi when building
a.vala. I think it works correctly with circular dependent files, but
I just tried it with a few stub .inc files.

Feel free to use & modify as you see fit. If you already have a better
solution please write back - this kind of thing should probably go on
the tips & tricks page to help others.

Thanks!
-Mike
vapiflags.pl

Simon Werbeck

unread,
Jul 1, 2012, 5:01:00 AM7/1/12
to tup-...@googlegroups.com
Sorry for answering this late, but in the meantime I have also written a
script (python, I didn't code in perl before :P ). One note on your
script, I think this should read,

my $inc = $ARGV[0];
$inc =~ s/vala$/inc/;

Since it will be called with %f which is the .vala source file. Without
this it will trie to read includes from the .vala file.
Actually my .inc files list the dependencies without extension (which is
OK I think, valac allows only the extension .vapi anyway).
Also it won't require the .inc files to exists, this way the directory
is a bit cleaner and I have to write the .inc's by hand anyway.
Additionally, if a line starts with [private], all following
dependencies belong to the corresponding .vala file only and won't be
included by other .inc's. This would be equivalent to the #includes that
go in a .c but not the header file. (in Vala you would have symbols
declared as private, or as parameters to private methods. Those don't
show up in the .vapi)

Anyway, I would really like to here your suggestions on this. I've also
attached the script (in fact, I think I've gone a bit rusty coding in
python lately).

Many thanks for taking the time to look into this matter.

Simon
vapiflags.py

ILYA Khlopotov

unread,
Jul 2, 2012, 5:32:30 PM7/2/12
to tup-...@googlegroups.com
Hi Mike,

> By variable outputs, what exactly do you mean? Tup does support more
> than a single output, but it has the requirement that all outputs must
> be specified. If you mean that you'd like to be able to run a command
> and not explicitly say what files are created, I'm not sure that would
> help in all of these cases.
I meant gathering output dependencies automatically. Since "tup monitor -a" is able to detect new source files in src/ directory if ebin/Tupfile contains 'foreach ../src/*.ex'.

>Is there a reason you can't put the Tupfile in ebin? There isn't
> really much difference between doing:
> src/Tupfile:
> compile *.ex -> ../ebin
> ...compared to...
> ebin/Tupfile:
> compile ../src/*.ex -> .
My Tupfile is in ebin directory. I'm using something like:
cat ebin/Tupfile
include_rules
: foreach $(APP_TOP)/src/*.ex |> elixirc %f |> <outputs>

> Can you try out the latest master and see if it helps? 
This is great and could be used to build elixir based projects with explicitly specified outputs.
However it doesn't work with my setup. Here is the failing test.
. ./tup.sh
mkdir src
cat > src/Tupfile << HERE
files_foo.in += foo1.out
files_foo.in += foo2.out
files_bar.in += bar1.out
: foreach *.in |> echo %f > %o |> \$(files_%f)
HERE
tup touch src/foo.in src/bar.in
tup parse

tup_dep_exist . 'echo src/foo.in > src/foo1.out src/foo2.out' . foo1.out
tup_dep_exist . 'echo src/foo.in > src/foo1.out src/foo2.out' . foo2.out
tup_dep_exist . 'echo src/bar.in > src/bar1.out' . bar1.out

eotup

And another one:
. ./tup.sh
mkdir src
mkdir dest
cat > dest/Tupfile << HERE
files_foo.in += foo1.out
files_foo.in += foo2.out
files_bar.in += bar1.out
: foreach ../src/*.in |> echo %f > %o |> \$(files_%b)
HERE
tup touch src/foo.in src/bar.in
tup parse

tup_dep_exist . 'echo src/foo.in > dest/foo1.out dest/foo2.out' . foo1.out
tup_dep_exist . 'echo src/foo.in > dest/foo1.out dest/foo2.out' . foo2.out
tup_dep_exist . 'echo src/bar.in > dest/bar1.out' . bar1.out

eotup

I also have tried following: 

$tup upd
[ tup ] [0.000s] Scanning filesystem...
[ tup ] [0.150s] Reading in new environment variables...
[ tup ] [0.150s] Parsing Tupfiles...
 1) [0.002s] .
 [ ] 100%
[ tup ] [0.164s] No files to delete.                                            
[ tup ] [0.164s] Deleting 1 command...
[ tup ] [0.325s] Executing Commands...                                          
* 1) ebin: elixirc ../src/foo.ex                                                
Compiled ../src/foo.ex
 *** tup errors ***
tup error: Unspecified output files - A command is writing to files that you didn't specify in the Tupfile. You should add them so tup knows what to expect.
 -- Unspecified output: ebin/__MAIN__-M-A.beam
 -- Unspecified output: ebin/__MAIN__-M-B.beam
tup error: Expected to write to file '__MAIN__-M.A.beam' from cmd 758 but didn't
tup error: Expected to write to file '__MAIN__-M.B.beam' from cmd 758 but didn't
 *** Command ID=758 ran successfully, but tup failed to save the dependencies.
 [ ] 100%
 *** tup: 1 job failed.

My setup is as follows:
$ cat Tuprules.tup 
APP_TOP = $(TUP_CWD)
HOME = $(HOME)
export HOME

$ cat ebin/Tupfile
include_rules
: foreach $(APP_TOP)/src/foo.ex |> elixirc %f |> __MAIN__-M.A.beam __MAIN__-M.B.beam
$ cat src/foo.ex
defmodule M.A do
end
defmodule M.B do
end

> 3. In-out files (When multiple modules are specified in single source
> file elixirc checks that target beam file might be compiled already).
>> Can you explain this further? When I tried running elixirc twice in a
>> row on math.ex in the command-line, it just gave an error. Since tup
>> removes the outputs before running the command again, it essentially
>> prevents the use of in-out files. Is there a case where two commands
>> need to update the same beam file?
This is the explanation from documentation of a compiler.
'... files are compiled in parallel and can automatically detect dependencies between them. Once a dependency is found, the current file stops being compiled until the dependency is resolved.'
I'm not sure if it starts writing to target file before compilation is finished or not. However it should be easy to verify using strace or tup monitor.
I didn't run into problems with dependencies just yet since I was unable to compile basic project yet.  

1.1: I think this could be fixed by specifying what directories are
readable in the 'run' line, though I'm curious if you still find it
necessary to use a run-script.
1.2: What files are you trying to write to while parsing the Tupfile, and why?
1.3: No generated files exist when you first parse a Tupfile, so this
doesn't really make sense.
Since in case of elixir the only way to get list of outputs automatically is to ask compiler to compile the file. The idea is to do following:
1. let elixirc compile foo.ex file
2. by looking at *.beam files created get list of outputs and save it into the file
3. Use somehow this file with mappings foo.ex:  __MAIN__-M.A.beam __MAIN__-M.B.beam to produce rules from run script
4. If at any point foo.ex produces different outputs rather then specified in mappings file regenerate it.

It looks like a stupid idea but if we can use some extra features from tup it can be implemented. Let's imagine that tup behaving like this:
1. execute rule when output is defined as wildcard in ebin directory
    : foreach ../src/*.ex |> elixirc %f |> *.beam
2. tup monitor gets list of output files
3. tup checks if those outputs in DAG
3.1. if they are and they are the same as in previous run just keep them in the DAG and keep going
3.2. if there are less outputs than currently in the DAG and return code from compiler is 0.
          3.2.1. Remove missing outputs from DAG and from directory
3.3. if there are more outputs than currently in the DAG and return code from compiler is 0. 
         3.3.1. Add new outputs to the DAG
In this case there is no need for explicit dependencies. There is one corner case though what if one of the modules (let's say foo) defines macro than every other module using that macro requires beam file of foo. In this case we could specify the rule as follows: : foreach ../src/*.ex | foo.beam |> elixirc %f |> *.beam. However I would prefer to specify it as:
MACROS += foo.beam
MACROS += bar.beam
: foreach ../src/*.ex | $(MACROS) |> elixirc %f |> *.beam

>>  2. Provide a way to read outputs from file.
> I'm not sure what you mean here - can you clarify?
See above
..'3. Use somehow this file with mappings foo.ex:  __MAIN__-M.A.beam __MAIN__-M.B.beam to produce rules from run script'..
$(files_%f) in output specification is very close to what I'm looking for. However the file with mappings need to be generated. Through some external logic in run script it could be possible to make this MAPPINGS CACHE file consistent. This MAPPINGS CACHE file is actually a list of products of given input. It might be possible to get it from tup monitor and manage it using sqlite or store in the DAG itself see algorithm proposed earlier.

>>  3. Allow specifying shell commands in output part of a rule.
>>  4. Allow specifying shell commands in input part of a rule.
>>  5. Eval shell commands in place. When we specify 'FILES=`find src/
>> *.ex`' current implementation replaces every $(FILES) reference with
>> '`find src/*.ex`'. If we would have an option to evaluate them (or
>> call external scripts) on parse stage we could use variables in input
>> or output part of a rule.
>>  6. Fix 1.1 or (1.2 and 1.3) (see above).
> I think 3,4,5 are better handled by the run-scripts. If 1.1 is
> addressed would that be sufficient?
Yes if reading from generated file (MAPPINGS CACHE) also would be possible. However implementing this without introducing weaknesses to break sanity checks could be tricky.

Best regards,
ILYA

Mike Shal

unread,
Jul 2, 2012, 6:50:53 PM7/2/12
to tup-...@googlegroups.com
Hi Simon,

On Sun, Jul 1, 2012 at 5:01 AM, Simon Werbeck
<simon....@googlemail.com> wrote:
> Sorry for answering this late, but in the meantime I have also written a
> script (python, I didn't code in perl before :P ). One note on your script,
> I think this should read,
>
> my $inc = $ARGV[0];
> $inc =~ s/vala$/inc/;
>
> Since it will be called with %f which is the .vala source file. Without this
> it will trie to read includes from the .vala file.

Whoops, you are correct of course.

> Actually my .inc files list the dependencies without extension (which is OK
> I think, valac allows only the extension .vapi anyway).
> Also it won't require the .inc files to exists, this way the directory is a
> bit cleaner and I have to write the .inc's by hand anyway.
> Additionally, if a line starts with [private], all following dependencies
> belong to the corresponding .vala file only and won't be included by other
> .inc's. This would be equivalent to the #includes that go in a .c but not
> the header file. (in Vala you would have symbols declared as private, or as
> parameters to private methods. Those don't show up in the .vapi)

Those all sound like good features to me.

>
> Anyway, I would really like to here your suggestions on this. I've also
> attached the script (in fact, I think I've gone a bit rusty coding in python
> lately).

Well I am obviously not the vala expert here, so if the script works
for you and addresses your concerns with maintaining the
--use-fast-vapi flags in a simple way, then by all means use it. I
think the only thing the perl script I posted did a little differently
was allow this sort of setup:

a.inc:
sub/d
sub2/e

sub/d.inc:
../sub2/e

(sub2/e.inc doesn't exist or is empty).

In the script you posted, it doesn't keep track of the fact that
'../sub2/e' is relative to 'sub/', so it should really try to open the
path 'sub/../sub2/e'. I had the perl version do some path
manipulations so it would canonicalize down to 'sub2/e' and thus
remove the duplicate, since a.inc already specifies sub2/e. I have no
idea if this would ever arise in practice, so if not don't worry about
it :).

Also you may want to allow the script to take the .vala name as an
argument, rather than the basename without the extension (or just
strip .vala if present). This way you can pass in the %f flag from
tup, so it will work properly even if the .vala file is a
subdirectory, such as by doing : foreach sub/*.vala |> ... |>

If you like, I can post your python script on the Tips & Tricks page
for others to use. If so please add your copyright info in a comment
at the top of the script & whatever license you want.

>
> Many thanks for taking the time to look into this matter.

Thanks for your feedback & patience!

-Mike

Mike Shal

unread,
Jul 7, 2012, 11:36:40 AM7/7/12
to tup-...@googlegroups.com
Hi ILYA,

On Mon, Jul 2, 2012 at 5:32 PM, ILYA Khlopotov <ilya.kh...@gmail.com> wrote:
> Hi Mike,
>
>> By variable outputs, what exactly do you mean? Tup does support more
>> than a single output, but it has the requirement that all outputs must
>> be specified. If you mean that you'd like to be able to run a command
>> and not explicitly say what files are created, I'm not sure that would
>> help in all of these cases.
> I meant gathering output dependencies automatically. Since "tup monitor -a"
> is able to detect new source files in src/ directory if ebin/Tupfile
> contains 'foreach ../src/*.ex'.

Just to clarify, the dependency checker (fuse / DLL injection) would
be the one to detect this, not the monitor. The monitor is just used
to detect files modified outside of tup, so that at startup it can
skip the file-system scan. But just because tup can detect which files
were written does not mean it is easy to correctly add them into the
DAG. It would still have to take into account any rules that may use
them as inputs (which have already been parsed), and any previously
issued commands that may have used them as inputs (eg: ghost nodes). I
don't know of an easy way to handle this.
The test just has a few issues. The first thing failing is the 'tup
touch' command - it is a very basic test command that just tries to
create the node specified. However, it doesn't create directory nodes
automatically, so it is failing because the node for 'src' doesn't
exist yet. The way this is handled is by aliasing mkdir as 'tmkdir',
which will create a directory and the corresponding node.

The second thing failing is the tup_dep_exist lines are not accurate.
The four parameters it takes are: [directory1] [file or command 1]
[directory2] [file or command 2]. In this case, the Tupfile is in
src/, so that's where the commands live. The first parameter should be
'src', not '.'. Also the command string isn't accurate - it should be
'echo foo.in > foo1.out foo2.out' since the command is running in the
src directory. And similar to the command, the output files also live
in src, so the third parameter needs to change as well. The full
working test case is as follows:

. ./tup.sh
tmkdir src
cat > src/Tupfile << HERE
files_foo.in += foo1.out
files_foo.in += foo2.out
files_bar.in += bar1.out
: foreach *.in |> echo %f > %o |> \$(files_%f)
HERE
tup touch src/foo.in src/bar.in
tup parse

tup_dep_exist src 'echo foo.in > foo1.out foo2.out' src foo1.out
tup_dep_exist src 'echo foo.in > foo1.out foo2.out' src foo2.out
tup_dep_exist src 'echo bar.in > bar1.out' src bar1.out

eotup

Note I didn't change your Tupfile here, so I don't think this exposes
an actual issue.

>
> And another one:
> . ./tup.sh
> mkdir src
> mkdir dest
> cat > dest/Tupfile << HERE
> files_foo.in += foo1.out
> files_foo.in += foo2.out
> files_bar.in += bar1.out
> : foreach ../src/*.in |> echo %f > %o |> \$(files_%b)
> HERE
> tup touch src/foo.in src/bar.in
> tup parse
>
> tup_dep_exist . 'echo src/foo.in > dest/foo1.out dest/foo2.out' . foo1.out
> tup_dep_exist . 'echo src/foo.in > dest/foo1.out dest/foo2.out' . foo2.out
> tup_dep_exist . 'echo src/bar.in > dest/bar1.out' . bar1.out
>
> eotup

Similar changes are needed here, and I think this is also not an issue:

. ./tup.sh
tmkdir src
tmkdir dest
cat > dest/Tupfile << HERE
files_foo.in += foo1.out
files_foo.in += foo2.out
files_bar.in += bar1.out
: foreach ../src/*.in |> echo %f > %o |> \$(files_%b)
HERE
tup touch src/foo.in src/bar.in
tup parse

tup_dep_exist dest 'echo ../src/foo.in > foo1.out foo2.out' dest foo1.out
tup_dep_exist dest 'echo ../src/foo.in > foo1.out foo2.out' dest foo2.out
tup_dep_exist dest 'echo ../src/bar.in > bar1.out' dest bar1.out

eotup

Please correct me if I'm misunderstanding the intent of these test cases.
In the Tupfile you have "__MAIN__-M.A.beam" but it is writing to
"__MAIN__-M-A.beam". Note the "M.A" vs "M-A". I tried it by listing
the correct output files and it updated successfully.

>
>> 3. In-out files (When multiple modules are specified in single source
>> file elixirc checks that target beam file might be compiled already).
>>> Can you explain this further? When I tried running elixirc twice in a
>>> row on math.ex in the command-line, it just gave an error. Since tup
>>> removes the outputs before running the command again, it essentially
>>> prevents the use of in-out files. Is there a case where two commands
>>> need to update the same beam file?
> This is the explanation from documentation of a compiler.
> '... files are compiled in parallel and can automatically detect
> dependencies between them. Once a dependency is found, the current file
> stops being compiled until the dependency is resolved.'
> I'm not sure if it starts writing to target file before compilation is
> finished or not. However it should be easy to verify using strace or tup
> monitor.
> I didn't run into problems with dependencies just yet since I was unable to
> compile basic project yet.

I don't know what that means exactly, but it sounds like it is trying
to do too much work here. Likely it will conflict with the work that
tup is supposed to be doing, but I don't know for sure.
It is in 3.2.1 and 3.3.1 where I think the issues will be. For
example, removing a file from the DAG means that tup has to rebuild
everything it points to. However, by the time you are running
commands, tup has already built the complete partial DAG that it is
supposed to be using - it doesn't have provisions to update it on the
fly.

> In this case there is no need for explicit dependencies. There is one corner
> case though what if one of the modules (let's say foo) defines macro than
> every other module using that macro requires beam file of foo. In this case
> we could specify the rule as follows: : foreach ../src/*.ex | foo.beam |>
> elixirc %f |> *.beam. However I would prefer to specify it as:
> MACROS += foo.beam
> MACROS += bar.beam
> : foreach ../src/*.ex | $(MACROS) |> elixirc %f |> *.beam

If I understand you correctly, you have to convert foo.ex -> foo.beam
before converting doing baz.ex -> baz.beam which uses macros from foo?
If so, wild-carding *.ex -> *.beam is going to be more difficult. Is
there a way that when converting baz.ex -> baz.beam it can read from
foo.ex instead of foo.beam? Ie: Just re-use the source rather than the
generated output. This is much easier to handle since you don't have
to manually specify the correct ordering, or parse all of the .ex
files once to generate the ordering and then again a second time to
actually compile them.

>
>>> 2. Provide a way to read outputs from file.
>> I'm not sure what you mean here - can you clarify?
> See above
> ..'3. Use somehow this file with mappings foo.ex: __MAIN__-M.A.beam
> __MAIN__-M.B.beam to produce rules from run script'..
> $(files_%f) in output specification is very close to what I'm looking for.
> However the file with mappings need to be generated. Through some external
> logic in run script it could be possible to make this MAPPINGS CACHE file
> consistent. This MAPPINGS CACHE file is actually a list of products of given
> input. It might be possible to get it from tup monitor and manage it using
> sqlite or store in the DAG itself see algorithm proposed earlier.

I think you could use a run script to parse the .ex files (it doesn't
even have to be elixirc if you're just looking for 'defmodule' lines -
it could be a python script or whatever you're familiar with). You
don't need to generate a separate mappings file - this could be done
on the fly to generate tup :-rules in the right order.

>
>>> 3. Allow specifying shell commands in output part of a rule.
>>> 4. Allow specifying shell commands in input part of a rule.
>>> 5. Eval shell commands in place. When we specify 'FILES=`find src/
>>> *.ex`' current implementation replaces every $(FILES) reference with
>>> '`find src/*.ex`'. If we would have an option to evaluate them (or
>>> call external scripts) on parse stage we could use variables in input
>>> or output part of a rule.
>>> 6. Fix 1.1 or (1.2 and 1.3) (see above).
>> I think 3,4,5 are better handled by the run-scripts. If 1.1 is
>> addressed would that be sufficient?
> Yes if reading from generated file (MAPPINGS CACHE) also would be possible.
> However implementing this without introducing weaknesses to break sanity
> checks could be tricky.

Yes, maintaining sanity is why tup doesn't write to files during the
parsing stage.

In summary, as far as I can tell there are two issues to deal with:

1) Since the output files depend on the 'defmodule' lines in the file,
you need to have some way to tell tup what the outputs are. You can
either:
a) Write a run-script that parses the .ex files and generates
:-rules with the correct file output listings, or
b) Manually specify the outputs in the Tupfile using $(files_%b) and such

Solution a) has the down-side that anytime you change a .ex file, tup
will need to re-parse the Tupfile (which means your script will run,
and re-read every .ex file). Solution b) is faster, but has the
down-side that whenever you change the module definitions in your .ex
files, you need to update the Tupfile. A good thing about tup here is
that it will tell you when you've missed an output (or specified to
many), so you will just get an error that you can then immediately
clean up. I think this would really only be a minor annoyance to keep
updated, unless you are changing the module definitions very often.

2) If baz.ex depends on foo.ex, then you have to compile foo.ex before
baz.ex (if I understood you correctly). You can either:
c) Write a run-script (ie: additional functionality in a single
run-script if you went with part a) above). This will parse the .ex
files and write out the :-rules in the correct order with the right
dependencies.
d) Manually list the .ex files in the right order in the Tupfile.
This could be something like:
exfiles_stage1 += foo.ex
exfiles_stage1 += bar.ex
: foreach $(exfiles_stage1) |> !elixirc |> ... <group1>

exfiles_stage2 += baz.ex
: foreach $(exfiles_stage2) | <group1> |> !elixirc |> ...
So anything in stage2 can use .beam files from stage1. This can be
annoying to maintain if you don't have a solid architecture up-front
and are moving functionality & dependencies around.
e) Get elixirc to only read from .ex files (similar to javac's
-Xprefer:source). This way when compiling baz.ex, it reads from foo.ex
and not foo.beam. There are no tricky dependencies to maintain, and if
you change foo.ex tup is responsible for re-compiling the correct
beams (foo's and baz's).

I think e) is the best approach here, but if that is not possible you
may just want to use a run-script (so select a) and c)) and take the
performance hit.

Let me know if I missed anything here.
-Mike
Reply all
Reply to author
Forward
0 new messages