Hi ILYA,
On Sun, Jun 24, 2012 at 7:42 PM, ILYA <
ilya.kh...@gmail.com> wrote:
> Hi,
>
> I want to resurrect variable number of outputs discussion. Tup is a
> great tool and I'm glad it exists. I'm currently trying to adapt it to
> my project and given set of existent tup features and limitations it
> is seems like it would be very hard to do. Given problems that javac/
> ocaml/vala/erlang users are experiencing I think that their use cases
> should be considered. In order to find a solution which would be easy
> to implement and which satisfy most of the users.
By variable outputs, what exactly do you mean? Tup does support more
than a single output, but it has the requirement that all outputs must
be specified. If you mean that you'd like to be able to run a command
and not explicitly say what files are created, I'm not sure that would
help in all of these cases.
I'm not explicitly ignoring these languages, and I don't think the
problems that arise are specific to the languages themselves, but
rather that the compilers try to take on some of the responsibilities
of the build system instead of just being a simple compiler. This
conflicts with tup (and other build systems) that are supposed to be
the ones doing dependency management and the like. For example,
consider the java case - if I have A.java and B.java, where A uses
stuff from B. As a build system, by looking at the directory I see:
A.java
B.java
From this I have no idea which should be built first. Suppose I build
A.java first - javac sees that it depends on B.java and so it builds
A.class *and* B.class. This presents a problem if I am also trying to
build B.java in to B.class independently (or compiling another java
file into a class file), because it will randomly fail if B.class is
only generated partially when another parallel job tries to read from
it.
Another problem is that if I build B.java first and get B.class, now
when I build A.java it reads from the class file instead of the java
file. As I recall, another language (maybe Ocaml or Haskell?) actually
*requires* that you build the base file before building the dependent
file. This is problematic as well because we have to read the source
files (not just the directory entries) before we can know what order
things have to be built in. As a side effect, it also means you have
to re-parse the build description every time a source file changes,
which shouldn't be necessary (but for build systems that always parse
the entire build description, maybe that doesn't matter).
What the compiler should be doing is when I tell it "compile A.java
and create A.class", it should read from A.java, and then when it sees
that it uses B, it should read from B.java. The only class file it
should create is A.class, since that's all that was requested. I
should note that the OpenJDK compiler at least has options to do just
this:
-implicit:none disables generating B.class in this case
-Xprefer:source reads from B.java instead of B.class when we compile A.java
Unfortunately OpenJDK also tries to stat every file in a directory
(rather than just the ones it uses), which causes tup to think it has
a dependency on all of those files. I was able to hack javac to not do
this and can build java files like so:
: foreach *.java |> javac %f -implicit:none -Xprefer:source
-sourcepath . |> %B.class
Someone with real java experience could probably do a proper patch and
get it into OpenJDK. The only thing that remains (as far as I know) is
that compiling a java file with inner classes results in multiple
class files. This can be partly addressed with some changes I just
pushed to allow variable outputs based on the input filename. Eg you
could do:
outputs_A.java += A$inner.class (or however these are named)
: foreach *.java |> javac ... |> %B.class $(outputs_%f)
So I don't believe there is anything in Java (the language) that
prevents tup from building it. Getting javac (the compiler) to do only
what it's told is a bit tricky, but can be done.
As far as I can tell, the problem with vala isn't that the outputs are
variable, but rather with specifying the --fast-vapi flags. I don't
believe there is the equivalent of a "#include" statement, so instead
of specifying what vapi files a vala file depends on in the vala file,
you have to specify it in the build description. Consider this analogy
to C:
bar.c:
#include "foo.h"
void bar(void) { foo(); }
build file:
foreach .c file
gcc -c FILENAME
So you tell the compiler that we need foo.h in bar.c. You could also
do this in the build description:
bar.c:
void bar(void) { foo(); }
build file:
foreach .c file
gcc -c FILENAME -include foo.h
Managing the 'include X.h' flags for each .c file is more difficult to
do here, but it could be done. It's much easier to just put the
#include's in the .c file. However, putting the --use-fast-vapi flags
in the build description is essentially the only way to do it with
vala. The best way I know of to handle this in tup is to manually list
what vapi files each input needs:
: foreach *.vala |> valac --fast-vapi=%B.vapi %f |> %B.vapi
$(MYPROJ_ROOT)/<vapi>
VAPISFLAGS_main.vala += --use-fast-vapi=foo.vapi
VAPISFLAGS_foo.vala += --use-fast-vapi=bar.vapi
: foreach *.vala | $(MYPROJ_ROOT)/<vapi> |> valac $(VAPISFLAGS_%f) %f -C |> %B.c
I don't recall ocaml well enough to know exactly what the issue is. I
think the compiler behaved similar to javac except that it also
required you to order the ocamlc compilations correctly (so tup has to
know that it must do 'ocamlc B' before 'ocamlc A'), otherwise it
fails. If it had an option similar to javac to only read from source
files, I think it would be much easier to work with.
Keep in mind I have no real experience with any of these languages. If
I am wrong in any of my assumptions please yell at me.
>
> The project that I'm working on uses elixir language which is working
> on top of Erlang VM. Elixir project in addition supports specifying
> multiple modules in one single source file.
> More over the directory structure is quite strict and you cannot
> really adjust it. The structure of typical elixir project is like
> this:
> ebin/__MAIN__-M-A.beam
> ebin/__MAIN__-M-B.beam
> ebin/
application.app # generated file containing
> # modules = ['__MAIN__-M-A.beam',
> '__MAIN__-M-B.beam']
> src/M.ex # it defines modules M.A and M.B
>
> All of the above assumes following set of features supported by build
> system.
> 1. Writing to directories other than directory containing source
> files.
Is there a reason you can't put the Tupfile in ebin? There isn't
really much difference between doing:
src/Tupfile:
compile *.ex -> ../ebin
...compared to...
ebin/Tupfile:
compile ../src/*.ex -> .
> 2. Variable number of outputs.
Can you try out the latest master and see if it helps? You still have
to specify the outputs, but since those only change when you
add/remove modules (correct?) I don't think it is too much of a
burden. I was able to install elixir and build some test files like
so:
Tupfile:
export HOME
files_math.ex += __MAIN__-Math.beam
files_math.ex += __MAIN__-NewMath.beam
files_macro_test.exs += __MAIN__-MacroTest.beam
files_macro_test.exs += __MAIN__-Macro-ExternalTest.beam
: foreach math.ex macro_test.exs |> ../bin/elixirc %f |> $(files_%f)
math.ex:
defmodule Math do
def sum(a, b) do
a + b
end
end
defmodule NewMath do
def sum(a, b) do
a + b
end
end
And macro_test.exs is from the elixir/test/elixir directory in the git
tree. Whenever you add/remove a module you have to update the Tupfile,
but tup will give an error if it's wrong. You could try to generate
the files_* variables from a script, but the only way to do that would
be to read the .ex/.exs files while parsing the Tupfile, which means
anytime you modify the source file at all it will re-parse the
Tupfile.
> 3. In-out files (When multiple modules are specified in single source
> file elixirc checks that target beam file might be compiled already).
Can you explain this further? When I tried running elixirc twice in a
row on math.ex in the command-line, it just gave an error. Since tup
removes the outputs before running the command again, it essentially
prevents the use of in-out files. Is there a case where two commands
need to update the same beam file?
> 4. Target file name is more complex and cannot be represented through
> usage of %-flags
Again please let me know if the latest master helps at all here. You
can see tup/test/t2126 as an example.
>
> For in-out files there is a workaround as follows:
> elixirc -o .build %f && mv .build/* . && rm -rf .build
> However in order to use it I have to specify:
> : $(APP_TOP)/src/M.ex |> elixirc -o .build %f && mv .build/* . && rm -
> rf .build |> __MAIN__-M-A.beam
> __MAIN__-M-B.beam
> I.e. we have to explicitly specify for every input which outputs it
> produces.
>
> I thought I could write a file which can be used to map src file to
> multiple *.beam files. So later I could generate rules like this:
> : src/application_module.ex |> elixirc -o ebin %f |> __MAIN__-M-
> B.beam __MAIN__-M-A.beam
> But due to 1.2 and 1.3 (see below) it failed.
>
> At one point it was a proposal to let tup to write to any files into
> any directory below the current one. What is the status of it?
I haven't had time to look into it at all, sorry.
>
> The things I've tried.
> 1. Running script invoked from ebin/Tupfile
> The reasons why this approach fails are:
> 1.1. Run script can read files only in current directory
> 1.2. Run script cannot write to files.
> 1.3. Run scripts cannot use generated files.
1.1: I think this could be fixed by specifying what directories are
readable in the 'run' line, though I'm curious if you still find it
necessary to use a run-script.
1.2: What files are you trying to write to while parsing the Tupfile, and why?
1.3: No generated files exist when you first parse a Tupfile, so this
doesn't really make sense.
> 2. Copying files into intermediate directory compile it there and then
> move them into ebin
> The reasons why this approach fails are:
> 2.1. You can write only into current directory
> 2.2. You cannot read from /tmp or .build (hidden) directories
> (from different rules)
>
> Possible solutions are (this is sort of brainstorm so some of them
> might be stupid or very hard to implement):
> 1. Allow specifying *.beam (i.e. wildcards) in output part of a rule
> (automatically add outputs (detected by fuse) into DAG). Which is
> similar implementation wise to the latest 'groups' feature.
I tried to do something like this a while ago, where you would specify
a 'dynamic bin' as the output instead of files. Then it would attach
all generated files to this bin. The problems that I couldn't figure
out how to solve were:
1) What should tup do when a generated file would be processed by the
Tupfile? Eg: my command creates a .c file, and I have a "foreach *.c"
rule.
2) What should tup do when a generated file overwrites a ghost node?
Ghosts are created in the database whenever a command attempts to
access a file but fails. These files would affect the build when they
are created, so when this happens tup would somehow need to restart
the build and load a new DAG knowing that the file now exists.
In short, I don't think it is going to be an easy fix to allow
commands to run with unspecified outputs.
> 2. Provide a way to read outputs from file.
I'm not sure what you mean here - can you clarify?
> 3. Allow specifying shell commands in output part of a rule.
> 4. Allow specifying shell commands in input part of a rule.
> 5. Eval shell commands in place. When we specify 'FILES=`find src/
> *.ex`' current implementation replaces every $(FILES) reference with
> '`find src/*.ex`'. If we would have an option to evaluate them (or
> call external scripts) on parse stage we could use variables in input
> or output part of a rule.
> 6. Fix 1.1 or (1.2 and 1.3) (see above).
I think 3,4,5 are better handled by the run-scripts. If 1.1 is
addressed would that be sufficient?
> 7. Provide additional hooks for run files of different kind.
> 7.1. When foreach rule detects new files (which match with given
> regexp) in previously scanned directory we call user provided script.
> 7.2. When fuse detects write to unspecified output we call user
> provided script.
Right now tup runs in very well-defined stages. The parser stage reads
the build description, and generates command strings in the DAG. The
updater stage, where commands actually run and create their outputs,
loads a fixed part of the DAG and then walks through it. I think
adding the ability for the updater stage to jump back to the parser
stage and apply rules to generate new commands in the currently
running DAG would add a lot of complexity and be very difficult to get
right.
Please try the latest master and let me know if that helps at all in
explicitly specifying the outputs for elixirc. It seemed to work well
in the simple example I tried, but this is just another language I
have no experience with so I may be missing the bigger picture :)
Thanks,
-Mike