short-circuiting build

45 views
Skip to first unread message

Slawomir Czarko

unread,
May 24, 2012, 1:26:46 PM5/24/12
to tup-...@googlegroups.com
Are the any plans for implementing short-circuiting build which has been mentioned before in different posts?

- Slawomir

Mike Shal

unread,
May 24, 2012, 2:49:27 PM5/24/12
to tup-...@googlegroups.com
I don't have immediate plans to do so. I would like to implement
short-circuiting for the parsing stage - I think this would be fairly
easy to do, and would allow tup to skip parsing dependent Tupfiles if
a Tupfile doesn't create new or remove old outputs. For
short-circuiting the command execution stage, it is a little more
difficult because tup would need to generate a checksum on the output
files as they are created, which adds extra processing. This would
slow down the normal use case of compiling a file with changes in it,
and I don't know if it is worth the tradeoff to try to speed up the
case of just touching a file. Is this something you run into
regularly? What exactly did you have in mind?

-Mike

Slawomir Czarko

unread,
May 24, 2012, 4:41:34 PM5/24/12
to tup-...@googlegroups.com
On Thursday, May 24, 2012 8:49:27 PM UTC+2, mar...@gmail.com wrote:
On Thu, May 24, 2012 at 1:26 PM, Slawomir Czarko

Wouldn't it be enough to compare timestamps of the output files vs what's in tup DB?

I'd like to use a code generator to generate both cpp and hpp files. Most of the time hpp file wouldn't change, only cpp. Because hpp would be included by other cpp files then if hpp gets overwritten it would trigger unnecessary recompilation of those other cpp files. So I'd prefer if there was a way to update hpp only if the new version is actually different.

- Slawomir

Mike Shal

unread,
May 24, 2012, 8:00:27 PM5/24/12
to tup-...@googlegroups.com
On Thu, May 24, 2012 at 4:41 PM, Slawomir Czarko
<slawomi...@gmail.com> wrote:
> On Thursday, May 24, 2012 8:49:27 PM UTC+2, mar...@gmail.com wrote:
>>
>> On Thu, May 24, 2012 at 1:26 PM, Slawomir Czarko
>> > Are the any plans for implementing short-circuiting build which has been
>> > mentioned before in different posts?
>> >
>> > - Slawomir
>>
>> I don't have immediate plans to do so. I would like to implement
>> short-circuiting for the parsing stage - I think this would be fairly
>> easy to do, and would allow tup to skip parsing dependent Tupfiles if
>> a Tupfile doesn't create new or remove old outputs. For
>> short-circuiting the command execution stage, it is a little more
>> difficult because tup would need to generate a checksum on the output
>> files as they are created, which adds extra processing. This would
>> slow down the normal use case of compiling a file with changes in it,
>> and I don't know if it is worth the tradeoff to try to speed up the
>> case of just touching a file. Is this something you run into
>> regularly? What exactly did you have in mind?
>>
>> -Mike
>
>
> Wouldn't it be enough to compare timestamps of the output files vs what's in
> tup DB?

After re-reading some of the older threads, I forgot the plan was to
move the old outputs out of the way and then just diff the new ones
(rather than doing checksumming). Currently tup deletes the outputs
before running the new command so that they don't influence the build
- it shouldn't be too hard to move them to a temporary directory and
compare them after the command executes (ie: use rename() and then
diff() instead of the current unlink()). Due to the extra overhead of
using a diff() function, this should probably be explicitly enabled in
the command with an ^-flag.

>
> I'd like to use a code generator to generate both cpp and hpp files. Most of
> the time hpp file wouldn't change, only cpp. Because hpp would be included
> by other cpp files then if hpp gets overwritten it would trigger unnecessary
> recompilation of those other cpp files. So I'd prefer if there was a way to
> update hpp only if the new version is actually different.

That makes sense. The only thing I'm concerned about now is why I made
this comment:

In order to make sure the inherited dependencies are consistent,
it is imperative that the command inheriting dependencies is
re-executed in case the command it is inheriting from no longer uses
that input. This is why the test uses a "strongly connected" graph,
which doesn't allow sticky-only links to be used. As a result, this
may mean that transitive dependecies precludes the use of
short-circuiting builds if outputs are unchanged.

(This is in commit b7ee9654b955 - I'll have to try to figure out what
exactly I meant...)

-Mike

Slawomir Czarko

unread,
May 25, 2012, 4:41:18 AM5/25/12
to tup-...@googlegroups.com
On Friday, May 25, 2012 2:00:27 AM UTC+2, mar...@gmail.com wrote:
After re-reading some of the older threads, I forgot the plan was to
move the old outputs out of the way and then just diff the new ones
(rather than doing checksumming). Currently tup deletes the outputs
before running the new command so that they don't influence the build
- it shouldn't be too hard to move them to a temporary directory and
compare them after the command executes (ie: use rename() and then
diff() instead of the current unlink()). Due to the extra overhead of
using a diff() function, this should probably be explicitly enabled in
the command with an ^-flag.
That sounds good.
 
That makes sense. The only thing I'm concerned about now is why I made
this comment:

     In order to make sure the inherited dependencies are consistent,
it is imperative that the command inheriting dependencies is
re-executed in case the command it is inheriting from no longer uses
that input. This is why the test uses a "strongly connected" graph,
which doesn't allow sticky-only links to be used. As a result, this
may mean that transitive dependecies precludes the use of
short-circuiting builds if outputs are unchanged.

(This is in commit b7ee9654b955 - I'll have to try to figure out what
exactly I meant...)


 This sounds more like it's talking about removing an input from a command. In my case the input (generated header file) would still be there but because it wasn't modified any commands which have it as input wouldn't have to be re-executed. Unless of course some other inputs have changed.

- Slawomir

Mike Shal

unread,
Jun 1, 2012, 1:33:26 PM6/1/12
to tup-...@googlegroups.com
I think my concern is with the dependency inheritance, where we have a
Tupfile like so:

a |> cmd1 |> b
b |> cmd2 |> c
c |> cmd3 |> d

And cmd3 also reads from b, but we don't list that as an input even
though it is a generated file. Normally this wouldn't be allowed, but
when it is detected tup tries to build a graph between b and the
inputs to the command (c in this case). If cmd2 no longer uses b, then
we need to get an error in cmd3 since the input isn't specified.
However, if we are doing short-circuiting, then it is possible to
remove the input to cmd2, leave c unchanged, and then we short-circuit
to skip cmd3. Now cmd3 has a dependency on a generated file but no
longer specifies it as an input (an error condition), but tup wouldn't
detect it.

So instead I believe the short-circuiting needs to check:
1) All outputs are the same as the previous run
2) No input links are removed

That should account for this case, whereas just doing part 1) would
not. I'll have to give it a try when I have some time...

-Mike

Slawomir Czarko

unread,
Jun 4, 2012, 7:41:48 AM6/4/12
to tup-...@googlegroups.com
On Friday, June 1, 2012 7:33:26 PM UTC+2, mar...@gmail.com wrote:
I think my concern is with the dependency inheritance, where we have a
Tupfile like so:

a |> cmd1 |> b
b |> cmd2 |> c
c |> cmd3 |> d

And cmd3 also reads from b, but we don't list that as an input even
though it is a generated file. Normally this wouldn't be allowed, but
when it is detected tup tries to build a graph between b and the
inputs to the command (c in this case). If cmd2 no longer uses b, then
we need to get an error in cmd3 since the input isn't specified.
However, if we are doing short-circuiting, then it is possible to
remove the input to cmd2, leave c unchanged, and then we short-circuit
to skip cmd3. Now cmd3 has a dependency on a generated file but no
longer specifies it as an input (an error condition), but tup wouldn't
detect it.

So instead I believe the short-circuiting needs to check:
1) All outputs are the same as the previous run
2) No input links are removed

That should account for this case, whereas just doing part 1) would
not. I'll have to give it a try when I have some time...


What if you leave cmd2 as a step to be executed so it enforces ordering between cmd1 and cmd3 but mark cmd2 as "NOOP" so tup doesn't actually run it?

- Slawomir

Mike Shal

unread,
Jun 4, 2012, 9:39:50 PM6/4/12
to tup-...@googlegroups.com
Hm, that is worth a shot too. I have it in my notes for trying when I
go to implement it.

Thanks,
-Mike
Reply all
Reply to author
Forward
0 new messages