for some static program analyses it is crucial to recognize valid
paths of the control flow graph taken while executing the code.
if (x < 1)
x = 1;
if (x > 10)
x = 10;
For this example code, it's obvious that a path through both
then-parts of the conditions is not feasible. Such invalid combination
often come up from dependencies between variable assignments (x=1) and
values of branch conditions (if(x>10)). Please also note that this is
not "dead code" since each of the then-parts can be executed depending
on the value of x before the first condition.
I was wondering which compiler analyses might recognized such invalid
combinations of paths. Any ideas?
Cheers,
Tim
That is more complicated than you might think. Analyzing anything more
than the most trivial cases will require a sophisticated logic solver
(which can not guarantee an answer in all cases). But here's the gist
of it (I'm assuming a C syntax):
assert( true );
bool firstScope = false;
assert( !firstScope );
if (x < 1) {
assert( !firstScope && x < 1 );
firstScope = true;
assert( firstScope && x < 1 );
x = 1;
assert( firstScope && x== 1 );
}
assert( (firstScope && x == 1) || (x >= 1 && !firstScope) );
if (x > 10)
assert( x > 10 && ((firstScope && x == 1) || (x >= 1 && !
firstScope)) );
// |
// V
assert( x > 10 && !firstScope);
x = 10;
}
I've changed several things in your example. For convenience, I've
introduced a boolean called firstScope that indicates if you've
visited the (x<1) scope. The goal here is to prove that (!firstScope)
within the (x>10) scope. I've used assert statements to show the proof
outline. Since you haven't given any precondition, we'll start with
the precondition (true) (the weakest possible condition). Then you
work your way down as you calculate the strongest postcondition of
each statement, given its precondition.
Note that calculating the strongest postcondition of a statement
usually requires you to keep track of many temporary logical
variables, but I've left them out for clarity, since we won't need
them in this simple example. I hope the simplified rules for finding
the strongest preconditions are clear from the code. I'm not sure how
I would explain them formally. But if they're unclear, I can try to
give an informal explanation.
As you can see in the proof outline, (!firstScope) is true upon
entering the (x>10) scope.
If there's a simpler way, I don't know it.
> for some static program analyses it is crucial to recognize valid
> paths of the control flow graph taken while executing the code.
>
> if (x < 1)
> x = 1;
> if (x > 10)
> x = 10;
[...]
> I was wondering which compiler analyses might recognized such invalid
> combinations of paths. Any ideas?
Intuitively I'd partition the values of x into <1, 1..10 and >10, based
on the tested conditions. Then subdivide these ranges, when required by
modifications of the value in some branch (this will not happen in your
example). Finally I'd check whether for every partition a single path is
taken, what again is true in your example. If not, the analysis may
deserve more table space for the possible pathes. Another inspection of
the ever visited BB's, in all pathes, will reveal dead code - if this is
what you want to know. Otherwise I don't understand what an "invalid"
path (never taken) should be useful to know. Branch and bound (wikipedia)?
A more formal approach has already been posted.
DoDi
> if (x > 10)
> assert( x > 10 && ((firstScope && x == 1) || (x >= 1 && !
> firstScope)) );
> // |
> // V
> assert( x > 10 && !firstScope);
> x = 10;
> }
A question on this simplification after "->". What type of rules to
do apply to infer that "
((firstScope && x == 1) || (x >= 1 && !> firstScope)) )"
becomes
"!firstScope" ?
> But if they're unclear, I can try to give an informal explanation.
That would be fine.
In general, how do you call this type of equations and their
solving? Are there any known approaches for that?
Will this be not complex for real-world applications that might have
a huge number of paths?
> Another inspection of
> the ever visited BB's, in all pathes, will reveal dead code - if this is
> what you want to know.
This is an information about the code that is independent of the calling
context since these BB's are never reached during program execution.
> Otherwise I don't understand what an "invalid"
> path (never taken) should be useful to know. Branch and bound
> (wikipedia)?
The problem to know which paths are taking in which context are
for example crucial for time estimations based on a static program
execution.
Regards,
Tim
> > if (x > 10)
> > assert( x > 10 && ((firstScope && x == 1) || (x >= 1 && !
> > firstScope)) );
> > // |
> > // V
> > assert( x > 10 && !firstScope);
> > x = 10;
> > }
>
> A question on this simplification after "->". What type of rules to
> do apply to infer that
> "((firstScope && x == 1) || (x >= 1 && ! firstScope)) )"
> becomes "!firstScope" ?
Ah, but you forget that we also know that "x > 10". I will give the
intermediate steps of the simplification:
x > 10 && ((firstScope && x == 1) || (x >= 1 && !firstScope))
[&& distributivity]
(x > 10 && firstScope && x == 1) || (x > 10 && x >= 1 && !firstScope)
["x > 10 && x == 1" = "false"]
(false && firstScope) || (x > 10 && x >= 1 && !firstScope)
["x > 10" implies "x >= 1"]
(false && firstScope) || (x > 10 && !firstScope)
["false && B" = "false"]
false || (x > 10 && !firstScope)
["false || B" = "B"]
x > 10 && !firstScope
It's simplifications like this, and conclusions you can draw from it, that
require a sophisticated logic solver if you want to do this analysis
automatically. And such solvers cannot guarantee an answer and might run
indefinitely.
> > But if they're unclear, I can try to give an informal explanation.
>
> That would be fine.
I'm not an expert (yet), but I'll try to explain. You can only use the
simple version of the rules if no variable has any relation to any other
variable. Like in this example, x only has a relation with literal integers
and firstScope also stands on its own in the logical conjunctions. This is
a rule of thumb I came up with and I wouldn't count on it in the general
case. But the simple rules I used work like this:
spc(statement, pre) returns the strongest postcondition of statement, given
precondition pre.
wo(cond, a) returns "cond", but with all constraints on "a" removed.
For assignment:
spc("a = b;", pre) = ( wo(pre, a) && a == b )
For if-statements:
spc("if (c) S1; else S2;", pre) =
( spc(S1, pre && c) || spc(S2, pre && !c) )
For no-op (the empty operation):
spc(";", pre) = pre
For if-statements without else (using "else no-op;"):
spc("if (c) S1;", pre) = ( spc(S1, pre && c) || (pre && !c) )
There are similar rules for while-loops and such, but they're more
complicated.
> In general, how do you call this type of equations and their
> solving?
Not sure. Possibly "Strongest Postcondition Logic".
> Are there any known approaches for that?
Yes. But in general, you can't use the simple rules I just showed you. See
this example:
assert(1 <= a && a <= 10);
b = a + 1;
assert(1 <= a && a <= 10 && b == a + 1);
a = 20;
assert( ??? );
What's the strongest postcondition here? Following the simple rules, it
would either be (a == 20) or maybe (a == 20 && b == a + 1). Both are wrong.
The first gives no guarantees about b at all and the second gives the wrong
guarantee, since it uses the new value of a where it should use the old
one. The correct postcondition would be:
assert(a = 20 && 1 <= a_old && a_old <= 10 && b == a_old + 1);
You see? a_old is a logical variable I introduced to complete the assertion
correctly. A logical variable does not appear in the implementation and
only exists for the proof.
Have fun!
--
Michiel
Note that you can calculate edge dominators in the same way you do block
dominators, and you can do both calculations at the same time. Note
that, in general, edge dominators form a forest and not a tree as the
block dominators do.
jeff
Tim Frink wrote:
> Hi,
>
> for some static program analyses it is crucial to recognize valid
> paths of the control flow graph taken while executing the code.
>
> if (x < 1)
> x = 1;
> if (x > 10)
> x = 10;
>
> For this example code, it's obvious that a path through both
> then-parts of the conditions is not feasible.
--
---------------------------------------------------------------------
= Jeff Kenton http://home.comcast.net/~jeffrey.kenton =
---------------------------------------------------------------------