PASM code analysis: detecting unreachable code.

0 views
Skip to first unread message

Clinton A. Pierce

unread,
May 20, 2003, 12:46:56 PM5/20/03
to perl6-i...@perl.org
I've just finished re-writing BASIC's expression evaluator (for speed, and
for fun -- I need to get out more) and now am integrating the code back
into the runtime PASM. My problem is, I moved a lot of the complexity from
the runtime (no longer a stack machine to evaluate) into the compiler but I
kept many of the old routines for other reasons. The runtime could get a
lot smaller, except now I've got junk code left behind.

So the question is, does anyone know a clever hack to see if a particular
branch is now isolated never taken? Things like:

MAIN: do this
and that
and this
bsr FOO
ret

UNUSED:
this code
never reached
ret

FOO: put code here
and here
FALSEPOS:
ret

And tell me that UNUSED is unreachable? Or at least it's never branched
to? I'd even be willing to sort through those
labeled-but-never-branched-to instructions (like FALSEPOS)...

Leopold Toetsch

unread,
May 21, 2003, 3:11:08 AM5/21/03
to Clinton A. Pierce, perl6-i...@perl.org
In perl.perl6.internals, you wrote:

> So the question is, does anyone know a clever hack to see if a particular
> branch is now isolated never taken? Things like:

[ code rewritten ]

MAIN: print "main\n"
bsr FOO
end

UNUSED:
print "never reached\n"
ret

FOO: print "in foo\n"
FALSEPOS:
ret

$ imcc -Op1 -d60 unu.pasm
block 2 label UNUSED deleted
block 4 label FALSEPOS deleted
found dead block 2
ins deleted (dead block) print "never reached\n"
ins deleted (dead block) ret
main
in foo

leo

Clinton A. Pierce

unread,
May 21, 2003, 12:29:40 PM5/21/03
to l...@toetsch.at, perl6-i...@perl.org
At 09:11 AM 5/21/2003 +0200, Leopold Toetsch wrote:
>In perl.perl6.internals, you wrote:
>
> > So the question is, does anyone know a clever hack to see if a particular
> > branch is now isolated never taken? Things like:
>
>[ code rewritten ]
>
>$ imcc -Op1 -d60 unu.pasm
>block 2 label UNUSED deleted
>block 4 label FALSEPOS deleted
>found dead block 2
> ins deleted (dead block) print "never reached\n"
> ins deleted (dead block) ret
>main
>in foo

Hmm.. There seems to be an issue with this in a non-trivial case. Using a
cvs update from today, along with
some changes I made with BASIC, here's a demo of what looks like a bug:

>C:\projects\perl\parrot\languages\BASIC\compiler>cat > test.bas
>print "Hello World"
>C:\projects\perl\parrot\languages\BASIC\compiler>perl compile.pl test.bas
>C:\projects\perl\parrot\languages\BASIC\compiler>..\..\imcc\imcc.exe -Op1
>-d60 TARG_test.pasm
>opt1 add I14, I14, I0 => add I14, I0
>opt1 sub I2, I2, I4 => sub I2, I4
>opt1 mul I12, I12, 5 => mul I12, 5
>opt1 add I12, I12, 1 => add I12, 1
>opt1 div N0, N0, 65536.0 => div N0, 65536.0
>opt1 div N0, N0, 65536.0 => div N0, 65536.0
>opt1 add N2, N2, N1 => add N2, N1
>opt1 mul N2, N2, 0.5 => mul N2, 0.5
>opt1 mul N3, N3, -1.0 => mul N3, -1.0
>opt1 add I3, I3, I2 => add I3, I2
>opt1 add I5, I5, I1 => add I5, I1
>opt1 add I5, I5, I0 => add I5, I0
>opt1 sub I0, I0, 8 => sub I0, 8
>opt1 add I0, I0, I1 => add I0, I1
>opt1 add I0, I0, I1 => add I0, I1
>if_branch bsr ... NEWFRAME
>if_branch ne ... USERFUNC
>if_branch ne ... ERRNOSUCHFUNC
>if_branch eq ... ANOTHERFRAME_USER
>if_branch eq ... SIGILSTRING
>if_branch ge ... XCASE_OK1
>if_branch le ... XCASE_SHIFT
>if_branch eq ... UF_LOADBAREVAR
>if_branch ne ... AL_GOTONE
>if_branch ne ... MEM_NEED_OK
>if_branch gt ... CASEN1
>if_branch eq ... CHOMPIT
>if_branch le ... CHOMPOK
>if_branch eq ... DEBUGGER_SET
>error:imcc:sort_loops: loop 1 contains first but notlast of outer loop 0
>C:\projects\perl\parrot\languages\BASIC\compiler>

imcc seems happy to assemble the PASM otherwise, and it even runs fine.

I know I've got *lots* of unreached code in the various RT_*.pasm files
(BASIC runtime, gets included in every program), could this be throwing it?


Leopold Toetsch

unread,
May 21, 2003, 1:35:58 PM5/21/03
to Clinton A. Pierce, perl6-i...@perl.org
Clinton A. Pierce wrote:

> At 09:11 AM 5/21/2003 +0200, Leopold Toetsch wrote:
> Using a cvs update from today, along with
> some changes I made with BASIC, here's a demo of what looks like a bug:

>> error:imcc:sort_loops: loop 1 contains first but notlast of outer loop 0
>> C:\projects\perl\parrot\languages\BASIC\compiler>

> imcc seems happy to assemble the PASM otherwise, and it even runs fine.

The optimizer has to work a lot on your code because its mainly one big
bunc of code (compilation unit). The optimizer of course has some errors
too, but finding them in 3290 lines of PASM code is out of my skills.

So please first have a look at some imcc rules, if you want to have imcc
optimize that code:
- local labels don't have an underscore in front
- global labels have
- use some form of calling conventions[1], i.e. stack calling conventions:

bsr _PLATFORM_SETUP
...

_PLATFORM_SETUP:
saveall
...
restoreall
ret

[1] the current only one imcc supports ;-)

You might have a look at your one-liner in PASM and what the CFG
(control flow graph) looks like:
$ ../../imcc/imcc -d10 -Op1 TARG_test.pasm >1 2>&1
$ less 1
/Dumping the CFG

Of course imcc should be able to optimize such a piece of PASM but
currently has some flaws WRT subroutines and ret statements.

leo

Leopold Toetsch

unread,
May 22, 2003, 3:07:34 AM5/22/03
to Leopold Toetsch, perl6-i...@perl.org
In perl.perl6.internals, Clinton A. Pierce wrote:

>>> error:imcc:sort_loops: loop 1 contains first but notlast of outer loop 0

I changed this to a non fatal debug message. The hello world now runs as
well as mandel.bas.

One additional note two my last reply: changing suroutine labels to
global labels is of course not enough, you should really target PIR
(IMCC) code not PASM to get all benefits of optimization. This would
probably allow for a much faster expression evaluator by using virtuell
registers a la "$I100 = $I101 + $I102".

leo

Clinton A. Pierce

unread,
May 24, 2003, 8:41:08 PM5/24/03
to Leopold Toetsch, perl6-i...@perl.org
At 07:35 PM 5/21/2003 +0200, Leopold Toetsch wrote:
>>imcc seems happy to assemble the PASM otherwise, and it even runs fine.
>
>The optimizer has to work a lot on your code because its mainly one big
>bunc of code (compilation unit). The optimizer of course has some errors
>too, but finding them in 3290 lines of PASM code is out of my skills.
>
>So please first have a look at some imcc rules, if you want to have imcc
>optimize that code:
>- local labels don't have an underscore in front
>- global labels have
>- use some form of calling conventions[1], i.e. stack calling conventions:
>
> bsr _PLATFORM_SETUP
> ...
>
> _PLATFORM_SETUP:
> saveall
> ...
> restoreall
> ret
>
>[1] the current only one imcc supports ;-)

Okay, I've looked at everything I could gather on IMCC from the source tree
and I've got a list of questions before I can IMCC-ize BASIC
completely. (It *is* marching in that direction, slowly.) Consider these
the questions of a raving clueless lunatic, and calmly point me to
documentation if I've missed it somewhere.

[I realize that I horribly over-complicated some aspects of BASIC and it's
slower than it needs to be. I'm doing *way* too much at runtime and am
trying to solve that. Speed is my motivation for IMCC-izing and
simplifying everything. I'd like a 3-ply chess game in BASIC to be
playable, dammit.]

* When dealing with an expression that I've determined needs to have
intermediate values stored like this (a simplistic example):

1+(2*3+4*5)

I'd naively interpret this as:

2 * 3 -> store result on stack
4 * 5 -> becomes operand 1
restore operand 2 from stack
multiply (result becomes operand 1)
set operand 2 to "1"
add (result becomes operand 1)
return operand 1 as the expression result

What's the approved way of implementing this "stack"? Which stack should I
use? Right now, the expression parser allocates a PerlArray and just
pushes the result. This doesn't get weird, of course, until in the middle
of an expression you need to call a function with another scope, in which
means I have to get another PerlArray; or when there's functions with
multiple arguments. Am I doing this right?

* BASIC sometimes doesn't lend itself to "compilation units" very
well. With modern-day Q-BASIC function calls:

function myfunc$(a, b)
myfunc$="results!"
end function
myfunc$(3.14, 9999)

Sure, but with:

100 gosub 999
101 end
999 rem
gosub 1000
gosub 1100
gosub 1200
1000 print "Hello"
1100 print "World"
1200 print "!"
return

Is this a compilation unit? There's no scope or anything. Nothing to
save, restore, or anything. No "one entry" point. Is this going to fly
well with IMCC?

* BASIC also has some odd scoping rules. In most programs, everything's
global. Within subroutines, everything's private unless it's been declared
as shared in the outer scope, or explicitly passed in.

sub mysub
print i # My own i, =0
print foo # foo from outer scope
end sub
function dothis(a)
dothis=100*a # a is local, and has z's value
end function

shared foo
i=50
mysub()
z=10
print dothis(z)

So would a BASIC program essentially want to declare *everything* local
(given how IMCC works) and just those things that are explicitly declared
SHARED would be declared as global in IMCC? Is IMCC going to be happy with
that many globals (or that many locals)? Is there anything I can do to
make it happier?

* Are BASIC arrays my own problem? (I can't see how it'd be possible for
IMCC to handle those in any way.) Right now I'm using a PerlHash/PerlArray
structure to manage those, and lookups are handled much in the same way
that function calls are performed.

* Is there a nice way to handle mixed-precision operations? Will there
be? Or is this just something I need to watch?

.local int a
.local float b
.local int c
a=1
b=3.14
c=a*b
end

* What's your advice on when to go through the whole calling convention stuff:

.local int r
.arg y
.arg x
call _foo # (int)foo(x,y)
.result r

.sub _foo
saveall
.local int r
.param int a
.param int b
r=a*b
.return r
restoreall
ret
.end

and when to just slug some things onto a stack or into some registers and
bsr there. I've got a few examples, your thoughts?

utility functions that the runtime will need (array lookups, etc...)
builtin functions that the runtime calls ( int, abs, str$, mid$,
etc...)
user-defined functions
user-defined subroutines

That's it for now. But its enough to get me started. Thank you!


Luke Palmer

unread,
May 24, 2003, 9:30:31 PM5/24/03
to cli...@geeksalad.org, l...@toetsch.at, perl6-i...@perl.org
> At 07:35 PM 5/21/2003 +0200, Leopold Toetsch wrote:
> >>imcc seems happy to assemble the PASM otherwise, and it even runs fine.
> >
> >The optimizer has to work a lot on your code because its mainly one big
> >bunc of code (compilation unit). The optimizer of course has some errors
> >too, but finding them in 3290 lines of PASM code is out of my skills.
> >
> >So please first have a look at some imcc rules, if you want to have imcc
> >optimize that code:
> >- local labels don't have an underscore in front
> >- global labels have
> >- use some form of calling conventions[1], i.e. stack calling conventions:
> >
> > bsr _PLATFORM_SETUP
> > ...
> >
> > _PLATFORM_SETUP:
> > saveall
> > ...
> > restoreall
> > ret
> >
> >[1] the current only one imcc supports ;-)
>
> Okay, I've looked at everything I could gather on IMCC from the source tree
> and I've got a list of questions before I can IMCC-ize BASIC
> completely. (It *is* marching in that direction, slowly.) Consider these
> the questions of a raving clueless lunatic, and calmly point me to
> documentation if I've missed it somewhere.

I'll answer what I can... don't trust me too much though :-)

> [I realize that I horribly over-complicated some aspects of BASIC and it's
> slower than it needs to be. I'm doing *way* too much at runtime and am
> trying to solve that. Speed is my motivation for IMCC-izing and
> simplifying everything. I'd like a 3-ply chess game in BASIC to be
> playable, dammit.]
>
> * When dealing with an expression that I've determined needs to have
> intermediate values stored like this (a simplistic example):
>
> 1+(2*3+4*5)
>
> I'd naively interpret this as:
>
> 2 * 3 -> store result on stack
> 4 * 5 -> becomes operand 1
> restore operand 2 from stack
> multiply (result becomes operand 1)
> set operand 2 to "1"
> add (result becomes operand 1)
> return operand 1 as the expression result

You're thinking of this as a stack machine -- it's not. It's a
register machine.

2 * 3 -> store into $I0
4 * 5 -> store into $I1
$I0 + $I1 -> store into $I2
1 + $I2 -> store into result

Imcc will handle the spilling for you.

> What's the approved way of implementing this "stack"? Which stack should I
> use? Right now, the expression parser allocates a PerlArray and just
> pushes the result. This doesn't get weird, of course, until in the middle
> of an expression you need to call a function with another scope, in which
> means I have to get another PerlArray; or when there's functions with
> multiple arguments. Am I doing this right?

No. Use Imcc's .arg and .result. They simplify things.

> * BASIC sometimes doesn't lend itself to "compilation units" very
> well. With modern-day Q-BASIC function calls:
>
> function myfunc$(a, b)
> myfunc$="results!"
> end function
> myfunc$(3.14, 9999)
>
> Sure, but with:
>
> 100 gosub 999
> 101 end
> 999 rem
> gosub 1000
> gosub 1100
> gosub 1200
> 1000 print "Hello"
> 1100 print "World"
> 1200 print "!"
> return

Ouch. I used that language?

Anyway, I don't think IMCC has a problem with jumping into the middle
of a compilation unit. But, you might just have that program as one
big compilation unit, since, as you say, there's no scope or
anything.

> Is this a compilation unit? There's no scope or anything. Nothing to
> save, restore, or anything. No "one entry" point. Is this going to fly
> well with IMCC?
>
> * BASIC also has some odd scoping rules. In most programs, everything's
> global. Within subroutines, everything's private unless it's been declared
> as shared in the outer scope, or explicitly passed in.
>
> sub mysub
> print i # My own i, =0
> print foo # foo from outer scope
> end sub
> function dothis(a)
> dothis=100*a # a is local, and has z's value
> end function
>
> shared foo
> i=50
> mysub()
> z=10
> print dothis(z)
>
> So would a BASIC program essentially want to declare *everything* local
> (given how IMCC works) and just those things that are explicitly declared
> SHARED would be declared as global in IMCC? Is IMCC going to be happy with
> that many globals (or that many locals)? Is there anything I can do to
> make it happier?

I'm not sure. Maybe you could use the C<global> ops for globals, and
use IMCC .locals for locals.

> * Are BASIC arrays my own problem? (I can't see how it'd be possible for
> IMCC to handle those in any way.) Right now I'm using a PerlHash/PerlArray
> structure to manage those, and lookups are handled much in the same way
> that function calls are performed.

Think so.

> * Is there a nice way to handle mixed-precision operations? Will there
> be? Or is this just something I need to watch?
>
> .local int a
> .local float b
> .local int c
> a=1
> b=3.14
> c=a*b
> end

Well, BASIC is a little bit typed, right? In that, you know whether a
particular variable is a string or a number at compile-time. So just
add the extra ops for conversion if the operands don't match.

> * What's your advice on when to go through the whole calling convention stuff:
>
> .local int r
> .arg y
> .arg x
> call _foo # (int)foo(x,y)
> .result r
>
> .sub _foo
> saveall
> .local int r
> .param int a
> .param int b
> r=a*b
> .return r
> restoreall
> ret
> .end
>
> and when to just slug some things onto a stack or into some registers and
> bsr there. I've got a few examples, your thoughts?
>
> utility functions that the runtime will need (array lookups, etc...)
> builtin functions that the runtime calls ( int, abs, str$, mid$,
> etc...)
> user-defined functions
> user-defined subroutines

Hmm, personally, I'd use the imcc .param/.return/etc. for everything.
It's not the *real* calling conventions for now, but when imcc gets
around to supporting those, you'll have already complied.

Luke

Reply all
Reply to author
Forward
0 new messages