An optimization pass is now included in Nemerle (optionally)

5 views
Skip to first unread message

Paweł Różański

unread,
Jan 2, 2009, 3:37:41 PM1/2/09
to nemer...@googlegroups.com
Hello,

As I'm messing around http://nemerle.org/Open_projects#Optimizations,
it's finally time to share.

I have committed a ncc/optimization dir, and tuned Makefile for it.
You can use optimizer by calling ncc.exe -O.
It mutates(tries to optimize) TypedTree between Typer3 & Typer4 phase.

I'll present an simple, biased & infantile example:

#pragma indent

public class A
static public IsEqual(_ : int, _ : int) : bool
| (a, b) when (a == b) =>
true
| _ =>
false

static public GetList() : list[int*int]
[(1,1), (4,2), (5,10),(3,3)]

foreach (i in A.GetList())
System.Console.WriteLine(A.IsEqual(i))


Nemerle produces tuple for matching more than one expression:

.method public static hidebysig
default bool IsEqual (int32 _N_u1697, int32 _N_u1698) cil managed
{
// Method begins at RVA 0x20f4
// Code size 37 (0x25)
.maxstack 3
.locals init (
int32 V_0,
int32 V_1,
valuetype [Nemerle]Nemerle.Builtins.Tuple`2<int32,
int32> V_2)
IL_0000: ldarg.0
IL_0001: ldarg.1
IL_0002: newobj instance void valuetype
[Nemerle]Nemerle.Builtins.Tuple`2<int32, int32>::.ctor(!0, !1)
IL_0007: stloc.2
IL_0008: ldloc.2
IL_0009: ldfld !0 valuetype
[Nemerle]Nemerle.Builtins.Tuple`2<int32,int32>::Field0
IL_000e: stloc.1
IL_000f: ldloc.2
IL_0010: ldfld !1 valuetype
[Nemerle]Nemerle.Builtins.Tuple`2<int32,int32>::Field1
IL_0015: stloc.0
IL_0016: ldloc.1
IL_0017: ldloc.0
IL_0018: bne.un IL_0023

IL_001d: ldc.i4.1
IL_001e: br IL_0024

IL_0023: ldc.i4.0
IL_0024: ret
} // end of method A::IsEqual

However we know that tuples are immutable, pretty functional
constructs. Optimizer does copy propagation[1]. Knowing that tuple is
immutable,
optimized version of that function looks like:

// method line 2
.method public static hidebysig
default bool IsEqual (int32 _N_u1697, int32 _N_u1698) cil managed
{
// Method begins at RVA 0x20f4
// Code size 15 (0xf)
.maxstack 8
IL_0000: ldarg.0
IL_0001: ldarg.1
IL_0002: bne.un IL_000d

IL_0007: ldc.i4.1
IL_0008: br IL_000e

IL_000d: ldc.i4.0
IL_000e: ret
} // end of method A::IsEqual

Note, that after Typer3 stage there are no matching constructs
(already translated into ifs), no macros (already expanded). Which
brings us to another part, main code of the example.

One foreach macro translates to these local values:

.method public static hidebysig
default void Main () cil managed
{
// Method begins at RVA 0x2174
.entrypoint
// Code size 78 (0x4e)
.maxstack 4
.locals init (
class [Nemerle]Nemerle.Core.list`1<valuetype
[Nemerle]Nemerle.Builtins.Tuple`2<int32, int32>> V_0,
class [Nemerle]Nemerle.Core.list`1<valuetype
[Nemerle]Nemerle.Builtins.Tuple`2<int32, int32>> V_1,
valuetype [Nemerle]Nemerle.Builtins.Tuple`2<int32,
int32> V_2,
valuetype [Nemerle]Nemerle.Builtins.Tuple`2<int32,
int32> V_3,
valuetype [Nemerle]Nemerle.Builtins.Tuple`2<int32,
int32> V_4)

I will not go into details, but there are local values used as
iterators, for splitting tail from head and so on. Optimizer knows
that list are immutable too, after copy propagation and register
allocation [2] locals looks:

.method public static hidebysig
default void Main () cil managed
{
// Code size 69 (0x45)
.maxstack 5
.locals init (
class [Nemerle]Nemerle.Core.list`1<valuetype
[Nemerle]Nemerle.Builtins.Tuple`2<int32, int32>> V_0,
valuetype [Nemerle]Nemerle.Builtins.Tuple`2<int32,
int32> V_1)

I think this is pretty neat, we can have generic macros and their
writer doesn't have to think about resulting code. Consider macro's
hygiene:

foreach (i in A.GetList())
System.Console.WriteLine(A.IsEqual(i))
foreach (i in A.GetList())
System.Console.WriteLine(A.IsEqual(i))

From the point of macro writer - local values are independent, macro
expansion generates:

.locals init (
class [Nemerle]Nemerle.Core.list`1<valuetype
[Nemerle]Nemerle.Builtins.Tuple`2<int32, int32>> V_0,
class [Nemerle]Nemerle.Core.list`1<valuetype
[Nemerle]Nemerle.Builtins.Tuple`2<int32, int32>> V_1,
valuetype [Nemerle]Nemerle.Builtins.Tuple`2<int32,
int32> V_2,
valuetype [Nemerle]Nemerle.Builtins.Tuple`2<int32,
int32> V_3,
valuetype [Nemerle]Nemerle.Builtins.Tuple`2<int32,
int32> V_4,
class [Nemerle]Nemerle.Core.list`1<valuetype
[Nemerle]Nemerle.Builtins.Tuple`2<int32, int32>> V_5,
class [Nemerle]Nemerle.Core.list`1<valuetype
[Nemerle]Nemerle.Builtins.Tuple`2<int32, int32>> V_6,
valuetype [Nemerle]Nemerle.Builtins.Tuple`2<int32,
int32> V_7,
valuetype [Nemerle]Nemerle.Builtins.Tuple`2<int32,
int32> V_8,
valuetype [Nemerle]Nemerle.Builtins.Tuple`2<int32,
int32> V_9)

In resulting code however, we know that these variables could be
reused, so again ... after optimization it looks like:
.locals init (
class [Nemerle]Nemerle.Core.list`1<valuetype
[Nemerle]Nemerle.Builtins.Tuple`2<int32, int32>> V_0,
valuetype [Nemerle]Nemerle.Builtins.Tuple`2<int32,
int32> V_1)

Ahh runtime can't optimize everything for us. It doesn't know about
tuples & list. Even what it can do - it costs aplication start up
time, so cleaner (shorter) code counts. Code without tuples is
considerably faster.

Of course, in real world application (compiler) savings are very small
but if you are witting a code which is dominated by matching & lists -
optimizer should help.

I know that my post is rather spaghetti-like. If someone wants more
information - just ask on the mailing list. I'll gladly answer.

[1] http://en.wikipedia.org/wiki/Copy_propagation
[2] http://en.wikipedia.org/wiki/Register_allocation (there are no
'registers' in CIL, but we can assume that local values mimic them)

Take care,
--
Paweł Różański

Dmitry Ivankov

unread,
Jan 2, 2009, 5:33:46 PM1/2/09
to nemer...@googlegroups.com


2009/1/3 Paweł Różański <pawel.r...@gmail.com>

Hello,

As I'm messing around http://nemerle.org/Open_projects#Optimizations,
it's finally time to share.

I have committed a ncc/optimization dir, and tuned Makefile for it.
You can use optimizer by calling ncc.exe -O.
It mutates(tries to optimize) TypedTree between Typer3 & Typer4 phase.
Unfortunately it is not stable. 
Ncc won't pass testsuite if tests are compiled with -O
Not too big deal though, they won't with -g flag too :)), but that is a first place to catch errors in your implementation.
Also there are not too much comments, no benchmarks or optimization stats, no unittests :(
So big thanks for the experiment, but now you'll have to work on what's beyound your MSc - fixing (at least known) bugs))

Paweł Różański

unread,
Jan 3, 2009, 6:13:47 AM1/3/09
to nemer...@googlegroups.com
On 02/01/2009, Dmitry Ivankov <divan...@gmail.com> wrote:
> Unfortunately it is not stable.
> Ncc won't pass testsuite if tests are compiled with -O
> Not too big deal though, they won't with -g flag too :)), but that is a
> first place to catch errors in your implementation.
> Also there are not too much comments, no benchmarks or optimization stats,
> no unittests :(
> So big thanks for the experiment, but now you'll have to work on what's
> beyound your MSc - fixing (at least known) bugs))
Some of these tests were created after my code - so, I'm actually glad
that they catch something more.

I will look into tests later. Some of these tests will fall anyway
(after optimization, TypedTree could be quite different - so e.g. some
of warnings can be ruled out)

As optimization turned out not so easy for me - my msc is in polish:).
I'll translate the 'benchmark' part, but now is the time to prepare
for some exams and so..

VladD2

unread,
Jan 12, 2009, 7:26:47 AM1/12/09
to nemer...@googlegroups.com
2009/1/2 Paweł Różański <pawel.r...@gmail.com>:
>...

Very nice! Thanks!

> Of course, in real world application (compiler) savings are very small
> but if you are witting a code which is dominated by matching & lists -
> optimizer should help.

The main performance penalty in Functional Language are creating using
of closure and using of lists.

I think we should implement something like supercompilation?
http://www.sm.luth.se/~pj/papers/scp/pscp.pdf

You can implement this algorithm?

Reply all
Reply to author
Forward
0 new messages