More MS-compatible stack depth calculation

Oleksiy Gapotchenko

unread,

Apr 2, 2009, 9:49:57 PM4/2/09

to mono-cecil

Hi

I found that Cecil makes wrong stack depth calculation for the
following IL code:

.assembly Hello {}
.method public static void Main() cil managed
{
.entrypoint
ldstr "Hello, world!"
br L1
nop
L1:
ldstr "Hello, Cecil!"
br L2
nop
L2:
ldstr "More text"
call void [mscorlib]System.Console::WriteLine(string)
call void [mscorlib]System.Console::WriteLine(string)
call void [mscorlib]System.Console::WriteLine(string)
ret
}

Cecil calculates this as 2. However it's obvious that stack size of
the above code is 3 because it has 3 pushes and 3 pops.

Code above is not verifiable CIL code, and ECMA CIL specification is
very vague regarding such a code. However, the sample above works
successfully with Microsoft .NET framework. Also such code is produced
by some obfuscators and such transformation is called 'control flow
obfuscation'.

It's clear that Microsoft .NET implementation do not clear the stack
on branching instructions. But Mono.Cecil assumes that stack is always
cleared on branching. That's why it may calculate lower max stack size
than it is in reality. It often leads to an exception which says
"Common Language Runtime detected an invalid program" when output
assembly is run.

I've made a change to Cecil that turns off the clearing of the stack
on branching instructions. Then I've tested max size calculations on
verifiable CIL code which is the most common one and produced by
all .NET compilers. Large open-source assemblies have been used for
the test. No changes to max stack values were detected comparing to
original Cecil algorithm.

Then I've created several assemblies with unverifiable CIL code and
tried to apply round-trip procedure with Cecil. Some max stack values
were greater comparing to original Cecil. And the main result - the
assemblies with unverifiable CIL code produced by Cecil have started
to work normally under Microsoft .NET.

Here is the patch:

Index: CodeWriter.cs
===================================================================
--- CodeWriter.cs (revision 130916)
+++ CodeWriter.cs (working copy)
@@ -478,7 +478,6 @@
}

switch (instr.OpCode.FlowControl) {
- case FlowControl.Branch:
case FlowControl.Throw:
case FlowControl.Return:
// next statement is not reachable from this statement, so reset
the stack depth to 0

Regards
Oleksiy

Lotfi Gheribi

unread,

Apr 3, 2009, 5:37:14 AM4/3/09

to mono-...@googlegroups.com

Have a look at this C# code:

bool var0;
int var1 = 3 + (var0?0:1);

This would be translated to :

L0: ldc.i4.3
L1: ldloc.0
L2: brfalse.s L5
L3: ldc.i4.0
L4: br.s L6
L5: ldc.i4.1
L6 : add
L7: stloc.1
L8 :

According to your correction, I assume that the stack level at the
beginning of L6 should be 0, because it's just after a br.s ! But in
fact it's 1 ! (The code provided above is 100% valid, and does work on
mono of course)

I believe the right algorithm should process the code as follows :

Before L0 : 0
==> After L0 : 1

Before L1 : 1
==> After L1 : 2

Before L2 : 2
==> After L2 : 1 ==> Before L5 = 1

Before L3 : 1
==> After L3 : 2

Before L4 : 2
==> ==> Before L6 = 2
But we can't assume anything here about After L4

Before L5 = 1
==> After L5 = 2

Before L6 = 2
==> After L6 = 1

Before L7 = 1
==> After L7 = 0

Correct me if I'm wrong

Oleksiy Gapotchenko

unread,

Apr 3, 2009, 1:52:13 PM4/3/09

to mono-cecil

Hi

Thanks for your nice sample. I'm completely agree with you. I've
checked the sample with two versions of Mono.Cecil.

For the first test I took original Mono.Cecil without a patch. Here
are the funny results:

Before L4 : 2
==> ==> Before L6 = 0

But we can't assume anything here about After L4

For the second test I applied the patch. Here are results:

Before L4 : 2
==> ==> Before L6 = 2
But we can't assume anything here about After L4

As you can see, the original Mono.Cecil algorithm is broken. The only
reason why it is not so critical in the most of applications is that
because this bug is almost always masked by neighbor code flow branch.
In your sample neighbor branch is L2, L5, L6 and it gives stack size
of 2 for L6. But when such neighbor branch is absent then Mono.Cecil
algorithm is going insane because it assumes stack size 0 before L6
and after L4 jump.

Your sample gave me a good understanding that there can be a lot of
neighbor branches and I've found another weak place in the existing
algorithm. It does not respect the possibility of existence of several
parallel neighbor flow branches that can have different stack depths.
It is very uncommon case which is never produced by known .NET
compilers. However such a case is possible in raw CIL and maybe some
future compilers. So I improved algorithm in this area. The
corresponding patch is below.

Index: CodeWriter.cs
===================================================================
--- CodeWriter.cs (revision 130916)
+++ CodeWriter.cs (working copy)

@@ -469,16 +469,15 @@
switch (instr.OpCode.OperandType) {
case OperandType.InlineBrTarget:
case OperandType.ShortInlineBrTarget:
- m_stackSizes [instr.Operand] = current;
- break;
+ UpdateForwardStackSize(instr.Operand, current);
+ break;
case OperandType.InlineSwitch:
foreach (Instruction target in (Instruction []) instr.Operand)
- m_stackSizes [target] = current;
+ UpdateForwardStackSize(target, current);
break;

}

switch (instr.OpCode.FlowControl) {
- case FlowControl.Branch:
case FlowControl.Throw:
case FlowControl.Return:
// next statement is not reachable from this statement, so reset
the stack depth to 0

@@ -490,6 +489,14 @@
instructions.Container.MaxStack = max + 1; // you never know
}

+ void UpdateForwardStackSize(object instr, int newStackSize)
+ {
+ object existingStackSize = m_stackSizes[instr];
+ if (existingStackSize != null)
+ newStackSize = Math.Max((int)existingStackSize, newStackSize);
+ m_stackSizes[instr] = newStackSize;
+ }
+
static int GetPushDelta (Instruction instruction)
{
OpCode code = instruction.OpCode;

Lotfi Gheribi

unread,

Apr 7, 2009, 3:52:00 PM4/7/09

to mono-...@googlegroups.com

Sorry, I didn't have much time to answer last days :-(

I'll try in this post to give more details about how both Mono & .Net
compute stack depth at each instruction.

Mono only supports 100% valid IL code. Mono assumes that the stack
depth at any point can be computed using a single top-down pass. This
is not the case for .Net. Microsoft's implementation needs more than a
single pass for be able to compute the stack depth.

To clarify things, let me give you a simple example

L0 : ldc.i4.3
L1 : br.s L4
L2 : ldc.i4.5
L3 : br.s L5
L4: br.s L2
L5: add
L6 : stloc.0

(This code is not 100% valid, it would not run on Mono, but it works
fine on .Net)

This is how Mono would try to compute stack depth

Before L0: 0
==> Before L1: 1

Before L1:1
==> Before L4 : 1

Before L2 unkown ==> We assume it is 0
==> Before L3 : 1

Before L3 : 1
==> Before L5 : 1

Before L4 : 1
==> Before L2 : 1 ===> ERROR, because we already assumed that Before
L2 is 0 !!!!

Now, this is how .Net would compute stack depth :

Before L0: 0
==> Before L1: 1

Before L1:1
==> Before L4 : 1

Before L4 : 1
==> Before L2 : 1

Before L2 : 1
==> Before L3 : 2

Before L3 : 2
==> Before L5 : 2

Before L5 : 2
==> Before L6 : 1

Before L6 : 1
==> AFTER L6 : 0

I hope this can help you see how it is actually done.

I've tried to make my own corrections to provide a .Net compatible
implementation. It seams to work well :)
I've included a patch file : CodeWriter.patch

The patch includes some other optimizations to avoid the use of
hashtables and int casts : knowing that instruction offsets are not
yet used, I used them as follows:

for (int i = 0; i < instructionCount; i++)
instructions[i].Offset = i;

As you can see, now the Offset property contains the sequential index
of the instruction.
According to my benchmarks, this optimization divides the processing
time by a factor of 2.

PS. Sorry for my potentially bad English

Keith

unread,

Apr 7, 2009, 6:09:54 PM4/7/09

to mono-cecil

Lofti Gheribi wrote:
> This is not the case for .Net. Microsoft's implementation needs more
> than a single pass for be able to compute the stack depth.

Multiple passes are not actually required. As you've shown in your
example, locations after an unconditional branch (or throw, etc) which
have a forward branch to them (like L4) can get their stack contents
from the branch in a single forward pass. They are not a concern.

The complication comes with locations after an unconditional branch
which have only backward references to them (like L2). For clarity/
brevity, let me give such kind of location a name. I'll call it a
"mystery point." The standard says to assume that the stack is empty
at a mystery point, which is what Cecil probably does. But the
standard is WRONG, or more clearly, it is overly restrictive. Why do I
say that?

(Not counting obfuscators) Mystery points occur only with one kind of
high-level language construct: the loop. A loop like this...
Before()
while (Condition())
Body()
After()

...turns into this IL
call Before
br COND
LOOP:
call Body
COND:
call Condition
brtrue LOOP
call After

The mystery point here is of course at label LOOP. What should the
stack be? With languages like C# and VB, a loop can only occur as a
statement, and a statement always begins with a clear stack. So
assuming the stack is clear works fine for those languages -- BUT NOT
FOR ALL.

There are (at least) two cases where a loop can occur with a non-empty
stack in the general case.
1) A loop construct which can return a value
2) The language supports inlining a method

With Case #1, such a loop may be used as a second method argument:
SomeMethod (15, SumOfRange(5,10))
Here, by SumOfRange, I'm referring to a language construct which loops
to add all numbers 5 to 10 without calling any method. This loop
occurs while the first argument (15) is sitting on the stack.

Case #2 means that not every statement begins with an empty stack.
Assume now SumOfRange is an ordinary method which loops between the
numbers given as arguments and returns their sum. But our language
allows demanding that the method's code is emitted inline.
SomeMethod (15, inline SumOfRange(5,10))
The net result is the same -- a loop with a nonempty stack -- just for
a different reason.

As it turns out, for these cases it is possible to determine the stack
with a single forward pass. The rule is simple: assume that the stack
contents at a mystery point is -- drum roll please -- exactly the same
as at the unconditional branch before it. In other words, the loop
only adds to the stack contents temporarily; by the time the loop
ends, the stack is as it was at the beginning. Certainly, it makes
sense that a loop should behave that way. This is what the standard
should say, and it appears to be what .NET is doing.

For C#/VB this rule reduces to "assume the stack is empty" since, as I
showed, their loops and (therefore) mystery points can only appear at
a place where the stack is empty. But not all languages work that way,
and this rule works in the general case.

BTW, this rule works for your example, although it's not a loop, which
I believe is why your example runs correctly on .NET.

My 2 cents,
-- Keith

Lotfi Gheribi

unread,

Apr 7, 2009, 6:35:12 PM4/7/09

to mono-...@googlegroups.com

hy keith,

let's have a look to your 'misterious point' example:

L0: call Before
L1: br COND

LOOP: call Body
COND: call Condition

L2: brtrue LOOP
L3: call After

Here how my algorithm works (& I suppose the .Net does the same) :

Before L0 : 0
==> before L1 : C1 (where C1 is a constant depending on L0 instructions)

Before L1 : C1
==> Before COND : C1

before COND : C1
==> before L2 : C2 (C2 is another know constant)

before L2 : C2
==> before LOOP : C3 = C2 - 1 (Here, we have the actual stack depth
of your mysterious point !!!!!)
==> before L3 : C3 = C2 - 1

...etc.

So when we start, we suppose all the points are "mysterious points"
We mark the first instruction as non mysterious with stack depth = 0;
We mark the first instruction of every exception handling block as non
mysterious with stack depth = 0 or 1 (depending of the type of the
block);

then we should start by traversing NON-mysterious points first, and
from them we can know about stack depth of other points.

In case at a certain point, all remaining points are mysterious, than
we take the first one, and assume it has a stack depth of 0.

Lotfi Gheribi

unread,

Apr 7, 2009, 7:40:44 PM4/7/09

to mono-...@googlegroups.com

Hey Keith, I think I got an example in which we cannot guess the stack depth.

br.s next

loop : ldc.i4.0
L0: br.s loop

next:

The loop is not referenced from outside. It's a kind of a "dead code".

The loop itself is invalid :
Let's suppose that loop has a stack depth, let's say Unkown1

Before loop : Unkown1
==> after loop : Unkown2 = Unkown1 + 1

before L0 : Unkown2
==> before loop = Unkown2 !!! Which is absurd because Unkown2 = Unkown1 + 1 !!!

In fact, Microsoft's implementation simply ignores unreachable or dead
code (The code actually runs perfectly on .Net)

thus, I believe that you're right about the fact that a single pass
should be enough, but we would have to IGNORE dead instructions.

Keith

unread,

Apr 7, 2009, 9:46:44 PM4/7/09

to mono-cecil

Hey Lofti,

Yeah, your algorithm will work, AND it should ignore dead code. But
it's not simple; it will either need to do multiple passes, or it will
have to know how to follow multiple branches. For example, when the
algorithm hits L2, it can finally start filling in the stack
information at LOOP. There may be lots of opcodes there; "call Body"
is just an abstraction. The algorithm has to remember that it needs to
resume processing at L3. This is recursive since "Body" may contain
similar loops. And what if that Body contains a branch to a label
AFTER L3?

I was keeping the goal specified in the standard, that the code must
be verifiable with a single forward pass through the code.

I'm really surprised that your dead code example will run on .NET,
because it's definitely invalid, as you said. IMO, the right thing to
do with that is to detect the conflicting stack information and throw
an exception. It's not something Cecil should worry about handling in
a valid way, because there IS no valid way to handle it. No self-
respecting compiler would produce such code.

-- Keith

Daniel Grunwald

unread,

Apr 8, 2009, 6:07:38 AM4/8/09

to mono-...@googlegroups.com

Keith wrote:
> It's not something Cecil should worry about handling in
> a valid way, because there IS no valid way to handle it. No self-
> respecting compiler would produce such code.
>

I agree with this. Cecil cannot be able to round-trip all kinds of
invalid assemblies that obfuscators could produce.
Obfuscator writers are just too clever coming up with invalid assemblies
that MS happens to accept.

I think it's sufficient if Cecil is able to read those assemblies - then
one could write a clean-up step (e.g. dead code removal) to fix the
invalid assembly before saving it.

Daniel Grunwald

signature.asc

Lotfi Gheribi

unread,

Apr 8, 2009, 6:30:27 AM4/8/09

to mono-...@googlegroups.com

Keith,

In fact, the algorithm is not supposed to check all possible paths:

1. As you said, any "No self-respecting compiler would produce such code."
2. We are not developing a PEVerify-like tool.
3. We need to keep the ComputeMaxStack method as simple as possible.
Handling all these cases would add a lot of complexity to the code.
4. We need to keep performance issues in mind...

I agree with you about the fact that if there is a conflict detected,
we probably should throw an exception. Or we can just use a very large
stack size (like methodBody.MaxStack = 100)

Daniel,

For dead code, we cannot just remove it ! But we still have to visit
"dead code" in a top-down fashion (and we should assume that it starts
with a 0 stack depth, because this is how Mono would validate the
stack depth). Have a look at the following code :

L0: ldc.i4.1
L1: br.s L6
L2: ldc.i4.2
L3: ldc.i4.3
L4: add
L5: br.s L7
L6: br.s L3
L7: stloc.0
L8: ...

This code is 100% valid, it has 1 single dead instruction at L2.
If you remove this instruction, your code would be no longer valid !!!!

In fact this is a very good feature that I'm willing to use in my
obfuscator to produce 100% valid code than would have been decompiled
:

int i = 2 + 3;

instead of

int i = 1 + 3;

All decompilers would crash or produce bad code.

Jb Evain

unread,

Apr 18, 2009, 9:13:29 AM4/18/09

to mono-...@googlegroups.com

On 4/8/09, Daniel Grunwald <dan...@danielgrunwald.de> wrote:
> I agree with this. Cecil cannot be able to round-trip all kinds of
> invalid assemblies that obfuscators could produce.
> Obfuscator writers are just too clever coming up with invalid assemblies
> that MS happens to accept.
>
> I think it's sufficient if Cecil is able to read those assemblies - then
> one could write a clean-up step (e.g. dead code removal) to fix the
> invalid assembly before saving it.

Definitely.

--
Jb Evain <j...@nurv.fr>

Keith

unread,

Apr 23, 2009, 5:20:52 PM4/23/09

to mono-cecil

On Apr 8, 6:30 am, Lotfi Gheribi <gheri...@gmail.com> wrote:
> For dead code, we cannot just remove it ! ...
> Have a look at the following code : ...

> This code is 100% valid, it has 1 single dead instruction at L2.
> If you remove this instruction, your code would be no longer valid !!!!

Perhaps it's valid according to the standard, but as I argued
previously, many real compilers need the platform to work differently.
I showed that the stack should be considered the same at L2 as at L1,
so there's a stack size conflict at L3. Branched from L6, the stack
size before L3 is 1; considered top-down, the stack size before L3 is
2. If it were my decision, I would throw an exception that the code is
invalid, when computing the max stack size.

Lotfi Gheribi

unread,

Apr 23, 2009, 5:45:48 PM4/23/09

to mono-...@googlegroups.com

> Perhaps it's valid according to the standard, but as I argued
> previously, many real compilers need the platform to work differently.
> I showed that the stack should be considered the same at L2 as at L1,
> so there's a stack size conflict at L3. Branched from L6, the stack
> size before L3 is 1; considered top-down, the stack size before L3 is
> 2.

When browsing the code is a top-down fashion, we can't know any thing
about the stack depth before L3 ! Why are you supposing it has a stack
depth of 1 ?!!!
In fact, since it's not reachable from any previous exception, then It
just has an "unknown stack depth".

=============== ECMA-335, Partition III
1.7.5 Backward branch constraints
It shall be possible, with a single forward-pass through the CIL
instruction stream for any method, to infer the exact state of the
evaluation stack at every instruction (where by “state” we mean the
number and type of each item on the evaluation stack).

In particular, if that single-pass analysis arrives at an instruction,
call it location X, that immediately follows an unconditional branch,
and where X is not the target of an earlier branch instruction, then
the state of the evaluation stack at X, clearly, cannot be derived
from existing information. In this case, the CLI demands that the
evaluation stack at X be empty.

Following on from this rule, it would clearly be invalid CIL if a
later branch instruction to X were to have a
non-empty evaluation stack
===============

> if it were my decision, I would throw an exception that the code is

> invalid, when computing the max stack size.

Actually, I've been using of modified version of the patch I submitted
for a little while :

private void MarkNextInstruction(IntegerQueue visitedInstructions,
int[] stackSizes, int instructionIndex, int stackDepth)
{
if (instructionIndex >= stackSizes.Length)
return;

int curStackDepth = stackSizes[instructionIndex];

if (curStackDepth == -1)
{
visitedInstructions.Add(instructionIndex);
stackSizes[instructionIndex] = stackDepth;
}
else if (curStackDepth != stackDepth)
throw new Exception("Invalid IL code.");
}

By throwing exceptions when detecting a wrong instruction depth, it
helped me a lot emitting correct IL code. No need to use PEVerify
every time I introduces a small change in my code.

Reply all

Reply to author

Forward