A bug in the JIT codegenerator

Maxim Shemanarev

unread,

Dec 9, 2002, 6:08:57 PM12/9/02

to

Here's a simple example in C#. Create a new C# Console Application and
copy/paste the following code:

using System;

namespace cs_bug1
{
class Class1
{

static void CalcTripleBond1(double w, double[] coord)
{
double dd = System.Math.Sqrt((coord[2] - coord[0]) *
(coord[2] - coord[0]) +
(coord[3] - coord[1]) *
(coord[3] - coord[1])) * 0.5;
double dx = (coord[2] - coord[0]) / dd;
double dy = (coord[3] - coord[1]) / dd;
coord[0] -= w * dy;
coord[1] += w * dx;
coord[2] -= w * dy;
coord[3] += w * dx;
coord[8] += w * dy;
coord[9] -= w * dx;
coord[10] += w * dy;
coord[11] -= w * dx;
}

static void CalcTripleBond2(double w, double[] coord)
{
double dx = coord[2] - coord[0];
double dy = coord[3] - coord[1];
double dd = System.Math.Sqrt(dx*dx + dy*dy) * 0.5;
dx /= dd;
dy /= dd;
coord[0] -= w * dy;
coord[1] += w * dx;
coord[2] -= w * dy;
coord[3] += w * dx;
coord[8] += w * dy;
coord[9] -= w * dx;
coord[10] += w * dy;
coord[11] -= w * dx;
//System.Console.WriteLine(dy); // Uncomment this line -
// the function will work
// correctly
}

[STAThread]
static void Main(string[] args)
{
double[] coord1 = new double[12];
double[] coord2 = new double[12];

// The initial values:
// 0 1 2 3 4 5 6 7 8 9 10 11
// X1, Y1, X2, Y2, X1, Y1, X2, Y2, X1, Y1, X2, Y2

coord2[0] = coord2[4] = coord2[8] =
coord1[0] = coord1[4] = coord1[8] = 2.8768; // X1

coord2[1] = coord2[5] = coord2[9] =
coord1[1] = coord1[5] = coord1[9] = 7.7655; // Y1

coord2[2] = coord2[6] = coord2[10] =
coord1[2] = coord1[6] = coord1[10] = 1.7654; // X2

coord2[3] = coord2[7] = coord2[11] =
coord1[3] = coord1[7] = coord1[11] = 2.0976; // Y2

// The following functions calculate two parallel line
// segments and modify elements:
// 0, 1, 2, 3 and 8, 9, 10, 11
// The functions are logically identical, so the result must
// be the same, but in the Release mode it's different.

CalcTripleBond1(0.1, coord1);
CalcTripleBond2(0.1, coord2);

// Compare the results and print out the difference
int j;
for(j = 0; j < 12; j++)
{
if(System.Math.Abs(coord1[j] - coord2[j]) > 0.000001)
{
System.Console.WriteLine("j=" + j + " err=" +
(coord1[j] - coord2[j]));
}
}
}
}
}

Two functions CalcTripleBond1 and CalcTripleBond2 are logically
identical. They calculate two line segments that are parallel to the
given one. At the end of Main() there is verification that in normal
case shouldn't print anything. In the Debug configuration everything
works fine while the Release version prints the following:

j=1 err=0.221360672458666
j=3 err=0.221360672458667
j=9 err=-0.221360672458667
j=11 err=-0.221360672458667

The difference in calculations is very considerable!
The bug appears in CalcTripleBond2(). If we uncomment
"//System.Console.WriteLine(dy);" the function starts working
correctly. It seems like the optimizer removes the "dy" variable and
uses "dx" instead.
Now. The compiler produces identical IL-asm code in these two cases
(with- and without WriteLine(dy)), except calling WriteLine itself at
the end of the function, of course. Here's the result of comparison:

Comparing files cs_bug_release and CS_NOBUG_RELEASE
***** cs_bug_release
{
// Code size 150 (0x96)
.maxstack 5
***** CS_NOBUG_RELEASE
{
// Code size 156 (0x9c)
.maxstack 5
*****

***** cs_bug_release
IL_0094: stelem.r8
IL_0095: ret
} // end of method Class1::CalcTripleBond2
***** CS_NOBUG_RELEASE
IL_0094: stelem.r8
IL_0095: ldloc.1
IL_0096: call void
[mscorlib]System.Console::WriteLine(float64)
IL_009b: ret
} // end of method Class1::CalcTripleBond2
*****

It means that actually the bug is in the JIT codegenerator, which is
very, very bad. :-( What am I doing? Beta-testing MS-products for
free? :-[

I use FrameWork v. 1.0.3705. Can someone check it in other versions,
Service-Packs (if available) or in .Net 2003?

McSeem
http://www.antigrain.com

Reid Wilkes

unread,

Dec 9, 2002, 8:06:04 PM12/9/02

to

Maxim,
I confirmed your results on sp1 of the V1 .NET frameworks, but do not get
the problem with V1.1 or later. I did not get the chance to try with sp2 of
V1, but it is publicly available for download. Thanks for reporting the
issue.

Thanks,
Reid Wilkes
Microsoft CLR QA Team

"Maxim Shemanarev" <mcse...@yahoo.com> wrote in message
news:941f5f44.02120...@posting.google.com...

Maxim Shemanarev

unread,

Dec 10, 2002, 9:43:48 AM12/10/02

to

Thank you Reid!

> Maxim,
> I confirmed your results on sp1 of the V1 .NET frameworks, but do not get
> the problem with V1.1 or later.

What does it mean "or later"? Are you testing something else?

McSeem
http://www.antigrain.com

Maxim Shemanarev

unread,

Dec 10, 2002, 11:08:39 AM12/10/02

to

> Maxim,
> I confirmed your results on sp1 of the V1 .NET frameworks, but do not get
> the problem with V1.1 or later. I did not get the chance to try with sp2 of
> V1, but it is publicly available for download. Thanks for reporting the
> issue.

V1 .NET frameworks SP2 has the same bug. Is there any chance to fix it
(we have an official support program, of course) or it's better to use
V1.1 right now? What would you recommend?

McSeem
http://www.antigrain.com

Jim Hogg [MS]

unread,

Dec 10, 2002, 7:18:27 PM12/10/02

to

Maxim,

Even tho' the two methods are "mathematically" equivalent, that does not
necessarily carry over into their fp implementation.

I would guess the difference in results is due to the JIT'ed code causing
register spilling from the 80-bit x86 fp registers back to the 64-bit 'home'
locations in memory. So an extra, seemingly unrelated call (eg to
Console.WriteLine) can give strange results due to 'losing' the low-order
bits of the fp mantissa. (The alternative, to force normalization to 64-bit
after each calculation, gives repeatable results, but lousy performance)

You might take a look at the generated x86 code (eg, using CorDbg). I'll
bet it looks (and is) a good translation of the algorithm.

Jim Hogg [MS]

"Maxim Shemanarev" <mcse...@yahoo.com> wrote in message

news:941f5f44.02121...@posting.google.com...

Maxim Shemanarev

unread,

Dec 11, 2002, 11:39:36 AM12/11/02

to

Hi Jum,

> Even tho' the two methods are "mathematically" equivalent, that does not
> necessarily carry over into their fp implementation.

Correct.

> I would guess the difference in results is due to the JIT'ed code causing
> register spilling from the 80-bit x86 fp registers back to the 64-bit 'home'
> locations in memory. So an extra, seemingly unrelated call (eg to
> Console.WriteLine) can give strange results due to 'losing' the low-order
> bits of the fp mantissa. (The alternative, to force normalization to 64-bit
> after each calculation, gives repeatable results, but lousy performance)

No!
1. This great difference cannot be explained with losing accuracy.
2. The difference depends on the value of the "w" argument. It's
simply proportional. Afterall, the range of the numbers is about
1...10 while the difference is 0.2! Losing accuracy is defininitely
not the cause.
3. The call to Console.WriteLine *fixes* the problem.
4. When using floats instead of double we have the very same problem.
5. In the Debug mode the function works fine as well as in .NET FW
1.1.

> You might take a look at the generated x86 code (eg, using CorDbg). I'll
> bet it looks (and is) a good translation of the algorithm.

I wouldn't like to waste my time analyzing the disassembled
instructions, it's MS job afterall. But I'll bet the translation of
CalcTripleBond2 (without the call to Console.WriteLine) is not as good
as you claim it.

I would suggest you to have more careful look at the issue.
This is ridiculous. I post a very serious issue and hear some
baby-talks from a Microsoft representative.

Thanks for the response, however! :-)

McSeem
http://www.antigrain.com

Maxim Shemanarev

unread,

Dec 11, 2002, 3:27:10 PM12/11/02

to

Here's the graphical representation of the bug.

I have modified the source coordinates and added the printing:

coord2[0] = coord2[4] = coord2[8] =

coord1[0] = coord1[4] = coord1[8] = 2.8768*50; // X1

coord2[1] = coord2[5] = coord2[9] =

coord1[1] = coord1[5] = coord1[9] = 7.7655*50; // Y1

coord2[2] = coord2[6] = coord2[10] =

coord1[2] = coord1[6] = coord1[10] = 1.7654*50; // X2

coord2[3] = coord2[7] = coord2[11] =

coord1[3] = coord1[7] = coord1[11] = 2.0976*50; // Y2

// The following functions calculate

// two parallel line segments and modify elements:

// 0, 1, 2, 3 and 8, 9, 10, 11

// The functions are logically identical, so the result must be the
same,
// but in the Release mode it's different.

CalcTripleBond1(20, coord1);
CalcTripleBond2(20, coord2);

for(j = 0; j < 12; j += 2)
{
System.Console.WriteLine("x1[" + j/2 + "] = " + coord1[j] +
"; y1[" + j/2 + "] = " + coord1[j+1] + ";");
}

System.Console.WriteLine("");

for(j = 0; j < 12; j += 2)
{
System.Console.WriteLine("x2[" + j/2 + "] = " + coord2[j] +
"; y2[" + j/2 + "] = " + coord2[j+1] + ";");
}

The output (formatted for your convenience) is:

x1[0] = 183.09248844949; y1[0] = 380.578107736064;
x1[1] = 127.52248844949; y1[1] = 97.1831077360639;
x1[2] = 143.84; y1[2] = 388.275;
x1[3] = 88.27; y1[3] = 104.88;
x1[4] = 104.58751155051; y1[4] = 395.971892263936;
x1[5] = 49.0175115505098; y1[5] = 112.576892263936;

x2[0] = 183.09248844949; y2[0] = 336.305973244331;
x2[1] = 127.52248844949; y2[1] = 52.9109732443305;
x2[2] = 143.84; y2[2] = 388.275;
x2[3] = 88.27; y2[3] = 104.88;
x2[4] = 104.58751155051; y2[4] = 440.244026755669;
x2[5] = 49.0175115505098; y2[5] = 156.849026755669;

Note that X-coordinates are calculated correctly.

The picture: http://www.antigrain.com/cs_bug.gif

The brown line segment is central. Green lines are what I expect and
what CalcTripleBond1() produces. Red lines is the result of
CalcTripleBond2()
You can make sure in it calling any MoveTo/LineTo API with these
coordinates like this:

int i;
for(i = 0; i < 6; i += 2)
{
MoveTo(x1[i], y1[i]);
LineTo(x1[i+1], y1[i+1]);

MoveTo(x2[i], y2[i]);
LineTo(x2[i+1], y2[i+1]);
}

This result cannot be explained with losing accuracy when converting
float64-float80 and back. There's someting wrong with using variables.
Period.

McSeem
http://www.antigrain.com

Jim Hogg

unread,

Dec 12, 2002, 11:35:57 AM12/12/02

to

You're right. Diffs this large had to be a JIT bug -- probably in one of
the optimization passes over the AST, prior to code emission. And such bugs
are indeed evil. (First time I hit one, long, long ago, was in a mainframe
FORTRAN compiler. Shook my faith in human nature. Intensely frustrating --
especially the kind that evaporate and reappear as you make minor tweaks to
the code. Yet building a correct batch compiler is child's play compared
with an IL-JIT: multi-threaded, optimizing, resource-constrained, GC-aware,
debugger-friendly, and injecting security stack walks. Not an excuse, just
a digression -- feel free to ignore)

JIT bugs are ultra-serious -- but rare. We fix each one reported or
discovered. As well as hand-written regression suites, we run program
transform tools that generate bizarre IL, as never output by any HLL
compiler, looking for holes. And, as Reid already said, this one does not
repro on V1.1. (I double-checked on my laptop, running a VS .NET 2003 build
from a month or two ago, and it's good).

Anyhow, bottom line, I should have analyzed your code and results more
carefully before jumping in. Apologies for the offense my reply seems to
have caused.

Jim [MS]

"Maxim Shemanarev" <mcse...@yahoo.com> wrote in message
news:941f5f44.02121...@posting.google.com...

Maxim Shemanarev

unread,

Dec 13, 2002, 11:56:55 AM12/13/02

to

Thanks Jim,

And please forgive me my agressive tone in the previous messages. I
used to trust compilers and it was the last resort to suspect the
compiler has a bug. But when it actually happens it's extremally
nasty. I spent two days trying to figure out what was wrong in my
code.

> You're right. Diffs this large had to be a JIT bug -- probably in one of
> the optimization passes over the AST, prior to code emission. And such bugs
> are indeed evil. (First time I hit one, long, long ago, was in a mainframe
> FORTRAN compiler. Shook my faith in human nature. Intensely frustrating --
> especially the kind that evaporate and reappear as you make minor tweaks to
> the code. Yet building a correct batch compiler is child's play compared
> with an IL-JIT: multi-threaded, optimizing, resource-constrained, GC-aware,
> debugger-friendly, and injecting security stack walks. Not an excuse, just
> a digression -- feel free to ignore)

Not at all. But on the other hand Microsoft took the responsibility to
implement all of it. The .Net technology is great, but what is the
good of it if I cannot rely on the generated code? One can tolerate
bugs in the DevStudio, even bugs when a compiler reports an internal
error, but never bugs in the generated code.

> Anyhow, bottom line, I should have analyzed your code and results more
> carefully before jumping in. Apologies for the offense my reply seems to
> have caused.

No problem, sorry again for my tone.
Still, what do you think how to fix the problem?
The options are:

1. To use a function that works correctly (it makes me feel very
uncomfortable, though)
2. To report a bug to MS officially and wait for possible
patch/service pack (I always want to say "six-pack" :-).
3. To use FW 1.1 now (AFAIK it's only Beta)

McSeem
http://www.antigrain.com