[Haskell-cafe] Possible floating point bug in GHC?

Peter Verswyvelen

unread,

Apr 3, 2009, 1:58:51 PM4/3/09

to Haskell Cafe

For days I'm fighting against a weird bug.
My Haskell code calls into a C function residing in a DLL (I'm on Windows,
the DLL is generated using Visual Studio). This C function computes a
floating point expression. However, the floating point result is incorrect.

I think I found the source of the problem: the C code expects that all the
Intel's x86's floating point register tag bits are set to 1, but it seems
the Haskell code does not preserve that.

Since the x86 has all kinds of floating point
weirdness<http://www.informit.com/articles/article.aspx?p=770362> -
it is both a stack based and register based system - so it is crucially
important that generated code plays nice. For example, when using MMX one
must always emit an EMMS
instruction<http://msdn.microsoft.com/en-us/library/590b9ks9(VS.80).aspx>to
clear these tag bits.

If I manually clear these tags bits, my code works fine.

Is this something other people encountered as well? I'm trying to make a
very simple test case to reproduce the behavior...

I'm not sure if this is a visual C compiler bug, GHC bug, or something I'm
doing wrong...

Is it possible to annotate a foreign imported C function to tell the Haskell
code generator the functioin is using floating point registers somehow?

Malcolm Wallace

unread,

Apr 3, 2009, 3:03:24 PM4/3/09

to Haskell Cafe, cvs...@haskell.org

Interesting. This could be the cause of a weird floating point bug
that has been showing up in the ghc testsuite recently, specifically
affecting MacOS/Intel (but not MacOS/ppc).
http://darcs.haskell.org/testsuite/tests/ghc-regress/lib/Numeric/num009.hs

That test compares the result of the builtin floating point ops with
the same ops imported via FFI. The should not be different, but on
Intel they sometimes are.

Regards,
Malcolm

On 3 Apr 2009, at 18:58, Peter Verswyvelen wrote:

> For days I'm fighting against a weird bug.
>
> My Haskell code calls into a C function residing in a DLL (I'm on
> Windows, the DLL is generated using Visual Studio). This C function
> computes a floating point expression. However, the floating point
> result is incorrect.
>
> I think I found the source of the problem: the C code expects that
> all the Intel's x86's floating point register tag bits are set to 1,
> but it seems the Haskell code does not preserve that.
>

> Since the x86 has all kinds of floating point weirdness - it is both

> a stack based and register based system - so it is crucially
> important that generated code plays nice. For example, when using

> MMX one must always emit an EMMS instruction to clear these tag bits.

>
> If I manually clear these tags bits, my code works fine.
>
> Is this something other people encountered as well? I'm trying to
> make a very simple test case to reproduce the behavior...
>
> I'm not sure if this is a visual C compiler bug, GHC bug, or
> something I'm doing wrong...
>
> Is it possible to annotate a foreign imported C function to tell the
> Haskell code generator the functioin is using floating point
> registers somehow?

_______________________________________________
Haskell-Cafe mailing list
Haskel...@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Peter Verswyvelen

unread,

Apr 3, 2009, 3:32:10 PM4/3/09

to Malcolm Wallace, cvs...@haskell.org, Haskell Cafe

Well this situation can indeed not occur on PowerPCs since these CPUs just
have floating point registers, not some weird dual stack sometimes /
registers sometimes architecture.
But in my case the bug is consistent, not from time to time.

So I'll try to reduce this to a small reproducible test case, maybe
including the assembly generated by the VC++ compiler.

Zachary Turner

unread,

Apr 3, 2009, 3:55:14 PM4/3/09

to Peter Verswyvelen, cvs...@haskell.org, Haskell Cafe, Malcolm Wallace

What floating point model is your DLL compiled with? There are a variety of
different options here with regards to optimizations, and I don't know about
the specific assembly that each option produces, but I know there are
options like Strict, Fast, or Precise, and maybe when you do something like
that it makes different assumptions about the caller. Although that doesn't
say anything about whose "fault" it is, but at least it might be helpful to
know if changing the floating point model causes the bug to go away.

Peter Verswyvelen

unread,

Apr 3, 2009, 4:10:42 PM4/3/09

to Zachary Turner, cvs...@haskell.org, Haskell Cafe, Malcolm Wallace

I tried both precise and fast, but that did not help. Compiling to SSE2
fixed it, since that does not use a floating point stack I guess.
I'm preparing a repro test case, but it is tricky since removing code tends
to change the optimizations and then the bug does not occur.

Does anybody know what the calling convention for floating points is for
cdecl on x86? The documentation says that the result is returned in st(0),
but it says nothing about the floating point tags. I assume that every
function expects the FP stack to be empty, potentially containing just
argument values. But GHC calls the C function with some FP registers
reserved on the stack...

Ian Lynagh

unread,

Apr 3, 2009, 4:35:29 PM4/3/09

to Peter Verswyvelen, cvs...@haskell.org, Malcolm Wallace, Haskell Cafe

On Fri, Apr 03, 2009 at 10:10:17PM +0200, Peter Verswyvelen wrote:
> I tried both precise and fast, but that did not help. Compiling to SSE2
> fixed it, since that does not use a floating point stack I guess.

You didn't say what version of GHC you are using, but it sounds like
this might already be fixed in 6.10.2 by:

Tue Nov 11 12:56:19 GMT 2008 Simon Marlow <marl...@gmail.com>
* Fix to i386_insert_ffrees (#2724, #1944)
The i386 native code generator has to arrange that the FPU stack is
clear on exit from any function that uses the FPU. Unfortunately it
was getting this wrong (and has been ever since this code was written,
I think): it was looking for basic blocks that used the FPU and adding
the code to clear the FPU stack on any non-local exit from the block.
In fact it should be doing this on a whole-function basis, rather than
individual basic blocks.

Thanks
Ian

Peter Verswyvelen

unread,

Apr 3, 2009, 4:47:25 PM4/3/09

to Peter Verswyvelen, Zachary Turner, cvs...@haskell.org, Haskell Cafe, Malcolm Wallace

Ouch, what a waste of time on my side :-(
This bugfix is not mentioned in the "notable bug fixes"
here<http://haskell.org/ghc/docs/6.10.2/html/users_guide/release-6-10-2.html>

Since this is such a severe bug, I would recommend listing it :)

Anyway, I have a very small repro test case now. Will certainly test this
with GHC 6.10.2.

Peter Verswyvelen

unread,

Apr 3, 2009, 5:04:31 PM4/3/09

to Peter Verswyvelen, Zachary Turner, cvs...@haskell.org, Haskell Cafe, Malcolm Wallace

Okay, I can confirm the bug is fixed.
It's insane this bug did not cause any more problems. Every call into every
C function that uses floating point could have been affected (OpenGL, BLAS,
etc)