Last week Budgemano posted an issue which involved numerical
inaccuracies in Mathematica 7. Due to the xmas season it hasn't
received much attention, or so it seems.
Meanwhile, I've been able to drill the problem down to the following
code:
myMod := Module[
{p},
Dot[{0.5018839897852984567457648, 0.544293499215831476457645674576,
0.67220312964800834576456745674567}, {-0.5359693986690249`, \
-0.41257352636613975`, -0.736559494563862`}]*{-0.5359693986690249`, \
-0.41257352636613975`, -0.736559494563862`} - {0.5579131132707782`,
0.2715072333003815`, 0.7842300557273234`}]
For[ii = 1, ii <= 250, ii++,
If[myMod =!= myMod,
Print["Iteration " <> ToString[ii] <> " fails comparison "]];]
As you can see the module myMod is called twice every iteration. Its
calculations only involve constants, so the results should always be
the same. However, on my system (Mathematica 7.WinXP) the results differ from one
call to the other at seemingly random times.
This phenomenon crucially depends on the presence of the local
variable p (that isn't used!). With this variable it's there, without
it isn't. In M6 everything is OK, so this is a bug specific to Mathematica 7.
Could you check on the various platforms whether you can reproduce
this?
Cheers -- Sjoerd
<snip>
> Meanwhile, I've been able to drill the problem down to the following
> code:
>
> myMod := Module[
> {p},
> Dot[{0.5018839897852984567457648, 0.544293499215831476457645674576,
> 0.67220312964800834576456745674567}, {-0.5359693986690249`, \
> -0.41257352636613975`, -0.736559494563862`}]*{-0.5359693986690249`, \
> -0.41257352636613975`, -0.736559494563862`} - {0.5579131132707782`,
> 0.2715072333003815`, 0.7842300557273234`}]
>
> For[ii = 1, ii <= 250, ii++,
> If[myMod =!= myMod,
> Print["Iteration " <> ToString[ii] <> " fails comparison "]];]
>
> As you can see the module myMod is called twice every iteration. Its
> calculations only involve constants, so the results should always be
> the same. However, on my system (Mathematica 7.WinXP) the results differ from
> one
> call to the other at seemingly random times.
>
> This phenomenon crucially depends on the presence of the local
> variable p (that isn't used!). With this variable it's there, without
> it isn't. In M6 everything is OK, so this is a bug specific to Mathematica 7.
>
> Could you check on the various platforms whether you can reproduce
> this?
Hi Sjoerd,
I have tested the code on two platforms. The results are as follows:
..1 Mathematica 6.0.3 64-bit Intel Mac OS X Leopard 10.5.6: the first as
well as consecutive evaluations of the loop does not yield any numerical
difference.
..2 However, Mathematica 7.0 32-bit Windows XP Pro. SP3 generates
numerous *random* differences run after run.
Best regards,
-- Jean-Marc
In[1]:= {$Version, $ReleaseNumber}
myMod :=
Module[{p},
Dot[{0.5018839897852984567457648, 0.544293499215831476457645674576,
0.67220312964800834576456745674567}, {-0.5359693986690249`, \
-0.41257352636613975`, -0.736559494563862`}]*{-0.5359693986690249`, \
-0.41257352636613975`, -0.736559494563862`} - {0.5579131132707782`,
0.2715072333003815`, 0.7842300557273234`}]
For[ii = 1, ii <= 250, ii++,
If[myMod =!= myMod,
Print["Iteration " <> ToString[ii] <> " fails comparison "]];]
Out[1]= {"6.0 for Mac OS X x86 (64-bit) (May 21, 2008)", 3}
In[4]:= For[ii = 1, ii <= 250, ii++,
If[myMod =!= myMod,
Print["Iteration " <> ToString[ii] <> " fails comparison "]];]
{$Version, $ReleaseNumber}
myMod :=
Module[{p},
Dot[{0.5018839897852984567457648, 0.544293499215831476457645674576,
0.67220312964800834576456745674567}, {-0.5359693986690249`, \
-0.41257352636613975`, -0.736559494563862`}]*{-0.5359693986690249`, \
-0.41257352636613975`, -0.736559494563862`} - {0.5579131132707782`,
0.2715072333003815`, 0.7842300557273234`}]
For[ii = 1, ii <= 250, ii++,
If[myMod =!= myMod,
Print["Iteration " <> ToString[ii] <> " fails comparison "]];]
{7.0 for Microsoft Windows (32 - bit) (November 10, 2008), 0}
Iteration 2 fails comparison
Iteration 8 fails comparison
Iteration 19 fails comparison
Iteration 21 fails comparison
Iteration 61 fails comparison
Iteration 150 fails comparison
Iteration 154 fails comparison
Iteration 162 fails comparison
Iteration 189 fails comparison
Iteration 201 fails comparison
Iteration 220 fails comparison
Iteration 224 fails comparison
Iteration 229 fails comparison
Iteration 230 fails comparison
Iteration 237 fails comparison
Iteration 249 fails comparison
For[ii = 1, ii <= 250, ii++,
If[myMod =!= myMod,
Print["Iteration " <> ToString[ii] <> " fails comparison "]];]
Iteration 7 fails comparison
Iteration 20 fails comparison
Iteration 36 fails comparison
Iteration 41 fails comparison
Iteration 52 fails comparison
Iteration 60 fails comparison
Iteration 64 fails comparison
Iteration 111 fails comparison
Iteration 114 fails comparison
Iteration 119 fails comparison
Iteration 143 fails comparison
Iteration 146 fails comparison
Iteration 162 fails comparison
Iteration 163 fails comparison
Iteration 164 fails comparison
Iteration 185 fails comparison
Iteration 236 fails comparison
Iteration 238 fails comparison
Iteration 239 fails comparison
Iteration 244 fails comparison
Iteration 245 fails comparison
Iteration 249 fails comparison
I also noticed that Block and With never give rise to the same
problem, so that this error appears to be specific to Module. In
particular, probably, to the part of the internal code which creates
local variables.
Cheers.
> ....
work fine on Windows Vista 64 Bit.
Regards
Jens
Even tried bumping up the number of iterations to over 1 million and
no difference on the Mac at all.
-Bob
> We shall discuss the following myths, among others: [...] Arithmetic
> operations are deterministic; that is, if I do z=x+y in two places in
> the same program and my program never touches x and y in the
> meantime, then the results should be the same.
from "The pitfalls of verifying floating-point computations"
http://hal.archives-ouvertes.fr/docs/00/28/14/29/PDF/floating-point-article.pdf
http://hal.archives-ouvertes.fr/hal-00128124
> Incidentally, some people think that the solution to such anomalies
> is never to compare floating-point numbers for equality, but instead
> to consider them equal if they are within some error bound E. This is
> hardly a cure-all because it raises as many questions as it answers.
> What should the value of E be? If x < 0 and y > 0 are within E,
> should they really be considered to be equal, even though they have
> different signs? Furthermore, the relation defined by this rule, a ~
> b |a - b| < E, is not an equivalence relation because a ~ b and b ~
> c does not imply that a ~ c.
from "What Every Computer Scientist Should Know About Floating-Point
Arithmetic"
http://docs.sun.com/source/806-3568/ncg_goldberg.html
Luis.
Sjoerd C. de Vries wrote:
First, the bug manifests itself as a variation at the same place in the
code - the loop goes round and the results vary - so I don't think this
caveat applies.
Furthermore, I think the reason that z=x+y could generate slightly
different results at two different points in a C or Fortran program,
would be as a result of some sort of reordering of the code by an
optimiser, and would not apply here.
I certainly think this is a bug, and I can even guess what may have
happened - see my reply to a later post on this issue!
David Bailey
http://www.dbaileyconsultancy.co.uk
Cheers -- Sjoerd
> We shall discuss the following myths, among others: [...] Arithmetic
> operations are deterministic; that is, if I do z=x+y in two places in
> the same program and my program never touches x and y in the
> meantime, then the results should be the same.
from "The pitfalls of verifying floating-point computations"
http://hal.archives-ouvertes.fr/docs/00/28/14/29/PDF/floating-point-article.pdf
http://hal.archives-ouvertes.fr/hal-00128124
> Incidentally, some people think that the solution to such anomalies
> is never to compare floating-point numbers for equality, but instead
> to consider them equal if they are within some error bound E. This is
> hardly a cure-all because it raises as many questions as it answers.
> What should the value of E be? If x < 0 and y > 0 are within E,
> should they really be considered to be equal, even though they have
> different signs? Furthermore, the relation defined by this rule, a ~
> b |a - b| < E, is not an equivalence relation because a ~ b and b ~
> c does not imply that a ~ c.
from "What Every Computer Scientist Should Know About Floating-Point
Arithmetic"
http://docs.sun.com/source/806-3568/ncg_goldberg.html
Luis.
Sjoerd C. de Vries wrote:
I believe I can tell you exactly why this is happening. It is not
caused by Mathematica. Google says that Mathematica uses the
Intel Math-Kernel-Library. We use the M.K.L. in our programs too,
and we see the same stuff.
The bad news is that it is a consequence of the M.K.L. which can't
be avoided. The good news is that you should not care about it.
If the difference seems important then you are doing something
wrong.
What I finally discovered is that Intel wrote two implementations
of many functions in their M.K.L. Basic-Linear-Algebra-System
(B.L.A.S.). One version works if the memory blocks that store the
data begin at a memory address that is a multiple of 16. This is
faster. If the data does not begin at a multiple of 16. All memory
blocks begin at a multiple of 8.
The algorithms for the multiple of 16 are the same as the non-
multiple of 16, but the code is different. The fast code uses
the SSE registers. The slow code uses the FPU registers.
Programs do not have any choice about where memory blocks begin.
When you use new to get an array in C++ the computer will pick
some memory block somewhere. Sometimes it begins at a memory
address that is divisible by 16. When that array is used by
the B.L.A.S. in the M.K.L. then the fast code is used. Other
times the slow code is used because the memory address is not
divisible by 16. Memory addresses are randomly chosen by the
computer's memory allocater.
The B.L.A.S. algorithm is always the same. The only difference is
in the lowest order digits. The FPU registers dont work the same
as the SSE registers exactly. If you see big important
differences between the two results it means your results are
garbage eitherways. Because of cancellation or an illconditioned
algorithm or a sensitive matrix.
While you use a program that uses the Math-Kernel-Library you
will always see the non-deterministic numerical inaccuracies
if you use linear algebra and floating point numbers. This is
not caused by Mathematica, they are victims of Intel also!!
On the 64-bit Intel CPU, there are no FPU registers, so there are
not two different implementations in the M.K.L. for functions.
The bug does not appear on 64-bit. There is a different M.K.L.
library from Intel for the 64-bit CPU. And also it only is
manufactured for Linux and Windows, not Mac.
But why this happens only with Mathematica 7 but not with Mathematica
5.2 and not with Mathematica 6?
Cheers -- Sjoerd
On Jan 2, 1:56 pm, fiamina...@gmail.com wrote:
> On Dec 30 2008, 4:53 am, "Sjoerd C. de Vries"<sjoerd.c.devr...@gmail.com>=
wrote:
> > Hi all,
>
> > Last week Budgemano posted an issue which involved numericalinaccura=
ci=
>
> esin Mathematica 7. Due to the xmas season it hasn't
>
>
>
> > received much attention, or so it seems.
>
> > Meanwhile, I've been able to drill the problem down to the following
> > code:
>
> > myMod := Module[
> > {p},
> > Dot[{0.5018839897852984567457648, 0.544293499215831476457645674576,
> > 0.67220312964800834576456745674567}, {-0.5359693986690249`,=
\
> > -0.41257352636613975`, -0.736559494563862`}]*{-0.5359693986690249`, \
> > -0.41257352636613975`, -0.736559494563862`} - {0.5579131132707782`,
> > 0.2715072333003815`, 0.7842300557273234`}]
>
> > For[ii = 1, ii <= 250, ii++,
> > If[myMod =!= myMod,
> > Print["Iteration " <> ToString[ii] <> " fails comparison "]];]
>
> > As you can see the module myMod is called twice every iteration. Its
> > calculations only involve constants, so the results should always be
> > the same. However, on my system (Mathematica 7.WinXP) the results diffe=
The test fails only on Mathematica V 7.0, not on V. 6.0.3
The test fails also on AMD 64 X2 dual core 5200+ machines (just
tested).
The test fails only if the localized p variable is present.
So it is hard to believe this is exclusively Intel's fault.
Mathematica 7.0 must be doing something different than in V. 6.0.3 related to
the localized variables in Module.
It is good that WRI is investigating this.
A newer compiler was used for 7
I work on a Linux platform that happens to show the two results. In a
debug kernel I was able to ascertain that this even/odd alignment mod
8 is in fact the cause. Specifically, we get two possible results for
the following input.
myMod5 := Dot[{0.50188398978529847, 0.54429349921583148,
0.67220312964800832},
{-0.53596939866902493, -0.41257352636613975, -0.73655949456386205}]
As one can see below, myMod5 will give one of two results.
In[86]:= InputForm[myMod5]
Out[86]//InputForm= -0.988673145974262
In[87]:= InputForm[myMod5]
Out[87]//InputForm= -0.9886731459742619
Use of Module/Block, Table/Do, etc. are sort of red herrings. They do
in fact affect the outcome, but I believe that is because they affect
the memory layout in some subtle way. For example, if I do
Do[Print[InputForm[myMod5]],{20}]
I see the less precise result (the one ending ...262 ). If instead I
do
InputForm[Table[myMod5,{20}]]
I only see the more precise result (ending in ...2619).
But when I just enter
InputForm[myMod5]
(as above, In[86] and In[87]), I can get either result. Tracking in a
debugger revealed that the one with the more precision arises when one
or both addresses of the vector are equal to 8 mod 16 (that is,
aligned to 8 but not 16 bytes). The less precise result appears only
to happen when both vectors are aligned to 16 bytes. The code that
produces these results is ddot in the MKL.
My own theory, apparently incorrect, was that registers were all the
same but that store-to-memory happened in ways that depended on
vagaries of register availability. But this theory might account for
the following similar item, that showed up in some private email I
recently had regarding platform dependent differences of the logistic
map.
x = 0.052903130124606165;
On a 64 bit Linux OS:
In[24]:= InputForm[x = 4*x*(1 - x)]
Out[24]//InputForm= 0.2004175557905006
In[25]:= InputForm[x = 4*x*(1 - x)]
Out[25]//InputForm= 0.6410014364858487
On a 32-bit Linux OS:
In[23]:= InputForm[x = 4*x*(1 - x)]
Out[23]//InputForm= 0.20041755579050063
In[24]:= InputForm[x = 4*x*(1 - x)]
Out[24]//InputForm= 0.6410014364858488
This platform dependent behavior is by no means a bug; it just shows
that IEEE arithmetic can give rise to different results depending on
usage of extended precision registers and intermediate stores. I do
not know whether the different results on one platform, due to
alignment dependent MKL behavior, is regarded as a bug by the
developers of that library.
Anyway, my thanks to several people in this thread for helpful
analysis of the issue. Even the things that turned out to be red
herrings seem interesting (and were by no means obviously off the
mark).
Daniel Lichtblau
Wolfram Research
I think we should care about it! Lots of programs have different
behaviours depending on a real number threshold, and this behaviour
means that there is the possibility of a test flipping different ways on
different runs of a program!
This could be a debugging nightmare, and will be in the back of people's
minds whenever anything 'strange' happens.
Would it not be possible to revert to the previous, slightly less
optimal, library until (presumably) the library is fixed?
David Bailey
http://www.dbaileyconsultancy.co.uk
First I should point out that the above remark was not mine. It was in a
note I had at least in part quoted. That said, I neither fully agree nor
disagree with it. If small machine arithmetic fuzz is bedeviling your
code, you almost certainly are doing something a bit unsafe.
> I think we should care about it! Lots of programs have different
> behaviours depending on a real number threshold, and this behaviour
> means that there is the possibility of a test flipping different ways on
> different runs of a program!
One has to be careful about choosing the threshold, etc. A few bits for
BLAS machine arithmetic tolerance seems not unreasonable to me.
> This could be a debugging nightmare, and will be in the back of people's
> minds whenever anything 'strange' happens.
>
> Would it not be possible to revert to the previous, slightly less
> optimal, library until (presumably) the library is fixed?
>
> David Bailey
> http://www.dbaileyconsultancy.co.uk
My understranding is that the library vendors do not regard it as broken,
so I am not optimistic that the library itself will change. We have made
some effort to align machine floats to 16 bits so this particular behavior
should stabilize in a future release.
I am not really competent to discuss the issues involved in going to a
different library. Maybe others can address that. I can only voice my
suspicion that it would be difficult, and offer dubious gain.
Daniel Lichtblau
Wolfram Research
>
> My understranding is that the library vendors do not regard it as broken,
> so I am not optimistic that the library itself will change. We have made
> some effort to align machine floats to 16 bits so this particular behavior
> should stabilize in a future release.
>
> I am not really competent to discuss the issues involved in going to a
> different library. Maybe others can address that. I can only voice my
> suspicion that it would be difficult, and offer dubious gain.
>
>
Surely aligning floats on 16-bytes would only be a partial fix because
if an array A were packed and 16-bytes aligned, A[2;;] would only be
8-bytes aligned!
It seems to me that it was a wretched decision of the C compiler writers
to plant code that delivers different results - however insignificant -
depending on data alignment! I wonder if anyone has reported to them
that this difference exists!
I was not suggesting using a different library - only a previous version
- such as that used inside 6.0.3, which was presumably compiled with an
earlier compiler.
The mere fact that several people have discovered this problem, suggests
to me that it will be an irritant.
David Bailey
http://www.dbaileyconsultancy.co.uk
That's true, or at least I should hope it is. But it will not matter in
terms of getting reproducible results, because it will always go through
the code that handles the non-16-byte-aligned case. The bad situation is
when we create packed tensors on-the-fly inside Dot (as we repeat the
same operation in a loop, say), and sometimes they are 16-byte-aligned
and other times not. We made a change to have them always 16 byte
aligned, so as to preclude this from happening in a future release.
> It seems to me that it was a wretched decision of the C compiler writers
> to plant code that delivers different results - however insignificant -
> depending on data alignment! I wonder if anyone has reported to them
> that this difference exists!
Not that it matters too much, but I don't think this is the compiler
writers' doing. It's in the BLAS library. I'm fairly certain they are
aware of the discrepancy.
> I was not suggesting using a different library - only a previous version
> - such as that used inside 6.0.3, which was presumably compiled with an
> earlier compiler.
I would surmise there might be speed losses associated with this. But
again, I'm just guessing. As for the software engineering aspect, it's
probably not so hard since it's the same library. This assumes the API
does not change in small ways between releases. I am not sure that this
is a safe assumption.
> The mere fact that several people have discovered this problem, suggests
> to me that it will be an irritant.
>
> David Bailey
> http://www.dbaileyconsultancy.co.uk
I imagine it is. But more at the level of a puzzlement rather than an
application killer.
Daniel Lichtblau
Wolfram Research