Significant differences between x86 and arm64

Skip to first unread message

Paul Scholz

Oct 25, 2022, 1:16:00 PM10/25/22

I'm a complete beginner with suave and wanted to start with the tutorials.
I got a Macbook Air with M2 processor which is an arm64 architecture. However, I have two anaconda environments, one with a native arm64 python 3.9 installation and one with an emulated x86 python 3.9 installation. For example openvsp only works with the emulated python.

My problem is that the Regional Jet Optimization Tutorial says: From the default inputs, the terminal (or IDE output) should display an optimum of [0.57213139 0.75471681] which corresponds to a wing area of 58 m^2, and 7.5 km.

When I run the file without changing anything (suave version 2.5.2) the outputs are:

using arm64 python
Optimization terminated successfully    (Exit mode 0)
            Current function value: 0.7187722797774287
            Iterations: 7
            Function evaluations: 53
            Gradient evaluations: 7
[0.91895522 0.95296716]
fuel burn =  [7187.72279777]
fuel margin =  [1.38439654]

using x86 python
Optimization terminated successfully    (Exit mode 0)
            Current function value: [0.71952612]
            Iterations: 3
            Function evaluations: 32
            Gradient evaluations: 3
[0.92523852 0.90766878]
fuel burn =  [7195.26117757]
fuel margin =  [1.47274825]

Not only are both outputs far away from the solution given on the tutorial page, but also they differ significantly from each other.

I hope you can help me with that.



Oct 28, 2022, 9:18:14 AM10/28/22
Tested it again on a Windows 10 machine (intel x86, 64 Bit), again different results.
Optimization terminated successfully    (Exit mode 0)
            Current function value: 0.7163576531747451
            Iterations: 6
            Function evaluations: 48
            Gradient evaluations: 6
[0.96458058 1.00058291]
fuel burn =  [7163.57653175]
fuel margin =  [1.22688188]


Oct 28, 2022, 9:00:03 PM10/28/22
Thanks Paul!

This is super interesting that you get very different results with all 3. We've been seeing this more and more recently. We have automatic regressions that are run on Appveyor and we've been struggling to get them to match. We've narrowed this down to architectural differences and installation differences with numpy/scipy. Although I did check and the Regional Jet Tutorial webpage is a little out of date so ignore the webpage (the scripts are fine).

I don't have an answer or a solution. The basic numerical solvers, which we don't control, are the reason for the differences. We've tried different tolerances and precisions (the VLM is single precision by default) but it still doesn't totally match between systems.

All that being said, at the conceptual level, does it matter that the fuel burn is 7195.26117757 vs 7163.57653175? That represents a 0.4% difference or 10 gallons of jet A. If anyone believes that the results of any design tool (even if maybe you're doing extremely high fidelity super computer level CFD) is within that margin they are fooling themselves.

So I would also be interested in a solution, just for consistency sake. However, I wouldn't disregard the results just because one computer gives a different value. I especially wouldn't favor a Windows result vs a Linux result or vice-versa.


Oct 28, 2022, 10:13:36 PM10/28/22
Generically speaking, these kinds of differences can have several origins that make them frustrating if not impossible to track down and eliminate.

CPU manufacturer, compiler, and compiler settings (debug vs. release as well as others) can all make a difference.

For example, when doing floating point calculations with 64-bit doubles, Intel actually uses an 80-bit representation internally.  It allows them to truncate results which is faster than formal IEEE 754 rounding.  Other processor architectures do not do this.

I would certainly expect differences from x86 and arm64.

Some compilers have flags to force formal IEEE 754 compliance (some 'robust' geometry algorithms will fail without it).  Microsoft provides options for /fp:fast, precise, and strict.  An optimizing compiler can introduce a multiply-add statement that combines the two operations without any intermediate rounding.  That instruction will not be used in debug mode -- or perhaps with a different compiler that misses that particular optimization opportunity.

I believe Java includes strict IEEE 754 compliance as a part of the language spec -- but there is a performance penalty in raw number crunching because of it.  (I am not saying Java is slow, or other languages are fast.)  It is nice in that it guarantees consistency.

In my experience, these kinds of problems are much more prevalent with float instead of double.  Code-Eli (the curve-surface library behind OpenVSP's Bezier math) is all templated C++ code.  It can be easily run with any data type.  The unit tests are set up for float, double, and long-double.  A long time ago, we even tested a quad-double library (it had lots of other problems).  The unit tests there are a nightmare of numeric comparisons -- trying to work around all the different results we get based on data type, CPU, compiler, etc....

Single precision floats are a nightmare.  Yes they take up less memory and are theoretically faster -- but their error is _huge_ compared to a double (and compared to relevant quantities).  I can't tell you how often I've seen algorithms fail with float -- not just subtle differences, but failure to converge, or failure to calculate something to anything resembling the correct result.

One of the hardest things in OpenVSP (in terms of numerical precision) is the fact that we're dimensionless -- the user chooses the scale of the model.  This means that some people choose to model their aircraft in mm -- the span of a 747 is a very big number in mm.  Other people model their aircraft in m -- the span of a small drone is a very small number in m.  This leads to needing to be able to work with numbers for the 'same quantity' that vary by six orders of magnitude in practice.  Now you need to write (for example) a Newton's method solver that will project a 3D point onto a Bezier surface.  You're going to be calculating derivatives dXYZ/dUV that will vary by six orders of magnitude.  Doing this with floats would be an absolute disaster.

Scaling / shifting the problem before serious computation is an important step (that OpenVSP does not do enough).  We also don't have a good enough unit testing framework -- you should test the 'same' problems at a huge range of scales.  Instead, the developers usually model medium sized aircraft and we usually use feet -- so a certain range of scales gets tested much more frequently than what others would choose.

Shifting can be as important as scaling -- working at a point near the origin will provide small numbers by definition -- while working near the wingtip will provide much larger numbers.

Watch out for quantities that vary greatly in scale. Integrating a trajectory might work great for many projects -- until you decide to work on a solar powered aircraft that flies for weeks or months (machine epsilon for a 32-bit float measuring one month in seconds is 0.15 seconds. For an hour, it is 0.00021 sec).  What is your timestep?  What is your tolerance?

If someone is going to write a serious numerical algorithm with single precision -- they should make sure it is trivial to swap in double precision through a preprocessor directive #define, templates / generic programming, or whatever it takes in their language.  This provides two advantages....

1) They can unit test and regression test float vs. double.  This will help convince them that floats are OK for their application.

2) They can measure the memory and performance 'gains' of using floats -- is all this hassle worth it?

Developing a reliable numeric algorithm using floats is going to take a lot more diligence and attention to esoteric details than doing the same thing with doubles.  Those skills aren't practiced and taught the way they were 30 years ago.  I suppose a code that has had all of these problems chased out of it with float will be an even better program with double -- by that logic, perhaps we should write and debug our programs with half-precision numbers so these issues become easier to find and life becomes more frustrating...



Oct 29, 2022, 5:34:39 AM10/29/22
Thank you both for the detailed explanations!
Reply all
Reply to author
0 new messages