TIA
You can, sort of, sometimes, do this with compiled code!
What you need to verify is that all you important variables stay on the
fp stack at all times, if they do, then you can use a small inline asm
snippet to switch the fpu into 80-bit mode, then run the same compiled
code on it.
I.e. all the normal operations (FADD/FSUB/FMUL/FDIV/FSQRT etc) work in
whatever precision you've set the chip to use. 80-bit is the default
after a reset, but most C compilers will switch to 64-bit mode for
better IEEE compatibility.
However, if you ever need to load/store any 80-bit floats, then you'll
need to use at least inline asm for this, with 10-byte (or 16-byte for
alignment) buffers to hold the values.
Terje
--
- <Terje.M...@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"
Change the PC field in the FPU Control Word to 11(binary)
See http://www.website.masmforum.com/tutorials/fptute/fpuchap1.htm#cword
Here is a software emulation of both 80- and 128-bit floating point
software written in C:
On April 5, 2006, I posted a write-up on the function of
the precision bits in this newsgroup.
It seems that noone has read that post since the same
misinformation is being repeated in two earlier posts in
this thread.
I am therefore submitting a longer version of my write-up
in which I explain the three floating point formats and the
(limited) role of the precision bits. There also were some
minor errors in my earlier posting, which I have corrected.
I have included a summary of programming tips for extended
double precision programming for the O/P.
INTEL FPU FLOATING POINT FORMATS
--------------------------------
The Intel FPU supports operations in single precision, double
precision and extended double precision. The formats are
<= most significant least significant =>
s eeeeeeee mmmmmmmmmmmmmmmmmmmmmmm
DATA TYPE SIGN EXPONENT MANTISSA BIAS
Single Precision 1 8 23 127
Double Precision 1 11 52 1023
Extended Precision 1 15 64 16383
In the single and double precision formats, the leading "one"
bit of the mantissa is implied (i.e. omitted) but in the ten
byte format, it is included, making the precision 63 bits.
The exponent is biased by always making it positive (or zero)
by adding the bias, presumably because this saves a sign bit.
FPU CONTROL WORD
----------------
The FPU control word contains the precision control bits and
the exception control bits.
The lower bits in the FPU CONTROL WORD are
0 invalid operation
1 denormalized operand
2 zerodivide
3 overflow
4 underflow
5 precision
6 RESERVED
7 interrupt enable, 8087 only
8-9 precision control
10-11 rounding control
12 infinity control
13-15 unused
The lowest six bits are the exception control bit which will
control the action of the FPU if an exception occurs.
The exception control bits are said to be "unmasked" when the
bit is set to 0 and "masked" when it is set to 1.
If an exception is masked, then the processor "takes a default
internal action" which means that execution merrily continues...
almost certainly infecting your data with Inf's and Nan's;
if an exception is unmasked, then a software exception handler
is called. I don't know how this works and it may be later,
possibly MUCH later. I would stay away from OS generated
interrupts.
The meaning of the Precision Control Bits is 00 for "single
precision" 10 for "double precision" and 11 for "extended
double precision"; the coding 01 is reserved.
The codings of the Rounding Control Bits are 00 for "round
to nearest or even", 01 for "rounding towards minus infinity"
and 01 for "rounding towards plus infinity".
The FINIT instruction initializes the FPU and sets the
Control Word to 0x037F (0000 0011 0111 1111) which means
uuui rrpp iree eeee
that all exceptions are masked, precision control (pp) is set
to extended double precision and rounding is set to nearest
or even.
FPU STATUS WORD
---------------
The bits in the FPU Status Word reflect the results of FPU
arithmetic operations, much like the flags on the CPU; the
fields are:
0 invalid operation
1 denormalized operand
2 zerodivide
3 overflow
4 underflow
5 precision
6 STACK FAULT
7 interrupt request 8087, otherwise error summary status
8-10,14 condition code bits (a long story)
We are not concerned here with the higher bits which relate
to arithmetic condition codes, much like the flags on the CPU.
Note that bit 6 in the CONTROL WORD is reserved, obviously
because stack faults should never be masked, since they are
programmer errors, not the result of "unlucky data".
Also note that the INEXACT bit in the FPU status word is set
when rounding takes place as with 1./3. but not with 1./2.
One more thing: The FPU exception status codes are "sticky"
and will remain set until someone clears FPU exceptions using
FCLEX or resets the FPU with FINIT.
The condition code bits reflect the outcome of arithmetic
comparison instructions but that is a long story.
PRECISION BITS
--------------
There is a lot of misinformation about the precision bits in
the FPU Control Word.
Unlike persistent claims in some earlier posts, the precision
bits do not affect the standard algebraic operations addition
or subtraction (FADD[P], FSUB[R][P]), and multiplication
(FMUL[P]), which are always carried out in extended precision,
but only floating point division (FDIV[R][P]).
Of the transcendental functions the precision control affects
only the square root FSQRT but none of the other transcendentals
FCOS,FSIN,FSINCOS, FPTAN (tan(y/x)), FPATAN (atan(y/x)),
FYL2X (y*log2(x)) FYL2XP1 (y*log2(x+1) and F2XM1 (2^x-1).
The reason for this is obviously that Intel's internal algorithms
for division and square root use iterative methods (presumably
preceded by some preprocessing where caseing (interval selection)
is carried out), whereas the algorithms for the others do not.
Normally the FPU is initialized with the FPINIT instruction,
which as discussed sets the precision to 64 bits amongst other.
Note that various versions of Windows initialize the FPU to
various states, presumably on bootup or maybe even when calling
a program. Of course, this behaviour varies between various
versions of Windows.
To prevent any interference from the O/S, one should set the
precision to the required, which for most applications would be
extended double precision.
Note that the rounding mode is set to round to nearest by default.
The only exception I know off, Is Jonathan Bayley's DD/QD package,
which supports double-double and quad-double precision in software
using "ordinary multiplication". Here the rounding is controlled
by the user and the FPU is always set to round down.
In summary, the precision bit is largely a relic: Expect no
great savings from setting the precision bit to lower precision
unless you must compute millions of divisions and or square
roots and are really sure that you want the reduced precision.
Intel IEEE Incompatibility
It is not possible to force the FPU the compute in single or double
precision with FADD, FSUB and FMUL. Intel has been attacked over
this issue by manufacturers of other (usually less accurate) micro-
processors since calculations on Intel FPU's will be more accurate
in cases where the compiler maintains intermediate results or
"stack" variables on the FPU stack.
Therefore, some compilers have added an option (solely for this
purpose) whereby every intermediate result is stored to memory in
the requested format and reloaded using the follwoing aberration:
fmul ; Calculate something
fstp qword [eax] ; Force rounding to double precision
fld qword [eax] ; ...by storing and re-loading
SOME NOTES ON EXTENDED PRECISION PROGRAMMING
============================================
Data Lay-Out and Alignment
--------------------------
Small integers (consult the respective formats) and fractions whose
nominator is a small integer and whose denominator is a power of two
are entirely representable in single precision and should be stored
as such
section .data align=16 ; initialized data R/W
; Align extended double precision (10 byte) data at 16 bytes:
align 16
third: db 0xAB,0xAA,0xAA,0xAA,0xAA,0xAA,0xAA,0xAA,0xFD,0x3F
align 16
tenth: db 0xCD,0xCC,0xCC,0xCC,0xCC,0xCC,0xCC,0xCC,0xFB,0x3F
or alternatively:
align 16
third: db 0xAB,0xAA,0xAA,0xAA,0xAA,0xAA,0xAA,0xAA,0xFD,0x3F, 0,0,0,0,0,0
tenth: db 0xCD,0xCC,0xCC,0xCC,0xCC,0xCC,0xCC,0xCC,0xFB,0x3F, 0,0,0,0,0,0
...
align 4
half: db 0x00,0x00,0x00,0x3F ; 1/2 (float = single precision)
ten: db 0x00,0x00,0x20,0x41 ; 10 (float = single precision)
half: dd 0x3F000000 ; 1/2 (single precision)
epsilon: dd 0x25800000 ; 2^(-52) typical tolerance for double
delta: dd 0x1F800000 ; 2^(-64) typical tolerance for extended double
Note that whereas
half: dd 0x3F000000 ; 1/2 (single precision)
half: db 0x00,0x00,0x00,0x3F ; 1/2 (float = single precision)
are both acceptable, NASM does not (yet) allow entering ten byte
data in hexadecimal format, so that
third: db 0x3FFDAAAAAAAAAAAAAAAB
is invalid, and must be coded as
third: db 0xAB,0xAA,0xAA,0xAA,0xAA,0xAA,0xAA,0xAA,0xFD,0x3F
NASM also allows limited computation (assuming its rounding to
be correct)
third: dt 1./3.
Useage:
fld tword third
fld dword half
Alignment using stack data
myfunc_asm:
; The following assumes 16 byte CPU stack alignment
%define n [ebp-0x04] ; long integer
%define u [ebp-0x08] ; address of some (array) variable
%define x [ebp-0x10] ; double
%define y [ebp-0x18] ; double
; leave an 8 byte hole here for alignment
%define xe [ebp-0x20] ; extended double aligned 16
%define ye [ebp-0x30] ; extended double aligned 16
%assign stackframe 0x30;
push ebp
mov esp, ebp ; save original base pointer
sub esp, stackframe ; Make room for local variables
I am not sure whether it is necessary to align 10 byte data on
16 byte boundaries since I have been able to run code with
extended precision data aligned at 8 bytes, but it is probably
advisable. Maybe somebody else in this newsgroup has something
to say about this.
Loading and Storing Ten Byte Data
---------------------------------
The FPU can load and store floating point data in all three
formats using the FLD and FSTP instructions.
However, there are some discrepencies between operation on
single and double precision numbers on the one hand and ten byte
numbers on the other.
Ten byte floating point numbers can only be stored using FSTP,
which also pops the stack, whether desired or not, and there is
no FST instruction for ten byte data:
fld dword [eax]
fld qword [eax]
fld tword [eax]
fst dword [eax] ; 4 byte single precision
fst qword [eax] ; 8 byte double precision
fstp tword [eax] ; 10 byte extended double precision
Obviously, if a copy of the number is to remain on the stack,
AND a stack register is available, the stack can be duplicated
first:
fld st0 ; make a copy
fstp tword [eax] ; store and pop the stack
It is likewise not possible to operate on the stack top using a
floating point number stored at a memory location if that number
is in extended precision format.
So, whereas it is legal (and saves a stack register) to program
fmul dword half
instead of
fld dword half
fmul
it is not legal to code
fmul tword third
and one must use
fld tword third
fmul
Finally note that loading and storing ten byte numbers probably
requires two clock cycles instead of one.
Passing Ten-Byte Data to a function
-----------------------------------
Unless you have a compiler which supports the extended precision
format, you cannot pass ten byte numbers, except by hiding their
respresentation, by rendering them into 10 byte structures,
presumably padded to 16 bytes.
Compilers which support extended precision are Intel and Gnu
C/C++, and they naturally using the C long double declaration.
Watcom does not support ten byte numbers and Microsoft dropped
the format long ago with VC5. For compliance with the C and C++
standards, compilers must recognize the long double declaration,
which here is treated as double. This is a sorry state of affairs.
Allocating Memory
-----------------
You can call malloc from inside you ASM code but beware since
some implementations such as my VC6 return only an 8 byte aligned
pointer, therefore
extern _malloc
extern _free
...
mov eax, n ; number of array elements
shl eax, 4 ; multiply by 16 (10 byte padded to 16 bytes)
add eax, 8 ; (i) request 8 bytes more for malloc returning
align 8
push eax
call _malloc
mov ub, eax ; save original pointer for use with free
add eax, 8 ; (ii) in case address ends in 8, add 8
and eax, -16 ; (ii) now we align at 16
mov u, eax ; aligned pointer
add esp, 4 ; pop stack
...
mov eax, ub ; Original unaligned u returned by malloc call
push eax
call _free ; Free extended precision u
add esp, 4
Now to the question of the original poster:
-------------------------------------------
To program in extended precision you must load and store data
in ten byte format using FLD TWORD and FSTP TWORD.
You must precompute all hard constants which cannot be
represented in single precision floating point format in
ten byte IEEE format, either in a small stand-alone ASM
program using the Intel FPU or by some other means
(for example some debuggers allow calculations),
and store the constants in your program as outlined above.
It is nice if your debugger allows display of the FPU stack
in hexadecimal.
You should probably align all ten byte data on 16 byte
boundaries by preceding blocks of ten byte data with an
align 16 directive and either pad each number to 16 bytes
using db 0,0,0,0,0,0 or by interleaving the numbers with
align 16 directives.
It is nice when you can keep all data in the eight registers
of the FPU stack but, in more complex cases where this is not
possible and you must use the CPU stack, it is probably best
to align the CPU stack pointer passed to the called function.
Unless the compiler supports stack alignment by the caller,
this is somewhat problematic since in this case any arguments
passed to the function will be at unpredictable offsets
following the alignment, depending on whether the original
address was aligned or not, but there are ways around this.
If you are passing ten byte data arguments to functions using
a compiler which recognizes the extended precision format, you
should probably request 16 byte stack alignment. I know that
GNUC++ allows this and probably Intel does as well.
You should probably always set the FPU precision to extended
by setting both precision control bits since there is almost
nothing to be gained by setting them to lower precision. This
is essential to get extended precision accuracy for FDIV and
FSQRT and irrelevant otherwise.
Use FINIT if you want extended precision and the default
exception setting. If you only want to load the control word,
use FLDCW; if you want to modify the existing control word,
use FSTCW followed by masking in the precision bits, followed
by FLDCW.
In some cases you need the extended precision only within a
function so at the beginning of your function you copy the data
from 8 byte double to 16 byte aligned 10 byte extended double
and vice versa at the end of the function.
You may have to review some system calls from your ASM code.
This is about all there is to extended precision programming.
Jen