Fcomp X86

0 views

Skip to first unread message

Lynne Pruskowski

unread,

Aug 3, 2024, 4:59:12 PM8/3/24

to sunmascrutmi

Note that the 16bit status word is stored to ax, but the test instruction is only looking at the upper 8 bits (ah). So 41h matches the C0 and C3 bits of the status word. Using an 8-bit test on ah avoids a slowdown on Intel CPUs from using 16-bit test with an imm16. (operand-size prefix changes the length of the rest of the instruction, the so called length-changing prefix decoder stall).

fld / fcomp / fld / fnstsw: That looks weird to me, too. I wondered if the goal was to have things like the denormal bit in the status word set based on the memory location, but C0 and so on set based on the fcomp. (This isn't what you get, because fld leaves the C0-3 undefined or set.)

Intel's insn reference manual says fld leaves C0 and C3 undefined, so the compiler that generated this code was depending on some specific behaviour. Maybe without an fwait, the status word wouldn't be updated from the fcomp yet? I haven't grokked the whole fwait thing. I haven't found (or looked hard for) an explanation of when you needed fwait on old CPUs, and when you didn't. As I understand it, you never do on P5 or later.

The 80x87 provides several instructions for comparing real values. The fcom, fcomp, fcompp, fucom, fucomp, and fucompp instructions compare the two values on the top of stack and set the condition codes appropriately. The ftst instruction compares the value on the top of stack with zero. The fxam instrution checks the value on tos and reports sign, normalization, and tag information.

Generally, most programs test the condition code bits immediately after a comparison. Unfortunately, there are no conditional jump instructions that branch based on the FPU condition codes. Instead, you can use the fstsw instruction to copy the floating point status register (see "The FPU Status Register" on page 785) into the ax register; then you can use the sahf instruction to copy the ah register into the 80x86's condition code bits. After doing this, you can can use the conditional jump instructions to test some condition. This technique copies C0 into the carry flag, C2 into the parity flag, and C3 into the zero flag. The sahf instruction does not copy C1 into any of the 80x86's flag bits.

Since the sahf instruction does not copy any 80x87 processor status bits into the sign or overflow flags, you cannot use the jg, jl, jge, or jle instructions. Instead, use the ja, jae, jb, jbe, je, and jz instructions when testing the results of a floating point comparison. Yes, these conditional jumps normally test unsigned values and floating point numbers are signed values. However, use the unsigned conditional branches anyway; the fstsw and sahf instructions set the 80x86 flags register to use the unsigned jumps.

The fcom, fcomp, and fcompp instructions compare st(0) to the specified operand and set the corresponding 80x87 condition code bits based on the result of the comparison. The legal forms for these instructions are

With no operands, fcom, fcomp, and fcompp compare st(0) against st(1) and set the processor flags accordingly. In addition, fcomp pops st(0) off the stack and fcompp pops both st(0) and st(1) off the stack.

With a single register operand, fcom and fcomp compare st(0) against the specified register. Fcomp also pops st(0) after the comparison.

With a 32 or 64 bit memory operand, the fcom and fcomp instructions convert the memory variable to an 80 bit extended precision value and then compare st(0) against this value, setting the condition code bits accordingly. Fcomp also pops st(0) after the comparison.

These instructions set C2 (which winds up in the parity flag) if the two operands are not comparable (e.g., NaN). If it is possible for an illegal floating point value to wind up in a comparison, you should check the parity flag for an error before checking the desired condition.

These instructions set the stack fault bit if there aren't two items on the top of the register stack. They set the denormalized exception bit if either or both operands are denormalized. They set the invalid operation flag if either or both operands are quite NaNs. These instructions always clear the C1 condition code.

The difference between fcom/fcomp/fcompp and fucom/fucomp/fucompp is relatively minor. The fcom/fcomp/fcompp instructions set the invalid operation exception bit if you compare two NaNs. The fucom/fucomp/fucompp instructions do not. In all other cases, these two sets of instructions behave identically.

The ftst instruction compares the value in st(0) against 0.0. It behaves just like the fcom instruction would if st(1) contained 0.0. Note that this instruction does not differentiate -0.0 from +0.0. If the value in st(0) is either of these values, ftst will set C3 to denote equality. If you need to differentiate -0.0 from +0.0, use the fxam instruction. Note that this instruction does not pop st(0) off the stack.

The fxam instruction examines the value in st(0) and reports the results in the condition code bits (see "The FPU Status Register" on page 785 for details on how fxam sets these bits). This instruction does not pop st(0) off the stack.

The 80x87 FPU provides several instructions that let you load commonly used constants onto the FPU's register stack. These instructions set the stack fault, invalid operation, and C1 flags if a stack overflow occurs; they do not otherwise affect the FPU flags. The specific instructions in this category include:

Fptan computes the tangent of st(0) and pushes this value and then it pushes 1.0 onto the stack. Like the fsin and fcos instructions, the value of st(0) is assumed to be in radians and must be in the range -2**63

This instruction expects two values on the top of stack. It pops them and computes the following:

st(0) = tan-1( st(1) / st(0) )

The resulting value is the arctangent of the ratio on the stack expressed in radians. If you have a value you wish to compute the tangent of, use fld1 to create the appropriate ratio and then execute the fpatan instruction.

This instruction affects the stack fault/C1, precision, underflow, denormal, and invalid operation bits if an problem occurs during the computation. It sets the C1 condition code bit if it has to round the result.

Fyl2x is useful for computing logs to bases other than two; fyl2xp1 is useful for computing compound interest, maintaining the maximum precision during computation.

Fyl2x can affect all the exception flags. C1 denotes rounding if there is not other error, stack overflow/underflow if the stack fault bit is set.

The fyl2xp1 instruction does not affect the overflow or zero divide exception flags. These exceptions occur when st(0) is very small or zero. Since fyl2xp1 adds one to st(0) before computing the function, this condition never holds. Fyl2xp1 affects the other flags in a manner identical to fyl2x.

The 80x87 FPU includes several additional instructions which control the FPU, synchronize operations, and let you test or set various status bits. These instructions include finit/fninit, fdisi/fndisi, feni/fneni, fldcw, fstcw/fnstcw, fclex/fnclex, fsave/fnsave, frstor, frstpm, fstsw/fnstsw, fstenv/fnstenv, fldenv, fincstp, fdecstp, fwait, fnop, and ffree. The fdisi/fndisi, feni/fneni, and frstpm are active only on FPUs earlier than the 80387, so we will not consider them here.

Many of these instructions have two forms. The first form is Fxxxx and the second form is FNxxxx. The version without the "N" emits an fwait instruction prior to opcode (which is standard for most coprocessor instructions). The version with the "N" does not emit the fwait opcode ("N" stands for no wait).

The finit instruction intializes the FPU for proper operation. Your applications should execute this instruction before executing any other FPU instructions. This instruction initializes the control register to 37Fh (see "The FPU Control Register" on page 782), the status register to zero (see "The FPU Status Register" on page 785) and the tag word to 0FFFFh. The other registers are unaffected.

The fwait instruction pauses the system until any currently executing FPU instruction completes. This is required because the FPU on the 80486sx and earlier CPU/FPU combinations can execute instructions in parallel with the CPU. Therefore, any FPU instruction which reads or writes memory could suffer from a data hazard if the main CPU accesses that same memory location before the FPU reads or writes that location. The fwait instruction lets you synchronize the operation of the FPU by waiting until the completion of the current FPU instruction. This resolves the data hazard by, effectively, inserting an explict "stall" into the execution stream.

These two instructions load the control register (see "The FPU Control Register" on page 782) from a memory location (fldcw) or store the control word to a 16 bit memory location (fstcw).

When using the fldcw instruction to turn on one of the exceptions, if the corresponding exception flag is set when you enable that exception, the FPU will generate an immediate interrupt before the CPU executes the next instruction. Therefore, you should use the fclex instruction to clear any pending interrupts before changing the FPU exception enable bits.

The fstenv/fnstenv instructions store a 14-byte FPU environment record to the memory operand specified. When operating in real mode (the only mode this text considers), the environment record takes the form:

You must execute the fstenv and fnstenv instructions with the CPU interrupts disabled. Furthermore, you should always ensure that the FPU is not busy before executing this instruction. This is easily accomplished by using the following code:

The fldenv instruction loads the FPU environment from the specified memory operand. Note that this instruction lets you load the the status word. There is no explicit instruction like fldcw to accomplish this.