FPU (x87) code debugging.

39 views
Skip to first unread message

R.Wieser

unread,
Aug 6, 2021, 1:19:29 PM8/6/21
to
Moderator, Frank : I'm not sure if questions about the x87 FPU are permitted
here. If not than please just discard. If they are than please remove this
line. :-)

Hello all,

I've just been writing some basic code to parse a simple float, and realized
that I had no idea how to check if the x87 FPU was empty after I was done -
as a simple measure to check if my code cleaned up correctly.

I've been looking at using the ST bits in the FPU status word, but had to
find that they (unexpectedly) didn't end at zero after I done my thing :

minimal example:

fld1 ;Load
fld1

fstp st(2) ;Swap ST(0) and ST(1) <-- this is the culprit

fstp st(0) ;Discard
fstp st(0)

At this point all the ST bits are set, indicating a minus one, not zero.

My questions at this point are:

1) Have I done anything wrong in the above ? I don't think so, but "you
never know" ....

2) How do I, for debugging purposes, check the FPU stack ?

Regards,
Rudy Wieser


Frank Kotler

unread,
Aug 6, 2021, 2:10:36 PM8/6/21
to
On 08/06/2021 01:11 PM, R.Wieser wrote:
> Moderator, Frank : I'm not sure if questions about the x87 FPU are permitted
> here. If not than please just discard. If they are than please remove this
> line. :-)

Hi Rudy,
Consider the line removed.
I think x87 is on topic. If necessary, I so rule it. :)
I don't know the answer, though...
Best.
Frank

R.Wieser

unread,
Aug 6, 2021, 2:55:45 PM8/6/21
to
Frank,

>> Moderator, Frank : I'm not sure if questions about the x87 FPU are
>> permitted
>> here. If not than please just discard. If they are than please remove
>> this
>> line. :-)
>
> Hi Rudy,
> Consider the line removed.

To be honest, I had forgotten all about you whitelisting (pardon me if that
isn't PC) people and assumed you would see the message before it would go
into the newsgroup. But hey, now I know I'm on your whitelist too. :-)

> I think x87 is on topic. If necessary, I so rule it. :)

I was't quite sure, as most all here is 16 bit assembly. And thats is from
a time when x87 FPUs were add-on chips. But thanks.

> I don't know the answer, though...

No problem. Hopefully someone else here has an idea.

Regards,
Rudy Wieser


DJ Delorie

unread,
Aug 6, 2021, 9:26:11 PM8/6/21
to
"R.Wieser" <add...@nospicedham.not.available> writes:
> 2) How do I, for debugging purposes, check the FPU stack ?

If your debugger doesn't support it, you can at least use FSAVE/FRESTOR
to fill in a chunk of data which you can then inspect.

R.Wieser

unread,
Aug 7, 2021, 3:56:42 AM8/7/21
to
DJ,

> If your debugger doesn't support it

No debugger here (never liked them).

> you can at least use FSAVE/FRESTOR to fill in a chunk
> of data which you can then inspect.

Thanks. That one does give quite a bit of information.

It does have a drawback though: it re-initializes the FPU stack, meaning it
cannot be used while in the middle of a calculation. Any idea to some
non-destructive probing ?

Regards,
Rudy Wieser


wolfgang kern

unread,
Aug 7, 2021, 4:41:47 AM8/7/21
to
On 06.08.2021 19:11, R.Wieser wrote:

> 2) How do I, for debugging purposes, check the FPU stack ?

not every debug tool supports FPU. I had to write my own debugger
anyway and it uses FXSAVE to show registers and all status bits.

but how did you check 1) FSTCW ? FXAM/r ? FSTENV ? FSTSW AX ?
too many consecutive fstp will cause stack errors.

The FNSTCW instruction does not check for possible floating-point
exceptions before copying the image of the x87 status register.

FCLEX or FXSAVE followed by FINIT work fine for me to clean up.
and FFREE/r is my way to empty a specific register.

I actually hate this stupid stack-up/dn design, an overall ST(n)
would work just fine with much lesser doubtful quirks.
meanwhile we got SSE/AVX and AMD may remove FPU from chip soon.
__
wolfgang

wolfgang kern

unread,
Aug 7, 2021, 4:41:48 AM8/7/21
to
FXSAVE
__
wolfgang

R.Wieser

unread,
Aug 7, 2021, 6:26:55 AM8/7/21
to
Wolfgang,

> but how did you check 1)

I read the Status Word, using FNSTSW. From there I isolated the ST bits.

Thanks for mentioning FXAM. Something I already thought of being handy to
have, but didn't now the name of. :-)

> FCLEX or FXSAVE followed by FINIT work fine for me to clean up.
> and FFREE/r is my way to empty a specific register.

I already found (and used, just before the code I posted) FNINIT. But that
just drops all "left over" variables and error flags. Not something I want
to finish a calculation with ...

As for FFREE ? I'm not sure I understand its worth - other than to perhaps
delete the bottom-of-stack variable (and even than), as in all other cases
it would create a "hole" on the stack, which I than still would have to
recon with. :-|

> I actually hate this stupid stack-up/dn design, an overall ST(n)
> would work just fine with much lesser doubtful quirks.

:-) Agreed. But as I have to work with what the 'puter offers me I have
no other choice than to deal with it.

[in regard to FSAVE]

>> It does have a drawback though: it re-initializes the FPU stack, meaning
>> it cannot be used while in the middle of a calculation. Any idea to some
>> non-destructive probing ?
>
> FXSAVE

Thanks again.


Blimy! I just realized (did some "thats quaint, what happens if I do
{this}" probing) that the "ST(x)" argument is relative to the "Stack top"
(status word, ST bits). In hindsight that makes sense, but wasn't expected.
It does make the "Stack Top" value useless for a quick "is it empty" test
though.

Regards,
Rudy Wieser


Robert

unread,
Aug 7, 2021, 10:27:10 AM8/7/21
to
R.Wieser <add...@nospicedham.not.available> wrote in part:
> Moderator, Frank : I'm not sure if questions about the x87
> FPU are permitted here. If not than please just discard.
> If they are than please remove this line. :-)
>
> Hello all,
>
> I've just been writing some basic code to parse a simple
> float, and realized that I had no idea how to check if the
> x87 FPU was empty after I was done - as a simple measure
> to check if my code cleaned up correctly.

You will need FSAVE/FRSTOR (and varients) if you use
the x87. Your first FLD will clobber the stack top,
which might be OK only if it is empty.

>
> I've been looking at using the ST bits in the FPU status word, but had to
> find that they (unexpectedly) didn't end at zero after I done my thing :
>
> minimal example:
>
> fld1 ;Load
> fld1
>
> fstp st(2) ;Swap ST(0) and ST(1) <-- this is the culprit
>
> fstp st(0) ;Discard
> fstp st(0)
>
> At this point all the ST bits are set, indicating a minus one, not zero.

As another poster has said, I don't think the x87 automagically
sets value flags (as x86 does( and needs FXAM. FSTSW=FF sounds
like an empty x87.

>
> My questions at this point are:
>
> 1) Have I done anything wrong in the above ? I don't think
> so, but "you never know" ....
>
> 2) How do I, for debugging purposes, check the FPU stack ?

Dump and examine in main memory. Like the Hewlett-Packard
Reverse Polish Notation calculators it was modelled on,
the x87 is meant for crunching together, not picking apart.

-- Robert

R.Wieser

unread,
Aug 7, 2021, 11:57:17 AM8/7/21
to
Robert,

> You will need FSAVE/FRSTOR (and varients)

Wolfgang gave that suggestion too. Alas, the F(N)SAVE resets the FPU stack,
and for some reason I can't get the FXSAVE to work (my assembler shows its
age by not knowing the opcode, and trying to use a "db 0Fh,0AEH, ....."
sequence crashes the program).

> Your first FLD will clobber the stack top,

I don't get that - why only the first one, and why would it clobber (the
value at) the stack top ?

> As another poster has said, I don't think the x87 automagically
> sets value flags

I don't quite get this either. Value flags ? I'm reading the "Status
Word" and in it look at the ST bits (at 11-13).

Remark : I later found out/realized that the "Stack Top" is just the
starting offset for the ST(x) arguments. IOW : whats in it isn't really
relevant.

> Dump and examine in main memory.

:-) The problem was that I had no idea that I could or how to do that .

Ofcourse it didn't help that I got confused by (and by it focussed on) the
"Stack Top" value. :-\

Regards,
Rudy Wieser


Robert

unread,
Aug 7, 2021, 11:28:19 PM8/7/21
to
R.Wieser <add...@nospicedham.not.available> wrote in part:
> Robert,
>> You will need FSAVE/FRSTOR (and varients)
> Wolfgang gave that suggestion too. Alas, the F(N)SAVE resets
> the FPU stack, and for some reason I can't get the FXSAVE to work
> (my assembler shows its age by not knowing the opcode, and trying
> to use a "db 0Fh,0AEH, ....." sequence crashes the program).

It might have some safeguards against executing data :)

>> Your first FLD will clobber the stack top,
>
> I don't get that - why only the first one, and why would
> it clobber (the value at) the stack top ?

The stack is eight FP registers, any load pushes the one on
the top into the bit bucket. Actually, I believe the registers
are a circular file, and the load overwrites and decrements TOS.

>> As another poster has said, I don't think the x87 automagically
>> sets value flags
>
> I don't quite get this either. Value flags ? I'm reading the
> "Status Word" and in it look at the ST bits (at 11-13).

Aren't those three bits (0-7) the Top-of-Stack pointer?
People sometimes compare the FPSW with the x86 flags register.
It is not.

> Remark : I later found out/realized that the "Stack Top"
> is just the starting offset for the ST(x) arguments. IOW :
> whats in it isn't really relevant.

Exactly.

>> Dump and examine in main memory.
>
> :-) The problem was that I had no idea that I could or how to do that .

Well, debugging always requires more space. x86 assumes
sufficient stack space (or switches to priviliged memory).

34 years ago I wrote an extention to MS-DOS DEBUG.COM to
examine the x87. Converting binaryFP to decimal FP was hard.

> Ofcourse it didn't help that I got confused by (and by it focussed on)
> the "Stack Top" value. :-\

Well, quite forgivable. The x87 is focussed on the stack.

-- Robert

R.Wieser

unread,
Aug 8, 2021, 4:13:45 AM8/8/21
to
Robert,

> It might have some safeguards against executing data :)

I've used the "trick" before, so I don't think so. Currently I'm torn
between the posibilities that the processor I'm using might not be having
that command, that I'm simply bungling up or that there is some kind of
memory alignment involved (the latter one would not be the first time I've
run into it).

Is there any possibility you could take a look at and post what code gets
generated for an "FXSAVE {register pointer}" ?

>> I don't get that - why only the first one, and why would
>> it clobber (the value at) the stack top ?
>
> The stack is eight FP registers, any load pushes the one
> on the top into the bit bucket.

True. But such a push would only clobber anything if the (circular) stack
is completely full.

> Actually, I believe the registers are a circular file,

It has to be, as my example code works : after the second FLD1 the TOS is 6.
But I can still execute a FSTP ST(2) ,which seemingly points at 6+2 = 8.

> and the load overwrites and decrements TOS.

The info to, for instance, FLD mentions decrementing first, than store
(which is why I didn't understand your "clobbering" remark).

> Aren't those three bits (0-7) the Top-of-Stack pointer?

Yep. I was assuming that that value would (implicitily) tell me how many
values where placed on the stack. Turns out it doesn't. :-\

> People sometimes compare the FPSW with the x86 flags register.
> It is not.

Similar perhaps (both contain status flags), but (ofcourse) not the same.

> 34 years ago I wrote an extention to MS-DOS DEBUG.COM
> to examine the x87.

I'm not sure what you mean with an 'extension' (wasn't aware that Debug
supported such a thing), but years ago I wrote something for it (using
memory patching) so it could deal with a few more opcodes.

> Converting binaryFP to decimal FP was hard.

Thats something I still have to take a look at. Just not at this moment.
:-)

Regards,
Rudy Wieser


Robert Prins

unread,
Aug 8, 2021, 5:28:56 AM8/8/21
to
You still haven't told us what OS (DOS, Windoze, Linux) or CPU (32/64 bit)
you're running this code on....

David Lindauer's GRDB (DOS) can show the contents of FPU registers, and as you
are/were a Pascal user, so can, I think Delphi. Virtual Pascal can definitely do
it, I use the (sadly) wrapping code below:

{************** Copyright (C) Robert AH Prins 2018-2018 ****************
* *
* This program is free software; you can redistribute it and/or modify *
* it under the terms of the GNU General Public License as published by *
* the Free Software Foundation; either version 3, or (at your option) *
* any later version. *
* *
* This program is distributed in the hope that it will be useful, *
* but WITHOUT ANY WARRANTY; without even the implied warranty of *
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the *
* GNU General Public License for more details. *
* *
* You should have received a copy of the GNU General Public License *
* along with this program; if not, write to the Free Software *
* Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110, USA *
************************************************************************
+------------+---------------------------------------------------------+
| Date | Major changes |
+------------+---------------------------------------------------------+
| | |
+------------+---------------------------------------------------------+
| 2018-09-30 | Add x_int3 to selectively enable debug code |
+------------+---------------------------------------------------------+
| 2018-08-31 | Initial version |
+------------+---------------------------------------------------------+
************************************************************************
* DEBUG.PAS *
* *
* This unit contains some code that enables viewing of extended (XMM & *
* YMM) registers in various formats. *
***********************************************************************}
unit debug;

{============================} interface {=============================}
const x_int3: boolean = false;

type
r_fpu = record { 16}
st : extended;
zz : array [0..5] of byte;
end;

r_mmx = record { 16}
case integer of
1: (_by: array [0..7] of byte;
z1 : array [0..7] of byte);
2: (_in: array [0..3] of shortint;
z2 : array [0..7] of byte);
3: (_lo: array [0..1] of longint;
z3 : array [0..7] of byte);
4: (_si: array [0..1] of single;
z4 : array [0..7] of byte);
5: (_do: array [0..0] of double;
z5 : array [0..7] of byte);
6: (_ch: array [0..7] of char;
z0 : array [0..7] of byte);
end;

r_xmm = record { 16}
case integer of
1: (_by: array [0..15] of byte);
2: (_in: array [0.. 7] of shortint);
3: (_lo: array [0.. 3] of longint);
4: (_si: array [0.. 3] of single);
5: (_do: array [0.. 1] of double);
6: (_ch: array [0..15] of char);
end;

xsave_hdr = array [0..63] of byte; { 64}

fpu = array [0..7] of r_fpu; { 128}
mmx = array [0..7] of r_mmx; { 128}
xmm = array [0..7] of r_xmm; { 128}

xsptr = ^a_xs;
a_xs = record
case integer of
1: (legacy : array [0..159] of char; { 160} // raw
legacy data
xmm_32 : xmm; { 128} //
XMM0-7 (low part of YMM0-7)
xmm_64 : xmm; { 128} //
XMM8-15 (low part of YMM8-15) (AMD64)
xsave_hdr: xsave_hdr; { 64} //
Storage bitmap for additional data
ymm_32 : xmm; { 128} //
YMM0-7 (high part, low in XMM0-XMM7)
ymm_64 : xmm); { 128} //
YMM8-15 (high part, low in XMM8-XMM15) (AMD64)

2: (fcw : smallword; { 2} // x87
FPU control word
fsw : smallword; { 2} // x87
status word
ftw : byte; { 1} // x87
res_1 : byte; { 1}
fop : smallword; { 2} // x87
last opcode
fip : longint; { 4} // x87 EIP
fcs : smallword; { 2} // x87 CS:
res_1_x64: smallword; { 2} // +
previous: RIP (AMD64)
fdp : longint; { 4} // x87
data pointer
fds : smallword; { 2} // x87 DS:
res_2_x64: smallword; { 2} // +
previous: DIP (AMD64)
mxcsr : longint; { 4} // SSE
control word
mxcsr_msk: longint; { 4}

case integer of
3: (fpu: fpu); { 128} // x87
FPU registers
4: (mmx: mmx)); { 128} // x86
MMX registers

3: (raw : array [0..1023] of byte); { 1024} //
just raw data
end;

procedure xsave;

{==========================} implementation {==========================}

{***********************************************************************
* XSAVE: *
* *
* Save the entire processor state for debugging purposes *
***********************************************************************}
procedure xsave; assembler; {&uses none} {&frame+}
var xs: array [0..2047] of char;
var xp: xsptr;

asm
//a-in xsave
cmp x_int3, true
jne @99

pushad

//------------------------------------------------------------------
// clear out save area
//------------------------------------------------------------------
lea edi, xs
xor eax, eax
mov ecx, type xs / 4
rep stosd

//------------------------------------------------------------------
// save area must be aligned on 64-byte boundary
//------------------------------------------------------------------
lea edi, xs
add edi, 63
and edi, -64
mov xp, edi

//------------------------------------------------------------------
// save everything that can be saved
//------------------------------------------------------------------
or eax, -1
or edx, -1
{ xsave [edi] } db $0f,$ae,$27

//------------------------------------------------------------------
// display data in "Watches" window
// - xp^ : all
// - xp^.fpu : all FPU registers as extended
// - xmm_32[0]._lo: contents of XMM0 as 4 longints
// - etc...
//------------------------------------------------------------------
int 3

popad

@99:
//a-out
end; {xsave}

end.

Robert
--
Robert AH Prins
robert(a)prino(d)org
The hitchhiking grandfather - https://prino.neocities.org/indez.html
Some REXX code for use on z/OS - https://prino.neocities.org/zOS/zOS-Tools.html

wolfgang kern

unread,
Aug 8, 2021, 6:14:02 AM8/8/21
to
On 07.08.2021 17:51, R.Wieser wrote:
...
> and for some reason I can't get the FXSAVE to work (my assembler shows its
> age by not knowing the opcode, and trying to use a "db 0Fh,0AEH, ....."
> sequence crashes the program).

on older CPUs 0F AE xx will raise exception 6 [illegal opcode] if:

1) bit 5 of xx is 1 (xx 20..3F, 60..7F, A0..BF)
newer CPU may show a few valid instructions (see sandpile.org)

2) mod=3 aka register operand (C0..FF) [memory only!]

3) may raise EXC_6 if not supported
0F AE 90..97 98..9f mean STMXCSR LDMXCSR [support specific]

so I'd recommend either
0F AE 06 00 xx FXSAVE [xx00h] (needs 512 byte DS: buffer !)
or shorter
0F AE 00 FXSAVE [bx+si] (ditto)
or HLL styled :)
0F AE 46 00 FXSAVE [bp+0] (needs 512 byte on SS: stack)
__
wolfgang

R.Wieser

unread,
Aug 8, 2021, 6:59:08 AM8/8/21
to
Robert,

> You still haven't told us what OS (DOS, Windoze, Linux) or CPU (32/64 bit)
> you're running this code on....

My apologies, I did not think that it would matter (still don't, but ...).

The OS is Windows, XP pro sp3, 32 bit. The used environment is Borlands
Tasm v5.2 (Assembler).

> David Lindauer's GRDB (DOS) can show the contents of FPU registers

The idea is that I would be able to write such FPU debugging code myself.
Somehow I like it that way. :-)

> // save area must be aligned on 64-byte boundary
...
> { xsave [edi] } db $0f,$ae,$27

Both where what I was looking for. Thanks.

Alas, I still can't get it to work :

lea edi,[@@Foo] ;size is 2000h. Plenty of space.
add edi,003Fh ;[1]
and edi,not 003Fh
or eax,-1 ;Not mentioned in my docs, but ...
or edx,-1
db 0Fh,0AEh,27h ;xsave [edi]

It still "crashes" ("{program.exe}has encountered a problem and needs to
close. We are sorry for the inconvenience.")

[1] My "The IA-32 Intel Architecture Software Developer's Manual, Volume 2"
mentions an alignment of 16.

Any ideas ?

Regards,
Rudy Wieser


R.Wieser

unread,
Aug 8, 2021, 7:44:13 AM8/8/21
to
Wolfgang,

> so I'd recommend either
...
> or shorter
> 0F AE 00 FXSAVE [bx+si] (ditto)

For testing purposes I tend to go with the most basic one first, so I took
that one.
Remark: I'm on XP 32 bit, so the registers are EBX and ESI respectivily.

Alas, same problem : crash.

Aligning [EBX+ESI] on a 64 byte boundary (as suggested by robert prins) did
not make a difference.

I'm starting to lean towards the possibility that the command is refused
(does not exist). Is there any way to check it ?

Regards,
Rudy Wieser


wolfgang kern

unread,
Aug 8, 2021, 7:44:15 AM8/8/21
to
you seem to work with 32 bit:

0F AE 07 FXSAVE [edi]

you used 27, so I were confused and had you look at my AMD docs,
it says: FXSAVE mem512env 0F AE /0 this Zero means bits 3..5
and I also checked on sandpile.org.
0F AE /4 means XSAVE (it's for CPU status and not for the FPU)
__
wolfgang

wolfgang kern

unread,
Aug 8, 2021, 8:29:20 AM8/8/21
to
look up CPUID, one of the returned bits tell if present or not.
__
wolfgang

wolfgang kern

unread,
Aug 8, 2021, 8:59:25 AM8/8/21
to
On 08.08.2021 13:35, R.Wieser wrote:

>> 0F AE 00 FXSAVE [bx+si] (ditto)

> For testing purposes I tend to go with the most basic one first, so I took
> that one.
> Remark: I'm on XP 32 bit, so the registers are EBX and ESI respectivily.
>
> Alas, same problem : crash.
>
> Aligning [EBX+ESI] on a 64 byte boundary (as suggested by robert prins) did
> not make a difference.

within 32 bit:
0F AE 00 is FXSAVE [eax] uses DS:
__
wolfgang

R.Wieser

unread,
Aug 8, 2021, 10:14:32 AM8/8/21
to
Wolfgang,

> you seem to work with 32 bit:

I am. Didn't think it would matter much.

> 0F AE 07 FXSAVE [edi]

I just tried that one, and it worked ! (got 288 bytes of data though, not
512) As a result I'm now thoroughly confused in regard to the mod, reg, r/m
encoding. I tried different ones, but only got crashes.

> you used 27, so I were confused and had you look at my AMD docs,

That value was suggested by Robert (in his code). And as I didn't get
anywhere ...


Oh blimy - I don't know how I did it, but I just noticed that I somehow
mixed up the 16 and 32-bit mod/reg/rm encodings. With the MOD and REG both
being zero the by R/M targetted registers are rather different between them.
:-|

Bottom line: I made a stupid mistake, created non-working code and got
myself confused as a result. And as I presumptiously forgot to mention the
basics of what I was busy with (32-bit coding) I did really help you guys
find the cause of it. My apologies for that.

Regards,
Rudy Wieser


R.Wieser

unread,
Aug 8, 2021, 10:44:42 AM8/8/21
to
> I did really help you guys find the cause of it. My apologies for that.

Ehrms ... "I did *not* really help" ofcourse.

Regards,
Rudy Wieser


wolfgang kern

unread,
Aug 8, 2021, 10:44:44 AM8/8/21
to
On 08.08.2021 16:00, R.Wieser wrote:

>> you seem to work with 32 bit:

> I am. Didn't think it would matter much.

>> 0F AE 07 FXSAVE [edi]

> I just tried that one, and it worked ! (got 288 bytes of data though, not
> 512) As a result I'm now thoroughly confused in regard to the mod, reg, r/m
> encoding. I tried different ones, but only got crashes.

IIRC we got 288 bytes with FSAVE long, 512 bytes may be just the
required buffer size.

>> you used 27, so I were confused and had you look at my AMD docs,
> That value was suggested by Robert (in his code). And as I didn't get
> anywhere ...

> Oh blimy - I don't know how I did it, but I just noticed that I somehow
> mixed up the 16 and 32-bit mod/reg/rm encodings. With the MOD and REG both
> being zero the by R/M targetted registers are rather different between them.
> :-|

> Bottom line: I made a stupid mistake, created non-working code and got
> myself confused as a result. And as I presumptiously forgot to mention the
> basics of what I was busy with (32-bit coding) I did really help you guys
> find the cause of it. My apologies for that.

I was once there as well :) experience can't be bought!
just fine that we could help, no need for apology.
__
wolfgang

R.Wieser

unread,
Aug 8, 2021, 12:59:58 PM8/8/21
to
Wolfgang,

> IIRC we got 288 bytes with FSAVE long, 512 bytes may be just the required
> buffer size.

Those 512 bytes do (currently) not seem to be /required/. I initialized the
buffer using a specific byte, and by it could see that nothing from 288 and
up was touched (the "reserved" areas below it however where).

Perhaps that 288-and-up "reserved" area is ment for future generations of
the x87 FPU.

> I was once there as well :) experience can't be bought!

I can only hope that I remember it for quite a while.

> just fine that we could help,

And thanks for that.

> no need for apology.

:-) In that case you may regard it as an explanation of what the problem
actually was. I know that when I try to help someone I often get curious
to it.

Regards,
Rudy Wieser


Robert

unread,
Aug 8, 2021, 7:02:45 PM8/8/21
to
R.Wieser <add...@nospicedham.not.available> wrote in part:
> Robert,
>> It might have some safeguards against executing data :)
>
> I've used the "trick" before, so I don't think so. Currently I'm
> torn between the posibilities that the processor I'm using might
> not be having that command, that I'm simply bungling up or that
> there is some kind of memory alignment involved (the latter one
> would not be the first time I've run into it).

Well, please make sure the pointer is correct (trash easily
gets caught in the upper bits in mixed-mode) and your pgm owns
the memory it points at. Otherwise, segfault.

>> Actually, I believe the registers are a circular file,
> It has to be, as my example code works : after the second FLD1 the TOS is 6.
> But I can still execute a FSTP ST(2) ,which seemingly points at 6+2 = 8.

Ah, but circularity is achieved by masking, 8=0 when masked at 3bits.

>> 34 years ago I wrote an extention to MS-DOS DEBUG.COM
>> to examine the x87.
>
> I'm not sure what you mean with an 'extension' (wasn't aware that
> Debug supported such a thing), but years ago I wrote something for it
> (using memory patching) so it could deal with a few more opcodes.

Very similar. I added code and patched the command jump table
to enter it when commanded.

-- Robert

R.Wieser

unread,
Aug 9, 2021, 3:33:55 AM8/9/21
to
Robert,

> Well, please make sure the pointer is correct

:-) And how do you propose that should be done ? It sounds like a great
idea, but ...

> (trash easily gets caught in the upper bits in mixed-mode)

Somewhere along the line I forgot to mention that I was programming in
32-bit mode (under Win XP). So, no mixed mode and no trash in the upper
bits.

> Ah, but circularity is achieved by masking, 8=0 when masked at 3bits.

Well ... It /can/ be achieved that way, but only under certain conditions
(related to origin and size). :-)

The problem has been located though : I simply used the wrong R/M value
while hand-encoding the FXSAVE command (likely mixing up the 16 bit table
with the 32 bit one). IOW, I was providing the target addres in a certain
register while the command expected it in another register/form.

Regards,
Rudy Wieser


Robert

unread,
Aug 9, 2021, 9:19:47 AM8/9/21
to
R.Wieser <add...@nospicedham.not.available> wrote in part:
> Robert,
>> Well, please make sure the pointer is correct
>
> :-) And how do you propose that should be done ?
> It sounds like a great idea, but ...

Walk before you run, when in trouble, drop back. Before trying
a potentially troublesome instruction like FXSAVE, use MOV.
Even hand-assemble from hex if those facilities are in doubt:

MOV EAX, "pointer" ; to see if you can read loc
MOV "pointer", EAX ; to see if you can write


>> (trash easily gets caught in the upper bits in mixed-mode)
>
> Somewhere along the line I forgot to mention that I was programming in 32-bit
> mode (under Win XP). So, no mixed mode and no trash in the upper bits.

I don't think XP does 64, but the CPU might. The upper-upper could
get trash. ISTR needing to set something to get IN/OUT to work.


>> Ah, but circularity is achieved by masking, 8=0 when masked at 3bits.
>
> Well ... It /can/ be achieved that way, but only under
> certain conditions (related to origin and size). :-)

Zero origin, power-of-two size. Check on both.
Ever wonder why there are so many buffers this way?

> The problem has been located though : I simply used the wrong R/M value
> while hand-encoding the FXSAVE command (likely mixing up the 16 bit table
> with the 32 bit one). IOW, I was providing the target addres in a certain
> register while the command expected it in another register/form.

Debugging with MOV test (hand-assembled) could have caught.

-- Robert


R.Wieser

unread,
Aug 9, 2021, 10:04:56 AM8/9/21
to
Robert

>>> Well, please make sure the pointer is correct
>>
>> :-) And how do you propose that should be done ?
>> It sounds like a great idea, but ...
...
> use MOV.

How would that change anything ? If the target for an FXSAVE is wrong
enough that it causes an exception, how /wouldn't/ that be in the same way
wrong for a MOV ? (lets forget about alignment for a moment)

It would even be making the problem larger, as you would than need to pick a
REG value too - and wonder if it perhaps is having a negative influence on
the result.

FWI, I tried several R/M values, none of which wanted to work. Bad luck I
guess.

In retrospect I should perhaps have tried loading all the common registers
with the same value and tried all R/M values until something worked. On
success it would be a case of determining which register is the source, and
than look back at the instruction set to find a match - and from it figure
out what the/my mistake was.

> Zero origin, power-of-two size. Check on both.
> Ever wonder why there are so many buffers this way?

No, never. Really ... <whistle>

> Debugging with MOV test (hand-assembled) could have caught.

I doubt it. See above.

Regards,
Rudy Wieser


Robert Redelmeier

unread,
Aug 9, 2021, 11:05:18 AM8/9/21
to
R.Wieser <add...@nospicedham.not.available> wrote in part:
> Robert
>>>> Well, please make sure the pointer is correct
>>> :-) And how do you propose that should be done ?
>>> It sounds like a great idea, but ...
>> use MOV.
>
> How would that change anything ? If the target for
> an FXSAVE is wrong enough that it causes an exception,
> how /wouldn't/ that be in the same way wrong for a MOV ?
> (lets forget about alignment for a moment)

It is a purer memory test. I thought there was question
of whether FXSAVE was available or supported on your CPU.
This checks opcode encoding too.

> It would even be making the problem larger, as you would
> than need to pick a REG value too - and wonder if it perhaps
> is having a negative influence on the result.

All GP registers should be available at all times.

> FWI, I tried several R/M values, none of which wanted
> to work. Bad luck I guess.

Encoding should not be a guessing game.
The odds are bad, <1% .

> In retrospect I should perhaps have tried loading all the
> common registers with the same value and tried all R/M
> values until something worked. On success it would be
> a case of determining which register is the source, and
> than look back at the instruction set to find a match -
> and from it figure out what the/my mistake was.

x86 has quirky indirect addressing modes that
are unlikely to yield to trial-and-error.

-- Robert

R.Wieser

unread,
Aug 9, 2021, 12:18:43 PM8/9/21
to
Robert,

> It is a purer memory test.

In what way ? And mind you, I already adressed that.

> I thought there was question of whether FXSAVE was available
> or supported on your CPU.

As I could not get a working FXSAVE encoding I started to doubt.

> This checks opcode encoding too.

No need for that, as those two bytes came from an opcode list. The only
unknown part was the adressing of the target memory.

> All GP registers should be available at all times.

Agreed. But it is an extra factor, and as such interference.

> Encoding should not be a guessing game.

What makes you think I was ? I tried a few different R/M encodings
(while providing different registers), and none of them wanted to work.
Hence my (above) described doubt to if the command was available on my
'puter/processor. (read: I was quite certain I did it "by the book")

But when you /know/ something ought to work and you cannot make it so than a
pragmatic approach will be called for. Which includes throwing everything
and the kitchen sink at it to see if /something/ will work. And from that
try to reason back why it does and where you went wrong with the first
attempts.

> x86 has quirky indirect addressing modes that
> are unlikely to yield to trial-and-error.

True. But I would not be looking for those. Just a simple one that
/does/ function. From that foot-in-the-door the rest often follows.

And that is effectivily what happened when Wolfgang supplied me with a
working encoding for FXSAVE [EDI] : while trying to match the 0x07 to the
mod,reg,r/m tables I had used I realized I had been using the wrong one. It
was as simple as that.

Regards,
Rudy Wieser


Robert Prins

unread,
Aug 10, 2021, 10:05:43 AM8/10/21
to
On 2021-08-09 16:11, R.Wieser wrote:

> And that is effectivily what happened when Wolfgang supplied me with a
> working encoding for FXSAVE [EDI] : while trying to match the 0x07 to the
> mod,reg,r/m tables I had used I realized I had been using the wrong one. It
> was as simple as that.
Use

<https://defuse.ca/online-x86-assembler.htm>

for all your "db" needs. I use it "all the time" to get P5+ opcodes for Virtual
Pascal in-line assembler, I've become a huge fan of using AVX instructions, and
miraculously, most of the data structures I was using in 1985 (TP3), then
16-bit, now 32-bit, are almost perfectly suited for XMM and YMM code, go figure!

R.Wieser

unread,
Aug 10, 2021, 11:05:48 AM8/10/21
to
Robert,

> Use
>
> <https://defuse.ca/online-x86-assembler.htm>
>
> for all your "db" needs.

Thank you very much. It will certainly come in handy. :-)

... and it doesn't even need JS to "do its thing". <thumbs up>

Regards,
Rudy Wieser


Kerr-Mudd, John

unread,
Aug 10, 2021, 4:06:19 PM8/10/21
to
I tried mov ax,bx and got
6689D8

I guess x86 means 32bit nowadays!

--
Bah, and indeed Humbug.

R.Wieser

unread,
Aug 11, 2021, 4:08:03 AM8/11/21
to
John,

> I tried mov ax,bx and got
> 6689D8
>
> I guess x86 means 32bit nowadays!

Not for the people in this newsgroup perhaps, but for the majority of users
out there ? Certainly.

But yes, I noticed that too. A 16-bit option would have been nice to have.

Regards,
Rudy Wieser


Anton Ertl

unread,
Aug 11, 2021, 4:08:10 AM8/11/21
to
"Kerr-Mudd, John" <ad...@nospicedham.127.0.0.1> writes:
>I guess x86 means 32bit nowadays!

That's the problem with "x86": People use it to mean any of several
different ISAs. So better avoid that term, and use:

8086 (rarely called IA-16) when you mean that instruction set.
IA-32 when you mean that instruction set (first implementation: 80386)
AMD64 when you mean that instruction set (first implementation: AMD K8
(Opteron, Athlon 64))

And then there are extensions, like the additional 80186 and 80286
instructions (plus the 80286 offers protected mode), or SSE, SSE2,
AVX, ...

Now what does that mean for the name of this newsgroup.

- anton
--
M. Anton Ertl Some things have to be seen to be believed
an...@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html

wolfgang kern

unread,
Aug 11, 2021, 4:38:19 AM8/11/21
to
On 10.08.2021 21:53, Kerr-Mudd, John wrote:

>>> <https://defuse.ca/online-x86-assembler.htm>

> I tried mov ax,bx and got
> 6689D8
> I guess x86 means 32bit nowadays!

:) of course!
16 bit code will soon just belong to history.

I'm happy to have my own 16/32/64bit disassembler although it already
needs many updates now, but it saves me from internet access.

89 D8 is the STORE variant which should only be used for memory write.

8B c3 would be the correct LOAD opcode. I fight for this since decades
but no one ever listened, so Intel and AMD will never get rid of this
doubles and can't make space for 64 other instructions with 89 mod 3,
like the added valid opcodes for the former illegal 8F08 and 8F10.
__
wolfgang

wolfgang kern

unread,
Aug 11, 2021, 4:53:35 AM8/11/21
to
On 11.08.2021 09:59, Anton Ertl wrote:
> "Kerr-Mudd, John" <ad...@nospicedham.127.0.0.1> writes:
>> I guess x86 means 32bit nowadays!
>
> That's the problem with "x86": People use it to mean any of several
> different ISAs. So better avoid that term, and use:
>
> 8086 (rarely called IA-16) when you mean that instruction set.
> IA-32 when you mean that instruction set (first implementation: 80386)
> AMD64 when you mean that instruction set (first implementation: AMD K8
> (Opteron, Athlon 64))
>
> And then there are extensions, like the additional 80186 and 80286
> instructions (plus the 80286 offers protected mode), or SSE, SSE2,
> AVX, ...
>
> Now what does that mean for the name of this newsgroup.

I wont recommend to split our CLAX into several CPU-related groups.
all Intel/AMD 16 bit instruction sets are different for CPU families.
And almost all regular readers of this group are aware of this anyway.
__
wolfgang

Reply all
Reply to author
Forward
0 new messages