Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

indexing sp in 8086

107 views
Skip to first unread message

luserdroog

unread,
May 11, 2018, 7:53:44 PM5/11/18
to
Hi everyone.

I've been casually reading jonesforth and trying to translate some
of the 386 code down to 8086. I've run into some difficulty with
this instruction:

mov 4(%esp),%eax

There doesn't appear to me to be any 8086 instruction that does this.
Unless I'm missing something it needs an SIB byte which was introduced
later.

But that strikes me a very surprising conclusion. Is there a way to
get (SP)+immed8 from some obscure encoding that I'm just not seeing?

Melzzzzz

unread,
May 11, 2018, 8:08:48 PM5/11/18
to
Hm it was really long ago since I used pure 8086.
https://www.tutorialspoint.com/microprocessor/microprocessor_8086_addressing_modes.htm
thank google ;)


--
press any key to continue or any other to quit...

luserdroog

unread,
May 11, 2018, 8:23:51 PM5/11/18
to
FWIW here's what I /do/ have translated. MRM bytes are in octal then
the equivalent hex, cuz the octal's easier to write.

macro next
lodsl ad
jmp *(eax) ff 0350 e8


drop
pop eax 58
next

swap
pop eax 58
pop ebx 5b
push eax 50
push ebx 53
next

dup
mov (esp),eax 8b 0340 e0
push eax 50
next

over
mov 4(esp),eax
push eax 50
next

The goal with all this is to have a forth that runs on my partially-complete
emulator. https://github.com/luser-dr00g/8086

luserdroog

unread,
May 11, 2018, 10:08:58 PM5/11/18
to
On Friday, May 11, 2018 at 7:23:51 PM UTC-5, luserdroog wrote:
> On Friday, May 11, 2018 at 6:53:44 PM UTC-5, luserdroog wrote:
> > Hi everyone.
> >
> > I've been casually reading jonesforth and trying to translate some
> > of the 386 code down to 8086. I've run into some difficulty with
> > this instruction:
> >
> > mov 4(%esp),%eax
> >
> > There doesn't appear to me to be any 8086 instruction that does this.
> > Unless I'm missing something it needs an SIB byte which was introduced
> > later.
> >
> > But that strikes me a very surprising conclusion. Is there a way to
> > get (SP)+immed8 from some obscure encoding that I'm just not seeing?
>
> FWIW here's what I /do/ have translated. MRM bytes are in octal then
> the equivalent hex, cuz the octal's easier to write.

After further reading in https://edge.edx.org/c4x/BITSPilani/EEE231/asset/8086_family_Users_Manual_1_.pdf
it seems to be true. No way to index sp. Your choices are

bx+si bx+di bp+si bp+di si di bp bx

And my translation of the other (sp) instruction was wrong.
MOD=3 is reg to reg. So you can mov to/from any of the
regular registers but you're limited to the above combinations
for the reg/mem field.

So, corrections inline, new syntax hex\oct\oct:

> macro next
> lodsl ad
> jmp *(eax) ff 0350 e8
>
>
> drop
> pop eax 58
> next
>
> swap
> pop eax 58
> pop ebx 5b
> push eax 50
> push ebx 53
> next
>
> dup
> mov (esp),eax 8b 0340 e0 [sb. 8b\364 8b\106\0 ]
> push eax 50
> next
>
> over
> mov 4(esp),eax [ 8b\364 8b\106\2 ]

Alexei A. Frounze

unread,
May 11, 2018, 11:09:04 PM5/11/18
to
If you must preserve bp, then you can do something like this:
mov ax, bp
mov bp, sp
mov bp, [bp+2]
xchg ax, bp

If you don’t:
mov bp, sp
mov ax, [bp+2]

Alex

luserdroog

unread,
May 11, 2018, 11:39:07 PM5/11/18
to
I've settled on using si for now. bp seems tempting for this, but
you always have to have a displacement even if it's zero because
mod=0 r/m=6 does something else. So using si I have

mov si, sp
mov ax, [si] ; smaller encoding

and

mov si, sp
mov ax, [si+2]

For giggles, here's my draft so far. In a form which should
be convenient to plug into my emulator.


$ cat forth.h
#define NEXT HEX3(ad,ff,e8)
#define CHAR(x) *#x
#define CHAR2(x,y) CHAR(x),CHAR(y)
#define CHAR3(x,y,z) CHAR2(x,y),CHAR(z)
#define CHAR4(w,x,y,z) CHAR2(w,x),CHAR2(y,z)
#define CHAR5(w,x,y,z,a) CHAR4(w,x,y,z),CHAR(a)
#define HEX(y) 0x##y
#define HEX2(x,y) HEX(x),HEX(y)
#define HEX3(x,y,z) HEX2(x,y),HEX(z)
#define HEX4(w,x,y,z) HEX2(w,x),HEX2(y,z)
#define HEX5(w,x,y,z,a) HEX4(w,x,y,z),HEX(a)

static inline int
forth(char *mem){
char image[] = {
0, 4, CHAR4(d,r,o,p), HEX(58), NEXT,
0, 4, CHAR4(s,w,a,p), HEX4(58,5b,50,53), NEXT,
0, 3, CHAR3(d,u,p),0, HEX5(8b,ec,8b,5,50), NEXT,
0, 4. CHAR4(o,v,e,r), HEX6(8b,ec,8b,45,2,50), NEXT,
0, 3, CHAR3(r,o,t),0, HEX6(58,5b,59,53,50,51), NEXT,
0, 4, CHAR4(-,r,o,t), HEX6(58,5b,59,50,51,53), NEXT,
0, 5, CHAR5(2,d,r,o,p),0, HEX2(58,59), NEXT,
0, 4, CHAR4(2,d,u,p), HEX4(8b,ec,8b,5),HEX5(8b,5d,2,53,50), NEXT,
0, 5, CHAR5(2,s,w,a,p),0, HEX4(58,5b,59,5a),HEX4(53,50,52,51), NEXT,
0, 4, CHAR4(?,d,u,p), HEX4(8b,ec,8b,5),HEX2(85,c0),HEX3(74,1,58), NEXT,
0, 2, CHAR2(1,+), HEX4(8b,ec,ff,5), NEXT,
0, 2, CHAR2(1,-), HEX4(8b,ec,ff,d), NEXT,
0, 1, CHAR(+),0, HEX5(58,8b,ec,1,5), NEXT,
0, 1, CHAR(-),0, HEX5(58,8b,ec,29,5), NEXT,
0, 1, CHAR(*),0, HEX5(58,5b,f7,eb,50), NEXT,
};
memcpy(mem, image, sizeof image);
}


Need to stitch up the link fields somehow.

Terje Mathisen

unread,
May 12, 2018, 7:54:33 AM5/12/18
to
luserdroog wrote:
> On Friday, May 11, 2018 at 10:09:04 PM UTC-5, Alexei A. Frounze wrote:
>> If you must preserve bp, then you can do something like this:
>> mov ax, bp
>> mov bp, sp
>> mov bp, [bp+2]
>> xchg ax, bp
>>
>> If you don’t:
>> mov bp, sp
>> mov ax, [bp+2]
>>
>> Alex
>
> I've settled on using si for now. bp seems tempting for this, but
> you always have to have a displacement even if it's zero because
> mod=0 r/m=6 does something else. So using si I have
>
> mov si, sp
> mov ax, [si] ; smaller encoding
>
> and
>
> mov si, sp
> mov ax, [si+2]

This fails every time SS is different from DS!

You _must_ use BP (which defaults to SS just like SP) or you need a ES:
override on the addressing.

mov si,sp
mov ax,[es:si+2]

But if you have SI as a spare register you can also use stack operations:

POP SI
POP AX
PUSH AX
PUSH SI

which is just 4 bytes of code.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Anton Ertl

unread,
May 12, 2018, 8:24:37 AM5/12/18
to
luserdroog <luser...@nospicedham.gmail.com> writes:
>So using si I have
>
>mov si, sp
>mov ax, [si] ; smaller encoding=20

From your definition of NEXT in the other posting I see that you
already use SI as Forth IP (you use LODS in your NEXT), so that's not
a good choice.

One thing you could do is to use SP as return stack pointer, and
something else (BX or BP?) as data stack pointer. Indexed return
stack accesses are less frequent (mostly for DO...LOOPs).

One other thing I would do is to keep the TOS in a register across
word boundaries. This does not increase register pressure in most
cases (probably in none), because the words with the highest register
pressure have the TOS in a register at the place of the highest
register pressure even if they keep it in memory across NEXT.

- anton
--
M. Anton Ertl Some things have to be seen to be believed
an...@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html

luserdroog

unread,
May 12, 2018, 10:24:45 AM5/12/18
to
Thanks. I totally missed that since my emulator doesn't do segments yet.
Looking at Table 2.2 Logical Address Sources confirms your conclusion.
BP is the only other register which defaults to SS.

My primary reference is actually my emulator source because it also
shows me which instructions I can actually use. But that can lead me
to this kind of oversight.

luserdroog

unread,
May 12, 2018, 11:24:49 AM5/12/18
to
On Saturday, May 12, 2018 at 7:24:37 AM UTC-5, Anton Ertl wrote:
> luserdroog <luser...@nospicedham.gmail.com> writes:
> >So using si I have
> >
> >mov si, sp
> >mov ax, [si] ; smaller encoding=20
>
> From your definition of NEXT in the other posting I see that you
> already use SI as Forth IP (you use LODS in your NEXT), so that's not
> a good choice.
>
> One thing you could do is to use SP as return stack pointer, and
> something else (BX or BP?) as data stack pointer. Indexed return
> stack accesses are less frequent (mostly for DO...LOOPs).
>
> One other thing I would do is to keep the TOS in a register across
> word boundaries. This does not increase register pressure in most
> cases (probably in none), because the words with the highest register
> pressure have the TOS in a register at the place of the highest
> register pressure even if they keep it in memory across NEXT.
>

Thanks, I'll consider those options. I think I'll wait on the TOS
in register until I get something running at all.

Current line-up:
SI is forth IP
SP is forth data stack ptr
BP used for indexing stack
DI is forth return stack ptr
AX,CX,DX,BX free for functions to use


macro next
lodsl ad
jmp *(eax) ff\0350


drop
pop eax 58
next

swap
pop eax 58
pop ebx 5b
push eax 50
push ebx 53
next

dup
mov (esp),eax 8b\364 8b\106\0
push eax 50
next

over
mov 4(esp),eax 8b\364 8b\106\2
push eax 50
next

rot
pop eax 58
pop ebx 5b
pop ecx 59
push ebx 53
push eax 50
push ecx 51
next

-rot
pop eax 58
pop ebx 5b
pop ecx 59
push eax 50
push ecx 51
push ebx 53
next

2drop
pop eax 58
pop eax 58
next

2dup
mov (esp),eax 8b\364 8b\106\0
mov 4(esp),ebx 8b\136\2
push ebx 53
push eax 50
next

2swap
pop eax 58
pop ebx 5b
pop ecx 59
pop edx 5a
push ebx 53
push eax 50
push edx 52
push ecx 51
next

?dup
movl (esp),eax 8b\364 8b\106\0
test eax,eax 85\300
jz 1f 74\1
push eax 58
1
next


1+
incl (esp) 8b\364 ff\106\0
next

1-
decl (esp) 8b\364 ff\116\0
next

+
pop eax 58
add eax,(esp) 8b\364 01\106\0
next

-
pop eax 58
sub eax,(esp) 8b\364 29\106\0
next

*
pop eax 58
pop ebx 5b
imull ebx,eax f7\353
push eax 50
next



#define NEXT 0xAD,0xFF,0350

static inline int
forth(char *mem){
char image[] = {
0, 4, 'd','r','o','p', 0x58, NEXT,
0, 4, 's','w','a','p', 0x58, 0x5b, 0x50, 0x53, NEXT,
0, 3, 'd','u','p', 0 , 0x8b,0364, 0x8b,0106,0, 0x50, NEXT,
0, 4. 'o','v','e','r', 0x8b,0364, 0x8b,0106,2, 0x50, NEXT,
0, 3, 'r','o','t', 0 , 0x58, 0x5b, 0x59, 0x53, 0x50, 0x51, NEXT,
0, 4, '-','r','o','t', 0x58, 0x5b, 0x59, 0x50, 0x51, 0x53, NEXT,
0, 5, '2','d','r','o','p', 0 , 0x58, 0x59, NEXT,
0, 4, '2','d','u','p', 0x8b,0364, 0x8b,0106,0, 0x8b,0136,2, 0x53, 0x50, NEXT,
0, 5, '2','s','w','a','p', 0 , 0x58, 0x5b, 0x59, 0x5a, 0x53, 0x50, 0x52, 0x51, NEXT,
0, 4, '?','d','u','p', 0x8b,0364, 0x8b,0106,0, 0x85,0300, 0x74,1, 0x58, NEXT,
0, 2, '1','+', 0x8b,0364, 0xff,0106,0, NEXT,
0, 2, '1','-', 0x8b,0364, 0xff,0116,0, NEXT,
0, 1, '+',0, 0x58, 0x8b,0364, 0x01,0106,0, NEXT,
0, 1, '-',0, 0x58, 0x8b,0364, 0x29,0106,0, NEXT,
0, 1, '*',0, 0x58, 0x5b, 0xf7,0353, 0x50, NEXT,
};
return memcpy(mem, image, sizeof image);
}

luserdroog

unread,
May 13, 2018, 12:10:26 AM5/13/18
to
On Saturday, May 12, 2018 at 10:24:49 AM UTC-5, luserdroog wrote:

> Current line-up:
> SI is forth IP
> SP is forth data stack ptr
> BP used for indexing stack
> DI is forth return stack ptr
> AX,CX,DX,BX free for functions to use
>

Custom cpp macro assembler.

$ cat forth.h
#define AX 0
#define CX 1
#define DX 2
#define BX 3
#define SP 4
#define BP 5
#define SI 6
#define DI 7
#define BX_SI 0
#define BX_DI 1
#define BP_SI 2
#define BP_DI 3
#define SI_ 4
#define DI_ 5
#define BP_ 6
#define BX_ 7
#define LODS 0xAD
#define NEXT LODS, JMP_(R,AX)
#define JMP_(m,r) 0xff,MRM(m,5,r)
#define PUSHRSP(r) LEA(,B,DI,DI_),0xfc, MOV(,Z,r,DI_)
#define POPRSP(r) MOV(F,Z,r,DI_), LEA(,B,DI,DI_),4
#define POP(r) 0x58+r
#define PUSH(r) 0x50+r
#define F -2
#define R 3
#define Z 0
#define B 1
#define W 2
#define MRM(m,r,r_m) 0##m##r##r_m
#define LEA(to,m,r,r_m) to+0x8d,MRM(m,r,r_m)
#define MOV(to,m,r,r_m) to+0x8b,MRM(m,r,r_m)
#define ADDAX 0x05
#define TEST(m,r,r_m) 0x85,MRM(m,r,r_m)
#define JZ 0x74
#define INC_(m,r_m) 0xff,MRM(m,0,r_m)
#define DEC_(m,r_m) 0xff,MRM(m,1,r_m)
#define ADD(to,m,r,r_m) to+0x03,MRM(m,r,r_m)
#define SUB(to,m,r,r_m) to+0x2b,MRM(m,r,r_m)
#define IMUL(m,r_m) 0xf7,MRM(m,5,r_m)


static inline int
forth(char *mem){
char image[] = {
00 5, 'd','o','c','o','l',0, PUSHRSP(SI), ADDAX,4,0, MOV(,R,DI,AX),NEXT,
20, 4, 'd','r','o','p', POP(AX),NEXT,
9, 4, 's','w','a','p', POP(AX),POP(BX),PUSH(AX),PUSH(BX),NEXT,
12, 3, 'd','u','p',0, MOV(,R,BP,SP), MOV(,B,AX,BP_),0, PUSH(AX),NEXT,
14, 4. 'o','v','e','r', MOV(,R,BP,SP), MOV(,B,AX,BP_),2, PUSH(AX),NEXT,
14, 3, 'r','o','t',0, POP(AX),POP(BX),POP(CX),PUSH(BX),PUSH(AX),PUSH(CX),NEXT,
14, 4, '-','r','o','t', POP(AX),POP(BX),POP(CX),PUSH(AX),PUSH(CX),PUSH(BX),NEXT,
14, 5, '2','d','r','o','p',0, POP(AX),POP(AX),NEXT,
12, 4, '2','d','u','p', MOV(,R,BP,SP), MOV(,B,AX,BP_),0, MOV(,B,BX,BP_),2, PUSH(BX),PUSH(AX),NEXT,
18, 5, '2','s','w','a','p',0, POP(AX),POP(BX),POP(CX),POP(DX),PUSH(BX),PUSH(AX),PUSH(DX),PUSH(CX),NEXT,
18, 4, '?','d','u','p', MOV(,R,BP,SP), MOV(,B,AX,BP_),0, TEST(R,AX,AX), JZ,1, PUSH(AX),NEXT,
18, 2, '1','+', MOV(,R,BP,SP), INC_(B,BP_),0, NEXT,
11, 2, '1','-', MOV(,R,BP,SP), DEC_(B,BP_),0, NEXT,
11, 1, '+',0, POP(AX), MOV(,R,BP,SP), ADD(F,B,AX,BP_),0, NEXT,
12, 1, '-',0, POP(AX), MOV(,R,BP,SP), SUB(F,B,AX,BP_),0, NEXT,
12, 1, '*',0, POP(AX),POP(BX), IMUL(R,BX), PUSH(AX), NEXT,
11, 4, 'e','x','i','t', POPRSP(SI),NEXT,
13, 3, 'l','i','t',0, LODS,PUSH(AX),NEXT,
10, 1, '!',0, POP(BX), POP(AX), MOV(F,Z,AX,BX), NEXT,

wolfgang kern

unread,
May 13, 2018, 5:10:45 AM5/13/18
to

"luserdroog" said:
> Hi everyone.

hello,
if your 16bit 8086 code run on a 386 (intel syntax, dunno AT&T):

66 0F B7 E4 movzx esp,sp ;needed only once, just in case
67 8B 44 24 04 mov ax,[esp+4] ;uses SS:
__
wolfgang

Rod Pemberton

unread,
May 14, 2018, 11:27:59 PM5/14/18
to
On Fri, 11 May 2018 20:29:46 -0700 (PDT)
luserdroog <luser...@nospicedham.gmail.com> wrote:

> I've settled on using si for now. bp seems tempting for this, but
> you always have to have a displacement even if it's zero because
> mod=0 r/m=6 does something else. So using si I have
>
> mov si, sp
> mov ax, [si] ; smaller encoding
>
> and
>
> mov si, sp
> mov ax, [si+2]
>
> For giggles, here's my draft so far. In a form which should
> be convenient to plug into my emulator.
>

If you're loading SI, you can use LODSW. It will index SI for you.
Use CLD or STD to set the direction, increment or decrement, for
indexing.
It works, but ouch.

Word alignment?

Is there some reason why putc() or putchar() i.e., send characters
directly to an open tmpfile() file, wouldn't work in your macros?

> Need to stitch up the link fields somehow.

Of course, &image[XX], provides the address: image+XX. However, you
only have a byte to encode a length to the next primitive. You could
subtract two addresses. The problem is that any time you insert or
delete bytes into any Forth word within image[], /all/ of the XX
indexes from there onwards to the end of the dictionary, become
incorrect and must be manually adjusted, since their values have been
hard coded. BTDT.

Now, if you called putc() or putchar() in your macros to write the
bytes out to a file, then you could create a wrapper routine for putc()
or putchar(), say i_putc(), which also increments a counter variable
every time you output a character to a file. This variable would give
you the current index position within image[]. Any time the char was
'\0', you'd save the counter to subtract to later subtract from the
current counter to obtain the LFA offset. Of course, that would mean
that you'd need to use something other than '\0' for word padding the
name field, e.g., perhaps a space.


Rod Pemberton
--
I believe in the right to life. That's why I oppose gun control.

luserdroog

unread,
May 15, 2018, 5:44:01 PM5/15/18
to
Agreed. The file did not remain in that form for long.
I hope you'll also look at my later posts from May 12
which reads more nicely.

> Word alignment?

My plan was to ignore this at first. The target is my
own 8086 emulator which is missing roughly 1/3 of the
full instruction set.

> Is there some reason why putc() or putchar() i.e., send characters
> directly to an open tmpfile() file, wouldn't work in your macros?
>

No, I just hadn't consider that! That would simplify quite a bit
from the later versions.

> > Need to stitch up the link fields somehow.
>
> Of course, &image[XX], provides the address: image+XX. However, you
> only have a byte to encode a length to the next primitive. You could
> subtract two addresses. The problem is that any time you insert or
> delete bytes into any Forth word within image[], /all/ of the XX
> indexes from there onwards to the end of the dictionary, become
> incorrect and must be manually adjusted, since their values have been
> hard coded. BTDT.
>
> Now, if you called putc() or putchar() in your macros to write the
> bytes out to a file, then you could create a wrapper routine for putc()
> or putchar(), say i_putc(), which also increments a counter variable
> every time you output a character to a file. This variable would give
> you the current index position within image[]. Any time the char was
> '\0', you'd save the counter to subtract to later subtract from the
> current counter to obtain the LFA offset. Of course, that would mean
> that you'd need to use something other than '\0' for word padding the
> name field, e.g., perhaps a space.
>

Yes, that does seem like a good approach. I've gone in a different
direction with more complicated macros. But I am tempted to backtrack
and build the image with smaller functional pieces which are then
combined.

Still a work in progress, but here's the latest draft. It's now split
into several files.

$ for i in ppnarg.h asm8086.h applyx.h forth.h ; do echo ---- $i: ---- ; cat $i ; done && echo -- && cpp -P forth.h
---- ppnarg.h: ----
/*
* The PP_NARG macro evaluates to the number of arguments that have been
* passed to it.
*
* Laurent Deniau, "__VA_NARG__," 17 January 2006, <comp.std.c> (29 November 2007).
*/
#define PP_NARG(...) PP_NARG_(__VA_ARGS__,PP_RSEQ_N())
#define PP_NARG_(...) PP_ARG_N(__VA_ARGS__)

#define PP_ARG_N( \
_1, _2, _3, _4, _5, _6, _7, _8, _9,_10, \
_11,_12,_13,_14,_15,_16,_17,_18,_19,_20, \
_21,_22,_23,_24,_25,_26,_27,_28,_29,_30, \
_31,_32,_33,_34,_35,_36,_37,_38,_39,_40, \
_41,_42,_43,_44,_45,_46,_47,_48,_49,_50, \
_51,_52,_53,_54,_55,_56,_57,_58,_59,_60, \
_61,_62,_63,N,...) N

#define PP_RSEQ_N() \
63,62,61,60, \
59,58,57,56,55,54,53,52,51,50, \
49,48,47,46,45,44,43,42,41,40, \
39,38,37,36,35,34,33,32,31,30, \
29,28,27,26,25,24,23,22,21,20, \
19,18,17,16,15,14,13,12,11,10, \
9,8,7,6,5,4,3,2,1,0
---- asm8086.h: ----

#define LEA(to,m,r,r_m) to+0x8d,MRM(m,r,r_m)
#define MOV(to,m,r,r_m) to+0x8b,MRM(m,r,r_m)
#define ADD(to,m,r,r_m) to+0x03,MRM(m,r,r_m)
#define SUB(to,m,r,r_m) to+0x2b,MRM(m,r,r_m)
#define F -2
#define MRM(m,r,r_m) 0##m##r##r_m
#define Z 0
#define B 1
#define W 2
#define R 3
#define AX 0
#define CX 1
#define DX 2
#define BX 3
#define SP 4
#define BP 5
#define SI 6
#define DI 7
#define BX_SI 0
#define BX_DI 1
#define BP_SI 2
#define BP_DI 3
#define SI_ 4
#define DI_ 5
#define BP_ 6
#define BX_ 7
#define TEST(m,r,r_m) 0x85,MRM(m,r,r_m)
#define IMUL(m, r_m) 0xf7,MRM(m,5,r_m)
#define INC_(m, r_m) 0xff,MRM(m,0,r_m)
#define DEC_(m, r_m) 0xff,MRM(m,1,r_m)
#define JMP_(m, r_m) 0xff,MRM(m,5,r_m)
#define POP(r) 0x58+r
#define PUSH(r) 0x50+r
#define ADDAX 0x05
#define LODS 0xAD
#define JZ 0x74
---- applyx.h: ----
#include "ppnarg.h"

/* need extra level to force extra eval */
#define Paste(a,b) a ## b
#define XPASTE(a,b) Paste(a,b)


/* APPLYXn variadic X-Macro by M Joshua Ryan */
/* Free for all uses. Don't be a jerk. */
/* I got bored after typing 15 of these. */
/* You could keep going upto 64 (PPNARG's limit). */
#define APPLYX1(X, a) X(a)
#define APPLYX2(X, a,b) X(a) X(b)
#define APPLYX3(X, a,b,c) X(a) X(b) X(c)
#define APPLYX4(X, a,b,c,d) X(a) X(b) X(c) X(d)
#define APPLYX5(X, a,b,c,d,e) X(a) X(b) X(c) X(d) X(e)
#define APPLYX6(X, a,b,c,d,e,f) X(a) X(b) X(c) X(d) X(e) X(f)
#define APPLYX7(X, a,b,c,d,e,f,g) \
X(a) X(b) X(c) X(d) X(e) X(f) X(g)
#define APPLYX8(X, a,b,c,d,e,f,g,h) \
X(a) X(b) X(c) X(d) X(e) X(f) X(g) X(h)
#define APPLYX9(X, a,b,c,d,e,f,g,h,i) \
X(a) X(b) X(c) X(d) X(e) X(f) X(g) X(h) X(i)
#define APPLYX10(X, a,b,c,d,e,f,g,h,i,j) \
X(a) X(b) X(c) X(d) X(e) X(f) X(g) X(h) X(i) X(j)
#define APPLYX11(X, a,b,c,d,e,f,g,h,i,j,k) \
X(a) X(b) X(c) X(d) X(e) X(f) X(g) X(h) X(i) X(j) X(k)
#define APPLYX12(X, a,b,c,d,e,f,g,h,i,j,k,l) \
X(a) X(b) X(c) X(d) X(e) X(f) X(g) X(h) X(i) X(j) X(k) X(l)
#define APPLYX13(X, a,b,c,d,e,f,g,h,i,j,k,l,m) \
X(a) X(b) X(c) X(d) X(e) X(f) X(g) X(h) X(i) X(j) X(k) X(l) X(m)
#define APPLYX14(X, a,b,c,d,e,f,g,h,i,j,k,l,m,n) \
X(a) X(b) X(c) X(d) X(e) X(f) X(g) X(h) X(i) X(j) X(k) X(l) X(m) X(n)
#define APPLYX15(X, a,b,c,d,e,f,g,h,i,j,k,l,m,n,o) \
X(a) X(b) X(c) X(d) X(e) X(f) X(g) X(h) X(i) X(j) X(k) X(l) X(m) X(n) X(o)
#define APPLYX_(M, ...) M(__VA_ARGS__)
#define APPLYXn(X, ...) APPLYX_(XPASTE(APPLYX, PP_NARG(__VA_ARGS__)), X, __VA_ARGS__)
---- forth.h: ----
#include "ppnarg.h"
#include "applyx.h"
#include "asm8086.h"

#define NEXT LODS, JMP_(R,AX)
#define PUSHRSP(r) LEA(,B,DI,DI_),0xfc, MOV(,Z,r,DI_)
#define POPRSP(r) MOV(F,Z,r,DI_), LEA(,B,DI,DI_),4

#define CODE(letters,...) _COUNT(LETTERS letters __VA_ARGS__)
#define WORD(letters,...) _COUNT(LETTERS letters JMP(DOCOL), __VA_ARGS__)
#define LETTERS(...) PP_NARG(__VA_ARGS__),APPLYXn(CHARIFY, __VA_ARGS__)
#define _FINIS(...) __VA_ARGS__
#define _COUNT(...) __VA_ARGS__,PP_NARG(__VA_ARGS__)
#define STRINGIFY(x) #x
#define CHARIFY(x) *STRINGIFY(x),


static inline int
forth(char *mem){
unsigned char image[] = {
0,
CODE((d,o,c,o,l), PUSHRSP(SI), ADDAX,4,0, MOV(,R,DI,AX), NEXT),
CODE((e,x,i,t), POPRSP(SI), NEXT),
CODE((l,i,t), LODS, PUSH(AX), NEXT),
CODE((d,r,o,p), POP(AX), NEXT),
CODE((s,w,a,p), POP(AX), POP(BX), PUSH(AX), PUSH(BX), NEXT),
CODE((d,u,p), MOV(,R,BP,SP), MOV(,B,AX,BP_),0, PUSH(AX), NEXT),
CODE((o,v,e,r), MOV(,R,BP,SP), MOV(,B,AX,BP_),2, PUSH(AX), NEXT),
CODE((r,o,t), POP(AX), POP(BX), POP(CX), PUSH(BX), PUSH(AX), PUSH(CX), NEXT),
CODE((-,r,o,t), POP(AX), POP(BX), POP(CX), PUSH(AX), PUSH(CX), PUSH(BX), NEXT),
CODE((2,d,r,o,p), POP(AX), POP(AX), NEXT),
CODE((2,d,u,p), MOV(,R,BP,SP), MOV(,B,AX,BP_),0, MOV(,B,BX,BP_),2, PUSH(BX), PUSH(AX), NEXT),
CODE((2,s,w,a,p), POP(AX), POP(BX), POP(CX), POP(DX),PUSH(BX),PUSH(AX),PUSH(DX),PUSH(CX),NEXT),
CODE((?,d,u,p), MOV(,R,BP,SP), MOV(,B,AX,BP_),0, TEST(R,AX,AX), JZ,1, PUSH(AX), NEXT),
CODE((1,+), MOV(,R,BP,SP), INC_(B,BP_),0, NEXT),
CODE((1,-), MOV(,R,BP,SP), DEC_(B,BP_),0, NEXT),
CODE((+), POP(AX), MOV(,R,BP,SP), ADD(F,B,AX,BP_),0, NEXT),
CODE((-), POP(AX), MOV(,R,BP,SP), SUB(F,B,AX,BP_),0, NEXT),
CODE((*), POP(AX),POP(BX), IMUL(R,BX), PUSH(AX), NEXT),
CODE((!), POP(BX), POP(AX), MOV(F,Z,AX,BX), NEXT),
CODE((@), POP(BX), MOV(,R,AX,BX), PUSH(AX), NEXT),
CODE((+,!), POP(BX), POP(AX), ADD(F,Z,AX,BX), NEXT),
CODE((-,!), POP(BX), POP(AX), SUB(F,Z,AX,BX), NEXT),
WORD((d,o,u,b,l,e), DUP, PLUS, EXIT),
};
return memcpy(mem, image, sizeof image);
}
--
static inline int
forth(char *mem){
unsigned char image[] = {
0,
5,*"d", *"o", *"c", *"o", *"l", +0x8d,0175,0xfc, +0x8b,0065, 0x05,4,0, +0x8b,0370, 0xAD, 0xff,0350,19,
4,*"e", *"x", *"i", *"t", -2 +0x8b,0065, +0x8d,0175,4, 0xAD, 0xff,0350,13,
3,*"l", *"i", *"t", 0xAD, 0x50+0, 0xAD, 0xff,0350,9,
4,*"d", *"r", *"o", *"p", 0x58+0, 0xAD, 0xff,0350,9,
4,*"s", *"w", *"a", *"p", 0x58+0, 0x58+3, 0x50+0, 0x50+3, 0xAD, 0xff,0350,12,
3,*"d", *"u", *"p", +0x8b,0354, +0x8b,0106,0, 0x50+0, 0xAD, 0xff,0350,13,
4,*"o", *"v", *"e", *"r", +0x8b,0354, +0x8b,0106,2, 0x50+0, 0xAD, 0xff,0350,14,
3,*"r", *"o", *"t", 0x58+0, 0x58+3, 0x58+1, 0x50+3, 0x50+0, 0x50+1, 0xAD, 0xff,0350,13,
4,*"-", *"r", *"o", *"t", 0x58+0, 0x58+3, 0x58+1, 0x50+0, 0x50+1, 0x50+3, 0xAD, 0xff,0350,14,
5,*"2", *"d", *"r", *"o", *"p", 0x58+0, 0x58+0, 0xAD, 0xff,0350,11,
4,*"2", *"d", *"u", *"p", +0x8b,0354, +0x8b,0106,0, +0x8b,0136,2, 0x50+3, 0x50+0, 0xAD, 0xff,0350,18,
5,*"2", *"s", *"w", *"a", *"p", 0x58+0, 0x58+3, 0x58+1, 0x58+2,0x50+3,0x50+0,0x50+2,0x50+1,0xAD, 0xff,0350,17,
4,*"?", *"d", *"u", *"p", +0x8b,0354, +0x8b,0106,0, 0x85,0300, 0x74,1, 0x50+0, 0xAD, 0xff,0350,18,
2,*"1", *"+", +0x8b,0354, 0xff,0106,0, 0xAD, 0xff,0350,11,
2,*"1", *"-", +0x8b,0354, 0xff,0116,0, 0xAD, 0xff,0350,11,
1,*"+", 0x58+0, +0x8b,0354, -2 +0x03,0106,0, 0xAD, 0xff,0350,11,
1,*"-", 0x58+0, +0x8b,0354, -2 +0x2b,0106,0, 0xAD, 0xff,0350,11,
1,*"*", 0x58+0,0x58+3, 0xf7,0353, 0x50+0, 0xAD, 0xff,0350,10,
1,*"!", 0x58+3, 0x58+0, -2 +0x8b,0003, 0xAD, 0xff,0350,9,
1,*"@", 0x58+3, +0x8b,0303, 0x50+0, 0xAD, 0xff,0350,9,
2,*"+", *"!", 0x58+3, 0x58+0, -2 +0x03,0003, 0xAD, 0xff,0350,10,
2,*"-", *"!", 0x58+3, 0x58+0, -2 +0x2b,0003, 0xAD, 0xff,0350,10,
6,*"d", *"o", *"u", *"b", *"l", *"e", JMP(DOCOL), DUP, PLUS, EXIT,11,

luserdroog

unread,
May 20, 2018, 6:50:13 PM5/20/18
to
> $ for i in [snip] forth.h ; do echo ---- $i: ---- ; cat $i ; done && echo -- && cpp -P forth.h
[snip]
I thought about going forward and the alternative seemed more appealing.
But I got as far as this:

static int code();
static int word();
static int grow();

static int
forth( char *load_address ){
unsigned char *image = NULL;
unsigned int capacity;
unsigned int offset = 0;
offset += code( image, offset, capacity, "docol", PUSHRSP(SI), ADDAX,4,0, MOV(,R,DI,AX), NEXT );
offset += word( image, offset, capacity, "double", "dup", "plus", "exit" );
}

static int
code( unsigned char *image, unsigned int offset, unsigned int capacity, char *name, ... ){
}

static int
word( unsigned char *image, unsigned int offset, unsigned int capacity, char *name, ... ){
}


And now I don't want to do it that way anymore. Already those functions
feel more terrible than the macros.

But to gather symbols with the macro approach, I think I need an X-macro
wrapper around the whole image so I can do a double pass. And that seems
pretty gruesome, too.

I wonder if there's a way to go side-ways...

luserdroog

unread,
May 26, 2018, 12:27:00 AM5/26/18
to
<snip>
> I thought about going forward and the alternative seemed more appealing.
> But I got as far as this:
>
> static int code();
> static int word();
> static int grow();
>
> static int
> forth( char *load_address ){
> unsigned char *image = NULL;
> unsigned int capacity;
> unsigned int offset = 0;
> offset += code( image, offset, capacity, "docol", PUSHRSP(SI), ADDAX,4,0, MOV(,R,DI,AX), NEXT );
> offset += word( image, offset, capacity, "double", "dup", "plus", "exit" );
> }
>
> static int
> code( unsigned char *image, unsigned int offset, unsigned int capacity, char *name, ... ){
> }
>
> static int
> word( unsigned char *image, unsigned int offset, unsigned int capacity, char *name, ... ){
> }
>
>
> And now I don't want to do it that way anymore. Already those functions
> feel more terrible than the macros.
>
> But to gather symbols with the macro approach, I think I need an X-macro
> wrapper around the whole image so I can do a double pass. And that seems
> pretty gruesome, too.
>
> I wonder if there's a way to go side-ways...


Made some sideways progress. Just two files. Using macro for the useful
stringify effect. And using sequential compound statements lets me
use a regular structure and named assignments making this fairly readable
IMO. Still not enough machinery to actually run anything. and the emulator
has bugger-all for i/o atm.

$ for i in asm8086.h forth3.h ; do echo ----- $i: ------ ; cat $i ; done ; echo --------------- ; cpp -P forth3.h
----- asm8086.h: ------
#define HALT 0xF4
----- forth3.h: ------
#include "asm8086.h"
/* W = BX
IP = SI
PSP = SP
RSP = BP
X = AX
TOS_in_memory */
#define NEXT LODS, JMP_(R,AX)
#define PUSHRSP(r) LEA(,B,BP,BP_),1+(0xff^4), MOV(,Z,r,BP_)
#define POPRSP(r) MOV(F,Z,r,BP_), LEA(,B,BP,BP_),4
typedef unsigned char UC;
typedef unsigned short US;

struct dict_entry {
US link;
UC name_len;
UC name[8];
US code;
UC param[20];
};

#define CODE(n, ...) { \
dict_entry x = { \
.link = -sizeof x, \
.name_len = -1+sizeof(#n), \
.name = #n, \
.code = (p - start) + offset_of(dict_entry, param), \
.param = __VA_ARGS__ }; \
memcpy(p, x, sizeof x); \
p += sizeof x; \
}

static inline int
forth(char *start){
char *p = start;
CODE(enter, { PUSHRSP(SI), ADDAX,4,0, MOV(,R,DI,AX), NEXT })
int _enter = p - start;
CODE(exit, { POPRSP(SI), NEXT })
CODE(lit, { LODS, PUSH(AX), NEXT })
int _lit = p - start;
CODE(drop, { POP(AX), NEXT })
CODE(swap, { POP(AX), POP(BX), PUSH(AX), PUSH(BX), NEXT })
CODE(dup, { MOV(,R,BX,SP), MOV(,B,AX,BX_),0, PUSH(AX), NEXT })
CODE(over, { MOV(,R,BX,SP), MOV(,B,AX,BX_),2, PUSH(AX), NEXT })
CODE(rot, { POP(AX), POP(BX), POP(CX), PUSH(BX), PUSH(AX), PUSH(CX), NEXT })
CODE(nrot, { POP(AX), POP(BX), POP(CX), PUSH(AX), PUSH(CX), PUSH(BX), NEXT })
CODE(2drop, { POP(AX), POP(AX), NEXT })
CODE(2dup, { MOV(,R,BX,SP), MOV(,B,AX,BX_),0, MOV(,B,CX,BX_),2, PUSH(AX), PUSH(CX), NEXT })
CODE(1+, { MOV(,R,BX,SP), INC_(B,BX_),0, NEXT })
CODE(1-, { MOV(,R,BX,SP), DEC_(B,BX_),0, NEXT })
CODE(+, { POP(AX), MOV(,R,BX,SP), ADD(F,B,AX,BX_),0, NEXT })
CODE(-, { POP(AX), MOV(,R,BX,SP), SUB(F,B,AX,BX_),0, NEXT })
CODE(*, { POP(AX), POP(BX), IMUL(R,BX), PUSH(AX), NEXT })
CODE(!, { POP(BX), POP(AX), MOV(F,Z,AX,BX_), NEXT })
CODE(@, { POP(BX), MOV(,Z,AX,BX_), PUSH(AX), NEXT })
CODE(+!, { POP(BX), POP(AX), ADD(F,Z,AX,BX_), NEXT })
CODE(-!, { POP(BX), POP(AX), SUB(F,Z,AX,BX_), NEXT })
CODE(bye, { HALT })
}
---------------
typedef unsigned char UC;
typedef unsigned short US;
struct dict_entry {
US link;
UC name_len;
UC name[8];
US code;
UC param[20];
};
static inline int
forth(char *start){
char *p = start;
{ dict_entry x = { .link = -sizeof x, .name_len = -1+sizeof("enter"), .name = "enter", .code = (p - start) + offset_of(dict_entry, param), .param = { +0x8d,0156,1+(0xff^4), +0x8b,0066, 0x05,4,0, +0x8b,0370, 0xAD, 0xff,0350 } }; memcpy(p, x, sizeof x); p += sizeof x; }
int _enter = p - start;
{ dict_entry x = { .link = -sizeof x, .name_len = -1+sizeof("exit"), .name = "exit", .code = (p - start) + offset_of(dict_entry, param), .param = { -2 +0x8b,0066, +0x8d,0156,4, 0xAD, 0xff,0350 } }; memcpy(p, x, sizeof x); p += sizeof x; }
{ dict_entry x = { .link = -sizeof x, .name_len = -1+sizeof("lit"), .name = "lit", .code = (p - start) + offset_of(dict_entry, param), .param = { 0xAD, 0x50+0, 0xAD, 0xff,0350 } }; memcpy(p, x, sizeof x); p += sizeof x; }
int _lit = p - start;
{ dict_entry x = { .link = -sizeof x, .name_len = -1+sizeof("drop"), .name = "drop", .code = (p - start) + offset_of(dict_entry, param), .param = { 0x58+0, 0xAD, 0xff,0350 } }; memcpy(p, x, sizeof x); p += sizeof x; }
{ dict_entry x = { .link = -sizeof x, .name_len = -1+sizeof("swap"), .name = "swap", .code = (p - start) + offset_of(dict_entry, param), .param = { 0x58+0, 0x58+3, 0x50+0, 0x50+3, 0xAD, 0xff,0350 } }; memcpy(p, x, sizeof x); p += sizeof x; }
{ dict_entry x = { .link = -sizeof x, .name_len = -1+sizeof("dup"), .name = "dup", .code = (p - start) + offset_of(dict_entry, param), .param = { +0x8b,0334, +0x8b,0107,0, 0x50+0, 0xAD, 0xff,0350 } }; memcpy(p, x, sizeof x); p += sizeof x; }
{ dict_entry x = { .link = -sizeof x, .name_len = -1+sizeof("over"), .name = "over", .code = (p - start) + offset_of(dict_entry, param), .param = { +0x8b,0334, +0x8b,0107,2, 0x50+0, 0xAD, 0xff,0350 } }; memcpy(p, x, sizeof x); p += sizeof x; }
{ dict_entry x = { .link = -sizeof x, .name_len = -1+sizeof("rot"), .name = "rot", .code = (p - start) + offset_of(dict_entry, param), .param = { 0x58+0, 0x58+3, 0x58+1, 0x50+3, 0x50+0, 0x50+1, 0xAD, 0xff,0350 } }; memcpy(p, x, sizeof x); p += sizeof x; }
{ dict_entry x = { .link = -sizeof x, .name_len = -1+sizeof("nrot"), .name = "nrot", .code = (p - start) + offset_of(dict_entry, param), .param = { 0x58+0, 0x58+3, 0x58+1, 0x50+0, 0x50+1, 0x50+3, 0xAD, 0xff,0350 } }; memcpy(p, x, sizeof x); p += sizeof x; }
{ dict_entry x = { .link = -sizeof x, .name_len = -1+sizeof("2drop"), .name = "2drop", .code = (p - start) + offset_of(dict_entry, param), .param = { 0x58+0, 0x58+0, 0xAD, 0xff,0350 } }; memcpy(p, x, sizeof x); p += sizeof x; }
{ dict_entry x = { .link = -sizeof x, .name_len = -1+sizeof("2dup"), .name = "2dup", .code = (p - start) + offset_of(dict_entry, param), .param = { +0x8b,0334, +0x8b,0107,0, +0x8b,0117,2, 0x50+0, 0x50+1, 0xAD, 0xff,0350 } }; memcpy(p, x, sizeof x); p += sizeof x; }
{ dict_entry x = { .link = -sizeof x, .name_len = -1+sizeof("1+"), .name = "1+", .code = (p - start) + offset_of(dict_entry, param), .param = { +0x8b,0334, 0xff,0107,0, 0xAD, 0xff,0350 } }; memcpy(p, x, sizeof x); p += sizeof x; }
{ dict_entry x = { .link = -sizeof x, .name_len = -1+sizeof("1-"), .name = "1-", .code = (p - start) + offset_of(dict_entry, param), .param = { +0x8b,0334, 0xff,0117,0, 0xAD, 0xff,0350 } }; memcpy(p, x, sizeof x); p += sizeof x; }
{ dict_entry x = { .link = -sizeof x, .name_len = -1+sizeof("+"), .name = "+", .code = (p - start) + offset_of(dict_entry, param), .param = { 0x58+0, +0x8b,0334, -2 +0x03,0107,0, 0xAD, 0xff,0350 } }; memcpy(p, x, sizeof x); p += sizeof x; }
{ dict_entry x = { .link = -sizeof x, .name_len = -1+sizeof("-"), .name = "-", .code = (p - start) + offset_of(dict_entry, param), .param = { 0x58+0, +0x8b,0334, -2 +0x2b,0107,0, 0xAD, 0xff,0350 } }; memcpy(p, x, sizeof x); p += sizeof x; }
{ dict_entry x = { .link = -sizeof x, .name_len = -1+sizeof("*"), .name = "*", .code = (p - start) + offset_of(dict_entry, param), .param = { 0x58+0, 0x58+3, 0xf7,0353, 0x50+0, 0xAD, 0xff,0350 } }; memcpy(p, x, sizeof x); p += sizeof x; }
{ dict_entry x = { .link = -sizeof x, .name_len = -1+sizeof("!"), .name = "!", .code = (p - start) + offset_of(dict_entry, param), .param = { 0x58+3, 0x58+0, -2 +0x8b,0007, 0xAD, 0xff,0350 } }; memcpy(p, x, sizeof x); p += sizeof x; }
{ dict_entry x = { .link = -sizeof x, .name_len = -1+sizeof("@"), .name = "@", .code = (p - start) + offset_of(dict_entry, param), .param = { 0x58+3, +0x8b,0007, 0x50+0, 0xAD, 0xff,0350 } }; memcpy(p, x, sizeof x); p += sizeof x; }
{ dict_entry x = { .link = -sizeof x, .name_len = -1+sizeof("+!"), .name = "+!", .code = (p - start) + offset_of(dict_entry, param), .param = { 0x58+3, 0x58+0, -2 +0x03,0007, 0xAD, 0xff,0350 } }; memcpy(p, x, sizeof x); p += sizeof x; }
{ dict_entry x = { .link = -sizeof x, .name_len = -1+sizeof("-!"), .name = "-!", .code = (p - start) + offset_of(dict_entry, param), .param = { 0x58+3, 0x58+0, -2 +0x2b,0007, 0xAD, 0xff,0350 } }; memcpy(p, x, sizeof x); p += sizeof x; }
{ dict_entry x = { .link = -sizeof x, .name_len = -1+sizeof("bye"), .name = "bye", .code = (p - start) + offset_of(dict_entry, param), .param = { 0xF4 } }; memcpy(p, x, sizeof x); p += sizeof x; }
}

wolfgang kern

unread,
May 26, 2018, 7:42:21 AM5/26/18
to

luserdroog wrote:

<q>
...
...
</q>

Mempointers with mode 00: addressing type rm_6 mean [MEM16].
[BP+xx] work for modes 01 and 10 only while rm_6 means
register SI in mode 11.

Not sure if I understood why you do it that complicated :)
wouldn't any old cross-assembler save you a lot of work ?
__
wolfgang

luserdroog

unread,
May 26, 2018, 12:42:36 PM5/26/18
to
Yes. Perhaps I should arrange for my assembler to enforce
that somehow. I could concatenate the mode and the rm and
have Z_BP_ undefined or defined as some construct which
produces a compiler error. Hmmm.

> Not sure if I understood why you do it that complicated :)
> wouldn't any old cross-assembler save you a lot of work ?

Probably I'm trying to do too many things at once. It seems
I'm trying to do all these things, and possibly more:

1. Learn more about asm and machine code
2. Have some application for my 8086 emulator (cf.
https://github.com/luser-dr00g/8086/blob/master/a8086.c )
3. Write a Forth
4. Write a thin assembler with macros (cf.
https://stackoverflow.com/questions/13459300/c-preprocessor-pseudo-assembly-with-embedded-byte-code-interpreter-how-to-find )
5. This other old idea to make a "stack of languages" (cf. "flips"
https://github.com/luser-dr00g/xpost/blob/wiki/EvolutionOfXpost.md )


Oh, probably the most important:

6. Not work on my actual work

0 new messages