assembly: mov a memory word into register

10 views
Skip to first unread message

Mateusz Viste

unread,
Nov 12, 2021, 8:35:36 AM11/12/21
to
I somehow got stuck on a simple quest: copying a word from memory into a
register. Here's what I do:

void myfunc(char *buff) {
_asm {
mov ax, [buff]
}
}

This is inline assembly within OpenWatcom. My understanding so far was
that:

mov ax, buff ; copies buff (pointer) into AX
mov ax, [buff] ; copies *buff (first word at memory location) into AX

But that's not what happens now.

Whether I use "mov ax, buff" or "mov ax, [buff]", the result is the
same: AX gets the address of buff and never the value under it.

What am I missing?

I must add that it works when I do this:

void myfunc(char *buff) {
_asm {
mov bx, buff
mov ax, [bx]
}
}

I'd like to understand why my first version isn't producing what I
expect, though... Any ideas?


Mateusz

Kerr-Mudd, John

unread,
Nov 12, 2021, 11:52:38 AM11/12/21
to
On Fri, 12 Nov 2021 14:35:35 +0100
Mateusz Viste <mat...@xyz.invalid> wrote:

> I somehow got stuck on a simple quest: copying a word from memory
> into a register. Here's what I do:
>
> void myfunc(char *buff) {
> _asm {
> mov ax, [buff]
> }
> }
>
> This is inline assembly within OpenWatcom. My understanding so far was
> that:
>
> mov ax, buff ; copies buff (pointer) into AX
> mov ax, [buff] ; copies *buff (first word at memory location) into
> AX
>
> But that's not what happens now.
>
> Whether I use "mov ax, buff" or "mov ax, [buff]", the result is the
> same: AX gets the address of buff and never the value under it.
>
> What am I missing?
>

I can only agree with you; have you "upgraded" OpenWatcom?


I might try
mov ax, word [buff]
but it shouldn't be necessary.

then again, it might depend on how 'buff' is declared (Sorry, I'm
not a C programmer).

> I must add that it works when I do this:
>
> void myfunc(char *buff) {
> _asm {
> mov bx, buff
> mov ax, [bx]
> }
> }
>
> I'd like to understand why my first version isn't producing what I
> expect, though... Any ideas?
>
>
> Mateusz
>


--
Bah, and indeed Humbug.

Herbert Kleebauer

unread,
Nov 12, 2021, 11:53:02 AM11/12/21
to
On 12.11.2021 14:35, Mateusz Viste wrote:

> I somehow got stuck on a simple quest: copying a word from memory into a
> register. Here's what I do:
>
> void myfunc(char *buff) {
> _asm {
> mov ax, [buff]
> }
> }
>
> This is inline assembly within OpenWatcom. My understanding so far was
> that:
>
> mov ax, buff ; copies buff (pointer) into AX

> mov ax, [buff] ; copies *buff (first word at memory location) into AX

There is no such x86 instruction. For indirect addressing you have to
use a register.

Mateusz Viste

unread,
Nov 12, 2021, 12:40:28 PM11/12/21
to
That is correct indeed, thanks. I have been confused by what "buff"
truly is. It is still weird that buff and [buff] are the same, but
that's certainly due to the magic introduced by the C compiler to
reference C variables within the inline assembly block.

To illustrate/understand the "issue", I wrote a little test program.


; print first char of cmdline tail

cpu 8086
org 0x100

; does not work (prints the character 0x82)
mov ah, 2
mov dl, [buff]
int 0x21

; works (prints the character at location 0x82)
mov ah, 2
mov dl, [0x82]
int 0x21

; game over
mov ax, 0x4c00
int 0x21

buff dw 0x82




Mateusz

Herbert Kleebauer

unread,
Nov 12, 2021, 1:19:32 PM11/12/21
to
On 12.11.2021 18:40, Mateusz Viste wrote:

I dont know Watcom inline assembly but as I understand:

https://open-watcom.github.io/open-watcom-v2-wikidocs/cguide.pdf


> To illustrate/understand the "issue", I wrote a little test program.
>
>
> ; print first char of cmdline tail
>
> cpu 8086
> org 0x100
>
> ; does not work (prints the character 0x82)
> mov ah, 2
> mov dl, [buff]

This instruction doesn't exist, the compiler uses instead:

mov dl, buff

which stores the value 0x82 (the content of the variable buff) in dl.

> int 0x21
>
> ; works (prints the character at location 0x82)
> mov ah, 2
> mov dl, [0x82]

This instruction exists, it loads the value stored at
location 0x82 into dl.

> int 0x21
>
> ; game over
> mov ax, 0x4c00
> int 0x21
>
> buff dw 0x82

Just take a look at the generated machine code to see what
happens.


Mateusz Viste

unread,
Nov 12, 2021, 2:08:02 PM11/12/21
to
2021-11-12 at 19:19 +0100, Herbert Kleebauer wrote:
> > mov ah, 2
> > mov dl, [buff]
>
> This instruction doesn't exist, the compiler uses instead:
>
> mov dl, buff

Well, no - in this specific example "mov dl, [buff]" does exist. It
loads whatever is at the location pointed out by buff (here: the word
0x82). But note that my previous program was an actual assembly
listing, not an assembly-inside-C abomination like in the first post.

"mov dl, [buff]" is, in fact, the same instruction as in "mov dl,
[0x82]", it is just that the assembler substitutes "buff" by its offset
at compile time:

00000000 B402 mov ah,0x2
00000002 8A161501 mov dl,[0x115]
00000006 CD21 int 0x21
00000008 B402 mov ah,0x2
0000000A 8A168200 mov dl,[0x82]
0000000E CD21 int 0x21
00000010 B8004C mov ax,0x4c00
00000013 CD21 int 0x21
00000015 82 db 0x82
00000016 00 db 0x00

My foolish mistake was to expect a similar behavior within an inline
assembly block when referencing a C pointer (which is different from an
assembly offset and implies - as you correctly pointed out in your
first reply - an extra layer of indirection).

Mateusz

Herbert Kleebauer

unread,
Nov 12, 2021, 4:18:09 PM11/12/21
to
On 12.11.2021 20:07, Mateusz Viste wrote:
> 2021-11-12 at 19:19 +0100, Herbert Kleebauer wrote:
>> > mov ah, 2
>> > mov dl, [buff]
>>
>> This instruction doesn't exist, the compiler uses instead:
>>
>> mov dl, buff
>
> Well, no - in this specific example "mov dl, [buff]" does exist.

Different assembler use different syntax.

mov ax, buff

In Watcom inline assembly this moves the content of the variable
buff into ax. But in NASM this instruction moves the address of
the variable buff into ax. Therefore in NASM a "mov dl, [buff]"
does exist (but not in Watcom) and means the same as "mov dl, buff"
in Watcom.

> It
> loads whatever is at the location pointed out by buff (here: the word
> 0x82). But note that my previous program was an actual assembly
> listing, not an assembly-inside-C abomination like in the first post.
>
> "mov dl, [buff]" is, in fact, the same instruction as in "mov dl,
> [0x82]", it is just that the assembler substitutes "buff" by its offset
> at compile time:
>
> 00000000 B402 mov ah,0x2
> 00000002 8A161501 mov dl,[0x115]
> 00000006 CD21 int 0x21
> 00000008 B402 mov ah,0x2
> 0000000A 8A168200 mov dl,[0x82]
> 0000000E CD21 int 0x21
> 00000010 B8004C mov ax,0x4c00
> 00000013 CD21 int 0x21
> 00000015 82 db 0x82
> 00000016 00 db 0x00
>
> My foolish mistake was to expect a similar behavior within an inline
> assembly block when referencing a C pointer (which is different from an
> assembly offset and implies - as you correctly pointed out in your
> first reply - an extra layer of indirection).

Just compare it with the opcode generated by the Watcom inline assembler.

Rod Pemberton

unread,
Nov 14, 2021, 8:52:17 PM11/14/21
to
On Fri, 12 Nov 2021 14:35:35 +0100
Mateusz Viste <mat...@xyz.invalid> wrote:

> I somehow got stuck on a simple quest:
> copying a word from memory into a register.

You might need to declare "buff" as a C variable, with both a
"volatile" keyword and as a pointer. E.g., perhaps (untested):

volatile unsigned char *buff=0xb800;

Then in the _asm{} section, the compiler should recognize "buff" as a C
variable, use the assigned address, and not optimize away the code (due
to the volatile):

mov ax,[buff]

Of course, AX won't be preserved into the C code as the C compiler
likely uses the register, i.e., AX's value will be destroyed by C code.

If you want to pass the AX value into the C code, you may need to use
the "#pragma aux" format for OW to pass the value back to C, which
should look something like (untested):

volatile unsigned char *buff=0xb800;

extern unsigned short myfunc(void);
#pragma aux myfunc = \
"mov ax, [buff]" \
value [ax];

int main(void)
{
...
printf("%04lx\n",myfunc());
...
}

--
Is Biden intentionally recreating Carter's legacy?

Mateusz Viste

unread,
Nov 15, 2021, 3:31:20 AM11/15/21
to
2021-11-14 at 20:53 -0500, Rod Pemberton wrote:
> You might need to declare "buff" as a C variable, with both a
> "volatile" keyword and as a pointer. E.g., perhaps (untested):
>
> volatile unsigned char *buff=0xb800;
>
> Then in the _asm{} section, the compiler should recognize "buff" as a
> C variable, use the assigned address, and not optimize away the code
> (due to the volatile):
>
> mov ax,[buff]

I'm sorry but I fail to see what problem you are trying to solve here.
The compiler already knows that buff is a C variable and this variable
can be used from within an inline asm block even without volatile.

Herbert's reply made me realize that buff is neither a register nor a
constant address, hence it cannot be used as indirect addressing. I was
too eager to apply the assembly concept of a "variable" (which is, in
fact, just a pre-calculated address offset) to a C pointer.

The fact that "mov r, [buff]" and "mov r, buff" are equivalent for wasm
didn't help to clear my initial confusion.

Consider this program:

#include <i86.h>
#include <stdio.h>

int main(void) {
unsigned char *buff = "AB";
unsigned short mybx = 0, mycx = 0, mydx = 0;

_asm {
mov cx, buff
mov dx, [buff]

mov bx, buff
mov bx, [bx]

mov mybx, bx
mov mycx, cx
mov mydx, dx
}

printf("mybx=%04X mycx=%04X mydx=%04X (buff=%p)\r\n",
mybx, mycx, mydx, buff);
return(0);
}

and its output:

mybx=4241 mycx=0022 mydx=0022 (buff=022)


Your volatile suggestion could be a solution if mybx, mycx, mydx were
optimized away as zeroes, but that's not a problem that exists in this
context. Here the problem was that I naively wanted to obtain the
0x4241 ("AB") result without using an intermediary register for indirect
addressing.

Both C and assembly can be tricky, but both mixed together lead to
entirely new classes of troubles.

Mateusz

Herbert Kleebauer

unread,
Nov 15, 2021, 6:10:47 AM11/15/21
to
On 15.11.2021 09:31, Mateusz Viste wrote:

Does the OpenWatcom assembler allow to use [] as normal
parenthesis like in [3+4]*5 ? Otherwise

mov dx, [buff]

should give an error "illegal addressing mode" instead of
just ignoring the [].


Mateusz Viste

unread,
Nov 15, 2021, 7:02:27 AM11/15/21
to
2021-11-15 at 12:10 +0100, Herbert Kleebauer wrote:
> Does the OpenWatcom assembler allow to use [] as normal
> parenthesis like in [3+4]*5 ?

Good point, apparently it does. I wasn't expecting such possibility.

Therefore, the [] notation can either mean "parenthesis" or
"dereference", depending on the context.

This:

_asm {
mov dx, [3+4]*5
mov cx, buff
mov dx, [buff]
}

Translates to that:

BA 23 00 mov dx,0x0023
8B 8E F8 FF mov cx,word ptr -0x8[bp]
8B 96 F8 FF mov dx,word ptr -0x8[bp]


Makes coding in this asm dialect even more "challenging"...


Mateusz

Ross Ridge

unread,
Nov 15, 2021, 11:53:58 AM11/15/21
to
Mateusz Viste <mat...@xyz.invalid> wrote:
>That is correct indeed, thanks. I have been confused by what "buff"
>truly is. It is still weird that buff and [buff] are the same, but
>that's certainly due to the magic introduced by the C compiler to
>reference C variables within the inline assembly block.

Watcom is apparently using MASM syntax where for the most part square
brackets don't have much meaning. The exception is when a register is
used inside the brackets, and so "bx" and "[bx]" mean two different
things, while "foo" and "[foo]" mean the same thing. What "foo" and
"[foo]" mean depends on how "foo" was defined.

I wrote an answer on Stack Overflow that gives some examples on how MASM
treats square brackets:

https://stackoverflow.com/questions/25129743/confusing-brackets-in-masm32

--
l/ // Ross Ridge -- The Great HTMU
[oo][oo] rri...@csclub.uwaterloo.ca
-()-/()/ http://www.csclub.uwaterloo.ca/~rridge/
db //
Reply all
Reply to author
Forward
0 new messages