assume cs:code,ds:data,es:ave
data segment
db '1975','1976','1977','1978','1979','1980','1981','1982','1983'
db '1984','1985','1986','1987','1988','1989','1990','1991','1992'
data ends
code segment
start:
call dispalY
dispalY: mov ax, data
mov ds, ax
mov si, 0
mov di, 0
mov ax, 0B800h
mov es, ax
mov bx, 0
mov ax, stack
mov ss, ax
mov sp, 6
mov dx, 0
YL:
mov al, 0A0h
mul bx
push bx
mov cx, 4
YLL:mov dl, [si]
cmp dx, 0
jz ok
mov bx, ax
mov byte ptr es:[bx + di], dl
add di, 2
inc si
loop YLL
pop bx
inc bx
jmp YL
ok:ret
code ends
end start
title testing
.286
.model small
.stack 100h
.data
db
'1975','1976','1977','1978','1979','1980','1981','1982','1983'
;i ended with 0 since your original program requries 0 to end.
db
'1984','1985','1986','1987','1988','1989','1990','1991','1992',0
msg db 'hello world',0dh,0ah,'$'
.code
main proc
call dispalY
mov ah,9h
mov dx,offset msg
int 21h
;your original program did not terminate.
;4c00 terminates the program.
mov ax,04C00h
int 21h
main endp
;moved the subroutines into procedures.
dispalY proc
;move the segment into ds.
mov ax,@data
mov ds, ax
mov si, 0
mov di, 0
mov ax, 0B800h
mov es, ax
;removed stack init since it is already initialized
;remove sp,6. Why? Since there is already a value
; on the stack itself. the previous call already pushed
;the return value.in this case movah,9h
mov bx, 0
mov dx, 0
YL:
mov al, 0A0h
;mul bx is multiplying ax -> dx:ax
mul bl
push bx
mov cx, 4
YLL:mov dl, [si]
cmp dl, 0
jz ok
mov bx, ax
mov byte ptr es:[bx + di], dl
add di, 2
inc si
loop YLL
pop bx
inc bx
jmp YL
ok:
;place the loop to allow the console to print out the video
;buffer on console.
mov cx,0FFFFh
to_me:
loop to_me
mov cx,0FFFFh
to_me1:
loop to_me1
mov cx,0FFFFh
to_me2:
loop to_me2
mov cx,0FFFFh
to_me3:
loop to_me3
mov cx,0FFFFh
to_me4:
loop to_me4
mov cx,0FFFFh
to_me5:
loop to_me5
mov cx,0FFFFh
to_me6:
loop to_me6
mov ah,9h
mov dx,offset msg
int 21h
;notice I POP again since there is already a value when
;you push bx
pop bx
ret
dispalY endp
end
On Tue, 6 Jun 2006 00:11:22 +0800, "Liys" <leon8...@yahoo.com.cn>
wrote:
Thanks John
I blindly copied this code sinppet from the program, here's the
original one, actually when I dubug it, there's always a "int 3"
interrupt comes after "mov cx, 3", this blocking me from debugging
further.
assume cs:code,ds:data
data segment
db
'1975','1976','1977','1978','1979','1980','1981','1982','1983'
db
'1984','1985','1986','1987','1988','1989','1990','1991','1992', 0
data ends
stack segment
db '0'
stack ends
code segment
start:
call dispalY
mov ax,4c00h
int 21h
mov cx, 4?
I think, after the first call with a too small stack,
the programm overwrite itself.
> assume cs:code,ds:data
>
>
> data segment
> db
> '1975','1976','1977','1978','1979','1980','1981','1982','1983'
> db
> '1984','1985','1986','1987','1988','1989','1990','1991','1992', 0
> data ends
>
> stack segment
> db '0'
> stack ends
Only one byte for the stack?
> code segment
> start:
> call dispalY
Every call-intruction push the return-adress(ip) to the stack.
(The ret-intruction pop the adress from the stack.)
A "far" call push additional the cs-register to the stack.
> mov ax,4c00h
> int 21h
>
>
> dispalY: mov ax, data
> mov ds, ax
The assume "ds:data" let the
instruction "mov ax, data" to get the segment of "data".
> mov si, 0
> mov di, 0
> mov ax, 0B800h
> mov es, ax
> mov bx, 0
> mov ax, stack
In MASM the intruction "mov ax, stack",
get the value in "stack" to ax(the same as mov ax,[stack]).
(NASM interpret this intruction as a pointer(offset) to the "stack".)
But i think, you want the segment to the "stack".
assume ss:stack
or
mov ax, seg stack
> mov ss, ax
...
> mov sp, 6
..bad code...
> mov dx, 0
...?
> YL:
> mov al, 0A0h
> mul bx
> push bx
..need additional 2 Byte stack.
(mov bp,bx....mov bx,bp = the same without stack)
>
> mov cx, 4
> YLL:mov dl, [si]
> cmp dx, 0
Reading after writing the same register, the CPU must wait.
Not an error, but not good for intruction-pairing.
Insert between "mov dl" and "cmp dx" some other instruction,
maybe "inc si" and "mov bx,ax".
YLL: mov dl, [si]
inc si ; here is no stall with si
mov bx, ax
cmp dx, 0
> jz ok
> ; mov bx, ax
> mov byte ptr es:[bx + di], dl
> add di, 2
> ; inc si
> loop YLL
> pop bx
> inc bx
> jmp YL
> ok:ret
>
> code ends
> end start
Dirk
The code that you copied is very bad. Try not to copy from this
kind of code anymore. To learn more on assembly pls visit Randall Hyde
web-site. Btw this like homework from school:) Blindly copying code is
not a good practice. Understand each line of code first. Have a mental
picture of the intended output and think like a machine or for your
case a debugger. You need not act exactly like a debugger but only
identify the changes that you need to know.
John
Hi Leon,
That's pretty weird. You aren't using a debugger that "retains"
breakpoints between sessions, are you? Almost sounds like it. (if so,
clear the damn breakpoint. if you don't know how, find out.)
Your original post indicated that the code was failing at that point -
"mov dl, [si]"... This seems very unlikely to me! In a protected OS, it
would segfault if si (it would be esi) were outside of strictly limited
memory that you "own". But in real mode - even fake real mode - any
value of si should be okay - might not get what you intend in dl, but it
shouldn't crash.
The really bad problem is that your stack is *way* too small. In real
mode, any interrupts that occur (timer interrupt, keyboard interrupt,
etc.) use your stack, in addition to any use you make of it - which is
already enough to "blow the stack".
Another bad problem is that you exit from your loop with bx on the
stack, and then "ret". You understand (?) that when you "call" a
subroutine, the return address (the next instruction after the call) is
stored on the stack, and when you "ret" the address to return to is
fetched off the stack. If the first thing on the stack is some random
value of bx, instead of the return address, it'll definitely crash!
Altering ss:sp after the call and before the ret is going to cause a
similar problem! If you're going to do that at all (which I would advise
against), do it first.
Your original post lacked a clean exit, but that was just a "posto", I
guess.
> assume cs:code,ds:data
>
>
> data segment
You might be better of to give this data a name - "years_table" or
something, and referring to it by that (mov si, offset years_table),
rather than just putting it first in the segment and assuming offset 0.
It almost certainly *is* at offset zero, but your code would be more
robust and more flexible if you *didn't* make that assumption.
> db
> '1975','1976','1977','1978','1979','1980','1981','1982','1983'
> db
> '1984','1985','1986','1987','1988','1989','1990','1991','1992', 0
> data ends
>
> stack segment
> db '0'
Insanely too small! You probably only need a couple dozen bytes, but it
doesn't hurt to allow some extra - and make 'em words, that's what the
stack works with.
dup 100h (dw ?)
(I'm guessing at the syntax)
> stack ends
>
> code segment
> start:
Some of the stuff in your subroutine should logically go here, IMO.
> call dispalY
>
> mov ax,4c00h
> int 21h
>
>
> dispalY: mov ax, data
> mov ds, ax
This would ordinarily be done first thing in your program. I've seen
programs that do it in an "initialization" subroutine (pointless, IMHO -
you're only calling it once). Doesn't seem to me that it logically
belongs in your "dispalY" (God help us!) routine.
> mov si, 0
If you set si outside of the subroutine, you could use it to display
*any* data, not just this "year list". (there are other things you'd
need to change). I think you'd be better off to use a named variable
instead of 0, in any case.
> mov di, 0
> mov ax, 0B800h
> mov es, ax
You might want to do the setup of es "first thing", as part of your
initialization - and leave es as "screen segment" for the entire
program. Or, you might want to leave it here as part of the subroutine.
If you do that, you might want to preserve the caller's es.
> mov bx, 0
This is the "line number", I guess. Would fit in bl...
> mov ax, stack
> mov ss, ax
> mov sp, 6
You *definitely* don't want to do this in a subroutine! I don't think
you want to do it at all!!! The example in the Nasm manual shows
initialization of ss:sp (but "first thing"). But why??? If your stack
segment is declared properly, the assembler will tell the linker about
it, the linker will put it in the executable header, and dos will set up
ss:sp when it loads. Step through this with a debugger, and see what
happens with, and without it. I say leave it out. Fastest and shortest
code you can imagine - and easy to write, too! :)
In any case, "don't do it here".
> mov dx, 0
Might not need this... You "cmp dx, 0" to find out if dl is zero, so you
*do* want it (maybe), but why not "cmp dl, 0"?
> YL:
> mov al, 0A0h
> mul bx
"mul bx" multiplies bx times ax - what's in ah? "mul bl" will do the job
here.
> push bx
>
>
> mov cx, 4
Okay if your subroutine is only going to display *this* data. What if
you had a "month_table db "Jan", "Feb",... too? Whole new subroutine,
that differs only in this number?
> YLL:mov dl, [si]
> cmp dx, 0
"cmp dl, 0" might be more appropriate here, since that's what you
actually care about.
> jz ok
> mov bx, ax
> mov byte ptr es:[bx + di], dl
> add di, 2
> inc si
> loop YLL
> pop bx
> inc bx
> jmp YL
> ok:
pop bx
Won't return without it!
> ret
>
>
> code ends
> end start
This might be a "better" subroutine if it accepted, as parameters, row,
column, color, and text to display, instead of having it all "hard
coded" in. Would require several changes in the way you're doing things.
Get it working this way first, then if you decide to improve it, work in
small steps, and keep a backup of each one that actually works!
Best,
Frank
> In real mode, any interrupts that occur (timer interrupt, keyboard interrupt,
> etc.) use your stack,
I think every interrupt use his own stack?
> Another bad problem is that you exit from your loop with bx on the
> stack, and then "ret". You understand (?) that when you "call" a
> subroutine, the return address (the next instruction after the call) is
> stored on the stack, and when you "ret" the address to return to is
> fetched off the stack. If the first thing on the stack is some random
> value of bx, instead of the return address, it'll definitely crash!
>
> Altering ss:sp after the call and before the ret is going to cause a
> similar problem! If you're going to do that at all (which I would advise
> against), do it first.
;-----------------------
; masm blub
;-----------------------
assume cs:CODE,ds:DATEN,ss:STAPEL
CODE SEGMENT use16 'CODE'
START: mov ax, DATEN
mov ds, ax
call FOO ; push the adress to stack using ss:sp
EXIT: mov ax,4c00h
int 21h
;------------------------
FOO: mov [RETSP], sp ; store sp and ss to data using ds:mem
mov [RETSS], ss
mov sp, OFFSET VALUE ; no problem to use ss:sp
mov ax, DATEN
mov ss, ax
mov word ptr ss:[sp], 1234h
mov sp, [RETSP] ; load sp and ss from data using ds:mem
mov ss, [RETSS]
ret ; pop the the adress from stack using ss:sp
CODE ends
;------------------------
DATEN SEGMENT use32 'DATA'
RETSP DW 0
RETSS DW 0
VALUE DW 0
DATEN ends
;-----------------------
STAPEL SEGMENT use16 STACK 'STACK'
DB 10h dup (?)
STAPEL ends
;-----------------------
> It almost certainly *is* at offset zero, but your code would be more
> robust and more flexible if you *didn't* make that assumption.
...else not easy to found those hidden error.
>> db
>> '1975','1976','1977','1978','1979','1980','1981','1982','1983'
>> db
>> '1984','1985','1986','1987','1988','1989','1990','1991','1992', 0
>> data ends
>>
>> stack segment
>> db '0'
>
>
> Insanely too small! You probably only need a couple dozen bytes, but it
> doesn't hurt to allow some extra - and make 'em words, that's what the
> stack works with.
>
> dup 100h (dw ?)
>
> (I'm guessing at the syntax)
...look above.
Dirk
In real mode??? News to me... Could be.
(incidentally, we have a difference of opinion on what "assume" does...)
If an interrupt were to occur in this area (real mode code), I think
you'd be scribbling on "who knows where". Well... we do know: where we
stored the old ss:sp values. Potential problem here. :)
> mov sp, [RETSP] ; load sp and ss from data using ds:mem
> mov ss, [RETSS]
Or you might consider "lss sp, [retsp]". That would be guaranteed
"atomic". At least, do ss first. That will disable interrupts until the
next instruction executes... except on buggy processors. The "old
school" way was to surround ss:sp changes with cli/sti.
> ret ; pop the the adress from stack using ss:sp
> CODE ends
> ;------------------------
> DATEN SEGMENT use32 'DATA'
> RETSP DW 0
> RETSS DW 0
> VALUE DW 0
> DATEN ends
> ;-----------------------
Suit yourself... You really want "use32"???
> STAPEL SEGMENT use16 STACK 'STACK'
> DB 10h dup (?)
> STAPEL ends
> ;-----------------------
...
>>dup 100h (dw ?)
>>
>>(I'm guessing at the syntax)
>
> ...look above.
Do I get partial credit for "close"? :) Why so small? Are we in a
limited memory environment?
Best,
Frank
"DATEN" store the data-segment?
An interupt must store first SS and SP, than initialize his own stack
and push all other register specialy CS and IP.
Is this correct for real mode?
>> mov sp, [RETSP] ; load sp and ss from data using ds:mem
>> mov ss, [RETSS]
>
>
> Or you might consider "lss sp, [retsp]". That would be guaranteed
> "atomic". At least, do ss first. That will disable interrupts until the
> next instruction executes... except on buggy processors. The "old
> school" way was to surround ss:sp changes with cli/sti.
Ok, this is otherwise an error.
>
>> ret ; pop the the adress from stack using ss:sp
>> CODE ends
>> ;------------------------
>> DATEN SEGMENT use32 'DATA'
>> RETSP DW 0
>> RETSS DW 0
>> VALUE DW 0
>> DATEN ends
>> ;-----------------------
>
> Suit yourself... You really want "use32"???
Whats wrong with an aligment of 32 bit?
Ok, when i load 64 bit to a mmx-register,
it is better an aligment of 64bit.
Maybe "org" can help, but i never test it in the data-segment.
DATEN SEGMENT use32 'DATA'
DATA1 DD 0
org DATA1 + ((($-DATA1)/64)*64)+64
DATA2 DD 0, 0
DATEN ends
>> STAPEL SEGMENT use16 STACK 'STACK'
>> DB 10h dup (?)
>> STAPEL ends
>> ;-----------------------
>
>>> dup 100h (dw ?)
>>>
>>> (I'm guessing at the syntax)
>>
>> ...look above.
>
> Do I get partial credit for "close"? :) Why so small? Are we in a
> limited memory environment?
I rarely use push/pop, i use the data-segment to store values with "mov",
the routine run faster and the stack can be smaller.
Dirk
...
>>(incidentally, we have a difference of opinion on what "assume" does...)
>
> "DATEN" store the data-segment?
Ummm, I guess so... I understood you to say that we can use *either*
"assume ds:DATEN" or "mov ax, DATEN"/"mov ds, ax". I don't think
"assume" *does* do that for us. I'm not a Masm/Tasm user, so I'm not
really certain what "assume" does. I don't think it does anything, in
this case. My understanding is that if we had *two* (or more) data
segments - "DATEN1" and "DATEN2", which overlapped, and a variable - "x"
- declared in "DATEN1", but reachable from either segment, and if we did.
assume ds:DATEN1
mov ax, DATEN2
mov ds, ax
mov al, x
The "assume" would fix up the offset to "x". Much like Nasm's "wrt".
mov al, [x wrt DATEN2]
Never had occasion to use two data segments, so I'm not at all sure, but
I think that's how it works. I think we (Masm users, that is) can also
do "assume bx: byte ptr" and then do "inc [bx]" without having to
specify a size. Some experimentation would confirm/deny this, or mabe a
Masm expert can straighten me out... If anyone cares...
...
> An interupt must store first SS and SP, than initialize his own stack
I could be wrong, but I don't think that happens in real mode. Assuming
that an interrupt uses its own stack, where is it?
> and push all other register specialy CS and IP.
I think CS and IP... and flags, are already "pushed" (and "popped" by
"iret"). Other registers that we modify in our ISR, definitely!
> Is this correct for real mode?
I'm not certain, but I think not. We could test it with your code -
fiddle and diddle around long enough so an interrupt *will* occur. Maybe
use John's delay loop (What a thing to show a newbie!!!). :)
In *emulated* real mode, interrupts are "faked" and may get a new stack
as you describe...
>>Or you might consider "lss sp, [retsp]". That would be guaranteed
>>"atomic". At least, do ss first. That will disable interrupts until the
>>next instruction executes... except on buggy processors. The "old
>>school" way was to surround ss:sp changes with cli/sti.
>
>
> Ok, this is otherwise an error.
Not exactly an "error" but something that *might* go boom if an
interrupt occurs at a bad time. As processors get faster, interrupts
occur relatively less often, so the chance of this actuaaly breaking is
probably one in several millions...
>>> DATEN SEGMENT use32 'DATA'
...
>>Suit yourself... You really want "use32"???
>
> Whats wrong with an aligment of 32 bit?
Okay, sorry. Unfamiarity with Masm. In Nasm, "use 32" would give us a
32-bit segment, which we wouldn't want (linker would complain - I don't
think Nasm cares). 32-bit alignment is no problem.
> Ok, when i load 64 bit to a mmx-register,
> it is better an aligment of 64bit.
... or 64-bit...
> Maybe "org" can help, but i never test it in the data-segment.
>
> DATEN SEGMENT use32 'DATA'
>
> DATA1 DD 0
>
> org DATA1 + ((($-DATA1)/64)*64)+64
>
> DATA2 DD 0, 0
>
> DATEN ends
"org" works differently in Masm than Nasm. I think that'd work. I think
you would potentially add a full 64 bits of padding in some cases.
org DATA1 + ((($-DATA1 + 63)/64)*64)
Should "round up" without any "excess", if I understand what it's doing...
Does Masm not have an "align" directive???
...
> I rarely use push/pop, i use the data-segment to store values with "mov",
> the routine run faster and the stack can be smaller.
Smaller stack, yeah. Why do you say it would be faster? Seems to me,
stack is probably in cache. Data may or may not be - probably is, but...
Code is larger for mov than push/pop. I'm not sure this *is* a "win".
Have you tested it?
Sorry to be so disagreeable. Must be that time of month. :)
Best,
Frank
Ah, i惴 new with Nasm.
> Never had occasion to use two data segments, so I'm not at all sure, but
> I think that's how it works. I think we (Masm users, that is) can also
> do "assume bx: byte ptr" and then do "inc [bx]" without having to
> specify a size. Some experimentation would confirm/deny this, or mabe a
> Masm expert can straighten me out... If anyone cares...
>
> ...
>
>> An interupt must store first SS and SP, than initialize his own stack
>
> I could be wrong, but I don't think that happens in real mode. Assuming
> that an interrupt uses its own stack, where is it?
In this case i take a look in some samples for a new timer-interupt.
There is no stack included, but many push/pop instructions.
Now i惴 wondering, how does it work, when my program don愒 start.
I gues the stack is declared in command.com?
>> and push all other register specialy CS and IP.
>
> I think CS and IP... and flags, are already "pushed" (and "popped" by
> "iret"). Other registers that we modify in our ISR, definitely!
>
>> Is this correct for real mode?
>
> I'm not certain, but I think not. We could test it with your code -
> fiddle and diddle around long enough so an interrupt *will* occur. Maybe
> use John's delay loop (What a thing to show a newbie!!!). :)
>
> In *emulated* real mode, interrupts are "faked" and may get a new stack
> as you describe...
Ah....
>>> Or you might consider "lss sp, [retsp]". That would be guaranteed
>>> "atomic". At least, do ss first. That will disable interrupts until the
>>> next instruction executes... except on buggy processors. The "old
>>> school" way was to surround ss:sp changes with cli/sti.
>>
>> Ok, this is otherwise an error.
>
> Not exactly an "error" but something that *might* go boom if an
> interrupt occurs at a bad time. As processors get faster, interrupts
> occur relatively less often, so the chance of this actuaaly breaking is
> probably one in several millions...
>
>>>> DATEN SEGMENT use32 'DATA'
>
>
>>> Suit yourself... You really want "use32"???
>>
>> Whats wrong with an aligment of 32 bit?
>
> Okay, sorry. Unfamiarity with Masm. In Nasm, "use 32" would give us a
> 32-bit segment, which we wouldn't want (linker would complain - I don't
> think Nasm cares).
Aha.
> 32-bit alignment is no problem.
>
>> Ok, when i load 64 bit to a mmx-register,
>> it is better an aligment of 64bit.
>
>
> ... or 64-bit...
>
>> Maybe "org" can help, but i never test it in the data-segment.
>>
>> DATEN SEGMENT use32 'DATA'
>>
>> DATA1 DD 0
>>
>> org DATA1 + ((($-DATA1)/64)*64)+64
>>
>> DATA2 DD 0, 0
>>
>> DATEN ends
>
>
> "org" works differently in Masm than Nasm. I think that'd work. I think
> you would potentially add a full 64 bits of padding in some cases.
>
> org DATA1 + ((($-DATA1 + 63)/64)*64)
>
> Should "round up" without any "excess", if I understand what it's doing...
>
> Does Masm not have an "align" directive???
I think the older MASM5 have not.
>> I rarely use push/pop, i use the data-segment to store values with "mov",
>> the routine run faster and the stack can be smaller.
>
> Smaller stack, yeah. Why do you say it would be faster?
"mov" is faster than push/pop specialy on older 386-CPUs.
> Seems to me,
> stack is probably in cache. Data may or may not be - probably is, but...
> Code is larger for mov than push/pop. I'm not sure this *is* a "win".
> Have you tested it?
No...
> Sorry to be so disagreeable. Must be that time of month. :)
...no problem.
This is a sunny morning, but many chemtrails in the sky over hamburg.
(we need more cloudbuster)
Dirk
...
>>>I rarely use push/pop, i use the data-segment to store values with "mov",
>>>the routine run faster and the stack can be smaller.
>>
>>Smaller stack, yeah. Why do you say it would be faster?
>
>
> "mov" is faster than push/pop specialy on older 386-CPUs.
>
>
>>Seems to me,
>>stack is probably in cache. Data may or may not be - probably is, but...
>>Code is larger for mov than push/pop. I'm not sure this *is* a "win".
>>Have you tested it?
>
>
> No...
Well, neither have I... This can be cured, however. Here's a first
draft. I make no representation that it's correct, or measures anything
interesting. It doesn't segfault... for me.
On my K6-300, the _MOV version is 135 or 90. The _PUSH version is 121 or
76... with occasional outliers... Looks like push/pop might be faster,
but not really definitive...
I blindly assume that there's nothing in edx - safe enough, I think, but
we can come up with a 64-bit version... We can come up with a Windows
version, if anyone wants to fool with that...
Comments and corrections?
Best,
Frank
; nasm -f elf pushvsmov.asm -d_MOV (or "-d_PUSH")
; ld -o pushvsmov pushvsmov.o
global _start
section .bss
eax_sav resd 1
ebx_sav resd 1
ecx_sav resd 1
edx_sav resd 1
esi_sav resd 1
edi_sav resd 1
section .text
_start:
nop
xor eax, eax
cpuid
rdtsc
push edx
push eax
%ifdef _MOV
mov [eax_sav], eax
mov [ebx_sav], ebx
mov [ecx_sav], ecx
mov [edx_sav], edx
mov [esi_sav], esi
mov [edi_sav], edi
mov edi, [edi_sav]
mov esi, [esi_sav]
mov edx, [edx_sav]
mov ecx, [ecx_sav]
mov ebx, [ebx_sav]
mov eax, [eax_sav]
%elifdef _PUSH
push eax
push ebx
push ecx
push edx
push esi
push edi
pop edi
pop esi
pop edx
pop ecx
pop ebx
pop eax
%else
%error 'must define _MOV or _PUSH'
%endif
xor eax, eax
cpuid
rdtsc
pop ebx
pop ecx
sub eax, ebx
sbb edx, ecx
call showeaxd
xor ebx, ebx
mov eax, 1
int 80h
;---------------------------------
showeaxd:
push eax
push ebx
push ecx
push edx
push esi
sub esp, 10h
lea ecx, [esp + 12]
mov ebx, 10
xor esi, esi
mov byte [ecx], 0
.top:
dec ecx
xor edx, edx
div ebx
add dl, '0'
mov [ecx], dl
inc esi
or eax, eax
jnz .top
mov edx, esi
mov ebx, 1
mov eax, 4
int 80h
add esp, 10h
pop esi
pop edx
pop ecx
pop ebx
pop eax
ret
;---------------------------------
Ok i test it on my K6-2@550mhz.
Dirk
> Frank Kotler schrieb:
>
>>Dirk Wolfgang Glomp wrote:
>>
>>>>>I rarely use push/pop, i use the data-segment to store values with
>>>>>"mov",
>>>>>the routine run faster and the stack can be smaller.
>>>>
>>>>Smaller stack, yeah. Why do you say it would be faster?
>>>
>>>"mov" is faster than push/pop specialy on older 386-CPUs.
>>>
>>>>Seems to me,
>>>>stack is probably in cache. Data may or may not be - probably is, but...
>>>>Code is larger for mov than push/pop. I'm not sure this *is* a "win".
>>>>Have you tested it?
>>>
>>>No...
>>
>>Well, neither have I... This can be cured, however. Here's a first
>>draft. I make no representation that it's correct, or measures anything
>>interesting. It doesn't segfault... for me.
>>
>>On my K6-300, the _MOV version is 135 or 90. The _PUSH version is 121 or
>>76... with occasional outliers... Looks like push/pop might be faster,
>>but not really definitive...
>>
>>I blindly assume that there's nothing in edx - safe enough, I think, but
>>we can come up with a 64-bit version... We can come up with a Windows
>>version, if anyone wants to fool with that...
>
> Ok i test it on my K6-2@550mhz.
Whith debian sarge(2.6) on my Asus-board(Ali):
219 - 302 push/pop
115 - 116 mov/mov
>>Comments and corrections?
mov [eax_sav], eax
mov [ebx_sav], ebx
mov [ecx_sav], ecx
mov [edx_sav], edx
mov [esi_sav], esi
mov [edi_sav], edi
In an other arrangement:
mov eax, [eax_sav]
mov ebx, [ebx_sav]
mov ecx, [ecx_sav]
mov edx, [edx_sav]
mov esi, [esi_sav]
mov edi, [edi_sav]
19 - 77 mov/mov
Dirk
...
>>>On my K6-300, the _MOV version is 135 or 90. The _PUSH version is 121 or
>>>76... with occasional outliers... Looks like push/pop might be faster,
>>>but not really definitive...
...
>>Ok i test it on my K6-2@550mhz.
>
> Whith debian sarge(2.6) on my Asus-board(Ali):
>
> 219 - 302 push/pop
> 115 - 116 mov/mov
Very strange!
...
> mov [eax_sav], eax
> mov [ebx_sav], ebx
> mov [ecx_sav], ecx
> mov [edx_sav], edx
> mov [esi_sav], esi
> mov [edi_sav], edi
>
> In an other arrangement:
>
> mov eax, [eax_sav]
> mov ebx, [ebx_sav]
> mov ecx, [ecx_sav]
> mov edx, [edx_sav]
> mov esi, [esi_sav]
> mov edi, [edi_sav]
>
> 19 - 77 mov/mov
Stranger still! If I reverse the order of the "restore"s like this - so
they're in the same order as the "save"s, instead of "stack-style" like
I had 'em, I get *longer* times! 113 - 194
What can this mean??? Are K6 and K6-2 *that* different? Looks like they
are... or there's something seriously wrong with my code...
Best,
Frank
Tomorrow i test it all with Knoppix on my other cpus,
Tbred 2700+@2166mhz and a Palomino 1800+@1533mhz.
Today i have no time my sister visit me surprise.
Dirk
> Stranger still! If I reverse the order of the "restore"s like this - so
> they're in the same order as the "save"s, instead of "stack-style" like
> I had 'em, I get *longer* times! 113 - 194
Perhaps you have offended the CPU by sandwiching ebx between eax and
ecx??? ;-)
>
> What can this mean??? Are K6 and K6-2 *that* different? Looks like they
> are... or there's something seriously wrong with my code...
Well, I didn't like the numbers that my machines were spitting at me
while they chewed on your code, so I fed them something HLA-flavoured
and minus the "push-vs-mov" test. I kept adding zeros to the end of my
'count' constant but they seemed to be spitting out the same range of
seemingly random numbers. Wasn't until I got to about 'count :=
400000' that the range of numbers changed significantly and settled
down to what might be meaningful instead of just jumping about wildly.
Perhaps my dates would prefer that I take them out for dinner instead
of cooking up this cruft?
program st;
#include("stdlib.hhf")
const
count :dword := 40;
static
first :str.strvar(16);
second :str.strvar(16);
space :char;
begin st;
again:
mov( ' ', space );
xor( eax, eax );
cpuid();
rdtsc();
conv.dToStr(edx, 8, space, first);
stdout.puts(first);
stdout.puts(" ");
conv.dToStr(eax, 8, space, second);
stdout.puts(second);
stdout.puts(" ");
push( edx );
push( eax );
// -- DO STUFF --
mov( count, eax );
lp:
dec( eax );
jnz lp;
// -- DO STUFF --
xor( eax, eax );
cpuid();
rdtsc();
pop( ebx );
pop( ecx );
sub( ebx, eax );
sbb( ecx, edx );
conv.dToStr(edx, 8, space, first);
stdout.puts(first);
stdout.puts(" ");
conv.dToStr(eax, 8, space, second);
stdout.puts(second);
stdout.newln();
jmp again;
end st;
Nathan.
Ah... just a slight change in the order of adding ingredients makes
this recipe more digestable:
program st;
#include("stdlib.hhf")
const
count :dword := 40;
static
first :str.strvar(16);
second :str.strvar(16);
space :char;
begin st;
again:
mov( ' ', space );
xor( eax, eax );
cpuid();
rdtsc();
push( edx );
push( eax );
// -- DO STUFF --
mov( count, eax );
lp:
dec( eax );
jnz lp;
// -- DO STUFF --
xor( eax, eax );
cpuid();
rdtsc();
pop( ebx );
pop( ecx );
sub( ebx, eax );
sbb( ecx, edx );
conv.dToStr(ecx, 8, space, first);
stdout.puts(first);
stdout.puts(" ");
conv.dToStr(ebx, 8, space, second);
stdout.puts(second);
stdout.puts(" ");
You may have something there. Suggests another varient to try, anyway.
> Well, I didn't like the numbers that my machines were spitting at me
> while they chewed on your code,
For example?
> so I fed them something HLA-flavoured
> and minus the "push-vs-mov" test.
Ummm, okay... Now we're timing HLA's number-to-string routine, I guess?
And print it. As you observe, you've gotta crank up the count pretty
good before the loop even makes a difference.
> I kept adding zeros to the end of my
> 'count' constant but they seemed to be spitting out the same range of
> seemingly random numbers. Wasn't until I got to about 'count :=
> 400000' that the range of numbers changed significantly and settled
> down to what might be meaningful instead of just jumping about wildly.
>
> Perhaps my dates would prefer that I take them out for dinner instead
> of cooking up this cruft?
Maybe if the best thing your dates make for dinner is "reservations". A
home cooked meal is better. "It's very good... What is it?"
Best,
Frank
Well, the lowest number I got on a AMD Duron 900 with MOV was 77; with
PUSH was 95.
On a Pentium MMX 233 the MOV was 250 and the PUSH was 252.
I added a count-to-40 loop to your NASM code and got these numbers:
AMD
MOV 356
PUSH 483
PENTIUM
MOV 546
PUSH 455
The AMD is on a K7S5A MOBO (which enjoyed some fame amoung the
overclockers during that time, but I haven't tinkered with it) and
running Puppy.
The Pentium is running DSL. Will have to try the AMD on Ubuntu later
just to see what happens.
> Maybe if the best thing your dates make for dinner is "reservations".
Well if she's gonna run off ta play with the injuns, I'll just cuddle
up with another critter. As long as it aint a skunk or a bar, I'll be
okay. :)
> A home cooked meal is better. "It's very good... What is it?"
Oops! Gotta find a fancy French-sounding name for "road-kill." ;)
Nathan.
Yeah, I like that modification better!
Best,
Frank
Okay, looks like Dirk was right - mov is faster.
> On a Pentium MMX 233 the MOV was 250 and the PUSH was 252.
Or as fast...
> I added a count-to-40 loop to your NASM code and got these numbers:
As we've determined, *when* an ingredient is added to a recipe is important.
> AMD
> MOV 356
> PUSH 483
>
> PENTIUM
> MOV 546
> PUSH 455
Well... I found some "lint" in my code. In the "print eax" routine there
was a:
mov byte [ecx], 0
left over from the "number to string" code (ah, the joys of
cut-and-paste). We don't need this string zero-terminated - we're
counting it (in esi). So I "saved face" by changing it to:
mov byte [ecx], 10
inc esi
so it prints a newline after each number. Now I can jump back to the top
like you do, and as expected, the values settle down. 43 for push/pop,
68/69 for mov/mov - fairly steady. I get similar results from a slight
modification of your program - 2Ah for push/pop and 45h/46h for mov/mov.
I have no explanation for the "off by one" difference between the Nasm
and HLA versions (in opposite directions for push and mov, no less). I'd
blame it on cosmic rays, but it seems quite repeatable...
> The AMD is on a K7S5A MOBO (which enjoyed some fame amoung the
> overclockers during that time, but I haven't tinkered with it) and
> running Puppy.
> The Pentium is running DSL. Will have to try the AMD on Ubuntu later
> just to see what happens.
Since we're counting cycles (I hope), the OS shouldn't matter too much -
unless we've got a *very* short time-slice. I suspect it's a "microcode"
issue, which means we can't do much but be aware of it.
> Oops! Gotta find a fancy French-sounding name for "road-kill." ;)
"Carne de la Rue au Push"
program st;
#include("stdlib.hhf")
const
count :dword := 4;
static
first :str.strvar(16);
second :str.strvar(16);
space :char;
begin st;
again:
mov( ' ', space );
xor( eax, eax );
cpuid();
rdtsc();
push( edx );
push( eax );
// -- DO STUFF --
push (eax);
push (ebx);
push (ecx);
push (edx);
push (esi);
push (edi);
pop (edi);
pop (esi);
pop (edx);
pop (ecx);
pop (ebx);
pop (eax);
// -- DO STUFF --
xor( eax, eax );
cpuid();
rdtsc();
pop( ebx );
pop( ecx );
sub( ebx, eax );
sbb( ecx, edx );
conv.dToStr(ecx, 8, space, first);
stdout.puts(first);
stdout.puts(" ");
conv.dToStr(ebx, 8, space, second);
stdout.puts(second);
stdout.puts(" ");
conv.dToStr(edx, 8, space, first);
stdout.puts(first);
stdout.puts(" ");
conv.dToStr(eax, 8, space, second);
stdout.puts(second);
stdout.newln();
jmp again;
end st;
And "Carne de la Rue au Mov"
program st;
#include("stdlib.hhf")
const
count :dword := 4;
static
first :str.strvar(16);
second :str.strvar(16);
space :char;
eax_sav :dword;
ebx_sav :dword;
ecx_sav :dword;
edx_sav :dword;
esi_sav :dword;
edi_sav :dword;
begin st;
again:
mov( ' ', space );
xor( eax, eax );
cpuid();
rdtsc();
push( edx );
push( eax );
// -- DO STUFF --
mov (eax, eax_sav);
mov (ebx, ebx_sav);
mov (ecx, ecx_sav);
mov (edx, edx_sav);
mov (esi, esi_sav);
mov (edi, edi_sav);
mov (eax_sav, eax);
mov (ebx_sav, ebx);
mov (ecx_sav, ecx);
mov (edx_sav, edx);
mov (esi_sav, esi);
mov (edi_sav, edi);
// -- DO STUFF --
xor( eax, eax );
cpuid();
rdtsc();
pop( ebx );
pop( ecx );
sub( ebx, eax );
sbb( ecx, edx );
conv.dToStr(ecx, 8, space, first);
stdout.puts(first);
stdout.puts(" ");
conv.dToStr(ebx, 8, space, second);
stdout.puts(second);
stdout.puts(" ");
conv.dToStr(edx, 8, space, first);
stdout.puts(first);
stdout.puts(" ");
conv.dToStr(eax, 8, space, second);
stdout.puts(second);
stdout.newln();
jmp again;
end st;
Best,
Frank
Did you try register indirect addressing for the move:
mov [edi],eax ; or what ever the correct NASM syntax is
mov 4[edi],ebx
:
:
This will halve the code length and maybe has an effect on the
execution time. Which speed do you get for stosd/lodsd?
stosd
mov eax,ebx
stosd
:
:
No. I was merely trying to prove Dirk wrong, so I did what I "assume" he
was talking about. So far, I seem to have proved that he's "wrong" for
my valuable antique hardware, but "right" for everything else! I should
have known that the correct answer is "it depends".
> mov [edi],eax ; or what ever the correct NASM syntax is
> mov 4[edi],ebx
"mov [edi + 4], ebx", actually. Unlike "Intel syntax", where '[' and ']'
are equivalent to '+', Nasm just uses '[]' to indicate an address - and
the entire EA - including segment override*, if any - goes in the '[]'.
Without them, Nasm assumes an "immediate", without requireing "offset",
'$' or '#' to indicate so. (Yes, this means that register names are a
"special case"...)
* "es mov [...], ..." is also acceptable, and more closely reflects the
machine code, perhaps.
But, back to business... The purpose of the push/pop or mov/mov was to
"preserve" registers across the function (we have an empty function, but
the assumption is that registers will be trashed). May have wider
implications than that, but that's what I had in mind. Introducing edi -
which will itself have to be saved/restored some way - doesn't seem
likely to help. However, indirect register addressing off of esp is
probably a "win". Definitely worth a try!
> This will halve the code length and maybe has an effect on the
> execution time. Which speed do you get for stosd/lodsd?
>
> stosd
> mov eax,ebx
> stosd
> :
> :
And then "std" and "lodsd" to restore 'em? Now we've trashed esi *and*
edi... Intuitively, this is looking "worser and worser", but... I've
proved what my "intuition" is worth...
Thanks for the interesting ideas to try!
Best,
Frank
> But, back to business... The purpose of the push/pop or mov/mov was to
> "preserve" registers across the function (we have an empty function, but
> the assumption is that registers will be trashed).
Isn't a pusha/popa the proper way to go?
SsssshhhhhhhH!!!! Hush up! That is cheating! And, is no fun! We
have been informed that there are many valuable and precious "casual
readers" watching the show -- so pay no attention to the man behind the
curtain. ;-)
Nathan.
For this one, if you place the *_sav variables on the stack, you'll get
different results. On my system (AMD64 3400), push/pop (and
pushad/popad) are faster than using the mov with the static memory.
Using mov with stack storage is faster than all of them.
pushad/popad had equivalent results to pushing/popping individually.
-sevag.k
www.geocities.com/kahlinor
The tests with the K6+2@550 ran with debian sarge without a gui.
...
Now a new test with Knoppix 4.02(Boot-CD) with a gui
----------------------------------------------------
[Tbred 2700+]
push/pop 76 - 132
mov/mov 76 - 111
mov/mov(2) 70 - 111
[Palomino 1800+]
push/pop 76 - 134
mov/mov 76 - 112
mov/mov(2) 70 - 112
I thing on an older 80386 the push and pop need more cycles,
else on modern CPUs.
Mayby on intel-plattforms(P4) push/pop win the match.
Dirk
> Frank Kotler wrote:>
>
>>Herbert Kleebauer wrote:
Hallo again.
>>But, back to business... The purpose of the push/pop or mov/mov was to
>>"preserve" registers across the function (we have an empty function, but
>>the assumption is that registers will be trashed).
>
> Isn't a pusha/popa the proper way to go?
A value from a labeled mem adress, can be offen read.....
Dirk
Is there anybody who can digit the micro-ops?
Dirk
Well I was talking about Frank's pretending not to be aware of the
differences between the CPU generations (we did similar speed tests
just a few months ago), but you've got an interesting point there. I
imagine it would probably be a grand job to be able to design your own
CPU and write the micro-ops for it. Fact is -- we *can* do this in a
virtual sense: build an emulator or virtual-machine. Thanks! You've
given me an idea... design an interpreter that emulates a simple CPU.
Nathan.
:You've
:given me an idea... design an interpreter that emulates a simple CPU.
Which is what I did a couple of years ago, when I wanted to resurrect some
programs which I had written for the Intel 8008. It's an interesting
exercise even if one is quite familiar with the cpu architecture -- and
very educational if one is not.
You can see a screenshot of the emulator on my webpage:
http://www.pacificsites.com/~ccrayne/charles.html
-- Chuck