Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

New life for MOVEM!

246 views
Skip to first unread message

smcg...@vax1.tcd.ie

unread,
Feb 11, 1991, 11:02:12 AM2/11/91
to
Hi 68000 users!

Here's a little trick that someone might find useful:
(maybe its common knowlage?)

Right, picture the problem; you want to move, say, 1200 bytes from A to B
QUICKLY but you couldn't be bothered getting the Blitter to do it/Blitter is
busy/You just don't know how to get the blitter to do it.

So you do it like this

LEA Source,A0
LEA Dest,A1
MOVE.W #300,D0 ; 1200 Bs=300 LWs
Loop: MOVE.L (A0)+,(A1)+
DBRA D0,Loop

How about this, which takes about 2/3 of the time of the above:

LEA Source,A0
LEA Dest,A1
MOVE.W #25,D0 ;25*48=1200 bytes
Loop: MOVEM.L (A0)+,D1-D7/A2-A6 ;12 LWs! = 48 bytes
MOVEM.L D1-D7/A2-A6,(A1)
ADDA.L #48,A1 ;since MOVEM can't have (A1)+ as Dest. operand
DBRA D0,Loop

Ok, so its a little register intensive, but you can always save all the regs
before using the routine, and restore them later.

Just to get a bit more speed, you could have a bigger loop, which has, say,
five itterations of the original loop in one loop, which saves you 4 DBRA
instructions for every 5-itteration. (I think thats almost 40 clock cycles!)
You may think thats trivial, but it all mounts up!

Anyone got any other tricks?

----------------------------------------------------------------------------
| / T | / Stephen John McGerty | Amiga // |
| / | |/ smcg...@vax1.tcd.ie (C.Sci.) | "Hmm.. No, nothing." \\// |
|__________________________________________|_______________________________|

Randell Jesup

unread,
Feb 19, 1991, 1:32:45 AM2/19/91
to
In article <1991Feb11....@vax1.tcd.ie> smcg...@vax1.tcd.ie writes:
>Here's a little trick that someone might find useful:
>(maybe its common knowlage?)

Yes.

[example of movem-loop follows..]

Or you could use CopyMem() (or CopyMemQuick() when you know the source
and destination are aligned). They use movem-loops when possible. (In
fact, under 2.0 CopyMem is adaptive to the processor in use).

Suprising what you can do when you use the OS....

--
Randell Jesup, Keeper of AmigaDos, Commodore Engineering.
{uunet|rutgers}!cbmvax!jesup, je...@cbmvax.commodore.com BIX: rjesup
The compiler runs
Like a swift-flowing river
I wait in silence. (From "The Zen of Programming") ;-)

smcg...@vax1.tcd.ie

unread,
Feb 21, 1991, 6:51:45 AM2/21/91
to
In article <19...@cbmvax.commodore.com>, je...@cbmvax.commodore.com (Randell Jesup) writes:
> In article <1991Feb11....@vax1.tcd.ie> smcg...@vax1.tcd.ie writes:
>>Here's a little trick that someone might find useful:
>>(maybe its common knowlage?)
> Yes.

Not judging by the response I got... Remember, there's always someone lower than
you on the learning curve....

> [example of movem-loop follows..]
> Or you could use CopyMem() (or CopyMemQuick() when you know the source
> and destination are aligned). They use movem-loops when possible. (In
> fact, under 2.0 CopyMem is adaptive to the processor in use).
> Suprising what you can do when you use the OS....
> --
> Randell Jesup, Keeper of AmigaDos, Commodore Engineering.

Hey, I don't doubt the OS is very fast and neat; we all use it quite often, and
its great etc etc.. However, as far as giving people a deeper understanding of
68000 programming is concerned , an example of a movem-loop in assembly is a
bit better than a recommendation to use an OS routine.

By writing my example, I wasn't really trying to fulfill someone's desire to
have a fast-copy-memory routine, but instead I wanted to stimulate an interest
in the techniques of using the 68000 efficiently.

If everyone purely relied on OS routines, without knowing how they worked, then
there would be a lot more ignorance about the nitty-gritty techniques of
programming the Amiga.

Re-inventing the wheel is often the best way of educating yourself. I find it
helpful, and I reckon others do too.

Matthew Dillon

unread,
Feb 22, 1991, 3:10:08 PM2/22/91
to
>>...

>
>Hey, I don't doubt the OS is very fast and neat; we all use it quite often, and
>its great etc etc.. However, as far as giving people a deeper understanding of
>68000 programming is concerned , an example of a movem-loop in assembly is a
>bit better than a recommendation to use an OS routine.
>
>By writing my example, I wasn't really trying to fulfill someone's desire to
>have a fast-copy-memory routine, but instead I wanted to stimulate an interest
>in the techniques of using the 68000 efficiently.
>
>Re-inventing the wheel is often the best way of educating yourself. I find it
>helpful, and I reckon others do too.
>...

I generally post this about once a year when the question comes up..
here is a fully working MOVMEM() call that optimizes via MOVEM:

-Matt

Matthew Dillon dil...@Overload.Berkeley.CA.US
891 Regal Rd. uunet.uu.net!overload!dillon
Berkeley, Ca. 94708
USA


; MOVMEM.A
;
; (c)Copyright 1990, Matthew Dillon, All Rights Reserved

section text,code

; movmem(src, dst, len) (ANSI)
; bcopy(src, dst, len) (UNIX)
; A0 A1 D0 DICE-REG
; A0 A1 D0 internal
; 4(sp) 8(sp) 12(sp)
;
; The memory move algorithm is somewhat more of a mess
; since we must do it either ascending or decending.

xdef _movmem
xdef _bcopy ; UNIX
xdef @movmem
xdef @bcopy ; UNIX


_bcopy:
_movmem: move.l 4(sp),A0
move.l 8(sp),A1
move.l 12(sp),D0
@bcopy:
@movmem:
cmp.l A0,A1 ;move to self
beq xbmend
bls xbmup
xbmdown adda.l D0,A0 ;descending copy
adda.l D0,A1
move.w A0,D1 ;CHECK WORD ALIGNED
lsr.l #1,D1
bcs xbmdown1
move.w A1,D1
lsr.l #1,D1
bcs xbmdown1
cmp.l #259,D0 ;chosen by calculation.
bcs xbmdown8

move.l D0,D1 ;overhead for bmd44: ~360
divu #44,D1
bvs xbmdown8 ;too big (> 2,883,540)
movem.l D2-D7/A2-A6,-(sp) ;use D2-D7/A2-A6 (11 regs)
move.l #44,D0
bra xbmd44b
xbmd44a sub.l D0,A0 ;8 total 214/44bytes
movem.l (A0),D2-D7/A2-A6 ;12 + 8*11 4.86 cycles/byte
movem.l D2-D7/A2-A6,-(A1) ; 8 + 8*11
xbmd44b dbf D1,xbmd44a ;10
swap D1 ;D0<15:7> already contain 0
move.w D1,D0 ;D0 = remainder
movem.l (sp)+,D2-D7/A2-A6

xbmdown8 move.w D0,D1 ;D1<2:0> = #bytes left later
lsr.l #3,D0 ;divide by 8
bra xbmd8b
xbmd8a move.l -(A0),-(A1) ;20 total 50/8bytes
move.l -(A0),-(A1) ;20 = 6.25 cycles/byte
xbmd8b dbf D0,xbmd8a ;10
sub.l #$10000,D0
bcc xbmd8a
move.w D1,D0 ;D0 = 0 to 7 bytes
and.l #7,D0
bne xbmdown1
xbmend
move.l 8(sp),D0
rts

xbmd1a move.b -(A0),-(A1) ;12 total 22/byte
xbmdown1 ; = 22 cycles/byte
xbmd1b dbf D0,xbmd1a ;10
sub.l #$10000,D0
bcc xbmd1a
move.l 8(sp),D0
rts

xbmup move.w A0,D1 ;CHECK WORD ALIGNED
lsr.l #1,D1
bcs xbmup1
move.w A1,D1
lsr.l #1,D1
bcs xbmup1
cmp.l #259,D0 ;chosen by calculation
bcs xbmup8

move.l D0,D1 ;overhead for bmu44: ~360
divu #44,D1
bvs xbmup8 ;too big (> 2,883,540)
movem.l D2-D7/A2-A6,-(sp) ;use D2-D7/A2-A6 (11 regs)
move.l #44,D0
bra xbmu44b
xbmu44a movem.l (A0)+,D2-D7/A2-A6 ;12 + 8*11 ttl 214/44bytes
movem.l D2-D7/A2-A6,(A1) ;8 + 8*11 4.86 cycles/byte
add.l D0,A1 ;8
xbmu44b dbf D1,xbmu44a ;10
swap D1 ;D0<15:7> already contain 0
move.w D1,D0 ;D0 = remainder
movem.l (sp)+,D2-D7/A2-A6

xbmup8 move.w D0,D1 ;D1<2:0> = #bytes left later
lsr.l #3,D0 ;divide by 8
bra xbmu8b
xbmu8a move.l (A0)+,(A1)+ ;20 total 50/8bytes
move.l (A0)+,(A1)+ ;20 = 6.25 cycles/byte
xbmu8b dbf D0,xbmu8a ;10
sub.l #$10000,D0
bcc xbmu8a
move.w D1,D0 ;D0 = 0 to 7 bytes
and.l #7,D0
bne xbmup1
move.l 8(sp),D0
rts

xbmu1a move.b (A0)+,(A1)+
xbmup1
xbmu1b dbf D0,xbmu1a
sub.l #$10000,D0
bcc xbmu1a
move.l 8(sp),D0
rts

END

David Jones

unread,
Feb 23, 1991, 7:13:53 AM2/23/91
to
>How about this, which takes about 2/3 of the time of the above:
>
> LEA Source,A0
> LEA Dest,A1
> MOVE.W #25,D0 ;25*48=1200 bytes
>Loop: MOVEM.L (A0)+,D1-D7/A2-A6 ;12 LWs! = 48 bytes
> MOVEM.L D1-D7/A2-A6,(A1)
> ADDA.L #48,A1 ;since MOVEM can't have (A1)+ as Dest. operand
> DBRA D0,Loop
>
>Anyone got any other tricks?

Ya. Save yourself some code. Check out CopyMem() in exec.library
(V33 or greater). Disassemble it. Essentially, it is the above code.

--

| The Q-Point David Jones
|\ Amiga S/W development UUCP: d...@qpoint.amiga.ocunix.on.ca
| \ Fido: 1:163/109.8
| \
| \ "I can understand why someone would want to go out, get drunk
| -\---- and wake up the next morning with a splitting headache and
| / \ absolutely no memory of the night before, but I *cannot*
| / \ understand why anyone would want to do that more than once."
|/ \
+---------- - Don Elgee

hugh...@vax1.tcd.ie

unread,
Mar 1, 1991, 11:25:11 PM3/1/91
to
In article <dej....@qpoint.amiga.ocunix.on.ca>, d...@qpoint.amiga.ocunix.on.ca (David Jones) writes:
>>In article <1991Feb11....@vax1.tcd.ie> smcg...@vax1.tcd.ie writes:
>>How about this, which takes about 2/3 of the time of the above:
>>
>>[..usage of movem deleted..]

>>
>>Anyone got any other tricks?
>
> Ya. Save yourself some code. Check out CopyMem() in exec.library
> (V33 or greater). Disassemble it. Essentially, it is the above code.

Hey cmon man, he doesn't want to hear about supplied software. Often you
find stuff written by someone else, particularly the OS, sucks. You want
one thing quick. It wants something else slow. So you write it _yourself_.
At least that way you know exactly what's going on, how fast, and everyone
will be able to use it. Not just people with V33 or greater, whatever
that is. He asks (if you read the posting) if anyone else has any tricks.
He wants to know if there are any other ways of squeezing more out of what
is basically a not-very-fast-processor. One byte per 4 cycles stinks, so
what'd it be like without movem? Are there any other ways of doing something
else faster; try and get summat out of the machine, if you don't want to
waste your money on a bigger chip in the series? Don't say find out about
the OS, because it is a heap of it. You want _real_optimisation_ for the
specific problem, for which some general ideas may help. Movem is one. The
OS is not. Matt Dillon's program is very nice, coping with non-word
boundaries and everything, but if you want _everything_ out of the machine,
forget those checks. Align your data, and use the plain movems. Shove the
loop in a cupboard, and in-line the code. On a processor running at the
speed of a low 68000, those cycles count. Save them. Don't give a damn about
memory. Remember, only a heartless fiend can get the true max out of the
machine. Work everything to the bloody stumps, and waste everything else.

T.

SICK - the Slightly Intelligent Crazy Rosebi -
We came. We saw. We went away again.
#! r

Lamonte Koop

unread,
Mar 3, 1991, 9:36:36 PM3/3/91
to
hugh...@vax1.tcd.ie writes:
>In article <dej....@qpoint.amiga.ocunix.on.ca>, d...@qpoint.amiga.ocunix.on.ca (David Jones) writes:
>>>In article <1991Feb11....@vax1.tcd.ie> smcg...@vax1.tcd.ie writes:
>>>How about this, which takes about 2/3 of the time of the above:
>>>
>>>[..usage of movem deleted..]
>>>
>>>Anyone got any other tricks?
>>
>> Ya. Save yourself some code. Check out CopyMem() in exec.library
>> (V33 or greater). Disassemble it. Essentially, it is the above code.
>
>Hey cmon man, he doesn't want to hear about supplied software. Often you
>find stuff written by someone else, particularly the OS, sucks. You want

Not in my experience. Just because the OS is "supplied" or written by
someone else, it doesn't mean you have to go about re-inventing the wheel
because you feel "it sucks"...a feeling which I strongly disagree with. Yes,
the OS has it's problems, but it has quite a few excellent points to it as
well.

>one thing quick. It wants something else slow. So you write it _yourself_.
>At least that way you know exactly what's going on, how fast, and everyone
>will be able to use it. Not just people with V33 or greater, whatever
>that is. He asks (if you read the posting) if anyone else has any tricks.
>He wants to know if there are any other ways of squeezing more out of what
>is basically a not-very-fast-processor. One byte per 4 cycles stinks, so
>what'd it be like without movem? Are there any other ways of doing something
>else faster; try and get summat out of the machine, if you don't want to
>waste your money on a bigger chip in the series? Don't say find out about
>the OS, because it is a heap of it. You want _real_optimisation_ for the
>specific problem, for which some general ideas may help. Movem is one. The
>OS is not. Matt Dillon's program is very nice, coping with non-word
>boundaries and everything, but if you want _everything_ out of the machine,
>forget those checks. Align your data, and use the plain movems. Shove the
>loop in a cupboard, and in-line the code. On a processor running at the
>speed of a low 68000, those cycles count. Save them. Don't give a damn about
>memory. Remember, only a heartless fiend can get the true max out of the
>machine. Work everything to the bloody stumps, and waste everything else.

From you comments, I have just a few of my own: First of all, I have
absolutely nothing against optimizing code...in fact I am all for it, and any
ideas pertaining to it. However, your attitude seems to be quite hostile
towards the OS...which is NOT "full of it". In fact, you seem to be the sort
who would write code which crashes just about every machine except a
particular model. This may not be the case, but I don't see how you would get
decently multitasking-friendly applications when you avoid the OS.

Second of all, how to you propose to get anything done...when you insist on
reinventing everything?

>
>T.
>
>SICK - the Slightly Intelligent Crazy Rosebi -
>We came. We saw. We went away again.
>#! r


LaMonte Koop
Internet: lk...@pnet01.cts.com ARPA: crash!pnet01!lk...@nosc.mil
UUCP: {hplabs!hp-sdd ucsd nosc}!crash!pnet01!lkoop
"It's a dog-eat-dog world...and I'm wearing Milk Bone underwear"--Norm

Randell Jesup

unread,
Mar 5, 1991, 2:50:31 AM3/5/91
to
In article <1991Mar2.0...@vax1.tcd.ie> hugh...@vax1.tcd.ie writes:
>In article <dej....@qpoint.amiga.ocunix.on.ca>, d...@qpoint.amiga.ocunix.on.ca (David Jones) writes:
>> Ya. Save yourself some code. Check out CopyMem() in exec.library
>> (V33 or greater). Disassemble it. Essentially, it is the above code.
>
>Hey cmon man, he doesn't want to hear about supplied software. Often you
>find stuff written by someone else, particularly the OS, sucks. You want
...

>will be able to use it. Not just people with V33 or greater, whatever
>that is.

V33 is 1.2. Anyone who is running anything earlier than 1.2 deserves
10 lashes with a wet noodle (since 1.0 and 1.1 were only available on A1000's,
and they can upgrade in a snap - almost all modern stuff requires 1.2).

>waste your money on a bigger chip in the series? Don't say find out about
>the OS, because it is a heap of it. You want _real_optimisation_ for the
>specific problem, for which some general ideas may help. Movem is one. The
>OS is not. Matt Dillon's program is very nice, coping with non-word
>boundaries and everything, but if you want _everything_ out of the machine,
>forget those checks. Align your data, and use the plain movems. Shove the
>loop in a cupboard, and in-line the code.

Guess what: what you suggest is exactly what's in the OS. There's
CopyMem(), for non-aligned data (ala matt's), and CopyMemQuick(), for
aligned data. It can't inline the code, but if you're transferring enough
data for movem-loops to make a difference, the cycles for a single
subroutine call to start it is WAY down in the noise (plus you win in that
on a chip-only machine, ROM access can be far faster than ram access,
depending on video mode).

And if you happen to run your code on 2.0 with an '020 or better,
suddenly your copies get even quicker, since we have separate copy loops
for different processors.

--
Randell Jesup, Keeper of AmigaDos, Commodore Engineering.

Stephan Schaem

unread,
Mar 5, 1991, 2:57:42 PM3/5/91
to

Talking about people that dont like their OS function....
If you think something is not fit, creat your own: why be stuck
with other people way of thinking?!
I'm not saying replacing but doing addition/extension.
I dont extensilvy use intuition (screen mostly) since I have other
need and Have fight to mutch to get things to be done the intuition
way.
The previews example where text: there should be diferent way to
handle text, and I find FF or the 2.0 'emulation' not at text display
'peak'.So when I need special text feature I use my own library
Alway using the OS is not ALWAY the best solution, and should be the
only way to make things work...
0 new messages