function XPos(theChr:char;Start:smallint;var
theStr:Str255):smallint;assembler;
asm
mov ax,Start {ax=Start (N of theStr[N] to start
search)}
cmp ax,1 {make sure it's GT 1}
jl @ZLen {else Error}
les di,theStr {es:[di] = @theStr}
...
How to get a string address in the registers these days?
-Paul
What Delphi do you use?
> les di,theStr {es:[di] = @theStr}
This only works in D1, AFAIK, or is at least superfluous in 32 bit
Delphi.
Ingvar Nilsen
there is a group dealing with assembler in Delphi, the BASM group.
You question would be equal (or more) appropriate there.
Don't change the segment registers in a 32 Bit program.
"les di,theStr" loads ES and DI in one opcode.
The 32 Bit way is: "mov EDI,theStr"
DI is 16 Bit wide, EDI is 32 Bit wide.
You should also keep Integer as type for the "Start"
parameter and load in into EAX (mov EAX,Start). It can
then be easier used as an Index to address the character.
Take a look at the new addressing formats you can use since
the 386.
Dirk Seifert
> I'm converting many 16 bit (Turbo Pascal) string functions
> for use in Delphi. The code below first objected to integer
The others have given you some specifics which should be useful. I just
wanted to add that when you make a transition like this, it's a good
idea to "start from scratch" with any assembler code. That is go back
to the original algorithm in pascal and then re-verify the need for ASM
in the first place. Delphi code is much better than TP's.
Bob Lee
Dirk Seifert wrote:
> > les di,theStr {es:[di] = @theStr}
>
> Don't change the segment registers in a 32 Bit program.
> "les di,theStr" loads ES and DI in one opcode.
> The 32 Bit way is: "mov EDI,theStr"
> DI is 16 Bit wide, EDI is 32 Bit wide.
Thanks for the answer.
> You should also keep Integer as type for the "Start"
> parameter and load in into EAX (mov EAX,Start). It can
> then be easier used as an Index to address the character.
> Take a look at the new addressing formats you can use since
> the 386.
I would have been happy just getting by with as few changes from
16 bit as possible but it doesn't appear it will be so easy.
I've got old manuals for one and D4 help barely touches the
surface.
For anyone.... is there a Delphi source for unbundled asm
documentation? A third party book to recommend that can be
bought online?
-Paul
Robert Lee wrote:
> I just
> wanted to add that when you make a transition like this, it's a good
> idea to "start from scratch" with any assembler code. That is go back
> to the original algorithm in pascal and then re-verify the need for ASM
> in the first place. Delphi code is much better than TP's.
It will be interesting to find out (D4) since so many of my
programs/projects deal with huge MBs of ASCII strings and their intricate
manipulation. They "scream" right now (in 16 bit) but must be programmed on
my old machine.
-Paul
You might be interested in Ernie Deel's HyperString Library. It's got a
bunch of optimized string handling routines in it. I don't have a URL
handy but a bit of poking around should turn it up.
If your new machine is a PII then I can almost definitely say that
you'll find no need for asm. In all the comparisons that I done between
Optimized Pascal and ASM on PII machines there is never more than about
a +/- 10% difference between the two.
If you have very large strings you'll likely be most limited by memory
accessing (aka cache misses) rather than true code performance. As
such, focusing on reducing the number of passes through the string will
be the way to go.
Have Fun
Bob Lee
> Robert Lee wrote:
> > If your new machine is a PII then I can almost definitely
> > say that you'll find no need for asm.
>
> Depends of many things, if you sell applications, there are still lots
> of machines out there lower spec-ed than PII.
True, thus my PII qualifier.
>
>
> > Optimized Pascal and ASM on PII machines there is never more than
> > about a +/- 10% difference between the two.
> To write Optimized Pascal I think you should know a little asm!
Definitely. But knowing a little is not the same as writing a little. The problem
with ASM is that is too opaque to your average Pascal user. Typically even
slightly convoluted Pascal is going to be much more readable/maintainable then asm.
> And 10%
> can be that little extra you need. But most important, myself I find
> asm more convenient in many cases, not because speed - but because it is
> more logic, at least when it comes to bit manipulation, masking flags
> etc. etc. Try writing a compression routine in Pascal/asm with Huffman
> coding.
>
> asm
> mov eax, MyInteger
> bswap eax
> mov MyOnteger, eax
> end;
>
> How do you do this with Pascal????
>
These are both perfectly legitimate reasons to use ASM. I never said that there
was never a reason to resort to ASM, only that performance on PII class computers
wasn't one of them.
By the way, you can do the above in Pascal, it's just ridiculously complicated by
comparison. As a function, swaping doesn't get any simpler than ASM
function Swap(a:integer):integer;
asm
bswap eax
end;
Hard to beat that!
Bob Lee
Depends of many things, if you sell applications, there are still lots
of machines out there lower spec-ed than PII.
> Optimized Pascal and ASM on PII machines there is never more than
> about a +/- 10% difference between the two.
To write Optimized Pascal I think you should know a little asm! And 10%
can be that little extra you need. But most important, myself I find
asm more convenient in many cases, not because speed - but because it is
more logic, at least when it comes to bit manipulation, masking flags
etc. etc. Try writing a compression routine in Pascal/asm with Huffman
coding.
But now to my standard question (newer got an answer)--
FileStream.Read(MyInteger, 4);
MyInteger is read from, let us say a TTF file, and is an offset value
into another record in the TTF file.
Problem: Integers in TTF files are stored highendian (or is it
lowendian, always confuse these two), and are not compatible with
Delphi's integers.
With BASM I do
asm
mov eax, MyInteger
bswap eax
mov MyOnteger, eax
end;
How do you do this with Pascal????
Ingvar Nilsen
Robert Lee wrote:
>
> Paul Onstad wrote:
> >
> > Robert Lee wrote:
> >
> > > I just
> > > wanted to add that when you make a transition like this, it's a good
> > > idea to "start from scratch" with any assembler code. That is go back
> > > to the original algorithm in pascal and then re-verify the need for ASM
> > > in the first place. Delphi code is much better than TP's.
> >
> > It will be interesting to find out (D4) since so many of my
> > programs/projects deal with huge MBs of ASCII strings and their intricate
> > manipulation. They "scream" right now (in 16 bit) but must be programmed on
> > my old machine.
> >
>
> You might be interested in Ernie Deel's HyperString Library. It's got a
> bunch of optimized string handling routines in it. I don't have a URL
> handy but a bit of poking around should turn it up.
>
> If your new machine is a PII then I can almost definitely say that
> you'll find no need for asm. In all the comparisons that I done between
> Optimized Pascal and ASM on PII machines there is never more than about
> a +/- 10% difference between the two.
>
>With BASM I do
>
>asm
> mov eax, MyInteger
> bswap eax
> mov MyOnteger, eax
>end;
>
This is a 486 instruction, isn't it? Bad boy <g>.
>How do you do this with Pascal????
With a lot of cursing...
--
Stefan Hoffmeister http://www.econos.de/
No private email, please, unless expressly invited.
>With BASM I do
>asm
> mov eax, MyInteger
> bswap eax
> mov MyOnteger, eax
>end;
>
>How do you do this with Pascal????
I just can't resist challenges... Here is my solution:
type
THiEndRec = packed record
LoLo: byte; LoHi: byte; HiLo: byte; HiHi: byte;
end;
TLoEndRec = packed record
HiHi: byte; HiLo: byte; LoHi: byte; LoLo: byte;
end;
function SwapByteOrder(Value: DWORD): DWORD;
var
Val : THiEndRec absolute Value;
begin
with TLoEndRec(Result) do
begin
HiHi := Val.HiHi;
HiLo := Val.HiLo;
LoHi := Val.LoHi;
LoLo := Val.LoLo;
end;
end;
If your brain doesn't go hi-lo-hi-hi-lo-lo-hi-lo after all of that,
you got a problem... <g>.
The generated code (with $O+) is:
>unit1.SwapByteOrder: begin
>:0042BDB8 55 push ebp
>:0042BDB9 8BEC mov ebp,esp
>:0042BDBB 83C4F8 add esp,FFFFFFF8
>:0042BDBE 8945FC mov [ebp-04],eax
>unit1.39: HiHi := Val.HiHi;
>:0042BDC1 8A45FF mov al,[ebp-01]
>:0042BDC4 8845F8 mov [ebp-08],al
>unit1.40: HiLo := Val.HiLo;
>:0042BDC7 8A45FE mov al,[ebp-02]
>:0042BDCA 8845F9 mov [ebp-07],al
>unit1.41: LoHi := Val.LoHi;
>:0042BDCD 8A45FD mov al,[ebp-03]
>:0042BDD0 8845FA mov [ebp-06],al
>unit1.42: LoLo := Val.LoLo;
>:0042BDD3 8A45FC mov al,[ebp-04]
>:0042BDD6 8845FB mov [ebp-05],al
>:0042BDD9 8B45F8 mov eax,[ebp-08]
>unit1.44: end;
>:0042BDDC 59 pop ecx
>:0042BDDD 59 pop ecx
>:0042BDDE 5D pop ebp
>:0042BDDF C3 ret
Hmm, I really do prefer:
function SwapByteOrder(Value: DWORD): DWORD;
asm
bswap eax
end;
--
Hallvard Vassbotn,
Senior Software Developer
Falcon R&D, Reuters Norge AS
Why bad boy, because it can not work under 386 processors?
Are you sure that Delphi compatible with 386 processors?
I'm not. Look graphics.pas (D3) for bswap instruction.
And I don sure that in Delphi source no Pentium instruction too.
Anyway what minimum proccessor for Delphi applications?
>>How do you do this with Pascal????
>
>With a lot of cursing...
Yes!!!
--
Sincerely yours
Anatoly Podgoretsky
Robert Lee wrote:
> You might be interested in Ernie Deel's HyperString Library. It's got a
> bunch of optimized string handling routines in it. I don't have a URL
> handy but a bit of poking around should turn it up.
>
> If your new machine is a PII then I can almost definitely say that
> you'll find no need for asm. In all the comparisons that I done between
> Optimized Pascal and ASM on PII machines there is never more than about
> a +/- 10% difference between the two.
I'll be doing a little ASM just so I can start timing to find out. My existing
routines are very short but I have many of them....such things as finding the nth
occurrence of char c, that sort of thing. Mostly they were to overcome the
consumption of cpu cycles with the generalized StrLen routine. StrLen could grab
up to 40% of the total program execution otherwise. Lisewise, the
copy/delele/insert routines are themselves a bit generalized. Reprogramming just
a specific property (when possible) could accomplish better than 10%. I'll see
how things compare now.
BTW, I'm still without asm documentation. Just an assembler Quick Reference Guide
would likely be enough. How have you learned the new 32 bit instructions?
-Paul
I bought a book a couple of years ago that was quite helpful (assuming
that you already know the essentials of ASM programming). "Pentium
Processor Optimization Tools" by Michael L. Schmit. It getting a bit
out of date, though. I believe that "Art of Assembly" or something like
that is available online as well. Also, I just plain looked at the code
that Delphi produced.
Bob Lee
Well, if most of the code is breaking the MBs into 255 byte segments to fit
into TP strings, the new huge strings will be a big benefit right off. The
huge strings are dynamically stored on the heap and are limited only by the
amount of RAM available. You could read in all the data at once and then
work on it.
--
Please respond only in the newsgroup. I will not respond
to newgroup messages by e-mail.
Colin Sarsfield wrote:
> Check out: http://agner.org/assem/
It had some basic string methods. Just the sort of examples
I was looking for.
Thanks. (And to the other post with book references)
-Paul
I seem to remember that early in the thread someone pointed you to
HyperString. If not, the reference is —
http://efd.home.mindspring.com/tools.htm
Don't reinvent the wheel (and HyperString is more like a SUV than just one
wheel).
PhR
Robert Lee wrote:
> If your new machine is a PII then I can almost definitely say that
> you'll find no need for asm. In all the comparisons that I done between
> Optimized Pascal and ASM on PII machines there is never more than about
> a +/- 10% difference between the two.
Well, I've had a chance to test a little now and found a case (in asm) where I
can do a lot better than that. This is from Agner Fog's work on Pentium
optimization. It's a substitute for 'StrLen' and is 3 or 4 times faster. I've
modified it for D4 Pascal (below) and ran it on a 166 MHz machine testing
5-million iterations (each) on a 68 character string:
function StrLenX(tStr:PChar):integer;
begin
asm
MOV EAX,tStr {get pointer }
MOV EDX,7
ADD EDX,EAX { pointer+7 used in the end }
{PUSH EBX NOT necessary; entry/exit code does this}
MOV EBX,[EAX] { read first 4 bytes}
ADD EAX,4 { increment pointer}
@L1: LEA ECX,[EBX-01010101H] { subtract 1 from each byte}
XOR EBX,-1 { invert all bytes}
AND ECX,EBX { and these two}
MOV EBX,[EAX] { read next 4 bytes}
ADD EAX,4 { increment pointer}
AND ECX,80808080H { test all sign bits}
JZ @L1 { no zero bytes, continue loop}
TEST ECX,00008080H { test first two bytes}
JNZ @L2 { *was JNZ SHORT @L2*}
SHR ECX,16 { not in the first 2 bytes}
ADD EAX,2
@L2: SHL CL,1 { use carry flag to avoid a branch}
{POP EBX Likewise; see above}
SBB EAX,EDX { compute length}
MOV EBX,EAX {**NEW** (optimizing) }
end
end;
StrLen is important since it is one of the hardest hit functions in string-based
apps. (I mentioned before that I've had programs where 40% of execution time was
spent in StrLen.) One trouble will be that it's often called within other library
functions. The performance of the above comes from its word orientation versus
bytes (using SCASB).
> If you have very large strings you'll likely be most limited by memory
> accessing (aka cache misses) rather than true code performance. As
> such, focusing on reducing the number of passes through the string will
> be the way to go.
That could be a partial effect. I'll know more when I test under real conditions.
I also have just started to look at AnsiString.
-Paul
> Robert Lee wrote:
>
> > If your new machine is a PII then I can almost definitely say that
^ Pentium II not Pentium or Pentium
MMX
> Well, I've had a chance to test a little now and found a case (in asm) where I
> can do a lot better than that. This is from Agner Fog's work on Pentium
> optimization. It's a substitute for 'StrLen' and is 3 or 4 times faster. I've
> modified it for D4 Pascal (below) and ran it on a 166 MHz machine testing
^
This looks like a Pentium
Even on a Pentium the factor of 3 to 4 seems a bit high. The other caveat in my
statement was that it had to be *Optimal* Pascal. The builtin functions are not
necessarily optimal.
Here's a quick shot at a Pascal routine. I haven't "optimized" it yet, in fact I
don't think I even have the end test quite right yet, but you should get the idea.
Basically, I just duplicated the ASM algorithm in Pascal. Currently, this clocks in
at about 1.2 x My version of the ASM (See Below) on a PII.
function StrLenX1(tStr:PChar):integer;
var
p:^cardinal;
bytes:cardinal;
begin
p:=Pointer(tStr);
dec(p);
repeat
inc(p);
bytes:=(p^-$01010101) and (p^ xor $FFFFFFFF);
until((bytes and $80808080)<>0);
if (bytes and $00008080)=0 then
begin
bytes:=bytes shr 16;
p:=pointer(integer(p)+2);
end;
result:=integer(p)-integer(tStr);
if (bytes and $80000000)=0 then
result:=result+1;
end;
You can make your function an asm function and avoid some of the overhead, and you
must push/pop EBX it is not saved even in your version.
function StrLenX(tStr:PChar):integer;
asm
// MOV EAX,tStr {get pointer }
MOV EDX,7
ADD EDX,EAX { pointer+7 used in the end }
PUSH EBX {is necessary; even in your version}
MOV EBX,[EAX] { read first 4 bytes}
ADD EAX,4 { increment pointer}
@L1: LEA ECX,[EBX-01010101H] { subtract 1 from each byte}
XOR EBX,-1 { invert all bytes}
AND ECX,EBX { and these two}
MOV EBX,[EAX] { read next 4 bytes}
ADD EAX,4 { increment pointer}
AND ECX,80808080H { test all sign bits}
JZ @L1 { no zero bytes, continue loop}
TEST ECX,00008080H { test first two bytes}
JNZ @L2 { *was JNZ SHORT @L2*}
SHR ECX,16 { not in the first 2 bytes}
ADD EAX,2
@L2: SHL CL,1 { use carry flag to avoid a branch}
POP EBX Likewise; see above}
SBB EAX,EDX { compute length}
// MOV EBX,EAX {**NEW** (optimizing) }
end;
Bob Lee
>>
StrLen is important since it is one of the hardest hit functions in
string-based
apps. (I mentioned before that I've had programs where 40% of execution time
was
spent in StrLen.) One trouble will be that it's often called within other
library
functions.
<<
There's at least one function, StrCopy, that does its own internal strLen
instead of calling it. And StrCopy is called by quite a few other functions.
PhR
function StrLenX1(tStr:PChar):integer;
var
p:^cardinal;
q:pchar;
bytes,r1,r2:cardinal;
begin
p:=pointer(tStr);
repeat
q:=pchar(p^);
r2:=cardinal(@q[-$01010101]);
r1:=cardinal(q) xor $FFFFFFFF;
bytes:=r1 and r2;
inc(p);
until (bytes and $80808080)<>0;
result:=integer(p)-integer(tStr)-4;
if (bytes and $00008080)=0 then
begin
bytes:=bytes shr 16;
inc(result,2);
end;
if (bytes and $80)=0 then
result:=result+1;
end;
I'll also grant that in terms of beauty, this is not my finest hour. Additionally, if
Delphi's compiler is ever improved again (we can only hope) it may well knock this out of
whack again. None the less, my statement still stands: You don't need to resort to ASM
on a PII.
In defense of the native StrLen. This routine is less robust than theirs. For instance
this algorithm assumes that there are no funky characters in the string (i.e. nothing
above 127). Also, it actually reads past the end of the string slightly, which could be
considered a no-no.
Thanks for the challenge,
Bob Lee
With AnsiString, length is a simple pointer de-reference.
--
Ernie Deel, EFD Systems
-----------------------------------------------
Any sufficiently advanced technology
is indistinguishable from a rigged demo.
Why didn't I catch this from the start? This doesn't make the algo fragile,
it makes it useless for modern times. For instance — my favorite dash is
high-ansi. Auch, natürlich, jeder civiliziert Sprache.
PhR
No, I was wrong, it does handle values above 127. That's what the xor calc is
for.
Bob Lee
>--
>Ernie Deel, EFD Systems
>-----------------------------------------------
>Any sufficiently advanced technology
>is indistinguishable from a rigged demo.
>
Love your sig, can I steal it? (just kidding)
Thanks,
--------------------------------------------
Brad, Rose & Tia
Robert Lee wrote:
> I'll also grant that in terms of beauty, this is not my finest hour. Additionally, if
> Delphi's compiler is ever improved again (we can only hope) it may well knock this out of
> whack again. None the less, my statement still stands: You don't need to resort to ASM
> on a PII.
Good work. It's true that Pascal can get right down to the ASM instruction level in so many
cases.
> In defense of the native StrLen. This routine is less robust than theirs.
I agree that native routines should have an emphasis on being clean and hardened. Writing an
entire compiler in the optimized code of the ASM example would end up being full of bugs I'm
sure. Then there's the business of the ASM rug being pulled out from under one's feet (which
I've experienced and am doing my conversions now). Still, within the last week, I've learned
a lot of new ASM and will be done shortly since my routines are all small ones. Anyhow, there
has always been something that has worked for me against the competition when it comes to
speed. That's my selling point.
-Paul