On Sun, 17 Aug 2014 15:07:06 -0400, DSF <nota...@address.here> wrote:
> ; Get lengths for str2 in edx and
> ; str1 in ebx.
So, you're getting lengths for both strings first? ...
That means you're scanning each string at least twice: first to get the
length of each, then to do the search and/or comparison of the two. It
might be possible to search and compare and just stop if end-of-string
is reached. That would eliminate the extra scans for the string lengths.
Many C string routines do this. Although, from the code below, it seems
this one doesn't ...
> For comparison, here is the C and assembly produced by the Borland
> compiler for their C source for strstr: [...]
>
I'd suspect that there are only two ways for the assembly routine
to be faster than an efficient and optimized C routine:
1) if it uses x86 string instructions with REP prefix which many
C compilers won't emit
2) if it uses advanced, faster x86 instructions that C can't or
doesn't emit for string code in C, such as x87, MMX, SSEx etc
> /*
> * C/C++ Run Time Library - Version 2.0
> * Copyright (c) 1987, 1996 by Borland International
> * All Rights Reserved.
> */
> char * _RTLENTRY _EXPFUNC strstr(const char *str1, const char *str2)
> {
> int len1 = strlen(str1);
> int len2 = strlen(str2);
> int i,j,k;
Wow. The coder used I, J, and K!
Well, it seems they had a BASIC programmer code this C routine ...
> i = 0;
> for(;;)
> {
> while(i < len1 && str1[i] != str2[0])
> ++i;
... who hadn't learned how to use character pointers in C, yet.
The code would be more compact and faster if the programmer had used
character pointers, incrementing them to move to the next character,
and dereferencing it to retrieve characters for comparison. The code
would be more compact because he'd eliminate all or nearly all locals,
and much of the checks and looping. The subscript operator [] adds an
extra addition and multiplication per usage to the code as well.
I don't use strstr() and don't have a working version, but I'm not a
fan of the fact that pointers weren't used. It might've been easier
for the coder to halt/return/exit when a nul '\0' char was found too,
if he had used pointers.
So, this C code is probably a really bad example of how to implement
strstr(). I haven't looked at how others versions are implemented
since I've never, ever, needed or used it, ever.
You could look at GNU's GLIBC, or Redhat's newlib, or the C library
with a compiler such as OpenWatcom. If needed, I have P.J. Plauger's
"The Standard C Library" book, which implements an entire, compliant,
C library using standard C.
> ...and time for a history
> question. I notice the compiler always adds
> a negative number to esp.
> Was there a time when add was faster than sub?
>
> push ebp
> mov ebp,esp
> add esp,-8 <-????
"The ADD instruction does not distinguish between signed or unsigned
operands." -Intel IA-32 manual
I.e., the use of negative values was just a choice made by the assembler's
authors. E.g., they might've chosen a range of -128 to 127 for imm8
instead
of a range of 0 to 255. Other assemblers, like NASM, do this. Or,
perhaps,
they allowed negative values to allow automatic sizing, e.g., -1 can be
0xFF
or 0xFFFFFFFF depending on the register size of the instruction.
For x86, "add esp,-8" can be encoded in different ways:
81 /0 id ADD r/m32,imm32
83 /0 ib ADD r/m32,imm8
For 83, -8 would be encoded as 248 (0xF8 or 0F8h). For 81, -8 should
be encoded as 268435448 (0xFFFFFFF8). I.e., if you do a hex dump of
the assembled code you should see one of these:
81 C4 F8 FF FF FF
83 C4 F8
Since hexadecimal directly corresponds to bits in bytes and is unsigned,
it's usually better to think of all values in assembly as unsigned, IMO.
I.e., 0 to 255 is 00h to FFh, etc. You're going to be doing hex dumps.
IMO, it's a mistake to use negative values in assembly because of this,
unless the assembler supports an overlapping range of signed and unsigned,
e.g., -128 to 255 for -128 to 127 or 0 to 255. The only place negative
values are convenient in assembly IMO is -1 for all bits set, which is
integer or register size independent, i.e., can be 0xFF or 0xFFFFFFFF
depending on the register size.
> Was there a time when add was faster than sub?
For the 386, 486, and Pentium, that would be: "No." "add" and "sub"
have the same official timings on each respective processor. I don't
have official 286 or 86 (or 186) timings in my notes or more recent
processors. I'd suspect they're the same on each of the earlier
processors too. If they're different, I'd be curious to know why.
Rod Pemberton