I have asked this question before but forgotten the conclusion.
I would like to start a string based challenge. Which function do you want
the most to be optimized?
Best regards
Dennis
> I would like to start a string based challenge. Which function do you want
> the most to be optimized?
In-place concatenation ;)
In-place concatenation, copy, delete and matching without copy.
Example:
if Copy(MyString,5,3)='OK' then
need become: if MatchWithoutCopy(MyString,5,3,'OK')=0 then ...
This will speedup a lot of useless copy mem operation.
I already did these routine for mine RTL, and the speed gain worth the
implementation.
I suggest also to build a Unicode lookup table for fastest UPPERCASE
LOWERCASE.
Cheers.
Roby
"Frank" <Fr...@infront.no> ha scritto nel messaggio
news:42247ca6$1...@newsgroups.borland.com...
> Dunno if it's off-topic in a string challenge, but TStrings.CommaText,
> Text and other DelimitedText getters/setters could use some speedup.
>
> Eric
How about just a TStringList challange.
TStringList is a much wanted challenge and I think we should do it soon.
But I would like to make a simple string challenge first.
Regards
Dennis
Yep, IndexOf, IndexOfName, Values[] and the rest could also use
some serious improvement (though the TStrings structure makes
it difficult to improve them cleanly in some cases).
A full TStringList challenge might a a bit too far reaching IMHO
(too many functions at once), might be preferable to break it down
along more specific functionality and assemble everything later on?
Eric
> A full TStringList challenge might a a bit too far reaching IMHO
> (too many functions at once), might be preferable to break it down
> along more specific functionality and assemble everything later on?
Yes I agree on that. I would personally like to do a .Sort challenge first.
> Eric
Regards
Dennis
What about string comparison?
We have StrComp, but no StrIComp, but I think that PChar functions are less
important than AnsiString functions.
What do you think of
CompareStr?
AnsiCompareStr?
SameText?
Regards
Dennis
A full challenge would allow people to change the entire *inner*
workings of TStringList. They could implement it with a hash, a tree,
or a list.
Also what happens when one person wins .Sort and another person wins
.IndexOfName and they are not compatiable with each other.
The B&V can test all the functions of TStringList giving weighting to
each one. If a person thinks they can win by only improving some of the
functions then they are free to do that.
Richard
"Dennis" <maria...@home3.gvdnet.dk> wrote in message
news:42249e85$1...@newsgroups.borland.com...
Juhani Suhonen
"Joe Bain" <j.b....@a.d.d.o.n.s.y.s.t.e.m.s..c.o.m> wrote in message
news:4224...@newsgroups.borland.com...
First, scuse for my english, i'm french :)
Also, TStringList is too slow !!
IndexOf method can optimized in a few minute.
For example : my method is 4 more fast than RTL !
All in pascal and without specific optimization.
You're a very good group.
I have discovered BASM since 2 weeks and optimize all can i.
I did a pos with Boyer-Moore method if you can interest, in BASM and 4
more fast than pos without specific instructions.
MyIndexOf method :
Chaine = String in french
function NewIndexOf(Liste: TStrings; Chaine: string): integer;
var
Lignes, Cpt, i : integer;
Car1 : Char;
begin
Lignes := Liste.Count;
Car1 := Chaine[1];
Result := -1;
For i:=0 to Lignes-1 do
begin
if Liste.Strings[i][1] = Car1 then
begin
if CompareStr(Liste.Strings[i], Chaine) = 0 then
begin
Result := i;
Break;
end;
end;
end;
end;
Robert
"Juhani Suhonen" <juhani....@a1netti.cmo> wrote in message
news:4224e37a$1...@newsgroups.borland.com...
> I suggested a few weeks ago an AnsiCompareStr challenge. Since it appears
> to be the main bottleneck of TStringlist Sort, Indexof, and Find
operation,
> then it could be a good candidate for a first attempt to opmitize
> TStringlist operations. However, someone suggested that since this
> procedure spend most of its time doing locale-specific things, there isn't
> very much one can do about its performance.
I will take a look at it. It looks like the way to go.
> Richard
Regards
Dennis
We should definately do TStringList soon.
I will put in on top of my todo list and hope that I can remember it this
time ;-)
Regards
Dennis
--
Marco
Yes. Every time I use a long string this makes me cringe.
Rgds,
Martin
Actually, all the T*List* and other containers seem to be slow. For
instance, TobjectQueue is apallingly slow. A simple expandable circular
list seems to be 20 times faster. My guess is that these inefficiences are
inherited from TList and probably infect TstringList, this on top of any
sorting etc. slowness in the TstringList itself.
Rgds,
Martin
> StringMatch ! i.e. StringMatch( mystring, '*fast*')
Something like this? :-) ...
{Compares two long strings with case sensitivity. (cf. Sametext). Returns
true
if they are equal.}
function FastSameText(const S1, S2: string): boolean;
asm
{On entry:
eax = S1
edx = S2}
cmp eax, edx
je @SameStr
test eax, eax
jz @NotSameStr
test edx, edx
jz @NotSameStr
{Compare lengths}
mov ecx, [eax - 4]
cmp ecx, [edx - 4]
jne @NotSameStr
push ebx
{Convert counter to negative based}
lea ebx, [eax + ecx]
add edx, ecx
neg ecx
@CompLoop:
{Compare four characters}
mov eax, [ebx + ecx]
cmp eax, [edx + ecx]
jne @Mismatch
add ecx, 4
js @CompLoop
pop ebx
@SameStr:
mov al, 1
ret
@NotSameStr:
xor eax, eax
ret
@Mismatch:
{A possible mismatch has been encountered}
pop ebx
{Have we read past the end? If so, have we read more than one character
too
many? (There is always a trailing #0 character in long strings, so if
there
is a mismatch with a one character overread we know the strings are
different.)}
cmp ecx, -2
jl @NotSameStr
{If we get here, we have read either two or three characters too many.
There
is always a trailing #0, so for the strings to be the same at least two
characters have to match.}
cmp ax, [edx + ecx]
sete al
end;
1) For many applications a TList can easily be replaced by a dynamic array
(array of pointer, or array of TObject). This gives you much improved
performance because the compiler generates better code, using direct
addressing instead of function calls.
2) All objects in the contnrs unit, including TObjectQueue, use TList and
therefore give poor performance.
3) TStringList does not use a TList, it uses an array of TStringItem (string
+ object). It's still quite inefficient because too complex.
As an example, it implements change notifications which you're usually not
very interested in. A TStringList replacement without OnChange and
OnChanging events would already give you a considerable performance boost.
Regards,
Robert
"Martin James" wrote in message...
A feature of TFastStringList would probably be a direct access to the
internal string array, without function calls. Like the following:
TStringArray = array of string;
TFastStringList = class
private
FStringArray: TStringArray;
public
property Strings: TStringArray read FStringArray write FStringArray;
end;
MyFastStringList.Strings[n] will then compile into a memory reference
without function calls.
The same approach can be followed in a TFastList class, with an array of
pointer, or in a TFastObjectList, with an array of TObject.
Robert
"Atle Smelvær" <zin...@sdf.lonestar.org> wrote in message
news:4225e760$1...@newsgroups.borland.com...
> I would prefer
> a new TFastStringList that is not 100 % compatible but much faster. Any
user
> would have the choice between either the classical TStringList focusing on
> compatibility and the new TFastStringList that focuses on performance.
I think we should make that one too.
We could do some challenges on the RTL TStringList first and at the same
time try to agree on the design of a TFastcodeStringList?
Regards
Dennis
I agree.
Regards
Dennis
What about string comparison?
We have StrComp, but no StrIComp, but I think that PChar functions are less
important than AnsiString functions.
What do you think of
AnsiCompareText
CompareStr?
AnsiCompareStr?
SameText?
Do you most often use the string comparison functions based on the current
locale or the non locale versions such as CompareText?
Perhaps we should do the case sensitive counterpart to CompareText first?
Regards
Dennis
> Do you most often use the string comparison functions based on the
> current locale or the non locale versions such as CompareText?
> Regards
> Dennis
I never use locale when doing string comparison.
Posted a "FasterTList" implementation sometime ago if people
are interested it should still be in the FastCode CVS, haven't
improved it much since we don't use TList anymore here
(but a variant that isn't based on TList).
Eric
80% of the time was spent on the 4th line of the
TStringList.CompareStrings function
Result := AnsiCompareText(S1, S2);
I then searched for 100 random lines using the TStringList.indexof()
function, and again, the program spent about 80% of the time on the exact
same line.
If I had set the property of the TStringList to CaseSensitive, then I guess
a similar amount of time would have been spent on the second line:
Result := AnsiCompareStr(S1, S2)
Also, since the locale versions (AnsiCompareText and AnsiCompareStr) appear
to be about 14 times slower than their non locale versions (CompareText and
CompareStr), then I guess much more speed improvements could be obtained by
optimizing the locale ones.
Richard
"Dennis" <maria...@home3.gvdnet.dk> wrote in message
news:4225fcc0$1...@newsgroups.borland.com...
All call CompareString in Windows.pas which calls CompareStringA in
kernal32.dll. If we instead optimize Windows.CompareString we could
optimize all these functions at once. Of course this would require
people to change there copy of Windows.pas to take advatage of it, as I
do not think we can patch a dll call.
> Posted a "FasterTList" implementation sometime ago if people
> are interested it should still be in the FastCode CVS, haven't
> improved it much since we don't use TList anymore here
> (but a variant that isn't based on TList).
That migth be good as reference for a challenge or as starting point
depending on peoples evaluation of the design.
> Eric
Regards
Dennis
>as I do not think we can patch a dll call.
Easier than you think, just modify the jump table. So this should be
quite doable.
- Asbjørn
-Atle
Very often when you need compatibility with TStrings, you do not really need
blazing speed.
E.g. you usually do not need top performance on the Items of a TComboBox.
> And there should be no problem making this fast without changing the
> design. The notify part only takes a fraction of time, and should be no
> big concern.
Is your claim based on any real tests ?
You can improve quite a lot when starting from TStrings, but wait until you
see the performance of a StringList that does not descend from TStrings...
Kind Regards,
Robert
Yes, but what we need is a all around TStringList, so compatibility with
everything should be very important.
>> And there should be no problem making this fast without changing the
>> design. The notify part only takes a fraction of time, and should be no
>> big concern.
>
> Is your claim based on any real tests ?
> You can improve quite a lot when starting from TStrings, but wait until
> you see the performance of a StringList that does not descend from
> TStrings...
I have profiled TStringList use, and very little time is spend inside the
notify handling. Ofcourse other StringList's that do not descend from
TStrings could be faster. But that does not mean that they would not have
been fast if they did descend form TStrings. The reason someone makes a new
StringList is because TStringList is slow, so ofcourse it would be faster
anyway. If not, they would not spend time making it. So to base speed of
interface design on the fact that some other StringList with another
interface design is faster, is not a good thing to do.
-Atle
In a project we needed an optimised string list to search for duplicates in
a very large database. We started off with a TStrings descendant, then
switched over to a TObject descendant with an interface very close to
TStrings, but based on a dynamic array of string. Performance improvement
was huge.
It is very useful to make an improved TStrings descendant, but be aware that
you'll never get maximum performance with that approach.
Cheers,
Robert
Are there any docs on the what exactly CompareTextA does?
-Atle
-Atle
> But there would be no problem descending from TStrings with your object.
> Everything inside TStrings is virtual, and there is no initial way it
> saves the values.
The virtual methods are at the core of the performance problem.
Calling a virtual function "Get" to obtain a string is a lot slower than
direct memory access.
> Btw. using a dynamic array will give you big penalties whenever you
> increase or decrease the list.
The standard TStringList also needs to grow and shrink its internal list of
pointers. In a standard TStringList each element is a string + an object = 8
bytes per item in the list.
An optimised stringlist does not necessarily implement the Objects property,
limiting the list size to 4 bytes per item. So you only need to allocate or
move around half the memory of the standard TStringList.
Regards,
Robert
It is just one extra lookup. And that is just a few clock cycles.
>> Btw. using a dynamic array will give you big penalties whenever you
>> increase or decrease the list.
>
> The standard TStringList also needs to grow and shrink its internal list
> of pointers. In a standard TStringList each element is a string + an
> object = 8 bytes per item in the list.
> An optimised stringlist does not necessarily implement the Objects
> property, limiting the list size to 4 bytes per item. So you only need to
> allocate or move around half the memory of the standard TStringList.
I was not talking about how TStringList does it at this point, I was
speaking about TStrings. TStrings does not implement any handling of
internal memory, this must be done by a descendant. And we are going to make
a new TStringList, not a descendant of TStringList. You could just as easily
descend your object from TStrings and save things in a dynamic array if you
want to. But dynamic arrays is not a efficient way to do this..
-Atle
> The standard TStringList also needs to grow and shrink its internal
> list of pointers. In a standard TStringList each element is a string
> + an object = 8 bytes per item in the list.
> An optimised stringlist does not necessarily implement the Objects
> property, limiting the list size to 4 bytes per item. So you only
> need to allocate or move around half the memory of the standard
> TStringList.
>
> Regards,
> Robert
Many people depend on the Objects part of TStringList, myself included.
I usally use Name=Value and the Object.
> Many people depend on the Objects part of TStringList, myself included.
> I usally use Name=Value and the Object.
So do we in most of my applications, but not in all.
If you want the best string performance and do not need the Objects,
TStrings and TStringList have a lot of overhead.
We use two different classes TOptimisedStringList and a descendant
TOptimisedObjectsStringList to deal with both cases.
Robert
> It is just one extra lookup. And that is just a few clock cycles.
It's a virtual method call... a lot slower than the straightforward memory
reference, especially if you need to use it like 250 billion times...
> I was not talking about how TStringList does it at this point, I was
> speaking about TStrings. TStrings does not implement any handling of
> internal memory, this must be done by a descendant. And we are going to
> make a new TStringList, not a descendant of TStringList.
Compatibility with TStrings forces you to implement the Objects property.
Any TStrings descendant will be forced to use at least 8 bytes per item in
the list.
> You could just as easily descend your object from TStrings and save things
> in a dynamic array if you want to. But dynamic arrays is not a efficient
> way to do this..
The Strings property will continue to use the slow Get and Put virtual
methods to access the internal list.
BTW, why do you think dynamic arrays are inefficient ?
Robert
-Atle
If you call it 250 billion times, then your problem lies in the routines
that call it 250 billion times. And when you are speaking about a call, you
still do a call even if it is not virtual. So I do not understand why you
say memory reference. The only difference is that the call is double pointer
based. That would mean, get call pointer from lookup table, then call. That
means a very few extra cycles.
>> I was not talking about how TStringList does it at this point, I was
>> speaking about TStrings. TStrings does not implement any handling of
>> internal memory, this must be done by a descendant. And we are going to
>> make a new TStringList, not a descendant of TStringList.
>
> Compatibility with TStrings forces you to implement the Objects property.
> Any TStrings descendant will be forced to use at least 8 bytes per item in
> the list.
Not nessessarily. You could have a property to say if Objects are to be
saved. If not, it could always return nil.. No problem there at all..
>> You could just as easily descend your object from TStrings and save
>> things in a dynamic array if you want to. But dynamic arrays is not a
>> efficient way to do this..
>
> The Strings property will continue to use the slow Get and Put virtual
> methods to access the internal list.
> BTW, why do you think dynamic arrays are inefficient ?
Because every time you add something and change the size of your dynamic
array, the entire array is duplicated and every value inside it is copied to
the new location. Still when only allocating more than needed to eliminate
some of these operations, it still happens whenever the array size is
changed, causing major stalls. By using linked blocks of memory you
eliminate this problem.
-Atle
-Atle
-Atle
"Atle Smelvær" <at...@datagrafikk.no> wrote in message
news:4227323d$1...@newsgroups.borland.com...
StringMatch('this is a cat','*is*cat') = true
fastest I know for this is found in russian qstrings,
function called "Q_testwildtext" which is also
case in-sensitive,
Juhani Suhonen
"Pierre le Riche" <pler...@hotmail.com> wrote in message
news:4225d438$1...@newsgroups.borland.com...
> Hi Marco,
>
>> StringMatch ! i.e. StringMatch( mystring, '*fast*')
>
> Something like this? :-) ...
>
> {Compares two long strings with case sensitivity. (cf. Sametext). Returns
> true
> if they are equal.}
> function FastSameText(const S1, S2: string): boolean;
> asm
> {On entry:
> eax = S1
> edx = S2}
> cmp eax, edx
> je @SameStr
> test eax, eax
> jz @NotSameStr
> test edx, edx
> jz @NotSameStr
> {Compare lengths}
> mov ecx, [eax - 4]
> cmp ecx, [edx - 4]
> jne @NotSameStr
> push ebx
> {Convert counter to negative based}
> lea ebx, [eax + ecx]
> add edx, ecx
> neg ecx
> @CompLoop:
> {Compare four characters}
> mov eax, [ebx + ecx]
> cmp eax, [edx + ecx]
> jne @Mismatch
> add ecx, 4
> js @CompLoop
> pop ebx
> @SameStr:
> mov al, 1
> ret
> @NotSameStr:
> xor eax, eax
> ret
> @Mismatch:
> {A possible mismatch has been encountered}
> pop ebx
> {Have we read past the end? If so, have we read more than one character
> too
> many? (There is always a trailing #0 character in long strings, so if
> there
> is a mismatch with a one character overread we know the strings are
> different.)}
> cmp ecx, -2
> jl @NotSameStr
> {If we get here, we have read either two or three characters too many.
> There
> is always a trailing #0, so for the strings to be the same at least two
> characters have to match.}
> cmp ax, [edx + ecx]
> sete al
> end;
>
>
>
>Compatibility with TStrings forces you to implement the Objects property.
>Any TStrings descendant will be forced to use at least 8 bytes per item in
>the list.
Not necessarily. You could arrange the string pointers and Object
pointers in separate arrays, creating the Object one on demand only.
--
Anders Isaksson, Sweden
BlockCAD: http://w1.161.telia.com/~u16122508/proglego.htm
Gallery: http://w1.161.telia.com/~u16122508/gallery/index.htm
>Are there any docs on the what exactly CompareTextA does?
If you mean CompareString (which is what AnsiCompareText/Str calls for
instance), then it's
http://msdn.microsoft.com/library/en-us/winui/winui/windowsuserinterface/resources/strings/stringreference/stringfunctions/comparestring.asp
(that'll most likely wrap)
Seems very non-trivial to me, however perhaps we can modify it for
special cases, and fall back to the original CompareString for the
harder stuff? Way beyond my abilities anyway :)
- Asbjørn
> Seems very non-trivial to me, however perhaps we can modify it for
> special cases, and fall back to the original CompareString for the
> harder stuff? Way beyond my abilities anyway :)
Maybe if you see a good speed bump you could get the first FastCode
submission to ever be included in Windows. <g>
Will
--
Want native support in Delphi for AMD64/EM64T? Vote here--
> I Think Marco proposed wildcard or wildchar matching, ie :
> StringMatch('this is a cat','*is*cat') = true
Yes!!
--
Marco