Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Biggest Delphi bottleneck?

0 views
Skip to first unread message

Peter Morris [Droopy eyes software]

unread,
Jun 28, 2006, 1:41:30 PM6/28/06
to

I think the replacement memory manager was a great idea, and I know someone
who has recompiled a very large D7 project into D2006 and noted a
"significant speed increase" but not timed it so I can't say how big
"significant" is.

I remember when writing FastStrings I decided to try to write a POS routine
that worked better for my specific purposes (multiple finds for faster
replacement). However, despite how many times it was called in a loop it
hardly gave me any increase in speed at all. This was because 99.9% of the
bottleneck in StringReplace was in the reallocation of the result string
each time it was changed. If you improve something that takes only 0.01% of
the total time, you can only speed up your app by a maximum of 0.01%

So it leads me to ask the following question. Any opinions as to the
current biggest bottleneck in Delphi?


Pete


Peter Morris [Droopy eyes software]

unread,
Jun 28, 2006, 2:26:44 PM6/28/06
to
> Do you mean the IDE or the compiled code?

You could give your opinion on both :-)


Jouni Turunen

unread,
Jun 28, 2006, 2:17:28 PM6/28/06
to
Hi Peter,

>
> So it leads me to ask the following question. Any opinions as to the
> current biggest bottleneck in Delphi?
>

My vote goes to whole TStringList class. If I have second
vote, that goes to compiler in general (only plain IA32,
no MMX, no SSE) and floating point stuff.

Regards,
Jouni

--
The Fastcode Project: http://www.fastcodeproject.org/

John Jacobson

unread,
Jun 28, 2006, 2:22:09 PM6/28/06
to
"Peter Morris [Droopy eyes software]" <pete@NO_droopyeyes_SPAM.com> wrote in
message <44a2bf4b$1...@newsgroups.borland.com>

> Any opinions as to the
> current biggest bottleneck in Delphi?

Do you mean the IDE or the compiled code?

--
***Free Your Mind***

Posted with JSNewsreader Preview 0.9.4.2549


Thaddy

unread,
Jun 28, 2006, 2:46:49 PM6/28/06
to
> So it leads me to ask the following question. Any opinions as to the
> current biggest bottleneck in Delphi?
>
>
> Pete
>
>

CEO? All other management seems to be o.k. The car parts are a goner,
that's for shure ;)

Ain Valtin

unread,
Jun 28, 2006, 3:25:08 PM6/28/06
to
On Wed, 28 Jun 2006 18:41:30 +0100, "Peter Morris [Droopy eyes
software]" <pete@NO_droopyeyes_SPAM.com> wrote:

>So it leads me to ask the following question. Any opinions as to the
>current biggest bottleneck in Delphi?

When I look into SampligProfiler results on my current project, then
top 3 items are

System.GetDynaMethod
Classes.TList.IndexOf
Classes.TList.Get

but I'm still using D5 (with FastMM and FastObj), so those may not be
"current biggest bottlenecks".

ain

Hannes Danzl[NDD]

unread,
Jun 28, 2006, 5:05:11 PM6/28/06
to
> Classes.TList.IndexOf

Try the Sorted property :)

--

Hannes Danzl [NexusDB Developer]
Newsgroup archive at http://www.tamaracka.com/search.htm

Peter Morris [Droopy eyes software]

unread,
Jun 28, 2006, 5:15:16 PM6/28/06
to
> Try the Sorted property :)

TList doesn't have a Sorted property :-)


Charles Appel

unread,
Jun 28, 2006, 9:29:30 PM6/28/06
to
"Peter Morris [Droopy eyes software]" <pete@NO_droopyeyes_SPAM.com> wrote in
message news:44a2bf4b$1...@newsgroups.borland.com...

> So it leads me to ask the following question. Any opinions as to the
> current biggest bottleneck in Delphi?

I'd like to see the floating point speed improved in
Native 32-Bit.

--
Charles Appel
http://charlesappel.home.mindspring.com/
Home of Chuck's Poker Libraries for Delphi,
Chuck's Video Poker and Chuck's Toys


Hannes Danzl[NDD]

unread,
Jun 28, 2006, 11:37:04 PM6/28/06
to
Peter Morris [Droopy eyes software] wrote:

> > Try the Sorted property :)
>
> TList doesn't have a Sorted property :-)

right. then we need one :)

Peter Morris [Droopy eyes software]

unread,
Jun 29, 2006, 5:16:10 AM6/29/06
to
> right. then we need one :)

That would only help if it was acceptible for the list to be sorted, the
items may be in a specific order. I would say we need a private hastable
lookup holding key + index, unless the new "Sorted" property is set to True.

Eric Grange

unread,
Jun 29, 2006, 10:13:25 AM6/29/06
to
> System.GetDynaMethod

Using virtual methods usually solves this one, unless you're using
TComponent subclasses too much... and are stuck with VCL-declared dynamics.

> Classes.TList.IndexOf

The speed of this one can be fairly easily tripled using a mere "rep
scasd", this may be enough to not make it a bottleneck in many
situations (and can be plugged right in).
Hashing/Sorting have more potential, but also represent an overhead risk
for a general purpose class, might be better suited for specialized list
classes.

> Classes.TList.Get

This one can be sped up too.

There are several TList speedups that were investigated back in 2003 in
the mini "FasterTList" experiment (Add, Put, Pack, Move, Exchange could
be sped up significantly IIRC).

Though in practice, I'm not using TList in my code much (got a faster,
more powerful replacement built from the ground up), so the TList
optimizations weren't brought to a conclusive end.
TList internals aren't very well designed to begin with, they're more a
bunch of mismatched list behaviors packed together, and so optimizing it
"cleanly" isn't simple, while replacing it altogether in code is simpler
and cleaner.

Eric

Ain Valtin

unread,
Jun 29, 2006, 12:19:23 PM6/29/06
to
On Thu, 29 Jun 2006 16:13:25 +0200, Eric Grange
<egra...@SPAMglscene.org> wrote:

>> System.GetDynaMethod
>
>Using virtual methods usually solves this one, unless you're using
>TComponent subclasses too much... and are stuck with VCL-declared dynamics.

In my own code I use only virtual methods. So those calls must come
from RTL \ VCL (isn't TWinControl message handling done using dynamic
methods?).


>> Classes.TList.IndexOf
>
>The speed of this one can be fairly easily tripled using a mere "rep
>scasd", this may be enough to not make it a bottleneck in many
>situations (and can be plugged right in).

Could you elaborate?


>> Classes.TList.Get
>
>This one can be sped up too.
>
>There are several TList speedups that were investigated back in 2003 in
>the mini "FasterTList" experiment (Add, Put, Pack, Move, Exchange could
>be sped up significantly IIRC).

Did some quick googling which turned up some dead links. It seems that
it (FasterTList) was done by FastCoders, but there is nothing about it
on current FastCode site (except "TList.Sort Challenge")?


>Though in practice, I'm not using TList in my code much (got a faster,
>more powerful replacement built from the ground up), so the TList
>optimizations weren't brought to a conclusive end.
>TList internals aren't very well designed to begin with, they're more a
>bunch of mismatched list behaviors packed together, and so optimizing it
>"cleanly" isn't simple, while replacing it altogether in code is simpler
>and cleaner.

My current project makes heavy use of TInterfaceList (which internally
uses TList) and I reckon it would give me some noticeable perfomance
boost if I replace it with something faster... I guess I should start
a FastCode challenge <g>


ain

Ryan J. Mills

unread,
Jun 29, 2006, 12:51:54 PM6/29/06
to
<snip>

>Though in practice, I'm not using TList in my code much (got a faster,
>more powerful replacement built from the ground up), so the TList
</snip>


Would you be willing to share this TList replacement you've got?

Ryan.

Eric Grange

unread,
Jun 29, 2006, 1:11:44 PM6/29/06
to
> (isn't TWinControl message handling done using dynamic methods?).

Yes.

> Could you elaborate?

This is the implementation we use for IndexOf, should easily be
adaptable to TList and look-alike list containers.

function TObjectList.IndexOf(Item: TObject): Integer;
var
c : Integer;
p : ^TObject;
begin
if (not Assigned(Self)) or (FCount<1) then
Result:=-1
else begin
c:=FCount;
p:=@FList^[0];
asm
mov eax, Item;
mov ecx, c;
mov edx, ecx;
push edi;
mov edi, p;
repne scasd;
je @@FoundIt
mov edx, -1;
jmp @@SetResult;
@@FoundIt:
sub edx, ecx;
dec edx;
@@SetResult:
mov Result, edx;
pop edi;
end;
end;
end;

The "rep" prefix is convenient to use, but isn't as fast as a well
designed loop on modern CPUs, so odds are a challenge could come up with
a variant that would be faster than this code.

> Did some quick googling which turned up some dead links. It seems that
> it (FasterTList) was done by FastCoders, but there is nothing about it
> on current FastCode site (except "TList.Sort Challenge")?

This link should work:
http://fastcode.cvs.sourceforge.net/*checkout*/fastcode/Source/FasterTList.pas

In GLScene's PersistentClasses.pas unit you'll find a
TPersistentObjectList class which should be able to advantageously
replace TList/TObjectList straight away in most scenarios.

Eric

Eric Grange

unread,
Jun 29, 2006, 1:19:42 PM6/29/06
to
> Would you be willing to share this TList replacement you've got?

A more limited implementation can be found in GLScene's
PersistentClasses unit:

http://glscene.cvs.sourceforge.net/*checkout*/glscene/Source/Base/PersistentClasses.pas

Eric

Lee_Nover

unread,
Jun 29, 2006, 4:01:46 PM6/29/06
to
> My current project makes heavy use of TInterfaceList (which internally
> uses TList) and I reckon it would give me some noticeable perfomance
> boost if I replace it with something faster... I guess I should start
> a FastCode challenge <g>

TInterfaceList is slow because of locking - it's threadsafe (uses TThreadList internally)

Solerman Kaplon

unread,
Jun 30, 2006, 10:27:31 AM6/30/06
to
Eric Grange escreveu:

> Though in practice, I'm not using TList in my code much (got a faster,
> more powerful replacement built from the ground up), so the TList
> optimizations weren't brought to a conclusive end.
> TList internals aren't very well designed to begin with, they're more a
> bunch of mismatched list behaviors packed together, and so optimizing it
> "cleanly" isn't simple, while replacing it altogether in code is simpler
> and cleaner.

Agreed, but I think optimizing current TList would yeld improvements across all
RTL/VCL, since it's used plenty in many implementations. Worth Fastcode target
imho :)

Solerman

Dennis

unread,
Jun 30, 2006, 11:24:23 AM6/30/06
to
Hi

> Agreed, but I think optimizing current TList would yeld improvements
across all
> RTL/VCL, since it's used plenty in many implementations. Worth Fastcode
target
> imho :)

I agree. We are working on a sort challenge.

Best regards
Dennis Kjaer Christensen


Eric Grange

unread,
Jun 30, 2006, 11:54:43 AM6/30/06
to
> Agreed, but I think optimizing current TList would yeld improvements
> across all RTL/VCL, since it's used plenty in many implementations.
> Worth Fastcode target imho :)

Yes, though the amount of misuse and bug-and-quirks-abuse this class is
seeing will likely mean a massive, complex, B&V tool :)

Eric

Pierre le Riche

unread,
Jun 30, 2006, 7:03:44 PM6/30/06
to
Hi,

> So it leads me to ask the following question. Any opinions as to the
> current biggest bottleneck in Delphi?

An optimization that could really speed up string equality/inequality tests
would be to compare the lengths first (and thus avoid expensive character by
character comparisons in many cases). At the moment all string comparisons
(including the "=" and "<>" variety) call _LStrCmp in system.pas, which
always compares character by character even if the lengths differ.

Another useful optimization would be to allow an alternate syntax for
operator overloading declarations - to allow the passing of records by
reference rather than value. Operator overloading for records is not really
practical in speed critical code (at least not in its current form), because
it causes a lot of unnecessary record copying.

Regards,
Pierre


Martin James

unread,
Jun 30, 2006, 8:25:46 PM6/30/06
to
> So it leads me to ask the following question. Any opinions as to the
> current biggest bottleneck in Delphi?

TThread.synchronize, (AKA 'stinkronize')

Unfortunately, it would still stink no matter how much optimized assembler
was thrown at it, so I guess this bottleneck will not be getting much
attention here <g>

TQueue/TObjectQueue are pretty hopeless, being based on TList which has
already had its share of posts in this thread. A couple pointers and an
expandable circular buffer would have been fine, but no - we get a TList
with all its memCopying to move all the existing data up one to make room
for a new object. :(

Rgds,
Martin


Eric Grange

unread,
Jul 1, 2006, 2:37:12 AM7/1/06
to
> TQueue/TObjectQueue are pretty hopeless [...]

IMO, pretty much all of Contnrs.pas is up for at least a reimplementation.

It's one of those units I remove straightaway from the "uses" clause (the other
being Math).

Eric

Peter Morris [Droopy eyes software]

unread,
Jul 1, 2006, 6:06:12 AM7/1/06
to
> I agree. We are working on a sort challenge.

Personally I think you need to improve on Add/Remove/IndexOf (on an unsorted
list) as these are the most commonly used.


Guillem

unread,
Jul 1, 2006, 6:22:07 AM7/1/06
to
Eric Grange wrote:

> IMO, pretty much all of Contnrs.pas is up for at least a
> reimplementation.
>
> It's one of those units I remove straightaway from the "uses" clause
> (the other being Math).
>
> Eric

out of curiosity, if you remove those units and you need mathematical
functions or list objects, what do you use? Self-created ones? Or are
there some units somewhere that are better/faster-than-the-lightning
and I am not aware of them?

TIA
--
Best regards :)

Guillem Vicens Meier
Dep. Informatica Green Service S.A.
www.clubgreenoasis.com
--
Contribute to the Indy Docs project: http://docs.indyproject.org
--
In order to contact me remove the -nospam

Eric Grange

unread,
Jul 3, 2006, 3:27:51 AM7/3/06
to
> out of curiosity, if you remove those units and you need mathematical
> functions or list objects, what do you use? Self-created ones?

I use GLScene's VectorGeometry.pas for all common maths functions and
single-precision stuff, for the rest, they're coming from various
origin, from self-made to JEDI's, to Delphi versions of early Turbo
Pascal days or Fortran-era routines.

Eric

Peter Morris [Droopy eyes software]

unread,
Jul 3, 2006, 6:27:16 AM7/3/06
to
I have finally thought of the one that affects me the most. It's the time
it takes for the IDE to start up!


Leo

unread,
Jul 9, 2006, 12:47:00 PM7/9/06
to
I thought: what a great idea and an easy way to improve performance. I have
an algorithm that works with singles and calls Trunc and Round a lot. So I
included VectorGeometry in uses and tested the algorithm. It got 2.5 times
slower! Have you tested Trunc an Round performance? How would you explain my
results?
I used Delphi 6 and Pentium 4. Maybe it has something to do with the calling
convention and might be different on Delphi 2006.

"Eric Grange" <egra...@SPAMglscene.org> wrote in message
news:44a8...@newsgroups.borland.com...

Eric Grange

unread,
Jul 10, 2006, 1:47:10 AM7/10/06
to
> It got 2.5 times slower! Have you tested Trunc an Round performance?

Round, yes.
Trunc I never use in any performance critical situation, it was included because
some Delphi implementations of Trunc had a bug and would sabotage the 8087
Control Word whenever called, so Trunc & Frac are a bug fixes (IIRC D6 and up
had the functions fixed, but D5 is GLScene's baseline version).

> How would you explain my results?

Apart for a variation in branch target alignment, System.pas Round & Trunc
benefit from compiler magic (they get their parameter on the FPU stack), which
means VectorGeometry's implementation can't compete when you're rounding a
computation result (vs rounding a variable, function-call-returned value, etc).

Also watch for the GEOMETRY_NO_ASM option, if set, the functions are just piping
to System.pas Round and Trunc.

> I used Delphi 6 and Pentium 4. Maybe it has something to do with the calling
> convention and might be different on Delphi 2006.

There is a discussion about adding the ability to pass FPU parameters on the
stack, well, it's in there, but only for compiler magic functions... not for
user function (plus compiler magic gives allows them asm function to be inlined,
which we can't be done with user function even in D2006).

Eric

Leo

unread,
Jul 10, 2006, 12:02:41 PM7/10/06
to
> Trunc I never use in any performance critical situation

So how do you Truncate a value?
Round(x-0.5) for positive values?

I have x:single;//always positive
I need to get TruncX,FracX and RoundX

Right now I use:
TruncX:=Trunc(x);
RoundX:=Round(x);
FracX:=x-TruncX;

Any better solutions?


Eric Grange

unread,
Jul 10, 2006, 11:44:51 AM7/10/06
to
> So how do you Truncate a value?

I rarely ever have to truncate.

> Round(x-0.5) for positive values?

Yes. Though due to bankers rounding, you can have odd side effects if
you're truncating for mathematical purposes (like to mimic a Frac(x)).

When there is a real need to Trunc/Round a lot of value, I use 3DNow! or
SSE (and have the values arranged in a vector/array fashion). Trunc
being relegated to the slow no-asm codepath.

Alternatively, if you have sections with lots of Trunc and no Round, you
could consider changing the 8087 CW temporarily to truncation mode, and
then just invoke Round (which will then behave like Trunc).

> Any better solutions?

Don't have isolated Trunc/Round/Frac, but couple them with something
else. For instance if you have lots of computations that look like

Trunc(x*y+z)

Make yourself a TruncXYZ(x, y, z) that will do the mult+add and then
truncation in one go. Even better if x/y/z are vectors or arrays.
This way the asm to write stays simple and reusable enough while
involving enough work to overcome the call convention overhead.

Just make sure to avoid compiler inefficiency by passing your values in
records, arrays or var params (ie. by address).

Eric

0 new messages