How about this syntactic candy?

Hilton

unread,

Nov 24, 2007, 7:22:46 PM11/24/07

to

Hi,

for (int i = 0; i < list.Count; i++)

has a hidden performance hit; i.e. list.Count gets evaluated each time, so
we write something like:

int listCount = list.Count;
for (int i = 0; i < listCount; i++)

How about simply:

for (int i = 0; i < const list.Count; i++)

I think it looks kinda nice and is a simple and clean to improve performance
(or at least remove the performance hit). Thoughts?

Hilton
P.S.: Disclaimer, I'm not too familiar with the new C# changes so if
something similar is in there, let me know.

Arne Vajhøj

unread,

Nov 24, 2007, 7:33:23 PM11/24/07

to

Hilton wrote:
> for (int i = 0; i < list.Count; i++)
>
> has a hidden performance hit; i.e. list.Count gets evaluated each time, so
> we write something like:
>
> int listCount = list.Count;
> for (int i = 0; i < listCount; i++)
>
> How about simply:
>
> for (int i = 0; i < const list.Count; i++)
>
> I think it looks kinda nice and is a simple and clean to improve performance
> (or at least remove the performance hit). Thoughts?

I don't like it.

I think you should write the code as:

for (int i = 0; i < list.Count; i++)

which is the most readable code and leave the optimization
to the compiler (csc & clr jit).

They should optimize that call away not the programmer.

(I don't know if they already do)

Arne

Steve Thackery

unread,

Nov 24, 2007, 9:39:31 PM11/24/07

to

I agree with Arne. Do you actually *know* that list.Count gets evaluated
each time and the end result is *actually* slower?

I don't claim to know, but suspect there is a good chance the optimisation
will sort it all out anyway. I once saw a demo of a for loop all compressed
into one line with ingenious shortcuts and contractions, and after the
optimising compiler had been through it, it generated EXACTLY the same
machine code as an alternative loop written out in full.

In fact, I vaguely remember someone explaining that code which is highly
optimised by the writer can even make the optimising compiler produce
*worse* code than a simple, but verbose, version of the algorithm. I can't
remember the product, unfortunately. Could it have been an old Delphi?

I've always believed that optimising at the written code stage is:

1/ undesirable, because it makes the code much harder to understand, debug
and maintain

2/ unlikely to work anyway

SteveT

Arne Vajhøj

unread,

Nov 24, 2007, 10:09:08 PM11/24/07

to

I made a small test. It seems as the overhead by having
.Count in the for loop is about 0.2 nanoseconds per
iteration on my 3 year old PC. That is very little.

Arne

Family Tree Mike

unread,

Nov 24, 2007, 10:43:00 PM11/24/07

to

"Hilton" wrote:

What should a compiler say about this code (assuming all control loops would
be enhanced):

while (const list.Count > 0)
{
list.RemoveAt(0);
}

Basically, the while control could be thought of as valid, as the const
evaluates to true or false at runtime, so no compiler error. The line that
removes an item "violates" the const condition, so is this flagged as a
compiler error or a runtime error or neither, where the last case leads to
debugging nightmares?

Just a first reaction. I agree with the others, that in a for loop, the
readable version is probably optimized.

Hilton

unread,

Nov 24, 2007, 11:52:55 PM11/24/07

to

"Family Tree Mike" <FamilyT...@discussions.microsoft.com> wrote in
message news:9071BC43-446A-414C...@microsoft.com...

> "Hilton" wrote:
>
>> Hi,
>>
>> for (int i = 0; i < list.Count; i++)
>>
>> has a hidden performance hit; i.e. list.Count gets evaluated each time,
>> so
>> we write something like:
>>
>> int listCount = list.Count;
>> for (int i = 0; i < listCount; i++)
>>
>> How about simply:
>>
>> for (int i = 0; i < const list.Count; i++)
>>
>> I think it looks kinda nice and is a simple and clean to improve
>> performance
>> (or at least remove the performance hit). Thoughts?
>>
>> Hilton
>> P.S.: Disclaimer, I'm not too familiar with the new C# changes so if
>> something similar is in there, let me know.
>>
>>
>>
>
> What should a compiler say about this code (assuming all control loops
> would
> be enhanced):
>
> while (const list.Count > 0)
> {
> list.RemoveAt(0);
> }

Well, this is a bug and not what I was proposing anyway. My suggestions was
(only) to be used in "for" loops.

> Basically, the while control could be thought of as valid, as the const
> evaluates to true or false at runtime, so no compiler error. The line
> that
> removes an item "violates" the const condition, so is this flagged as a
> compiler error or a runtime error or neither, where the last case leads to
> debugging nightmares?

Even if it was allowed here, it definitely cannot be avaluated at compile,
but before we go off on this tangent, I was only proposing this for a "for"
loop.

> Just a first reaction. I agree with the others, that in a for loop, the
> readable version is probably optimized.

I suggest you read:
http://msdn2.microsoft.com/en-us/library/ms998547.aspx
"Note that if these are fields, it may be possible for the compiler to do
this optimization automatically. If they are properties, it is much less
likely. If the properties are virtual, it cannot be done automatically."

Hilton

unread,

Nov 24, 2007, 11:55:03 PM11/24/07

to

Arne wrote:

> I made a small test. It seems as the overhead by having
> .Count in the for loop is about 0.2 nanoseconds per
> iteration on my 3 year old PC. That is very little.

That information by itself is meaningless. Did the loop run once or a
billion times? What were you doing in the loop? Was the entirely optimized
out by the compiler? etc etc etc...

I'm not saying that the call time is insignificant as you suggest, just that
what you wrote does not tell the full story.

Hilton

Arne Vajhøj

unread,

Nov 25, 2007, 12:00:29 AM11/25/07

to

Hilton wrote:
> Arne wrote:
>> I made a small test. It seems as the overhead by having
>> .Count in the for loop is about 0.2 nanoseconds per
>> iteration on my 3 year old PC. That is very little.
>
> That information by itself is meaningless. Did the loop run once or a
> billion times? What were you doing in the loop? Was the entirely optimized
> out by the compiler? etc etc etc...

10 billion times.

A single if statement, but I would not expect the overhead per
iteration to change due to the content of the loop.

It was obviously not entirely optimized out.

Arne

Peter Duniho

unread,

Nov 25, 2007, 12:32:47 AM11/25/07

to

On 2007-11-24 18:39:31 -0800, "Steve Thackery" <nob...@nowhere.com> said:

> I agree with Arne. Do you actually *know* that list.Count gets
> evaluated each time and the end result is *actually* slower?

list.Count had _better_ get evaluated each time. There's nothing about
C# that suggests that it should assume the value in the loop continue
condition won't change. Unless the code in the loop is so simple that
it is guaranteed that the list.Count property won't change during the
execution of the loop, optimizing the expression to avoid reevaluating
list.Count would be wrong.

> I don't claim to know, but suspect there is a good chance the
> optimisation will sort it all out anyway. I once saw a demo of a for
> loop all compressed into one line with ingenious shortcuts and
> contractions, and after the optimising compiler had been through it, it
> generated EXACTLY the same machine code as an alternative loop written
> out in full.

There may be a situation where the compiler cannot tell that list.Count
is constant, but where it actually is. In that case, I could see the
utility in providing the compiler with an explicit statement to that
effect, but it seems to me that the current method required is so easy
and simple, I can't imagine the point in adding something extra to the
language to support the behavior.

One thing I like very much about C# is its simplicity. I think people
can get carried away trying to add all their favorite "syntactical
sugar" to the language, weighing it down with all sorts of stuff that
just makes things harder on the compiler, the spec writers, and the
developers. Things that are added to the language should be _really_
important and provide some significant functionality. I don't think
this example applies.

> In fact, I vaguely remember someone explaining that code which is
> highly optimised by the writer can even make the optimising compiler
> produce *worse* code than a simple, but verbose, version of the
> algorithm. I can't remember the product, unfortunately. Could it have
> been an old Delphi?

I think that's a general truth, regardless of language. Optimizing
compilers often can take advantage of commonly used code structures.
If you write code that's "sneaky", where you've tried to create some
optimization outside of the compiler, it's possible that the compiler
could fail to be able to actually generate optimal code.

I would expect that to be a rare situation. Compiler technology has
gotten extremely good. But I wouldn't rule it out.

The point is somewhat moot anyway. After all, you shouldn't be
optimizing code that doesn't need optimizing anyway. Write it readable
first, optimize if and when there's a specific performance issue that
needs to be fixed.

> I've always believed that optimising at the written code stage is:
>
> 1/ undesirable, because it makes the code much harder to understand,
> debug and maintain
>
> 2/ unlikely to work anyway

I agree. :)

Pete

Peter Duniho

unread,

Nov 25, 2007, 12:42:40 AM11/25/07

to

On 2007-11-24 20:52:55 -0800, "Hilton" <nos...@nospam.com> said:

>> What should a compiler say about this code (assuming all control loops
>> would be enhanced):
>>
>> while (const list.Count > 0)
>> {
>> list.RemoveAt(0);
>> }
>
> Well, this is a bug and not what I was proposing anyway. My suggestions was
> (only) to be used in "for" loops.

Well, then what would happen if you wrote this:

for ( ; const list.Count > 0; list.RemoveAt(0));

Not a syntax I'd prefer, but perfectly legal assuming "const" is a
legal keyword in a for() statement.

When you write "const" what is the compiler supposed to assume is
constant? What happens if it's not actually constant?

> [...]

>> Just a first reaction. I agree with the others, that in a for loop, the
>> readable version is probably optimized.
>
> I suggest you read:
> http://msdn2.microsoft.com/en-us/library/ms998547.aspx
> "Note that if these are fields, it may be possible for the compiler to do
> this optimization automatically. If they are properties, it is much less
> likely. If the properties are virtual, it cannot be done automatically."

Well, it all depends. I do agree that properties are more difficult.
But in many cases they will be simple retrievals of a field, and if
it's a class for which the property can be inlined, it can be optimized
just as a field could be.

Given that the programmer would still be responsible for identifying
constant expressions manually anyway, and given the fact that this new
use of the "const" keyword would add complications for the compiler
even as it allows for one specific kind of optimization, it would be
better to just let the programmer take advantage of the language as it
exists now rather than adding a new syntax to it.

Pete

Hilton

unread,

Nov 25, 2007, 1:40:07 AM11/25/07

to

Did you use an array or a list? Anyway, here are the results on my machine
and the results are significant performance-wise:

foreach takes 3.7 seconds
list.Count takes 1.6 seconds
const list.Count take 0.88 seconds

So, even if "const" never gets adopted, using a temporary variable
(especially for simple loops) seems to achieve 2x results. The temporary
variable is the same as my proposed const and it is exactly what the
compiler would do anyway. I bet that most for loops loop to a constant and
that we're all losing a lot of effiency by using "i < list.Count" in the for
condition.

Here is the code - I used an ArrayList so the length wasn't fixed as with an
array.

using System;
using System.Collections;

namespace Test
{
class Class1
{
[STAThread]
static void Main(string[] args)
{
int LOOP_COUNT = 100000;

IList list = new ArrayList ();

for (int i = 0; i < 1024; i++)
{
list.Add (string.Empty);
}

int x = 0;

DateTime d0 = DateTime.Now;
for (int loop = 0; loop < LOOP_COUNT; loop++)
{
foreach (object o in list)
{
if (o == null) // if (IsNull (o, list))
{
x++;
}
}
}
DateTime d1 = DateTime.Now;

DateTime d2 = DateTime.Now;
for (int loop = 0; loop < LOOP_COUNT; loop++)
{
for (int i = 0; i < 1024; i++)
{
if (list [i] == null) // if (IsNull (i, list))
{
x++;
}
}
}
DateTime d3 = DateTime.Now;

DateTime d4 = DateTime.Now;
for (int loop = 0; loop < LOOP_COUNT; loop++)
{

for (int i = 0; i < list.Count; i++)

{
if (list [i] == null) // if (IsNull (i, list))
{
x++;
}
}
}
DateTime d5 = DateTime.Now;

Console.WriteLine ((d1 - d0).TotalSeconds.ToString ("F3"));
Console.WriteLine ((d3 - d2).TotalSeconds.ToString ("F3"));
Console.WriteLine ((d5 - d4).TotalSeconds.ToString ("F3"));
Console.WriteLine (x);
}

static bool IsNull (object o, IList list)
{
return o == null;
}

static bool IsNull (int i, IList list)
{
return list [i] == null;
}
}
}

Hilton

unread,

Nov 25, 2007, 1:52:24 AM11/25/07

to

Peter Duniho wrote:
> On 2007-11-24 20:52:55 -0800, "Hilton" <nos...@nospam.com> said:
>
>>> What should a compiler say about this code (assuming all control loops
>>> would be enhanced):
>>>
>>> while (const list.Count > 0)
>>> {
>>> list.RemoveAt(0);
>>> }
>>
>> Well, this is a bug and not what I was proposing anyway. My suggestions
>> was
>> (only) to be used in "for" loops.
>
> Well, then what would happen if you wrote this:
>
> for ( ; const list.Count > 0; list.RemoveAt(0));
>
> Not a syntax I'd prefer, but perfectly legal assuming "const" is a legal
> keyword in a for() statement.
>
> When you write "const" what is the compiler supposed to assume is
> constant? What happens if it's not actually constant?

I never claimed to develop a syntax that prevented buggy code being written.
:) Anyway, your example would not compile - you changed my suggestion. I
suggested: "for (int i = 0; i < const list.Count; i++)" and I bet the
compiler writers could add this very very simply.

C# and other languages are full of syntactic candy. e.g. A ? B : C. What
about the foreach to loop through stuff even when it is MUCH slower than a
simple for loop. My suggestion speeds things up 2x (see my other post), is
easy to read, and adds no new keywords. For the record, I know the exact
same thing can be done with a temporary variable, but IMHO, "const" is much
easier to read.

Thanks,

Hilton

Jon Skeet [C# MVP]

unread,

Nov 25, 2007, 3:16:46 AM11/25/07

to

Hilton <nos...@nospam.com> wrote:
> Did you use an array or a list? Anyway, here are the results on my machine
> and the results are significant performance-wise:
>
> foreach takes 3.7 seconds
> list.Count takes 1.6 seconds
> const list.Count take 0.88 seconds
>
> So, even if "const" never gets adopted, using a temporary variable
> (especially for simple loops) seems to achieve 2x results. The temporary
> variable is the same as my proposed const and it is exactly what the
> compiler would do anyway. I bet that most for loops loop to a constant and
> that we're all losing a lot of effiency by using "i < list.Count" in the for
> condition.

"A lot of efficiency"? Only if you habitually write loops where the
dominant factor is the looping part itself.

In most code this would be lost in the noise - and where possible, I'd
use foreach despite it being (gasp!) 4 times as slow in your test.

Micro-optimise where you have a bottleneck, not before.

--
Jon Skeet - <sk...@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
World class .NET training in the UK: http://iterativetraining.co.uk

Ian Semmel

unread,

Nov 25, 2007, 3:51:01 AM11/25/07

to

Now all you have to do is find an application where this "performance
hit" is measurable.

A good program is one which is readily readable and easy to change. If
you are going to waste your time trying to optimize source code in this
manner, you will never get anything done.

While this loop of yours is running, windows would be processing about
10 zillion messages, writing to the monitor, processing interupts,
executing other programs etc

Peter Duniho

unread,

Nov 25, 2007, 4:01:26 AM11/25/07

to

On 2007-11-24 22:52:24 -0800, "Hilton" <nos...@nospam.com> said:

> I never claimed to develop a syntax that prevented buggy code being written.

What's the point of "const" here if it's not enforced? The only thing
I can think of that's worse than adding a rarely needed, rarely
beneficial syntax is adding one that actually allows one to lie to the
compiler in a way that _breaks_ the code.

> :) Anyway, your example would not compile - you changed my suggestion. I
> suggested: "for (int i = 0; i < const list.Count; i++)" and I bet the
> compiler writers could add this very very simply.

What's the difference? Are you proposing that the only legal variation
of the syntax would be "[comparison operator] const"? If so, then I
submit that's an even more limited addition (and thus unwarranted) than
I'd originally thought you were talking about.

If not, then perhaps you could be more clear about what it is exactly
you're proposing. So far, you haven't been very specific.

> C# and other languages are full of syntactic candy.

I never said there's never any use for it. But it should add real
value, rather than being some sort of gimmick.

> e.g. A ? B : C.

Well, first of all, lots of languages do okay without that. But more
significantly, IMHO, the reason that syntax has survived so many
generations of this family of languages is that it's so commonly
useful. It _significantly_ abbreviates the code, but in a way that's
still quite readable. More important, it provides a syntax to do
something that would simply not be practical any other way.

The primary alternative would be to write a function to perform the
test. It's not just some abbreviated way to write something the
language supports via other means. It provides a way of representing
an expression with a conditional value, without the overhead of having
to call a function.

In C you could hack it together by taking advantage of a boolean
expression evaluating to 0 or 1, but it would be significantly _less_
readable that way, and much more complicated to write. In C# it gets
even worse, because you'd have to get around the languages reluctance
to treat the bool type as a numerical type in the first place.

IMHO, your suggestion doesn't even come close to solving the kind of
issues that an expression like "A ? B : C" solves.

> What
> about the foreach to loop through stuff even when it is MUCH slower than a
> simple for loop.

What about it? It's true, IEnumerable is slower than the usual simple
mechanisms available via an indexing for() loop. So what? For one,
there's a real code-correctness and maintainability value in having a
typed enumeration of some collection, and for another it would be
unusual for the speed difference to have any real significance in your
code.

And in those rare cases when it does, then fine...write it as an
indexed for() loop instead.

But first of all, don't waste time optimizing until you have shown
there's a need. Sound familiar? And secondly, what's foreach() got to
do with this thread? If you're using foreach(), you're not going to be
able to use "const" as you suggest anyway.

> My suggestion speeds things up 2x (see my other post),

It speeds things up 2x in a very specific test, and a degenerate one at
that. I doubt it would have anywhere close to that kind of difference
in code that actually did something interesting.

> is easy to read,

That's definitely an "eye of the beholder" thing.

> and adds no new keywords.

It does add a new interpretation of an existing keyword, and especially
in the context of C# it's a fairly orthogonal interpretation at that
("const" not having all of the same uses in C# that it has in C++).

> For the record, I know the exact
> same thing can be done with a temporary variable, but IMHO, "const" is much
> easier to read.

Again, "eye of the beholder". I find that it obscures a key detail
with respect to the implementation, one that's obvious and hard-coded
rather than assumed when a local variable is used.

In any case, when you write code that copies a value to a local, you
are _guaranteeing_ the compiler that the value won't change. No such
guarantee is made if you insert "const" in the condition for the loop,
and that's a definite code-correctness problem.

I will pick maintainable and correct code over fast every single time. YMMV.

Pete

Peter Duniho

unread,

Nov 25, 2007, 4:17:38 AM11/25/07

to

On 2007-11-25 01:01:26 -0800, Peter Duniho <NpOeS...@NnOwSlPiAnMk.com> said:

> [...]

> In any case, when you write code that copies a value to a local, you
> are _guaranteeing_ the compiler that the value won't change. No such
> guarantee is made if you insert "const" in the condition for the loop,
> and that's a definite code-correctness problem.

And just to be clear, in case the way I've put this was too ambiguous:

I do understand that you can still wind up writing a bug even if you
copy the variable to a local. My point is that the "const" keyword is
traditionally used in places where the compiler can actually ensure the
"const"-ness of the affected identifier, which can't be done here. As
well, using a local variable would IMHO make such a bug much more
obvious than something that _looks_ like it's more than just a hint to
the compiler.

Pete

Marc Gravell

unread,

Nov 25, 2007, 4:56:21 AM11/25/07

to

For info, I treaked your sample to use a stopwatch, and rather than
hard-coding the 1024 in the middle test, I used:

int j = list.Count;
for (int i = 0; i < j; i++) {...}

since this is what you would need to do anyway (we don't know the size
and compile-time). My results don't reproduce your 1.6 => 0.88
observations (I'm using release mode, C# 3, .NET 2.0, command line (no
debugger)):

6047316
1400371
1693672

(numbers are ticks)

OK, it is faster (between 2 & 3) but not by as much as you hinted.

Marc

Hilton

unread,

Nov 25, 2007, 6:14:01 AM11/25/07

to

Peter Duniho wrote:
> On 2007-11-24 22:52:24 -0800, "Hilton" <nos...@nospam.com> said:
>
>> I never claimed to develop a syntax that prevented buggy code being
>> written.
>
> What's the point of "const" here if it's not enforced? The only thing I
> can think of that's worse than adding a rarely needed, rarely beneficial
> syntax is adding one that actually allows one to lie to the compiler in a
> way that _breaks_ the code.

What are you talking about? Lying? Breaking the code?

>> :) Anyway, your example would not compile - you changed my suggestion.
>> I
>> suggested: "for (int i = 0; i < const list.Count; i++)" and I bet the
>> compiler writers could add this very very simply.
>
> What's the difference? Are you proposing that the only legal variation of
> the syntax would be "[comparison operator] const"? If so, then I submit
> that's an even more limited addition (and thus unwarranted) than I'd
> originally thought you were talking about.
>
> If not, then perhaps you could be more clear about what it is exactly
> you're proposing. So far, you haven't been very specific.

In the post to which you're repying, I wrote: "I suggested: "for (int i = 0;
i < const list.Count; i++)""

>> C# and other languages are full of syntactic candy.
>
> I never said there's never any use for it. But it should add real value,
> rather than being some sort of gimmick.
>
>> e.g. A ? B : C.
>
> Well, first of all, lots of languages do okay without that. But more
> significantly, IMHO, the reason that syntax has survived so many
> generations of this family of languages is that it's so commonly useful.
> It _significantly_ abbreviates the code, but in a way that's still quite
> readable. More important, it provides a syntax to do something that would
> simply not be practical any other way.

What? Come on Peter...

x = A ? B : C can easily be written with an "if". If you look at the code,
that is exactly what compiler generally do anyway.

[I've ignored formatting here]
if (A) { x = B; } else {x = C; }

[zap]

> IMHO, your suggestion doesn't even come close to solving the kind of
> issues that an expression like "A ? B : C" solves.

"A ? B : C" solves no 'issues' - it is pure syntactic candy.

> But first of all, don't waste time optimizing until you have shown there's
> a need. Sound familiar?

Well since you don't develop for the Compact Framework, you clearly don't
need to optimize. (Yes, I really am just kidding here...)

Peter, C# has added so much new stuff in the latter versions that I
personally feel that C#'s readability has taken a huge nose-dive. As you
say, that is in the eye of the beer-holder, uhh, I mean beholder. I simply
suggested one little syntax change. It really isn't such a big deal.

Hilton

Jon Skeet [C# MVP]

unread,

Nov 25, 2007, 7:45:22 AM11/25/07

to

Hilton <nos...@nospam.com> wrote:
> > I never said there's never any use for it. But it should add real value,
> > rather than being some sort of gimmick.
> >
> >> e.g. A ? B : C.
> >
> > Well, first of all, lots of languages do okay without that. But more
> > significantly, IMHO, the reason that syntax has survived so many
> > generations of this family of languages is that it's so commonly useful.
> > It _significantly_ abbreviates the code, but in a way that's still quite
> > readable. More important, it provides a syntax to do something that would
> > simply not be practical any other way.
>
> What? Come on Peter...
>
> x = A ? B : C can easily be written with an "if". If you look at the code,
> that is exactly what compiler generally do anyway.
>
> [I've ignored formatting here]
> if (A) { x = B; } else {x = C; }

That's fine if it's the only thing in the statement - but often the
result is used as a method parameter, etc. For instance:

Console.WriteLine ("You have {0} order{1} remaining.",
orders.Count, orders.Count==1 ? "" : "s");

You could have multiple parameters which are conditionalised this way,
at which point the benefit of using the conditional operator can be
significant.

<snip>

> Well since you don't develop for the Compact Framework, you clearly don't
> need to optimize. (Yes, I really am just kidding here...)
>
> Peter, C# has added so much new stuff in the latter versions that I
> personally feel that C#'s readability has taken a huge nose-dive. As you
> say, that is in the eye of the beer-holder, uhh, I mean beholder. I simply
> suggested one little syntax change. It really isn't such a big deal.

Every syntax change has to justify itself - and the slight gain in
performance (insignificant in most cases) isn't worth it IMO.

The changes in C# 3 actually *vastly* improve readability once you're
used to them. While the new elements such as lambda expressions are
"strangers" to you then yes, it'll be harder to read. The
expressiveness of LINQ etc is really great though, IMO. Just wait until
you've used it for a while. I get frustrated when I write C# 2 these
days...

Steve Thackery

unread,

Nov 25, 2007, 7:42:02 AM11/25/07

to

Although Hilton's results disprove my arguments ("instincts" would be a
better term), I want to congratulate him on actually doing some
measurements!

There's a tongue-in-cheek expression - "Don't let the facts ruin a perfectly
good theory" - which we are probably all guilty of at some time or other.
Lots of our debates in this forum are based on opinions rather than facts,
so it's a pleasant contrast to see some real evidence in support of
someone's position!

SteveT

Jon Skeet [C# MVP]

unread,

Nov 25, 2007, 8:20:09 AM11/25/07

to

True - although it's worth saying that the tests need to be performed
on real-world code in order to prove whether or not they're
significant. Most loops actually do real work, and that's where the
time goes. I would be surprised to see a change like Hilton's making a
significant difference to much genuine code. (It's not impossible, but
I believe it's rare.)

Lew

unread,

Nov 25, 2007, 8:38:21 AM11/25/07

to

Hilton wrote:
> In the post to which you're repying, I wrote: "I suggested: "for (int i = 0;
> i < const list.Count; i++)""

As many have pointed out, the problem is that list.Count is not constant. It
can change between iterations. Adding "const" to it is therefore a lie, and
breaks the code.

--
Lew

Arne Vajhøj

unread,

Nov 25, 2007, 11:08:38 AM11/25/07

to

Hilton wrote:
> Did you use an array or a list? Anyway, here are the results on my machine
> and the results are significant performance-wise:
>
> foreach takes 3.7 seconds
> list.Count takes 1.6 seconds
> const list.Count take 0.88 seconds
>
> So, even if "const" never gets adopted, using a temporary variable
> (especially for simple loops) seems to achieve 2x results. The temporary
> variable is the same as my proposed const and it is exactly what the
> compiler would do anyway. I bet that most for loops loop to a constant and
> that we're all losing a lot of effiency by using "i < list.Count" in the for
> condition.

1)

for (int loop = 0; loop < LOOP_COUNT; loop++)
{
for (int i = 0; i < 1024; i++)
{
if (list [i] == null) // if (IsNull (i, list))
{
x++;
}
}
}

is not what you suggested. You suggested:

for (int loop = 0; loop < LOOP_COUNT; loop++)
{

int n = list.Count;
for (int i = 0; i < n; i++)

{
if (list [i] == null) // if (IsNull (i, list))
{
x++;
}
}
}

2)

Somehow I think you used .NET 1.1 and not 2.0:

C:\>csc /o+ z.cs
Microsoft (R) Visual C# .NET Compiler version 7.10.6001.4
for Microsoft (R) .NET Framework version 1.1.4322
Copyright (C) Microsoft Corporation 2001-2002. All rights reserved.

C:\>z
2,781
0,656
1,234
0

C:\>csc /o+ z.cs
Microsoft (R) Visual C# 2005 Compiler version 8.00.50727.1433
for Microsoft (R) Windows (R) 2005 Framework version 2.0.50727
Copyright (C) Microsoft Corporation 2001-2005. All rights reserved.

C:\>z
2,344
0,719
1,172
0

the relative gap narrows a bit with a recent CLR.

3)

The relative gap is rather uninteresting.

It is still only 5 nanoseconds per iteration.

How many apps would that have an impact on total performance ??

4)

Changing:

IList list = new ArrayList ();

to:

List<string> list = new List<string> ();

gives:

1,000
0,219
0,297
0

which is still a relative 35% difference, but the absolute gap
has been reduced to 1 nanosecond per iteration.

Using a newer CLR and a newer library is much more important
than teaking the code manually.

Arne

rossum

unread,

Nov 25, 2007, 1:50:27 PM11/25/07

to

On Sat, 24 Nov 2007 16:22:46 -0800, "Hilton" <nos...@nospam.com>
wrote:

>Hi,
>
>for (int i = 0; i < list.Count; i++)
>
>has a hidden performance hit; i.e. list.Count gets evaluated each time, so
>we write something like:
>
>int listCount = list.Count;
>for (int i = 0; i < listCount; i++)

You might be able to help the compiler more by adding a const to the
decalration:

const int listCount = list.Count;

for (int i = 0; i < listCount; i++)

rossum

Lew

unread,

Nov 25, 2007, 4:23:53 PM11/25/07

to

rossum wrote:
> You might be able to help the compiler more by adding a const to the
> decalration:
>
> const int listCount = list.Count;
> for (int i = 0; i < listCount; i++)

Of course, if the list size changes during the loop this will cause trouble.
This is a fine idiom for when you know the size will not change.

--
Lew

Peter Duniho

unread,

Nov 25, 2007, 5:03:47 PM11/25/07

to

On 2007-11-25 03:14:01 -0800, "Hilton" <nos...@nospam.com> said:

>> What's the point of "const" here if it's not enforced? The only thing I
>> can think of that's worse than adding a rarely needed, rarely beneficial
>> syntax is adding one that actually allows one to lie to the compiler in a
>> way that _breaks_ the code.
>
> What are you talking about? Lying? Breaking the code?

If the compiler does not enforce the "const"-ness of the expression,
then it's possible for the programmer to write "const" even when it's
not. Instant bug, due to an invalid optimization.

I'm not aware of any other use of the keyword "const" where that's
possible, at least not without some explicit "unsafe" code on the part
of the programmer (like casting away the "const" attribute).

>>> :) Anyway, your example would not compile - you changed my suggestion.
>>> I
>>> suggested: "for (int i = 0; i < const list.Count; i++)" and I bet the
>>> compiler writers could add this very very simply.
>>
>> What's the difference? Are you proposing that the only legal variation of
>> the syntax would be "[comparison operator] const"? If so, then I submit
>> that's an even more limited addition (and thus unwarranted) than I'd
>> originally thought you were talking about.
>>
>> If not, then perhaps you could be more clear about what it is exactly
>> you're proposing. So far, you haven't been very specific.
>
> In the post to which you're repying, I wrote: "I suggested: "for (int i = 0;
> i < const list.Count; i++)""

Yes, but you're using a single example to try to explain some more
general rule. What is your more general rule? Both FTM and I have
proposed other examples of code that would conform to _a_ general rule
that includes your example, but you are unsatisfied with those examples.

So what _is_ your general rule that describes what a legal syntax would
be, if not one of the rules we've inferred so far?

Describe the grammar. Providing a single code example is insufficient,
because language specifications rely on grammar descriptions, not code
examples.

>>> C# and other languages are full of syntactic candy.
>>
>> I never said there's never any use for it. But it should add real value,
>> rather than being some sort of gimmick.
>>
>>> e.g. A ? B : C.
>>
>> Well, first of all, lots of languages do okay without that. But more
>> significantly, IMHO, the reason that syntax has survived so many
>> generations of this family of languages is that it's so commonly useful.
>> It _significantly_ abbreviates the code, but in a way that's still quite
>> readable. More important, it provides a syntax to do something that would
>> simply not be practical any other way.
>
> What? Come on Peter...
>
> x = A ? B : C can easily be written with an "if". If you look at the code,
> that is exactly what compiler generally do anyway.

Wrong.

> [I've ignored formatting here]
> if (A) { x = B; } else {x = C; }

The expression is "A ? B : C". Your code is not the expression, it's a
different way of achieving the assignment to the variable "x".

It's true that there is an alternative way to represent "x = A ? B :
C", but a) that's not what was originally being discussed (you've added
the assignment), and b) the new representation isn't actually the
literal equivalent of the original. The original uses a conditional
expression in a single assignment to a variable "x", where as your
"alternative" uses a conditional expression to choose between two
different code paths assigning to the variable "x".

The difference is subtle and I don't blame you for missing it. But it
does exist, and it is important.

As Jon pointed out, that is just one example of how the expression
could be used. But an if() statement isn't an expression and can't be
used where an expression can, unlike the "?:" operation. While in the
above example, you're correct that you can essentially achieve the same
thing, there are lots of examples that cannot be successfully converted
like that, and even in this case the resulting code is not necessarily
identical.

> [zap]
>> IMHO, your suggestion doesn't even come close to solving the kind of
>> issues that an expression like "A ? B : C" solves.
>
> "A ? B : C" solves no 'issues' - it is pure syntactic candy.

Of course it solves issues. I haven't even said your suggestion
doesn't solve an issue. It just doesn't solve a very important issue,
where as the issue addressed by "A ? B : C" is quite common, and very
important in that there really is no readable, concise equivalent
without such a syntax.

>> But first of all, don't waste time optimizing until you have shown there's
>> a need. Sound familiar?
>
> Well since you don't develop for the Compact Framework, you clearly don't
> need to optimize. (Yes, I really am just kidding here...)
>
> Peter, C# has added so much new stuff in the latter versions that I
> personally feel that C#'s readability has taken a huge nose-dive. As you
> say, that is in the eye of the beer-holder, uhh, I mean beholder. I simply
> suggested one little syntax change. It really isn't such a big deal.

What's not a big deal? IMHO, it'd be pretty dumb to add it to the
language and if it were added I'd say that'd be a pretty big deal.
Fortunately, I doubt those in charge of the language would ever do
something like that.

But is your suggestion itself a big deal? No...I agree, it's not.
Frankly, I have no idea why you've invested so much effort in defending
it.

It's pretty funny, actually. It seems like the people most likely to
refuse to accept any valid commentary regarding their suggestions are
the people who write things like "Thoughts?" as if they are actually
open to commentary. It appears to me that you aren't actually looking
for commentary; you're only interested in someone agreeing with you and
if they don't, well then you just don't have any use for their
contribution.

Well, good luck with that.

Pete

Hilton

unread,

Nov 25, 2007, 6:43:31 PM11/25/07

to

Jon wrote:

> The changes in C# 3 actually *vastly* improve readability once you're
> used to them. While the new elements such as lambda expressions are
> "strangers" to you then yes, it'll be harder to read. The
> expressiveness of LINQ etc is really great though, IMO. Just wait until
> you've used it for a while. I get frustrated when I write C# 2 these
> days...

That's what I feel like now when I write Java. Sounds corny but C# is a
very clean and neat language. For example "person.setAge
(person.getAge()+1)" is simply person.Age++, that's nice and very readable.
I really haven't got into C#2 or C#3 at all since I'm targetting CF1
devices, pity I like some (all?) of the new stuff, but some of it looks
C++ish which I found to be 'ugly'. Again, just my initial impressions of
generics etc. As you point out, after using it for a while, I'd probably
change my opinion on it.

Keep well,

Hilton

unread,

Nov 25, 2007, 7:23:59 PM11/25/07

to

Arne,

Arne Vajhøj wrote:
> 1)
>
> for (int loop = 0; loop < LOOP_COUNT; loop++)
> {
> for (int i = 0; i < 1024; i++)
> {
> if (list [i] == null) // if (IsNull (i, list))
> {
> x++;
> }
> }
> }
>
> is not what you suggested. You suggested:
>
> for (int loop = 0; loop < LOOP_COUNT; loop++)
> {
> int n = list.Count;
> for (int i = 0; i < n; i++)
> {
> if (list [i] == null) // if (IsNull (i, list))
> {
> x++;
> }
> }
> }

Good point, that's for pointing that out and running the numbers.

> 2)
>
> Somehow I think you used .NET 1.1 and not 2.0:
>
> C:\>csc /o+ z.cs
> Microsoft (R) Visual C# .NET Compiler version 7.10.6001.4
> for Microsoft (R) .NET Framework version 1.1.4322
> Copyright (C) Microsoft Corporation 2001-2002. All rights reserved.
>
>
> C:\>z
> 2,781
> 0,656
> 1,234
> 0
>
> C:\>csc /o+ z.cs
> Microsoft (R) Visual C# 2005 Compiler version 8.00.50727.1433
> for Microsoft (R) Windows (R) 2005 Framework version 2.0.50727
> Copyright (C) Microsoft Corporation 2001-2005. All rights reserved.
>
>
> C:\>z
> 2,344
> 0,719
> 1,172
> 0

Very nice that two of them got faster, but the 2nd got slower - interesting,
but of course, it could just accuracy of the timing.

> the relative gap narrows a bit with a recent CLR.

That's good to see, I wonder why; i.e. did they speed up method calls or do
some unrolling. The unrolling would cause the code to get larger and might
narrow the gap too (as we see above)

> 3)
>
> The relative gap is rather uninteresting.
>
> It is still only 5 nanoseconds per iteration.
>
> How many apps would that have an impact on total performance ??
>
> 4)
>
> Changing:
>
> IList list = new ArrayList ();
>
> to:
>
> List<string> list = new List<string> ();
>
> gives:
>
> 1,000
> 0,219
> 0,297
> 0

This is interesting. I don't have the later Framekworks installed. If you
could list the IL on these that would be interesting.

Hilton

unread,

Nov 25, 2007, 7:45:51 PM11/25/07

to

Peter wrote:
> If the compiler does not enforce the "const"-ness of the expression, then
> it's possible for the programmer to write "const" even when it's not.
> Instant bug, due to an invalid optimization.

If the compiler doesn't enforce x++ by incrementing x, then it is my bug?

[zap]

>> x = A ? B : C can easily be written with an "if". If you look at the
>> code,
>> that is exactly what compiler generally do anyway.
>
> Wrong.
>
>> [I've ignored formatting here]
>> if (A) { x = B; } else {x = C; }
>
> The expression is "A ? B : C". Your code is not the expression, it's a
> different way of achieving the assignment to the variable "x".
>
> It's true that there is an alternative way to represent "x = A ? B : C",
> but a) that's not what was originally being discussed (you've added the
> assignment), and b) the new representation isn't actually the literal
> equivalent of the original. The original uses a conditional expression in
> a single assignment to a variable "x", where as your "alternative" uses a
> conditional expression to choose between two different code paths
> assigning to the variable "x".
>
> The difference is subtle and I don't blame you for missing it. But it
> does exist, and it is important.

Thanks for your concern, but it really is the same. Using Jon's example, I
expanded a bit to use his code, plus I added the same version using a basic
good ol' "if" statement. You'll be amazed. See how the "if" and the
"non-if" version are essentially the same? That's an "if" statement right
there, the results go to a 'real' variable, temporary variable, stack (if
any language), but it is still an "if" statement. Yet when I said: "If you
look at the code, that is exactly what compiler generally do anyway.", you
said "Wrong". If you're just trying to prove me wrong all the time, perhaps
you should spend a few minutes of research first. In another thread you
said that Dictionaries would just be a very little bit slower that arrays
and therefore my suggestion was not good. But, did you post any timing, any
code, anything? No. Nothing at all to back up your claims and your
'accusations' of my being wrong.

static void blob ()
{
int count = 2;

string plural;
if (count==1)
{
plural = "";
}
else
{
plural = "s";
}

Console.WriteLine ("You have {0} order{1} of 23 order{2}remaining.",
count, count==1 ? "" : "s", plural);
}

.method private hidebysig static void 'blob'() cil managed
{
// Code size 54 (0x36)
.maxstack 4
.locals init (int32 V_0,
string V_1)
IL_0000: ldc.i4.2
IL_0001: stloc.0
IL_0002: ldloc.0
IL_0003: ldc.i4.1
IL_0004: bne.un.s IL_000e
IL_0006: ldstr ""
IL_000b: stloc.1
IL_000c: br.s IL_0014
IL_000e: ldstr "s"
IL_0013: stloc.1
IL_0014: ldstr "You have {0} order{1} of 23 order{2}remaining."
IL_0019: ldloc.0
IL_001a: box [mscorlib]System.Int32
IL_001f: ldloc.0
IL_0020: ldc.i4.1
IL_0021: beq.s IL_002a
IL_0023: ldstr "s"
IL_0028: br.s IL_002f
IL_002a: ldstr ""
IL_002f: ldloc.1
IL_0030: call void [mscorlib]System.Console::WriteLine(string,
object,
object,
object)

>> It really isn't such a big deal.
>
> What's not a big deal? IMHO, it'd be pretty dumb to add it to the
> language and if it were added I'd say that'd be a pretty big deal.
> Fortunately, I doubt those in charge of the language would ever do
> something like that.
>
> But is your suggestion itself a big deal? No...I agree, it's not.
> Frankly, I have no idea why you've invested so much effort in defending
> it.

I'm not defending it, like I said it is not a big deal. But when you accuse
me of being wrong all the time and mud-sling, that's just not cool.

Hilton

Peter Duniho

unread,

Nov 25, 2007, 9:39:59 PM11/25/07

to

On 2007-11-25 16:45:51 -0800, "Hilton" <nos...@nospam.com> said:

> Peter wrote:
>> If the compiler does not enforce the "const"-ness of the expression, then
>> it's possible for the programmer to write "const" even when it's not.
>> Instant bug, due to an invalid optimization.
>
> If the compiler doesn't enforce x++ by incrementing x, then it is my bug?

Non sequitur.

> [...]

> The difference is subtle and I don't blame you for missing it. But it
>> does exist, and it is important.
>
> Thanks for your concern, but it really is the same.

No, it's not.

> Using Jon's example, I
> expanded a bit to use his code, plus I added the same version using a basic
> good ol' "if" statement.

Where? What are you talking about?

> You'll be amazed. See how the "if" and the
> "non-if" version are essentially the same?

No, I don't see any such thing. Did you intend to post something and forget?

> That's an "if" statement right
> there, the results go to a 'real' variable, temporary variable, stack (if
> any language), but it is still an "if" statement. Yet when I said: "If you
> look at the code, that is exactly what compiler generally do anyway.", you
> said "Wrong".

Perhaps I could have put my statement of "Wrong" in a less ambiguous
place. I thought the rest of my post would have made clear where it is
you went wrong, but I can see from your statement about ("but it really
is the same") that you just don't get it.

> If you're just trying to prove me wrong all the time, perhaps
> you should spend a few minutes of research first. In another thread you
> said that Dictionaries would just be a very little bit slower that arrays
> and therefore my suggestion was not good. But, did you post any timing, any
> code, anything? No.

Would you like me to? It would be trivial to test.

If I thought there was any chance at all that posting some data would
evoke a retraction from you, I would be more than happy to do your
testing for you. But you've never shown any sign of being able to
acknowledge when you've made a mistake. I'm not interested in wasting
my time on something that you're never going to acknowledge as helpful
anyway.

> Nothing at all to back up your claims and your
> 'accusations' of my being wrong.

The same could be said of your own claims. Why is the onus on me to do
the work, when you are the one making claims that fly in the face of
the known facts. Algorithm order is a very good way of estimating
general performance characteristics, and there's already sufficient
documentation of that to prove your claims wrong without having to run
any actual tests.

All of that information was posted to the thread you're talking about,
and it's quite conclusive. If you want to reject the information, you
need to put some effort into it. I see no reason for me to bother
myself, especially since I have no reason to expect that having
additional data showing you to be wrong would change anything.

> static void blob ()
> {
> int count = 2;
>
> string plural;
> if (count==1)
> {
> plural = "";
> }
> else
> {
> plural = "s";
> }
>
> Console.WriteLine ("You have {0} order{1} of 23 order{2}remaining.",
> count, count==1 ? "" : "s", plural);
> }

I'm not sure what the above code is supposed to prove. I've already
acknowledged in my previous post that in your degenerate case the use
of "?:" can be the same as using an if() statement. The point is that
it _is_ a degenerate case, and there are lots of examples of using that
syntax that _cannot_ be expressed in the same way just using an if()
statement.

[...]

> I'm not defending it, like I said it is not a big deal.

You _are_ defending it. This entire thread is about you defending it.

> But when you accuse
> me of being wrong all the time and mud-sling, that's just not cool.

Why? I can't help it when you're wrong, and I certainly haven't
engaged in "mud-sling". Hint: just saying that you're wrong isn't
mud-slinging.

Pete

Arne Vajhøj

unread,

Nov 25, 2007, 9:51:39 PM11/25/07

to

Hilton wrote:

> Arne Vajhøj wrote:
>> C:\>csc /o+ z.cs
>> Microsoft (R) Visual C# .NET Compiler version 7.10.6001.4
>> for Microsoft (R) .NET Framework version 1.1.4322
>> Copyright (C) Microsoft Corporation 2001-2002. All rights reserved.
>>
>>
>> C:\>z
>> 2,781
>> 0,656
>> 1,234
>> 0
>>
>> C:\>csc /o+ z.cs
>> Microsoft (R) Visual C# 2005 Compiler version 8.00.50727.1433
>> for Microsoft (R) Windows (R) 2005 Framework version 2.0.50727
>> Copyright (C) Microsoft Corporation 2001-2005. All rights reserved.
>>
>>
>> C:\>z
>> 2,344
>> 0,719
>> 1,172
>> 0
>
> Very nice that two of them got faster, but the 2nd got slower - interesting,
> but of course, it could just accuracy of the timing.

I don't think it is accuracy. I tried with x10 more iterations. Same
picture.

But this type of benchmarks is measuring the speed of one or two simple
constructs. They are extremely sensitive to small changes to the
optimization. A real program consist of tens of thousands of simple
constructs. An enhanced optimizer may make practically all real programs
faster, but it will almost certainly make some constructs with some
data slower.

I don't think that it say much.

> This is interesting. I don't have the later Framekworks installed.

You should - 1.1 is 4 years old.

Arne

Arne Vajhøj

unread,

Nov 25, 2007, 9:52:39 PM11/25/07

to

Hilton wrote:
> If you
> could list the IL on these that would be interesting.

.method private hidebysig static void Main(string[] args) cil managed
{
.entrypoint
.custom instance void [mscorlib]System.STAThreadAttribute::.ctor() =
( 01 00 00 00 )
// Code size 459 (0x1cb)
.maxstack 2
.locals init (int32 V_0,
class [mscorlib]System.Collections.Generic.List`1<string> V_1,
int32 V_2,
int32 V_3,
valuetype [mscorlib]System.DateTime V_4,
int32 V_5,
object V_6,
valuetype [mscorlib]System.DateTime V_7,
valuetype [mscorlib]System.DateTime V_8,
valuetype [mscorlib]System.DateTime V_9,
valuetype [mscorlib]System.DateTime V_10,
valuetype [mscorlib]System.DateTime V_11,
bool V_12,
valuetype
[mscorlib]System.Collections.Generic.List`1/Enumerator<string> V_13,
valuetype [mscorlib]System.TimeSpan V_14,
float64 V_15)
IL_0000: nop
IL_0001: ldc.i4 0x186a0
IL_0006: stloc.0
IL_0007: newobj instance void class
[mscorlib]System.Collections.Generic.List`1<string>::.ctor()
IL_000c: stloc.1
IL_000d: ldc.i4.0
IL_000e: stloc.2
IL_000f: br.s IL_0023
IL_0011: nop
IL_0012: ldloc.1
IL_0013: ldsfld string [mscorlib]System.String::Empty
IL_0018: callvirt instance void class
[mscorlib]System.Collections.Generic.List`1<string>::Add(!0)
IL_001d: nop
IL_001e: nop
IL_001f: ldloc.2
IL_0020: ldc.i4.1
IL_0021: add
IL_0022: stloc.2
IL_0023: ldloc.2
IL_0024: ldc.i4 0x400
IL_0029: clt
IL_002b: stloc.s V_12
IL_002d: ldloc.s V_12
IL_002f: brtrue.s IL_0011
IL_0031: ldc.i4.0
IL_0032: stloc.3
IL_0033: call valuetype [mscorlib]System.DateTime
[mscorlib]System.DateTime::get_Now()
IL_0038: stloc.s V_4
IL_003a: ldc.i4.0
IL_003b: stloc.s V_5
IL_003d: br.s IL_0090
IL_003f: nop
IL_0040: nop
IL_0041: ldloc.1
IL_0042: callvirt instance valuetype
[mscorlib]System.Collections.Generic.List`1/Enumerator<!0> class
[mscorlib]System.Collections.Generic.List`1<string>::GetEnumerator()
IL_0047: stloc.s V_13
.try
{
IL_0049: br.s IL_006a
IL_004b: ldloca.s V_13
IL_004d: call instance !0 valuetype
[mscorlib]System.Collections.Generic.List`1/Enumerator<string>::get_Current()
IL_0052: stloc.s V_6
IL_0054: nop
IL_0055: ldloc.s V_6
IL_0057: ldnull
IL_0058: ceq
IL_005a: ldc.i4.0
IL_005b: ceq
IL_005d: stloc.s V_12
IL_005f: ldloc.s V_12
IL_0061: brtrue.s IL_0069
IL_0063: nop
IL_0064: ldloc.3
IL_0065: ldc.i4.1
IL_0066: add
IL_0067: stloc.3
IL_0068: nop
IL_0069: nop
IL_006a: ldloca.s V_13
IL_006c: call instance bool valuetype
[mscorlib]System.Collections.Generic.List`1/Enumerator<string>::MoveNext()
IL_0071: stloc.s V_12
IL_0073: ldloc.s V_12
IL_0075: brtrue.s IL_004b
IL_0077: leave.s IL_0088
} // end .try
finally
{
IL_0079: ldloca.s V_13
IL_007b: constrained. valuetype
[mscorlib]System.Collections.Generic.List`1/Enumerator<string>
IL_0081: callvirt instance void
[mscorlib]System.IDisposable::Dispose()
IL_0086: nop
IL_0087: endfinally
} // end handler
IL_0088: nop
IL_0089: nop
IL_008a: ldloc.s V_5
IL_008c: ldc.i4.1
IL_008d: add
IL_008e: stloc.s V_5
IL_0090: ldloc.s V_5
IL_0092: ldloc.0
IL_0093: clt
IL_0095: stloc.s V_12
IL_0097: ldloc.s V_12
IL_0099: brtrue.s IL_003f
IL_009b: call valuetype [mscorlib]System.DateTime
[mscorlib]System.DateTime::get_Now()
IL_00a0: stloc.s V_7
IL_00a2: call valuetype [mscorlib]System.DateTime
[mscorlib]System.DateTime::get_Now()
IL_00a7: stloc.s V_8
IL_00a9: ldc.i4.0
IL_00aa: stloc.s V_5
IL_00ac: br.s IL_00e7
IL_00ae: nop
IL_00af: ldc.i4.0
IL_00b0: stloc.2
IL_00b1: br.s IL_00d2
IL_00b3: nop
IL_00b4: ldloc.1
IL_00b5: ldloc.2
IL_00b6: callvirt instance !0 class
[mscorlib]System.Collections.Generic.List`1<string>::get_Item(int32)
IL_00bb: ldnull
IL_00bc: ceq
IL_00be: ldc.i4.0
IL_00bf: ceq
IL_00c1: stloc.s V_12
IL_00c3: ldloc.s V_12
IL_00c5: brtrue.s IL_00cd
IL_00c7: nop
IL_00c8: ldloc.3
IL_00c9: ldc.i4.1
IL_00ca: add
IL_00cb: stloc.3
IL_00cc: nop
IL_00cd: nop
IL_00ce: ldloc.2
IL_00cf: ldc.i4.1
IL_00d0: add
IL_00d1: stloc.2
IL_00d2: ldloc.2
IL_00d3: ldc.i4 0x400
IL_00d8: clt
IL_00da: stloc.s V_12
IL_00dc: ldloc.s V_12
IL_00de: brtrue.s IL_00b3
IL_00e0: nop
IL_00e1: ldloc.s V_5
IL_00e3: ldc.i4.1
IL_00e4: add
IL_00e5: stloc.s V_5
IL_00e7: ldloc.s V_5
IL_00e9: ldloc.0
IL_00ea: clt
IL_00ec: stloc.s V_12
IL_00ee: ldloc.s V_12
IL_00f0: brtrue.s IL_00ae
IL_00f2: call valuetype [mscorlib]System.DateTime
[mscorlib]System.DateTime::get_Now()
IL_00f7: stloc.s V_9
IL_00f9: call valuetype [mscorlib]System.DateTime
[mscorlib]System.DateTime::get_Now()
IL_00fe: stloc.s V_10
IL_0100: ldc.i4.0
IL_0101: stloc.s V_5
IL_0103: br.s IL_013f
IL_0105: nop
IL_0106: ldc.i4.0
IL_0107: stloc.2
IL_0108: br.s IL_0129
IL_010a: nop
IL_010b: ldloc.1
IL_010c: ldloc.2
IL_010d: callvirt instance !0 class
[mscorlib]System.Collections.Generic.List`1<string>::get_Item(int32)
IL_0112: ldnull
IL_0113: ceq
IL_0115: ldc.i4.0
IL_0116: ceq
IL_0118: stloc.s V_12
IL_011a: ldloc.s V_12
IL_011c: brtrue.s IL_0124
IL_011e: nop
IL_011f: ldloc.3
IL_0120: ldc.i4.1
IL_0121: add
IL_0122: stloc.3
IL_0123: nop
IL_0124: nop
IL_0125: ldloc.2
IL_0126: ldc.i4.1
IL_0127: add
IL_0128: stloc.2
IL_0129: ldloc.2
IL_012a: ldloc.1
IL_012b: callvirt instance int32 class
[mscorlib]System.Collections.Generic.List`1<string>::get_Count()
IL_0130: clt
IL_0132: stloc.s V_12
IL_0134: ldloc.s V_12
IL_0136: brtrue.s IL_010a
IL_0138: nop
IL_0139: ldloc.s V_5
IL_013b: ldc.i4.1
IL_013c: add
IL_013d: stloc.s V_5
IL_013f: ldloc.s V_5
IL_0141: ldloc.0
IL_0142: clt
IL_0144: stloc.s V_12
IL_0146: ldloc.s V_12
IL_0148: brtrue.s IL_0105
IL_014a: call valuetype [mscorlib]System.DateTime
[mscorlib]System.DateTime::get_Now()
IL_014f: stloc.s V_11
IL_0151: ldloc.s V_7
IL_0153: ldloc.s V_4
IL_0155: call valuetype [mscorlib]System.TimeSpan
[mscorlib]System.DateTime::op_Subtraction(valuetype
[mscorlib]System.DateTime,

valuetype [mscorlib]System.DateTime)
IL_015a: stloc.s V_14
IL_015c: ldloca.s V_14
IL_015e: call instance float64
[mscorlib]System.TimeSpan::get_TotalSeconds()
IL_0163: stloc.s V_15
IL_0165: ldloca.s V_15
IL_0167: ldstr "F3"
IL_016c: call instance string
[mscorlib]System.Double::ToString(string)
IL_0171: call void [mscorlib]System.Console::WriteLine(string)
IL_0176: nop
IL_0177: ldloc.s V_9
IL_0179: ldloc.s V_8
IL_017b: call valuetype [mscorlib]System.TimeSpan
[mscorlib]System.DateTime::op_Subtraction(valuetype
[mscorlib]System.DateTime,

valuetype [mscorlib]System.DateTime)
IL_0180: stloc.s V_14
IL_0182: ldloca.s V_14
IL_0184: call instance float64
[mscorlib]System.TimeSpan::get_TotalSeconds()
IL_0189: stloc.s V_15
IL_018b: ldloca.s V_15
IL_018d: ldstr "F3"
IL_0192: call instance string
[mscorlib]System.Double::ToString(string)
IL_0197: call void [mscorlib]System.Console::WriteLine(string)
IL_019c: nop
IL_019d: ldloc.s V_11
IL_019f: ldloc.s V_10
IL_01a1: call valuetype [mscorlib]System.TimeSpan
[mscorlib]System.DateTime::op_Subtraction(valuetype
[mscorlib]System.DateTime,

valuetype [mscorlib]System.DateTime)
IL_01a6: stloc.s V_14
IL_01a8: ldloca.s V_14
IL_01aa: call instance float64
[mscorlib]System.TimeSpan::get_TotalSeconds()
IL_01af: stloc.s V_15
IL_01b1: ldloca.s V_15
IL_01b3: ldstr "F3"
IL_01b8: call instance string
[mscorlib]System.Double::ToString(string)
IL_01bd: call void [mscorlib]System.Console::WriteLine(string)
IL_01c2: nop
IL_01c3: ldloc.3
IL_01c4: call void [mscorlib]System.Console::WriteLine(int32)
IL_01c9: nop
IL_01ca: ret
} // end of method Class1::Main

Family Tree Mike

unread,

Nov 26, 2007, 6:58:00 AM11/26/07

to

"Peter Duniho" wrote:

>
> Yes, but you're using a single example to try to explain some more
> general rule. What is your more general rule? Both FTM and I have
> proposed other examples of code that would conform to _a_ general rule
> that includes your example, but you are unsatisfied with those examples.
>
> So what _is_ your general rule that describes what a legal syntax would
> be, if not one of the rules we've inferred so far?
>
> Describe the grammar. Providing a single code example is insufficient,
> because language specifications rely on grammar descriptions, not code
> examples.
>
>

I stepped away from a little bit, but the discussion looks great. What
Peter wrote was what I was trying to get at describing. With one example,
the impact on the language can not be known. You have to have multiple
people interpret how they would use the change.

Hilton

unread,

Nov 26, 2007, 6:11:16 PM11/26/07

to

Peter Duniho wrote:
>> In another thread you
>> said that Dictionaries would just be a very little bit slower that arrays
>> and therefore my suggestion was not good. But, did you post any timing,
>> any
>> code, anything? No.
>
> Would you like me to? It would be trivial to test.

Yes please. Putting aside our disagreements, I really would be interested
to see how much slower (if any) using a dictionary is than "array[x]++".

Thanks,

Hilton

Peter Duniho

unread,

Nov 26, 2007, 7:35:49 PM11/26/07

to

I'm sorry. You appear to have missed the rest of my post. Granted,
you're a big fan of your "zap", trimming anything that you'd rather not
respond to. But I already explained my position. Frankly, nothing
you've ever written in this newsgroup gives me any confidence there
will be a point in putting together such a test.

Like I said, if I thought there was a good chance you'd admit your
original mistake when presented with the numbers, I'd be happy to
invest the time (and I have in fact done so in other discussions
regarding other topics). But there's already been plenty of other
kinds of evidence suggesting a mistake in your original claim that the
dictionary solution is "inefficient", with nary a peep from you
acknowledging it.

So, are you saying that you're prepared to admit you were wrong? If
so, why haven't you acknowledged the other information that's already
been provided? What motivation do I have for writing and running the
code, given the ample evidence that already exists in support of my
statements?

Pete

Hilton

unread,

Nov 27, 2007, 3:37:44 AM11/27/07

to

Peter Duniho wrote:
> On 2007-11-26 15:11:16 -0800, "Hilton" <nos...@nospam.com> said:
>
>> Peter Duniho wrote:
>>>> In another thread you
>>>> said that Dictionaries would just be a very little bit slower that
>>>> arrays
>>>> and therefore my suggestion was not good. But, did you post any
>>>> timing,
>>>> any code, anything? No.
>>>
>>> Would you like me to? It would be trivial to test.
>>
>> Yes please. Putting aside our disagreements, I really would be
>> interested
>> to see how much slower (if any) using a dictionary is than "array[x]++".
>
> I'm sorry. You appear to have missed the rest of my post. Granted,
> you're a big fan of your "zap", trimming anything that you'd rather not
> respond to. But I already explained my position. Frankly, nothing you've
> ever written in this newsgroup gives me any confidence there will be a
> point in putting together such a test.

Frankly, I don't care what you think about me or what I write, I honestly
truely don't, I care about how much slower (if any) using a ditionary is
than "array[x]++". Because you have yet to produce any code or timings to
back up what you've claimed in this thread (I and others have), other people
would tell you to "put up or shut up", but I'd rather simply ignore your
ranting posts and poll the other readers here to see if anyone has a few
minutes to write a "dictionary versus array" time trial. [I would, but I
delibrately need to keep (only) earlier versions on the .NET Framework on my
work machine.]

> Like I said, if I thought there was a good chance you'd admit your
> original mistake when presented with the numbers, I'd be happy to invest
> the time (and I have in fact done so in other discussions regarding other
> topics). But there's already been plenty of other kinds of evidence
> suggesting a mistake in your original claim that the dictionary solution
> is "inefficient", with nary a peep from you acknowledging it.
>
> So, are you saying that you're prepared to admit you were wrong?

Yes, I always am when given cold, hard facts. The nice thing about being
proven wrong is that you learn something.

Hilton

Peter Duniho

unread,

Nov 27, 2007, 4:23:01 AM11/27/07

to

On 2007-11-27 00:37:44 -0800, "Hilton" <nos...@nospam.com> said:

>> I'm sorry. You appear to have missed the rest of my post. Granted,
>> you're a big fan of your "zap", trimming anything that you'd rather not
>> respond to. But I already explained my position. Frankly, nothing you've
>> ever written in this newsgroup gives me any confidence there will be a
>> point in putting together such a test.
>
> Frankly, I don't care what you think about me or what I write, I honestly
> truely don't, I care about how much slower (if any) using a ditionary is
> than "array[x]++".

For someone who cares, you sure are a lot more willing to tell people
they are wrong (i.e. those who proposed a dictionary solution are
suggesting an "inefficient" solution) than to actually provide any code
to back up your claim.

I find it ironic that you should take me to task for making what you
call unfounded claims of you being wrong without providing code to
prove it when you're the one who initially claimed the other replies
were wrong in the first place. Just as it's not clear what grammatical
rules you intend to apply to this "const" keyword use in a for()
statement, it's not really clear by what rules you're following in
which I am somehow obligated to provide your test code, while you are
not.

My only objection was your dismissal of the other suggestions as
"inefficient". The only thing I suggested you were wrong about was
that you were telling people they were wrong. And yet, somehow, the
obligation to "put up or shut up" has somehow fallen on me exclusively,
according to you.

As for whether you care what I think, you miss my point. It's not
important whether you care what I think about you. I'm simply
explaining that there's been nothing to suggest that any effort on my
part to provide more detailed information than what has already been
provided will change anything about your attitude.

Just as in this thread you've avoided providing any actual details
regarding the grammar rules for your new use of "const", in the other
thread you've carefully avoided even revisiting the question about
whether the dictionary solution is "inefficient". That makes me
suspect that you already realize the error, and that even if it were
shown that the dictionary solution can in fact be efficient, you would
either just ignore that or come up with some reason to dismiss the
demonstration as irrelevant.

Do I care whether this information matters to you? No. I am simply
offering it so that you understand better why I am not strongly
motivated to provide the test code that you yourself ought to be
providing anyway. If it's not useful information to you, that's fine.

> Because you have yet to produce any code or timings to
> back up what you've claimed in this thread (I and others have)

Um. You are confusing your threads. This thread is not the
array/dictionary thread. Why would I post code in this thread? I
haven't made any claims that require the posting of code to demonstrate.

> , other people
> would tell you to "put up or shut up", but I'd rather simply ignore your
> ranting posts and poll the other readers here to see if anyone has a few
> minutes to write a "dictionary versus array" time trial. [I would, but I
> delibrately need to keep (only) earlier versions on the .NET Framework on my
> work machine.]

If you have no suitable machine on which to test a version of .NET
Framework that supports the Dictionary<T> class, in what way are you
even qualified to make a statement regarding the efficiency of the
Dictionary<T> class?

>> So, are you saying that you're prepared to admit you were wrong?
>
> Yes, I always am when given cold, hard facts.

I see. Other than the "cold, hard facts" that have already been
provided, you mean. Is there something about things like...

* The algorithm order for Dictionary<T>, being nearly equivalent to
an array lookup, strongly suggests the performance of the two solutions
are basically also equivalent. Is it that you don't believe in order
analysis of algorithms? Against your religion? What?

* The "?:" operator (which frankly is a red herring here anyway,
but I don't mind humoring your assertion that it's relevant) provides a
syntax that is in many cases not possible to duplicate with some
reasonably convenient alternative.

...that you find to not be "cold, hard facts"?

And with respect to the second point, which I've limited only to
commenting about the "convenience" of an alternative: while I haven't
put much thought into it, I suspect that there are some scenarios that
simply can't produce the same compiled code. But regardless, for sure
there are scenarios that can't produce the same compiled code without a
much greater amount of extra code (as opposed to the trivial one extra
line of code required to deal with your "const" scenario).

Even a simple example like this one:

SomeMethod(fFlag ? 10 : 20);

would require at a minimum five new lines of code (nine, if you like
braces around all blocks as I do), and that assumes that the compiler
will be smart enough to optimize out the local variable required (it
_should_ be, but not all compilers are created equally). It's trivial
to write even more complex expressions that cause the needed
alternative code to balloon to enormous proportions. (And yes, I'm
insisting on no more than one statement per line of code...I like my
code readable, thankyouverymuch).

Give me some reason to believe that posting any actual timing
information for the array/dictionary thread will make a genuine
difference in your attitude, and I just might take some time to do it.
Otherwise, I just don't see the point.

Pete

Jon Skeet [C# MVP]

unread,

Nov 27, 2007, 5:02:04 AM11/27/07

to

On Nov 27, 8:37 am, "Hilton" <nos...@nospam.com> wrote:
> Frankly, I don't care what you think about me or what I write, I honestly
> truely don't, I care about how much slower (if any) using a ditionary is
> than "array[x]++". Because you have yet to produce any code or timings to
> back up what you've claimed in this thread (I and others have), other people
> would tell you to "put up or shut up", but I'd rather simply ignore your
> ranting posts and poll the other readers here to see if anyone has a few
> minutes to write a "dictionary versus array" time trial. [I would, but I
> delibrately need to keep (only) earlier versions on the .NET Framework on my
> work machine.]

Okay, here's a fairly quick bit of code. I don't like it because it
uses mutable structs for your array-based version, but that's what you
were suggesting. I haven't optimised the dictionary code at all.

The program takes three parameters:
1) The number of iterations to use
2) The range of values (e.g. 100 means all values are in the range
0-99)
3) The number of samples

Code is at the bottom, here are the results:
test 100000 100 100 // Range=sample size
ArrayCountSorter: 4207
DictionaryCountSorter: 4020

test 100000 100 1000 // 10 times as many samples as range
ArrayCountSorter: 8625
DictionaryCountSorter: 15900

test 10000 100 10000 // 100 times as many samples as range
ArrayCountSorter: 5053
DictionaryCountSorter: 11486

test 10000 10 100000 // 10,000 times as many samples as range
ArrayCountSorter: 45711
DictionaryCountSorter: 109260

test 100000 1000 100 // 1/10 as many samples as range
ArrayCountSorter: 36110
DictionaryCountSorter: 4999

test 10000 10000 100 // 1/100 as many samples as range
ArrayCountSorter: 50147
DictionaryCountSorter: 521

I've conducted more tests, but here's the gist of the results:
1) When the range ~= sample size, the two algorithms are roughly
equal, with the dictionary approach just about winning
2) As the sample size increases without changing the range, the array
approach is better than the dictionary approach, but by *about* a
factor of 2 (increasing slightly, but not dramatically) as the
proportion changes
3) As the range size increases without changing the sample size, the
dictionary approach is *much, much* better.

Given a choice between "twice as fast in certain situations" and
"scales well and is never *very* slow" I know which I'd say is a
better *general* option (given no further information).

Oh, and the dictionary approach has the added benefit of being generic
across types, so you could use it for string-based samples etc with
*no* code changes :)

Here's the code:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;

struct Pair
{
public int Key;
public int Value;

public Pair(int key)
{
Key = key;
Value = 0;
}
}

public class Test
{
const int Iterations = 100000;
const int MaxValue = 100;
const int DataSize = 100;

static void Main(string[] args)
{
int iterations = int.Parse(args[0]);
int range = int.Parse(args[1]);
int samples = int.Parse(args[2]);

Stopwatch sw = Stopwatch.StartNew();
for (int i=0; i < iterations; i++)
{
ArrayCountSorter sorter = new ArrayCountSorter(range);
sorter.CountAndSort(CreateData(range, samples));
}
sw.Stop();
Console.WriteLine ("ArrayCountSorter: {0}",
sw.ElapsedMilliseconds);

sw = Stopwatch.StartNew();
for (int i=0; i < iterations; i++)
{
DictionaryCountSorter<int> sorter = new
DictionaryCountSorter<int>();
sorter.CountAndSort(CreateData(range, samples));
}
Console.WriteLine ("DictionaryCountSorter: {0}",
sw.ElapsedMilliseconds);
sw.Stop();
}

static IEnumerable<int> CreateData(int range, int samples)
{
// Always use the same seed - always get the same data
Random rng = new Random(0);
for (int i=0; i < samples; i++)
{
yield return rng.Next(range);
}
}
}

class ArrayCountSorter
{
int range;

public ArrayCountSorter(int range)
{
this.range = range;
}

public IEnumerable<Pair> CountAndSort(IEnumerable<int> input)
{
Pair[] values = new Pair[range];
for (int i=0; i < range; i++)
{
values[i] = new Pair(i);
}
foreach(int datum in input)
{
values[datum].Value++;
}
Array.Sort(values, delegate(Pair p1, Pair p2)
{ return p2.Value.CompareTo(p1.Value); }
);
return values;
}
}

class DictionaryCountSorter<T>
{
public IEnumerable<KeyValuePair<T,int>>
CountAndSort(IEnumerable<T> input)
{
Dictionary<T,int> histogram = new Dictionary<T,int>();
foreach (T datum in input)
{
int count;
histogram.TryGetValue(datum, out count);
histogram[datum] = count+1;
}
List<KeyValuePair<T,int>> list = new
List<KeyValuePair<T,int>>(histogram);
list.Sort(delegate(KeyValuePair<T,int> p1, KeyValuePair<T,int>
p2)
{ return p2.Value.CompareTo(p1.Value); }
);
return list;
}
}

Jon

Hilton

unread,

Nov 27, 2007, 7:29:45 PM11/27/07

to

Jon,

Fantastic, thanks, very interesting. When I have some time here I'll take a
closer look at the code, but as a first pass it looks like a tie for few
samples going to about 2x for a larger sample size. With a very few
samples, the sorting then takes the majority of the time and the Dictionary
wins.

Jon, assuming you were going to write it using an array, how would you store
the info given that you don't like the struct code below?

Thanks, and thanks again for the code, numbers, and time to put this
together.

Hilton

"Jon Skeet [C# MVP]" <sk...@pobox.com> wrote in message
news:6bbeb1a2-2151-43c0...@s36g2000prg.googlegroups.com...

Jon Skeet [C# MVP]

unread,

Nov 27, 2007, 7:38:15 PM11/27/07

to

Hilton <nos...@nospam.com> wrote:
> Fantastic, thanks, very interesting. When I have some time here I'll take a
> closer look at the code, but as a first pass it looks like a tie for few
> samples going to about 2x for a larger sample size. With a very few
> samples, the sorting then takes the majority of the time and the Dictionary
> wins.
>
> Jon, assuming you were going to write it using an array, how would you store
> the info given that you don't like the struct code below?

I suspect I'd use two arrays instead, one for the keys and one for the
values. Or I'd try a mutable reference type. Or use KeyValuePair and
create a new one each time, copying the key.

Mutable structs are bad news. They often behave in ways which are
logical but unexpected.

Peter Duniho

unread,

Nov 27, 2007, 7:57:49 PM11/27/07

to

On 2007-11-27 02:02:04 -0800, "Jon Skeet [C# MVP]" <sk...@pobox.com> said:

> [...]

> I've conducted more tests, but here's the gist of the results:
> 1) When the range ~= sample size, the two algorithms are roughly
> equal, with the dictionary approach just about winning
> 2) As the sample size increases without changing the range, the array
> approach is better than the dictionary approach, but by *about* a
> factor of 2 (increasing slightly, but not dramatically) as the
> proportion changes
> 3) As the range size increases without changing the sample size, the
> dictionary approach is *much, much* better.

For what it's worth, while my own tests have resulted in similar
results (sorry, yes...I actually have done the tests, and while the
mature thing to do might have been to go ahead and post the results,
I'm bugged enough by the issues I've described that I just don't feel
like it at the moment), I've got some observations (including two
caveats, listed first):

1) The array implementation can be sped up quite a lot when the
input range is sparsely populated by creating a new list of the counted
values that include only those counts greater than 0. Without doing
that, the cost of sorting all those values that weren't even in the
input data create a huge liability as compared to the dictionary
solution.

The relative outcomes are still the same, but the dictionary solution
only winds up enjoying a smaller advantage (for a 1:100 range:samples
ratio, it's more like 10X instead of 100X).

Of course, you have to know in advance this is going to be an issue if
you're going to make a design decision based on this. In my own tests,
I just always did the copy before sorting; when the data isn't sparse,
it still doesn't add much to the total time cost, and when it is it
makes a very dramatic difference in performance.

2) I also found that when running multiple tests in the same
process, subsequent tests seem to enjoy some benefit from the execution
of the first test. That is, the first test is always slowed down
relative to the time it would take had it been performed later. This
accounted for as much as a 5-10% increase in time cost for a test.

In my own tests, I accounted for that by running a throw-away test to
start with, and of course the difference is small enough that it
wouldn't affect relative comparisons that were significant (I don't
consider a 10% time difference a significant enough performance
difference to qualify as sole justification for a change in design).
But it does mean that if the same effect happens on your computer, with
the array implementation always being run first it's always a bit
handicapped.

3) I was surprised to find that using an enumerator to iterate
through the sample data was actually faster than an explicit for() loop
(even one that uses a constant expression for the loop terminator :) )

4) In the dictionary solution, it is faster to call Contains() and
then retrieve and reassign the value, than to call TryGetValue(). As
with the iteration difference, I have not bothered to look closely at
the .NET implementation to better understand why the difference exists.
But it does (my suspicion is that TryGetValue() basically calls
Contains(), and so calling that method simply adds an extra call to the
loop).

Anyway, I appreciate that there's at least one person around here calm
and rational enough to just provide the numbers, regardless of whether
they will be heeded or not. :) And of course, if Hilton promises to
stop ducking the questions and respond favorably to the evidence he
claims to require, I will go ahead and post my own results.

Finally, if nothing else I find that the topic of the dictionary versus
array implementation _strongly_ reinforces the point about not engaging
in premature optimization.

In even such a simple scenario, I came up with 11 different
implementations of the two algorithms, each with varying performance.
Two implementations were awful (sorting a large range of zeros was very
costly, as is using exceptions to deal with missing keys in a
dictionary), two were not terrible but still enough slower that you'd
have to have a really good reason to choose them (those involved using
a class to contain data in the dictionary, but the performance cost of
the memory management for that outweighed the benefit of not having to
look up a key twice when iterating through the input data), and the
remaining seven varied in ways similar to those you've reported here.

A person would have to do a lot of research, and have a _very_ solid
knowledge of what input data their algorithm was going to be processing
before they could make any sort of informed decision regarding
algorithm choice based on performance goals. You would have to know
with absolute certainty that the data fell into the narrow scenario
that favors the array implementation in order to choose that
implementation.

Any statement regarding which algorithm is better based on performance
is clearly premature absent that research, and the data clearly
demonstrate that the array implementation is only faster in a very
specific scenario and even then not by all that much, while the
dictionary implementation is equal to or better in the majority of
cases and when it is noticeably better, it's better by a huge factor
(much greater difference than when it's slower).

Pete

Jon Skeet [C# MVP]

unread,

Nov 27, 2007, 8:23:51 PM11/27/07

to

Peter Duniho <NpOeS...@NnOwSlPiAnMk.com> wrote:

<snip>

> 4) In the dictionary solution, it is faster to call Contains() and
> then retrieve and reassign the value, than to call TryGetValue(). As
> with the iteration difference, I have not bothered to look closely at
> the .NET implementation to better understand why the difference exists.
> But it does (my suspicion is that TryGetValue() basically calls
> Contains(), and so calling that method simply adds an extra call to the
> loop).

Really? Wow - that's a real surprise. I must investigate that further.
(Reflector shows that TryGetValue doesn't call Contains, but both call
FindEntry - so calling Contains and then the indexer to read will end
up finding the entry twice, but TryGetValue only finds the entry once.)

I'll look at it more tomorrow - it's time for bed now.

> Anyway, I appreciate that there's at least one person around here calm
> and rational enough to just provide the numbers, regardless of whether
> they will be heeded or not. :)

You call me rational now... see if you still do after reading this:

http://msmvps.com/blogs/jon.skeet/archive/2007/11/28/don-t-call-us-we-
ll-call-you-push-enumerators.aspx

I blame Marc Gravell.

<snip>

Peter Duniho

unread,

Nov 27, 2007, 9:12:34 PM11/27/07

to

On 2007-11-27 17:23:51 -0800, Jon Skeet [C# MVP] <sk...@pobox.com> said:

> Really? Wow - that's a real surprise. I must investigate that further.
> (Reflector shows that TryGetValue doesn't call Contains, but both call
> FindEntry - so calling Contains and then the indexer to read will end
> up finding the entry twice, but TryGetValue only finds the entry once.)

I was surprised too. The reason I even compared the two is that I had
assumed that TryGetValue would do the check for containment and
retrieval as a single operation. If anything, I expected it to be
faster. But in my tests it's not.

I would not at all be surprised if looking at the code shows that
TryGetValue does indeed appear to have more efficient code (ie
retrieving at the same time as checking for the key's existence). But
_something_ is going on to slow it down. If nothing else, it's yet
another illustration of how optimizing code is best left for when
there's an actual problem to be solved, and when some real world data
can be obtained regarding actual performance.

> I'll look at it more tomorrow - it's time for bed now.
>
>> Anyway, I appreciate that there's at least one person around here calm
>> and rational enough to just provide the numbers, regardless of whether
>> they will be heeded or not. :)
>
> You call me rational now... see if you still do after reading this:
>
> http://msmvps.com/blogs/jon.skeet/archive/2007/11/28/don-t-call-us-we-ll-call-you-push-enumerators.aspx

Well,
>
even the most rational among us need to escape once in awhile. :)

For what it's worth "push" enumeration isn't unheard of. I would
consider any sort of enumeration involving a callback to be an example
of that, of which there are examples in the Windows API and which I've
used in other contexts as well (I even used it in an image processing
class I wrote once).

Still, I don't know enough about LINQ to really "get" what you're
article is all about but if you think the design in your article is
crazy, who am I to argue? :)

> I blame Marc Gravell.

Sounds good to me. Seems like we have precious little things to blame
on him, so why not? I'm sure he can handle the load. :)

Pete

Jon Skeet [C# MVP]

unread,

Nov 28, 2007, 2:12:50 PM11/28/07

to

Peter Duniho <NpOeS...@NnOwSlPiAnMk.com> wrote:
> > Really? Wow - that's a real surprise. I must investigate that further.
> > (Reflector shows that TryGetValue doesn't call Contains, but both call
> > FindEntry - so calling Contains and then the indexer to read will end
> > up finding the entry twice, but TryGetValue only finds the entry once.)
>
> I was surprised too. The reason I even compared the two is that I had
> assumed that TryGetValue would do the check for containment and
> retrieval as a single operation. If anything, I expected it to be
> faster. But in my tests it's not.

I'll need to test it myself - it'll be most odd if your tests show it
to be one way and mine show something else :(

> For what it's worth "push" enumeration isn't unheard of. I would
> consider any sort of enumeration involving a callback to be an example
> of that, of which there are examples in the Windows API and which I've
> used in other contexts as well (I even used it in an image processing
> class I wrote once).

Well, it was the way of using it that was particularly odd. However,
we've got a better way of doing it :)

> Still, I don't know enough about LINQ to really "get" what you're
> article is all about but if you think the design in your article is
> crazy, who am I to argue? :)

The basic thrust is to do efficient grouped aggregations.

For instance, consider a stream of newsgroup posts, and you want to
build a "most prolific posters" list. With LINQ in its current form,
that can't be done very efficiently (in terms of memory usage). The
push stuff helps with that a lot. The new design will also allow for
multiple aggregates to be computed at once, somewhat coincidentally.

> > I blame Marc Gravell.
>
> Sounds good to me. Seems like we have precious little things to blame
> on him, so why not? I'm sure he can handle the load. :)

His generic maths stuff is evil too. I hadn't looked at it properly
before this afternoon. Ouch, but cool ouch :)