Not normal for the same program to be faster in C# than in C++ [Visual Studio 2019]

27 views
Skip to first unread message

Paolo Ferraresi

unread,
Aug 5, 2021, 3:02:04 PM8/5/21
to
Hello, my name is Paolo Ferraresi and I program in both C# and C++, for
passion/study.
(sorry for my bad English but I will never learn properly)
I like both C# and C++. I have no preclusions of the religion wars type. I
find C# very convenient for almost any application but C++ should be
considered when maximum efficiency and performance is required.
Since a few days I'm on vacation and a bit for fun, I wrote a few lines
that make the sieve of Eratosthenes.
It never happens to me to write exactly the same program for C# and for C+
+, but so for fun I said to myself: - I write two equivalent codes,
without .NET and STL containers, only predefined data and arrays, without
library algorithms, only for cycles on arrays.

Here is the C# code:
using System;
using System.Diagnostics;
class Program
{
const uint N = 2147483591; //Maximum array size in C#;
static void Main(string[] args)
{
Stopwatch sw = new Stopwatch();
sw.Start();
bool[] A = new bool[N];
for (uint i = 2; i < N; ++i) A[i] = true;
for (uint i = 2; i < N; ++i)
if (A[i])
for (uint j = i; i * j < N; ++j)
A[i*j] = false;
sw.Stop();

Console.WriteLine("Tempo impiegato {0} ms",
sw.ElapsedMilliseconds);
Console.Write("Premi un tasto... ");
Console.ReadKey();
}
}

Here is the C++ code:
#include <iostream>
#include <iterator>
#include <array>
#include <vector>
#include <chrono>
#include <algorithm>
using namespace std;
int main()
{
const unsigned int N = 2147483591;
auto Tstart = chrono::high_resolution_clock::now();
bool* A = new bool[N];
fill(A, A + N, true);
for (unsigned int i = 2; i < N; ++i)
if (A[i])
for (unsigned int j = i; i * j < N; ++j)
A[i * j] = false;
auto Tend = chrono::high_resolution_clock::now();
chrono::duration<double, std::milli> diff = Tend - Tstart;
cout << "Tempo impiegato " << diff.count() << "ms\n";
delete[] A;
cout << "Press ENTER ";
cin.get();
return 0;
}

I understand very well that the validity of such a game is almost null,
but since I remain convinced that the same code (and I repeat the same
almost 1:1, not that one used arrays and the other STL) cannot be faster
in C# than in C++, imagine the surprise when the results came out:

C# (release build): 23093 ms, (48630 ms in debug build).
C++(release build): 33516 ms, (44906 ms in debug build).

I came to the conclusion that maybe I have a specific problem from me and
not from others. I mean apart from the numerical values, which will be
different depending on the hardware each of us has, try to see if C++
turns out faster from you, which is what I expect, honestly.
Also I changed build from debug to release and that's it, leaving
everything default, except that I always put x64 platform.

Finally, I use Windows 10 Pro, Visual Studio 2019 community edition and as
I mentioned the code was compiled for x64 platform.
The CPU is AMD Ryzen Threadripper 3970X.

If any of you would like to try it, then explain what's not working at my
place? Thanks bye!
Greetings from Italy! :)

Paolo Ferraresi
fp....@alice.it
[I would guess the difference is something unrelated to the loops, such as how the
two runtime systems allocate a two gigabyte array. -John]

George Neuner

unread,
Aug 6, 2021, 9:41:39 AM8/6/21
to
On Thu, 5 Aug 2021 18:24:48 -0000 (UTC), Paolo Ferraresi
<fp....@alice.it> wrote:
:
a C# program
:
a C++ program
:
>
>I understand very well that the validity of such a game is almost null,
>but since I remain convinced that the same code (and I repeat the same
>almost 1:1, not that one used arrays and the other STL) cannot be faster
>in C# than in C++, imagine the surprise when the results came out:
>
>C# (release build): 23093 ms, (48630 ms in debug build).
>C++(release build): 33516 ms, (44906 ms in debug build).
>
>I came to the conclusion that maybe I have a specific problem from me and
>not from others. I mean apart from the numerical values, which will be
>different depending on the hardware each of us has, try to see if C++
>turns out faster from you, which is what I expect, honestly.
>Also I changed build from debug to release and that's it, leaving
>everything default, except that I always put x64 platform.
>
>Finally, I use Windows 10 Pro, Visual Studio 2019 community edition and as
>I mentioned the code was compiled for x64 platform.
>The CPU is AMD Ryzen Threadripper 3970X.
>[I would guess the difference is something unrelated to the loops, such as how the
>two runtime systems allocate a two gigabyte array. -John]


My guess is that the issue is how (and what) you are timing.


Multitasking operating systems royally screw up attempts to accurately
time things. If you want to compare code, you should run many
iterations of each version and compare their /average/ running times.


As John mentioned, you are timing allocation of the array. Heap
management is very different in these two languages, so the time to
allocate things is, in general, not comparable.

You shouldn't time the array initialization either unless you do it
the same way in both programs. The templated fill() algorithm in C++
will not necessarily be equivalent to the inline C# code - it depends
on your compiler settings. [see below]


I would modify your programs like so (in pseudo):

total = 0
allocate array
for N iterations
initialize array
start = current time
run the seive
stop = current time
total += (stop - start)
average = total / N

And then run for N = 50 (or more) to filter out multitasking related
noise in the individual timings.


To really be fair you need to find out what optimizations are being
done by the C# and dotNET JIT compilers (which work together), and
adjust your C++ compiler to do the equivalent. Simply doing a
'release' compile in both languages is not sufficient: in general C++
is harder to optimize than C#, and many of the possible optimizations
are disabled by default because they can break code that does not
comply with their requirements. Except in 'unsafe' code, C# largely
makes it impossible for code to not comply with its optimization
requirements.

But start with more accurate timing.


Hope this helps,
George

Dmitry A. Kazakov

unread,
Aug 8, 2021, 11:14:17 AM8/8/21
to
On 2021-08-06 04:58, George Neuner wrote:

> I would modify your programs like so (in pseudo):
>
> total = 0
> allocate array
> for N iterations
> initialize array
> start = current time
> run the seive
> stop = current time
> total += (stop - start)
> average = total / N

Another technique is factoring out looping and other overheads by
running empty loop as a reference:

start = current time
for N iterations
initialize array
run the sieve
end loop;
total1 = start - current time

start = current time
for N iterations
initialize array
end loop;
total2 = start - current time

average = (total1 - total2) / N -- sieve only

P.S. Optimizations is a usual suspect of ruining benchmark measures.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

gah4

unread,
Aug 8, 2021, 4:01:57 PM8/8/21
to
On Sunday, August 8, 2021 at 8:14:17 AM UTC-7, Dmitry A. Kazakov wrote:

(snip)

> P.S. Optimizations is a usual suspect of ruining benchmark measures.

Yes. But in this case, one might want to include some optimizations.

Note, though, that a good optimizer could optimize out all the loops, as no
output depends on them. Some programs I know output the total number
of primes found, which stops that from happening.

Also, compilers can do some calculations at compile time. I don't expect
it for this, but that does ruin some benchmarks. There are stories of complicated
benchmarks being done entirely at compile time, except for output of the result.

I would have used a j += i loop. Not that multiply is that slow on modern processors,
but that it a big part of the loop. One compiler might optimize that one for you.

One might store the array as bits (8 bool/byte), the other as bytes. It isn't so
obvious which one is faster, but often the 1 bool/byte is faster, until you run out
of real memory.

How much real memory do you have? And the speed might depend in complicated
ways on the memory management system.

And note that you aren't comparing languages, but two compilers implementing
those languages (which is why it goes here).
[In this case, the documentation says they both allocate a byte for each bool but the other
stuff is all possible. Also remember C++ is a traditional compiler, while C# is bytecode and JIT. -John]
Reply all
Reply to author
Forward
0 new messages