Not good for Ada endorsement

Richard Iswara

unread,

Jul 7, 2021, 10:21:57 AM7/7/21

to

I haven't seen this posted before so apologies if redundant.
A couple days ago this person posted on YouTube this clip: https://www.youtube.com/watch?v=pv4Yq35Chx0.
In the video it was run against Pascal and Ada lost by being 50% slower than Pascal on Prime Sieving. Also shown as 2 orders of magnitude slower than the fastest implementation of that day. See on video at 20:45.
This is the base implementation link: https://github.com/PlummersSoftwareLLC/Primes/blob/drag-race/PrimeAda/solution_1/src/main.adb.
I know it's trivial, and probably click bait on the video person, but this is not a good PR on Ada reputation. The guy said do a better implementation to get a better score. So can Ada implementation do better?

Luke A. Guest

unread,

Jul 7, 2021, 11:07:07 AM7/7/21

to

On 07/07/2021 15:21, Richard Iswara wrote:
> I haven't seen this posted before so apologies if redundant.
> A couple days ago this person posted on YouTube this clip: https://www.youtube.com/watch?v=pv4Yq35Chx0.

I watched this finally yesterday as it kept popping up. I started
running the entire project last night as running the Ada one on it's own
didn't do anything. It's still running, It's been on PrimeR all day!

> In the video it was run against Pascal and Ada lost by being 50% slower than Pascal on Prime Sieving. Also shown as 2 orders of magnitude slower than the fastest implementation of that day. See on video at 20:45.
> This is the base implementation link: https://github.com/PlummersSoftwareLLC/Primes/blob/drag-race/PrimeAda/solution_1/src/main.adb.
> I know it's trivial, and probably click bait on the video person, but this is not a good PR on Ada reputation. The guy said do a better implementation to get a better score. So can Ada implementation do better?
>

Don't know as I don't know the algorithm, but I did clone the repo to
look at it. I would be comparing Ada with C, C++, Pascal and any other
compiled language. Not just Pascal.

Doctor Who

unread,

Jul 7, 2021, 11:19:15 AM7/7/21

to

There are many algorithms to find primes, some are optimized some not.
It makes no sense comparing languages with the same algorithm, I'm
sure every language can be made faster with a language-optimized
prime-finding algorithm.

Simon Wright

unread,

Jul 7, 2021, 3:21:29 PM7/7/21

to

Doctor Who <d...@tardis.org> writes:

> It makes no sense comparing languages with the same algorithm,

I think it makes little sense to compare languages with _different_
algorithms.

Doctor Who

unread,

Jul 7, 2021, 3:49:48 PM7/7/21

to

On Wed, 07 Jul 2021 20:21:26 +0100, Simon Wright <si...@pushface.org>
wrote:

it depends if you are comparing algorithms or comparing languages ...

Luke A. Guest

unread,

Jul 7, 2021, 4:40:06 PM7/7/21

to

On 07/07/2021 20:49, Doctor Who wrote:
>> I think it makes little sense to compare languages with _different_
>> algorithms.
>
> it depends if you are comparing algorithms or comparing languages ...
>

It's a comparison of languages using the same algorithm, but you can
also add in other algorithms, just like there are parallel versions.

Richard Iswara

unread,

Jul 7, 2021, 11:46:31 PM7/7/21

to

It is supposed to be a basic Sieve of Erastosthenes searching primes under 1 million.
Odd number search only, can be multi threaded and skip ahead. See the rules at: https://github.com/plummerssoftwarellc/Primes/blob/drag-race/CONTRIBUTING.md#rules.
Indirectly it is a comparison of implementation and tools benchmarking. Looking at the gpr file, there is no compile switch used, not even an "-o2" switch.

Jeffrey R. Carter

unread,

Jul 8, 2021, 4:20:35 AM7/8/21

to

On 7/8/21 5:46 AM, Richard Iswara wrote:
> Indirectly it is a comparison of implementation and tools benchmarking. Looking at the gpr file, there is no compile switch used, not even an "-o2" switch.

Compiling with -gnatp -O3 would undoubtedly speed it up (suppressing checks is
justified since execution with checks active shows that no checks fail).

Looking casually at the code, the map could be replaced by a constant, as
Sieve_Size is hard coded to 1,000,000, and the filling of the map is included in
the timing. The calculation of the square root of 1,000,000 could be replaced by
a constant. The array of Boolean could be constrained to 3 .. Sieve_Size. The
function that simply returns (others => True) could be replaced by the
aggregate, though optimization will probably do that. Long_Long_Integer could be
replaced by a type with range 0 .. 2 ** 31 - 1, though I don't know if that
would have any effect. The first inner loop in the sieve algorithm could be
eliminated, in which case the initialization of Num could also be removed.

--
Jeff Carter
"My legs are gray, my ears are gnarled, my eyes are old and bent."
Monty Python's Life of Brian
81

Dmitry A. Kazakov

unread,

Jul 8, 2021, 4:42:42 AM7/8/21

to

On 2021-07-08 10:20, Jeffrey R. Carter wrote:
> On 7/8/21 5:46 AM, Richard Iswara wrote:
>> Indirectly it is a comparison of implementation and tools
>> benchmarking. Looking at the gpr file, there is no compile switch
>> used, not even an "-o2" switch.
>
> Compiling with -gnatp -O3 would undoubtedly speed it up (suppressing
> checks is justified since execution with checks active shows that no
> checks fail).
>
> Looking casually at the code, the map could be replaced by a constant,
> as Sieve_Size is hard coded to 1,000,000, and the filling of the map is
> included in the timing. The calculation of the square root of 1,000,000
> could be replaced by a constant. The array of Boolean could be
> constrained to 3 .. Sieve_Size. The function that simply returns (others
> => True) could be replaced by the aggregate, though optimization will
> probably do that. Long_Long_Integer could be replaced by a type with
> range 0 .. 2 ** 31 - 1, though I don't know if that would have any
> effect. The first inner loop in the sieve algorithm could be eliminated,
> in which case the initialization of Num could also be removed.

Exactly, this is what is wrong with such contests. The solution is
non-scalable giving advantage to low-level languages like C. Scalability
and readability has the price that hobby-sized instances work poorly.

P.S. I would also suggest ensuring the Boolean array not packed. If not
with compiler switches and pragmas then by declaring a custom Boolean
1,2,4,8 bytes long depending on the target architecture. It is not a
fair play, guys!

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

Luke A. Guest

unread,

Jul 8, 2021, 4:52:41 AM7/8/21

to

On 08/07/2021 09:20, Jeffrey R. Carter wrote:
> On 7/8/21 5:46 AM, Richard Iswara wrote:
>> Indirectly it is a comparison of implementation and tools
>> benchmarking. Looking at the gpr file, there is no compile switch
>> used, not even an "-o2" switch.

I did wonder about that as I haven't got around to looking through it yet.

> Compiling with -gnatp -O3 would undoubtedly speed it up (suppressing
> checks is justified since execution with checks active shows that no
> checks fail).

I got nothing from the output of trying to run just the Ada version.

> Looking casually at the code, the map could be replaced by a constant,
> as Sieve_Size is hard coded to 1,000,000, and the filling of the map is
> included in the timing. The calculation of the square root of 1,000,000
> could be replaced by a constant. The array of Boolean could be
> constrained to 3 .. Sieve_Size. The function that simply returns (others
> => True) could be replaced by the aggregate, though optimization will
> probably do that. Long_Long_Integer could be replaced by a type with
> range 0 .. 2 ** 31 - 1, though I don't know if that would have any

In the video he mentions Pascal's version only being 32 bit and his is 64.

Jeffrey R. Carter

unread,

Jul 8, 2021, 6:42:27 AM7/8/21

to

On 7/8/21 10:20 AM, Jeffrey R. Carter wrote:
>
> Compiling with -gnatp -O3 would undoubtedly speed it up (suppressing checks is
> justified since execution with checks active shows that no checks fail).
>
> Looking casually at the code, the map could be replaced by a constant, as
> Sieve_Size is hard coded to 1,000,000, and the filling of the map is included in
> the timing. The calculation of the square root of 1,000,000 could be replaced by
> a constant. The array of Boolean could be constrained to 3 .. Sieve_Size. The
> function that simply returns (others => True) could be replaced by the
> aggregate, though optimization will probably do that. Long_Long_Integer could be
> replaced by a type with range 0 .. 2 ** 31 - 1, though I don't know if that
> would have any effect. The first inner loop in the sieve algorithm could be
> eliminated, in which case the initialization of Num could also be removed.

Compiling the original code with

gnatmake prime_sieve.adb

gives 408 passes in 5 seconds.

Making the changes listed above (I used Interfaces.Integer_32) and compiling with

gnatmake -O3 prime_sieve.adb

(to ensure that no checks fail) gives 2087 passes in 5 seconds, for a factor of 5.1.

Applying that to the reported 67 passes/second for the original on the test
system implies that this version, compiled with checks enabled and optimization,
would give 343 passes/second.

Luke A. Guest

unread,

Jul 8, 2021, 6:51:26 AM7/8/21

to

On 08/07/2021 11:42, Jeffrey R. Carter wrote:

Here's the playlist
https://youtube.com/playlist?list=PLF2KJ6Gy3cZ5Er-1eF9fN1Hgw_xkoD9V1 the
second video is where he sets up Python, C# and C++.

> Compiling the original code with
>
> gnatmake prime_sieve.adb
>
> gives 408 passes in 5 seconds.
>
> Making the changes listed above (I used Interfaces.Integer_32) and
> compiling with
>
> gnatmake -O3 prime_sieve.adb
>
> (to ensure that no checks fail) gives 2087 passes in 5 seconds, for a
> factor of 5.1.

He shows the C++ jumping from 4645 (https://youtu.be/D3h62rgewZM?t=1246)
to 7,436 (https://youtu.be/D3h62rgewZM?t=1306) passes going from 32 to
64 bit.

He also uses clang's -Ofast option to compile for speed over size.

Jeffrey R. Carter

unread,

Jul 8, 2021, 7:12:30 AM7/8/21

to

Going back to 64-bit integers gives 1999; with -gnatp, 2098

Luke A. Guest

unread,

Jul 8, 2021, 1:38:19 PM7/8/21

to

On 08/07/2021 12:12, Jeffrey R. Carter wrote:

>> He also uses clang's -Ofast option to compile for speed over size.
>
> Going back to 64-bit integers gives 1999; with -gnatp, 2098
>

I've uploaded my version here
https://github.com/Lucretia/Primes/tree/add-optimised-ada/PrimeAda/solution_2

It's as straight a conversion from the C++ to Ada. I intend to keep
optimising it.

Luke A. Guest

unread,

Jul 8, 2021, 1:44:05 PM7/8/21

to

As a quick test, I removed the Pack attribute from the Bits array and
got the following speed:

debug0:
Passes: 1108, Time: 5.003198, Avg: 0.004515521, Limit: 1000000,
Count1: 78498, Count2: 78498, Valid: TRUE
Lucretia;1108; 5.003198;algorithm=base,faithful=yes,bits=8

optimised:
Passes: 3286, Time: 5.000592, Avg: 0.001521786, Limit: 1000000,
Count1: 78498, Count2: 78498, Valid: TRUE
Lucretia;3286; 5.000592;algorithm=base,faithful=yes,bits=8

Luke A. Guest

unread,

Jul 8, 2021, 3:16:34 PM7/8/21

to

On 08/07/2021 18:43, Luke A. Guest wrote:
> debug0:
> Passes: 1108, Time: 5.003198, Avg: 0.004515521, Limit: 1000000,
> Count1: 78498, Count2: 78498, Valid: TRUE
> Lucretia;1108; 5.003198;algorithm=base,faithful=yes,bits=8
>
> optimised:
> Passes: 3286, Time: 5.000592, Avg: 0.001521786, Limit: 1000000,
> Count1: 78498, Count2: 78498, Valid: TRUE
> Lucretia;3286; 5.000592;algorithm=base,faithful=yes,bits=8

I've updated again, moving the main function to the generic, strange
results.

This also now used packed and unpacked boolean arrays.

I'd be interested to see what numbers your machines get, mine's a tad
ancient now.

Luke A. Guest

unread,

Jul 8, 2021, 3:16:54 PM7/8/21

to

On 08/07/2021 18:37, Luke A. Guest wrote:

Richard Iswara

unread,

Jul 8, 2021, 10:47:59 PM7/8/21

to

Thank you. Well done.

Paul Rubin

unread,

Jul 9, 2021, 4:10:37 AM7/9/21

to

"Luke A. Guest" <lag...@archeia.com> writes:
> It's as straight a conversion from the C++ to Ada. I intend to keep
> optimising it.

I'd be interested in seeing a straight conversion of Pascal to Ada, if
the existing Ada code differs significantly from the existing Pascal
code in a way that affects the speed and isn't a straightforward
conversion.

Egil H H

unread,

Jul 9, 2021, 4:24:02 AM7/9/21

to

On Friday, July 9, 2021 at 10:10:37 AM UTC+2, Paul Rubin wrote:

One significant difference between the original Ada version and the Pascal and C++ versions, is
that Ada is missing a Num := Factor before the first inner loop.
(Luke fixed that in his version, though)

Egil H H

unread,

Jul 9, 2021, 4:34:00 AM7/9/21

to

On my computer, at least, GNAT is not happy about incrementing Number by 2
in the inner loop. I get a bit better performance by modifying the check to

if Number mod 2 /= 0 and then Sieve.Bits(Index_Type(Number)) then

and then incrementing Number by 1.

Or even rewriting to a for-loop

Is_Prime : for Foo in Index_Type (Factor)..Sieve.Size loop
if Foo mod 2 /= 0 and then Sieve.Bits (Foo) then
Factor := Integer(Foo);
exit Is_Prime;
end if;
end loop Is_Prime;

Jeffrey R. Carter

unread,

Jul 9, 2021, 5:16:26 AM7/9/21

to

On 7/9/21 10:33 AM, Egil H H wrote:
>
> On my computer, at least, GNAT is not happy about incrementing Number by 2
> in the inner loop. I get a bit better performance by modifying the check to
>
> if Number mod 2 /= 0 and then Sieve.Bits(Index_Type(Number)) then
>
> and then incrementing Number by 1.

Since both operands are positive, mod and rem give the same results, and rem is
usually a bit faster. It's normally not an issue, but in this case it's done
billions of times, so it might make a difference.

--
Jeff Carter
"Oh Lord, bless this thy hand grenade, that with it thou
mayst blow thine enemies to tiny bits, in thy mercy."
Monty Python and the Holy Grail
24

Jeffrey R. Carter

unread,

Jul 9, 2021, 5:21:05 AM7/9/21

to

On 7/9/21 11:16 AM, Jeffrey R. Carter wrote:
> On 7/9/21 10:33 AM, Egil H H wrote:
>>
>> On my computer, at least, GNAT is not happy about incrementing Number by 2
>> in the inner loop. I get a bit better performance by modifying the check to
>>
>> if Number mod 2 /= 0 and then Sieve.Bits(Index_Type(Number)) then
>>
>> and then incrementing Number by 1.
>
> Since both operands are positive, mod and rem give the same results, and rem is
> usually a bit faster. It's normally not an issue, but in this case it's done
> billions of times, so it might make a difference.

Note also that short-circuit forms (and then) require disabling processor-level
optimizations and may have a negative effect on execution time when used
unnecessarily.

Lucretia

unread,

Jul 15, 2021, 11:13:07 AM7/15/21

to

On Friday, 9 July 2021 at 10:21:05 UTC+1, Jeffrey R. Carter wrote:
> On 7/9/21 11:16 AM, Jeffrey R. Carter wrote:
> >> if Number mod 2 /= 0 and then Sieve.Bits(Index_Type(Number)) then
> >>
> >> and then incrementing Number by 1.
> >
> > Since both operands are positive, mod and rem give the same results, and rem is
> > usually a bit faster. It's normally not an issue, but in this case it's done
> > billions of times, so it might make a difference.
> Note also that short-circuit forms (and then) require disabling processor-level
> optimizations and may have a negative effect on execution time when used
> unnecessarily.

You don't need those as the algorithm always starts at 3 and increments by 2.

I managed to get just under 4000 passes with a 1 bit array, but not using Ada's packed array. That's actually the slowest method, strangely.

Jeffrey R. Carter

unread,

Jul 15, 2021, 11:56:40 AM7/15/21

to

On 7/15/21 5:13 PM, Lucretia wrote:
>
> I managed to get just under 4000 passes with a 1 bit array, but not using Ada's packed array. That's actually the slowest method, strangely.

So you have an array of a modular type, calculate an index and bit within it,
mask out that bit, and compare it to zero? I would have thought an unpacked
array of Boolean would be fastest.

A packed array of Boolean requires all the operations above, plus shifting the
bit to the LSB and treating the result as a Boolean, so it may not be that
surprising that it's slower.

--
Jeff Carter
"How'd you like to hide the egg and gurgitate
a few saucers of mocha java?"
Never Give a Sucker an Even Break
101

Anh Vo

unread,

Jul 15, 2021, 12:29:52 PM7/15/21

to

On Thursday, July 15, 2021 at 8:56:40 AM UTC-7, Jeffrey R. Carter wrote:
> On 7/15/21 5:13 PM, Lucretia wrote:
> >
> > I managed to get just under 4000 passes with a 1 bit array, but not using Ada's packed array. That's actually the slowest method, strangely.
> So you have an array of a modular type, calculate an index and bit within it,
> mask out that bit, and compare it to zero? I would have thought an unpacked
> array of Boolean would be fastest.
>
> A packed array of Boolean requires all the operations above, plus shifting the
> bit to the LSB and treating the result as a Boolean, so it may not be that
> surprising that it's slower.

Should fixed lower bound index array (-gnatX) speed it up?

Anh Vo

Lucretia

unread,

Jul 15, 2021, 12:29:52 PM7/15/21

to

On Thursday, 15 July 2021 at 16:56:40 UTC+1, Jeffrey R. Carter wrote:
> On 7/15/21 5:13 PM, Lucretia wrote:
> >
> > I managed to get just under 4000 passes with a 1 bit array, but not using Ada's packed array. That's actually the slowest method, strangely.
> So you have an array of a modular type, calculate an index and bit within it,
> mask out that bit, and compare it to zero? I would have thought an unpacked
> array of Boolean would be fastest.

Doesn't seem to be. Also, the guy is using the packed bit as the final test of all the languages.

> A packed array of Boolean requires all the operations above, plus shifting the
> bit to the LSB and treating the result as a Boolean, so it may not be that

Don't need to shift to the LSB, only need to shift the 1 to the bit location you want to test, invert and then check against 0.

> surprising that it's slower.

I know. I would've thought the compiler would've handled the packed array as a special case and optimised as much as possible there.

Dmitry A. Kazakov

unread,

Jul 15, 2021, 12:49:09 PM7/15/21

to

On 2021-07-15 18:29, Lucretia wrote:
> On Thursday, 15 July 2021 at 16:56:40 UTC+1, Jeffrey R. Carter wrote:
>> On 7/15/21 5:13 PM, Lucretia wrote:
>>>
>>> I managed to get just under 4000 passes with a 1 bit array, but not using Ada's packed array. That's actually the slowest method, strangely.
>> So you have an array of a modular type, calculate an index and bit within it,
>> mask out that bit, and compare it to zero? I would have thought an unpacked
>> array of Boolean would be fastest.
>
> Doesn't seem to be.

Unlikely unless compiled into machine-specific instructions. You should
inspect the code.

> Also, the guy is using the packed bit as the final test of all the languages.
>
>> A packed array of Boolean requires all the operations above, plus shifting the
>> bit to the LSB and treating the result as a Boolean, so it may not be that
>
> Don't need to shift to the LSB, only need to shift the 1 to the bit location you want to test, invert and then check against 0.

Usually masks are either tabulated in a look-up table or are constants
in a case choice.

>> surprising that it's slower.
>
> I know. I would've thought the compiler would've handled the packed array as a special case and optimised as much as possible there.

It is still unlikely to be faster than all arrays. You should probably
try to use different element types and inspect the code.

Lucretia

unread,

Jul 15, 2021, 1:30:38 PM7/15/21

to

On Thursday, 15 July 2021 at 17:29:52 UTC+1, Anh Vo wrote:

> > A packed array of Boolean requires all the operations above, plus shifting the
> > bit to the LSB and treating the result as a Boolean, so it may not be that
> > surprising that it's slower.
> Should fixed lower bound index array (-gnatX) speed it up?

On GCC 9.3.0:

-gnatX Language extensions permitted

Jeffrey R. Carter

unread,

Jul 15, 2021, 5:08:42 PM7/15/21

to

On 7/15/21 6:29 PM, Lucretia wrote:
> On Thursday, 15 July 2021 at 16:56:40 UTC+1, Jeffrey R. Carter wrote:
>
>> A packed array of Boolean requires all the operations above, plus shifting the
>> bit to the LSB and treating the result as a Boolean, so it may not be that
>
> Don't need to shift to the LSB, only need to shift the 1 to the bit location you want to test, invert and then check against 0.

You know that that is enough, and may be what you're doing manually, but the
compiler may not know that. If A is a packed array of Boolean, then A (I) has
type Boolean. Unless the compiler can optimize it (and maybe it can), it would
normally need to shift the bit down so it can be treated as a value of type
Boolean, and then apply whatever you do with the resulting Boolean value.

Simon Wright

unread,

Jul 16, 2021, 12:27:40 PM7/16/21

to

Ah, but which ones?

Simon Wright

unread,

Jul 16, 2021, 12:28:18 PM7/16/21

to

Ah, but which ones?

Mace Ayres

unread,

Jul 18, 2021, 7:03:05 PM7/18/21

to

I doubt if any industrial software engineering in Aviations, railroads, control system in Europe or in the US, beyond, web programmers are going to abandon Ada, back to Pascal, over such kiddy code comparisons.

Paul Rubin

unread,

Jul 18, 2021, 9:00:23 PM7/18/21

to

It's still a matter of concern if straightforward code is that much
harder to compile with Ada, that an advanced Ada compiler (GNAT)
produces slower code than a relatively simple Pascal compiler does
(depending on what Pascal compiler it was, of course).

Anh Vo

unread,

Jul 23, 2021, 1:04:13 PM7/23/21

to

Indeed, it does. As posted on Reddit Ada, I got 5476 Passes after changing array types to arrays having fixed lower bound index of 0 on lines 9 and 10 and compiling it using switch -gnatX -02. By the way, I used GNAT Studio 2021 running on Windows 10.

Luke A. Guest

unread,

Jul 23, 2021, 1:13:11 PM7/23/21

to

On 23/07/2021 18:04, Anh Vo wrote:

>> Should fixed lower bound index array (-gnatX) speed it up?
>
> Indeed, it does. As posted on Reddit Ada, I got 5476 Passes after changing array types to arrays having fixed lower bound index of 0 on lines 9 and 10 and compiling it using switch -gnatX -02. By the way, I used GNAT Studio 2021 running on Windows 10.
>

I as I stated that flag enables gnat extensions, usually the next
language revision. I'm on the FSF compiler, the dockerfile is set of gcc 11.

Luke A. Guest

unread,

Jul 23, 2021, 1:56:00 PM7/23/21

to

Oh, I now think it's the 2012 extension which enables you do have (1 ..
<>) or (0 .. <>) in array types, rather than a more useful command line
option to normalise array indices to 0.