Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

C++ threads vs. Ada tasks - surprised

339 views
Skip to first unread message

Maciej Sobczak

unread,
Aug 16, 2009, 5:34:11 PM8/16/09
to
In another thread (pun intended) the discussion on language design
with regard to threads/tasks got obstructed, so I'm starting a new
one.

An important point that was discussed there was the potential gain
from having tasking built-in the language. I have argued that there is
none *functional* gain (there is an obvious gain in readability,
though), or rather that all compiler tricks can be performed with
standard API as well.

As I'm a pragmatist and had a couple of free minutes to spend, I have
written a very simple "benchmark" that uses a shared buffer with
capacity 1 (a unit buffer) and two tasks, one putting values into the
buffer and one consuming them. The idea was to test the performance of
synchronization and condition checks.
A special value in the buffer is used to finish the whole program (the
"poison pill" concept from another thread) and since only two
different values are needed, a Boolean type was used.
The number of iterations is given as an argument to the program (no
error checking is made as it is not the point here).

I have also written an equivalent program in C++ with the direct use
of POSIX threads API (again, no checks, etc.).

To be frank: I have expected to see the C++ version being a hell lot
of faster, as the Ada tasking model is conceptually much more complex
and therefore I have expected tons of overhead there.
My expectations were wrong, as the Ada version proved to be ~2x faster
on my machine (with GNAT 4.4.0 and g++ 4.4.0, taken from the same
package, both versions compiled with -O2).

I have no idea how to explain this other than with the "task morph"
that is allowed by standard to happen when calling protected entries,
so that the entry call is actually executed not by the task that is
calling and waiting, but rather by the task that happened to open the
relevant barrier. This could, in principle, reduce the number of
thread switches.
I would very much welcome an insight from some GNAT implementers to
confirm if that was actually possible here.

After this exercise I have no choice (frankly) than to admit that
built-in threading support is better *also* in terms of performance.
Damned benchmarks... ;-)

For the sake of completeness, here are the complete sources for both
versions. I welcome any suggestions for improving them, there is
always some possibility that I got something totally wrong...
(note: error checking was omitted by intention and both programs
*must* be called with a single positive parameter as the number of
iterations)

-- Ada version:
with Ada.Command_Line;
with Ada.Text_IO;

procedure Test is

Iterations : Positive := Positive'Value
(Ada.Command_Line.Argument (1));

protected Buffer is
entry Put (X : in Boolean);
entry Get (X : out Boolean);
private
Value : Boolean;
Full : Boolean := False;
end Buffer;

protected body Buffer is
entry Put (X : in Boolean) when not Full is
begin
Value := X;
Full := True;
end Put;
entry Get (X : out Boolean) when Full is
begin
X := Value;
Full := False;
end Get;
end Buffer;

task Producer;
task body Producer is
begin
for I in 1 .. Iterations - 1 loop
Buffer.Put (False);
end loop;
Buffer.Put (True);
end Producer;

task Consumer;
task body Consumer is
X : Boolean;
Count : Natural := 0;
begin
loop
Buffer.Get (X);
Count := Count + 1;
exit when X;
end loop;
Ada.Text_IO.Put_Line
("Executed " & Natural'Image (Count) & " iterations");
end Consumer;

begin
null;
end Test;

// C++ version (for POSIX systems):
#include <iostream>
#include <sstream>
#include <pthread.h>

class Buffer
{
public:
Buffer()
{
pthread_mutex_init(&mtx_, NULL);
pthread_cond_init(&full_, NULL);
pthread_cond_init(&empty_, NULL);

is_full_ = false;
}

~Buffer()
{
pthread_mutex_destroy(&mtx_);
pthread_cond_destroy(&full_);
pthread_cond_destroy(&empty_);
}

void put(bool x)
{
pthread_mutex_lock(&mtx_);
while (is_full_)
{
pthread_cond_wait(&empty_, &mtx_);
}

value_ = x;
is_full_ = true;

pthread_cond_signal(&full_);
pthread_mutex_unlock(&mtx_);
}

void get(bool & x)
{
pthread_mutex_lock(&mtx_);
while (not is_full_)
{
pthread_cond_wait(&full_, &mtx_);
}

x = value_;
is_full_ = false;

pthread_cond_signal(&empty_);
pthread_mutex_unlock(&mtx_);
}

private:
bool value_;
bool is_full_;

pthread_mutex_t mtx_;
pthread_cond_t full_;
pthread_cond_t empty_;
};

Buffer buf;

int iterations;

void * producer(void *)
{
for (int i = 0; i != iterations - 1; ++i)
{
buf.put(false);
}
buf.put(true);

return NULL;
}

void * consumer(void *)
{
int count = 0;
bool x;
while (true) {
buf.get(x);
++count;
if (x)
{
break;
}
}

std::cout << "Executed " << count
<< " iterations" << std::endl;

return NULL;
}

int main(int argc, char * argv[])
{
std::istringstream ss(argv[1]);
ss >> iterations;

pthread_t producer_th;
pthread_t consumer_th;

pthread_create(&producer_th, NULL, producer, NULL);
pthread_create(&consumer_th, NULL, consumer, NULL);

pthread_join(producer_th, NULL);
pthread_join(consumer_th, NULL);
}

--
Maciej Sobczak * www.msobczak.com * www.inspirel.com

Database Access Library for Ada: www.inspirel.com/soci-ada

Stephen Leake

unread,
Aug 17, 2009, 3:12:29 AM8/17/09
to
Maciej Sobczak <see.my....@gmail.com> writes:

> An important point that was discussed there was the potential gain
> from having tasking built-in the language. I have argued that there is
> none *functional* gain (there is an obvious gain in readability,
> though), or rather that all compiler tricks can be performed with
> standard API as well.

This makes sense, except that with a higher-level tasking model, there
is more room for optimization of simple cases.

> To be frank: I have expected to see the C++ version being a hell lot
> of faster, as the Ada tasking model is conceptually much more complex
> and therefore I have expected tons of overhead there.

I would not have expected much speed difference; Ada tasks map quite
nicely to Posix threads.

> My expectations were wrong, as the Ada version proved to be ~2x
> faster on my machine (with GNAT 4.4.0 and g++ 4.4.0, taken from the
> same package, both versions compiled with -O2).

Interesting!

Getting this benchmark on the shootout page would be nice :).

> After this exercise I have no choice (frankly) than to admit that
> built-in threading support is better *also* in terms of performance.
> Damned benchmarks... ;-)

You could explain this by saying the Ada implementation took advantage
of what it knows about tasking, while the C++ implementation did _not_
take advantage of any knowledge about the Posix tasking API.

So the challenge is to add that knowledge to g++, and see if it
actually helps.

I guess the issue is "what compiler/run-time tricks were used to make
the Ada code faster than the simple calls to Posix"?

You could try using -gnatD to see what lower-level constructs GNAT
used.

And there's some sort of "trace" facility in Linux, that reports calls
to "system" routines; that might be useful (I've never used it).

--
-- Stephe

Frederik Sausmikat

unread,
Aug 17, 2009, 4:04:15 AM8/17/09
to
Maciej Sobczak wrote:

> My expectations were wrong, as the Ada version proved to be ~2x faster
> on my machine (with GNAT 4.4.0 and g++ 4.4.0, taken from the same
> package, both versions compiled with -O2).

Just FYI, on my machine the C++ version is the one being ~2x faster
(both gnat and g++ are version 4.3.3 here).

Regards, Freddy

Maciej Sobczak

unread,
Aug 17, 2009, 4:14:22 AM8/17/09
to
On 17 Sie, 09:12, Stephen Leake <stephen_le...@stephe-leake.org>
wrote:

> You could explain this by saying the Ada implementation took advantage
> of what it knows about tasking, while the C++ implementation did _not_
> take advantage of any knowledge about the Posix tasking API.

Exactly, but that was the point of the exercise.
In theory the applicability of compiler tricks is the same and the C++
compiler could do them as well, but in practice it does not.

BTW - I have tried to make it more difficult for Ada by splitting the
program into separate units (buffer, producer and consumer all in
separate packages), but the result was the same.

> I guess the issue is "what compiler/run-time tricks were used to make
> the Ada code faster than the simple calls to Posix"?

Right, this is my question.
Another idea that comes to my mind is that the Ada version might be
using spinlocks instead of immediate conditional wait, which in this
particular scenario would exclude condvars altogether.
Again, some insight from the GNAT implementers will be appreciated.

Maciej Sobczak

unread,
Aug 17, 2009, 4:17:08 AM8/17/09
to
On 17 Sie, 10:04, Frederik Sausmikat <frederik.sausmi...@gmx.de>
wrote:

Interesting.

One professor used to say: Statistics is like bikini - what it shows
is very suggestive, but the most important stuff is still hidden. ;-)

Tomek Wałkuski

unread,
Aug 17, 2009, 4:28:23 AM8/17/09
to
On 17 Sie, 10:04, Frederik Sausmikat <frederik.sausmi...@gmx.de>
wrote:
> Just FYI, on my machine the C++ version is the one being ~2x faster
> (both gnat and g++ are version 4.3.3 here).
>
How many tests have you performed?

I have done ~20 tests for both versions and Ada version is average
~1,5x faster.

Frederik Sausmikat

unread,
Aug 17, 2009, 5:39:42 AM8/17/09
to

I just did 100 runs of each version with a constant number of iterations
and averaged the execution times. The result is the same as before with
the C++ version being about 2 times faster than the Ada version.

Ludovic Brenta

unread,
Aug 17, 2009, 5:44:02 AM8/17/09
to
Frederik Sausmikat wrote on comp.lang.ada:

I guess this must depend on the underlying platform as well as the
options used when building GCC and the test program, then.

--
Ludovic Brenta.

John McCabe

unread,
Aug 17, 2009, 7:12:20 AM8/17/09
to
On Sun, 16 Aug 2009 14:34:11 -0700 (PDT), Maciej Sobczak
<see.my....@gmail.com> wrote:

>In another thread (pun intended) the discussion on language design
>with regard to threads/tasks got obstructed, so I'm starting a new
>one.

<..snip..>

Have you considered redoing this using the Boost::Thread library? I
imagine that it's a fairly direct mapping on to the posix calls you've
used by the looks of it, but....

John B. Matthews

unread,
Aug 17, 2009, 11:20:57 AM8/17/09
to
In article
<f02a0d10-acaa-47d3...@d4g2000yqa.googlegroups.com>,
Maciej Sobczak <see.my....@gmail.com> wrote:

[...]


> My expectations were wrong, as the Ada version proved to be ~2x faster
> on my machine (with GNAT 4.4.0 and g++ 4.4.0, taken from the same
> package, both versions compiled with -O2).

FWIW, the version Ada appears to be ~25% faster on Mac OS 10.5.8 with
gcc 4.3.4.

$ make cleaner run
rm -f *.o *.ali b~* core
rm -f *.s a.out atest ctest
Darwin 9.8.0; gcc 4.3.4
gnatmake atest -cargs -O2 -bargs -shared -largs -dead_strip
gcc -c -O2 atest.adb
gnatbind -shared -x atest.ali
gnatlink atest.ali -shared-libgcc -dead_strip
Darwin 9.8.0; gcc 4.3.4
g++ -O2 ctest.cpp -o ctest
time ./atest 500000
Executed 500000 iterations
3.48 real 1.97 user 3.33 sys
time ./ctest 500000
Executed 500000 iterations
4.58 real 2.91 user 5.02 sys

[...]

--
John B. Matthews
trashgod at gmail dot com
<http://sites.google.com/site/drjohnbmatthews>

vlc

unread,
Aug 17, 2009, 3:44:57 PM8/17/09
to
On Aug 17, 9:12 am, Stephen Leake <stephen_le...@stephe-leake.org>
wrote:

> And there's some sort of "trace" facility in Linux, that reports calls
> to "system" routines; that might be useful (I've never used it).

It's called strace.

Ira Baxter

unread,
Aug 18, 2009, 4:36:01 PM8/18/09
to

"Maciej Sobczak" <see.my....@gmail.com> wrote in message
news:f02a0d10-acaa-47d3...@d4g2000yqa.googlegroups.com...

> An important point that was discussed there was the potential gain
> from having tasking built-in the language. I have argued that there is
> none *functional* gain (there is an obvious gain in readability,
> though), or rather that all compiler tricks can be performed with
> standard API as well.

I can't speak for Ada, but it is clear that a compiler that understands
the existence of tasks can generate better code than one can get
from a pure library implementation. The compiler can preallocate
space for the task stack and task control blocks, can prefill the
areas as needed, can generate specialzed calls on the scheduler,
can optimize away task interactions that it can prove are unnecessary, etc.
Finally, having the compiler know about the tasks can enable easier
expression of the task-based problem, easing coding effort,
debugging and maintenance.

We designed a parallel langauge, PARLANSE, to take advantage of
this for SMP computing. See
http://www.semanticdesigns.com/Products/PARLANSE
(It isn't intended to be competition for Ada :)


--
Ira Baxter, CTO
www.semanticdesigns.com


0 new messages