Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

ANN: Storage pool for Ada 2005 with bindings to Apache Runtime Pools library

26 views
Skip to first unread message

Brad Moore

unread,
Mar 24, 2011, 10:00:04 AM3/24/11
to
This is the initial release of a storage pool for Ada 2005
called Deepend, that binds to the Apache Runtime Pools library.

Key features
- Pool may deallocate all storage all at once, rather than having
to perform Unchecked_Deallocation one object at a time.
- No need to call Unchecked_Deallocation with this pool. It is
essentially a NO-OP.
- Provides Subpool capabilities, where a pool object may be a
subpool of another pool object. The lifetime of the subpool
object extends to the lifetime of the ultimate top-level pool
object. Subpools may in turn also have subpools.
- Fast storage management, should be more efficient than
garbage collection strategies used in other languages.

The latest stable release and older releases may be downloaded from;

https://sourceforge.net/projects/deepend/files/

For those who want the current development versions of the source they
can download using git (http://git-scm.com/) by issuing the following
commands

mkdir sandbox
cd sandbox
git clone git://deepend.git.sourceforge.net/gitroot/deepend/deepend

The current development version typically will correspond to the latest
stable release, but may at times be unstable when new features are being
worked on.


Low-level Bindings to the Apache Runtime Pools library were recently
used for a submission to the Computer Language Benchmarks game, binary
tree benchmark, and moved Ada into the number 2 spot behind C.
On my machine, the Ada version actually runs 10% faster than the C
version, but for some reason the benchmark has C ahead of Ada.
It may be that the number of worker threads isn't tuned correctly for
the benchmark hardware, or compiler version differences, or other
differences related to the target platform.

See
http://shootout.alioth.debian.org/u64q/benchmark.php?test=binarytrees&lang=all

Although the submission code does not use Deepend, the submission
code has been reworked to use deepend to see if performance is
impacted by using Ada's storage pool mechanism, and no noticeable
performance impacts were found.

Brad Moore

Shark8

unread,
Mar 24, 2011, 11:54:59 AM3/24/11
to
Nicely done.

Brian Drummond

unread,
Mar 24, 2011, 11:59:48 AM3/24/11
to
On Thu, 24 Mar 2011 08:00:04 -0600, Brad Moore wrote:

> This is the initial release of a storage pool for Ada 2005 called
> Deepend, that binds to the Apache Runtime Pools library.
>

> The latest stable release and older releases may be downloaded from;
>
> https://sourceforge.net/projects/deepend/files/

Excellent!

> Low-level Bindings to the Apache Runtime Pools library were recently
> used for a submission to the Computer Language Benchmarks game, binary
> tree benchmark, and moved Ada into the number 2 spot behind C. On my
> machine, the Ada version actually runs 10% faster than the C version,
> but for some reason the benchmark has C ahead of Ada. It may be that the
> number of worker threads isn't tuned correctly for the benchmark
> hardware, or compiler version differences, or other differences related
> to the target platform.

Great work, and certainly blows the doors off my puny efforts!

You may be right about tuning the number of threads; on my (AMD Phenom)
system, my version (#3) gave the same runtime for 4 or 8 tasks, but on
the test system (Intel Q6600) 8 tasks was about 10% slower than 4. (The
memory footprint was doubled, suggesting memory or cache limitations on
the Intel system).

It may be worth posting the Deepend version - either there, or is there a
place on Rosetta for it? - as a demonstration of the flexibility of Ada's
storage pools.

- Brian

Brad Moore

unread,
Mar 24, 2011, 5:25:44 PM3/24/11
to
On 24/03/2011 9:59 AM, Brian Drummond wrote:
> Great work, and certainly blows the doors off my puny efforts!
>
> You may be right about tuning the number of threads; on my (AMD Phenom)
> system, my version (#3) gave the same runtime for 4 or 8 tasks, but on
> the test system (Intel Q6600) 8 tasks was about 10% slower than 4. (The
> memory footprint was doubled, suggesting memory or cache limitations on
> the Intel system).
>
> It may be worth posting the Deepend version - either there, or is there a
> place on Rosetta for it? - as a demonstration of the flexibility of Ada's
> storage pools.
>
> - Brian

Thanks for your version also, In particular, the output generation from
your version saved me from having to fiddle around with getting the
output to come out right.

I actually set the number of workers to 5, which was a bit surprising to
me. I believe there are 9 iterations, which is why the number
of workers doesnt come out to an even number. On my system, an AMD
Quadcore, 5 workers gave me the best time. I was thinking 9 would
have been the best number.

It may be that 4 is a better number on their machine. I should maybe
ask the maintainers of the benchmarks to try running with 4 to see if
that runs any better.

I was thought about posting the Deepend version, (there are actually two
versions, one that uses nested access types that relies on Ada's ability
to clean up objects when access types get finalized, using the new
operator, and the second version that uses calls to Deepend's generic
allocate procedure that lets you use a single access type with different
pool objects. The reason I decided against posting the result was more
that the one that was there involves less source code, and might be
better for language comparisons.

I'm not aware of Rosetta. I'll see if I can find that site.

Thanks,
Brad

Brad Moore

unread,
Mar 25, 2011, 1:25:53 AM3/25/11
to
On 24/03/2011 3:25 PM, Brad Moore wrote:
> I actually set the number of workers to 5, which was a bit surprising to
> me. I believe there are 9 iterations, which is why the number
> of workers doesnt come out to an even number. On my system, an AMD
> Quadcore, 5 workers gave me the best time. I was thinking 9 would
> have been the best number.

Actually, thinking about it some more, it makes sense to me that 5
workers would be the best choice for 9 iterations and 4 processors.

At t=0, 5 workers should proceed at the same rate, (assuming that
processor affinity is not set on the tasks). The 5 workers should
migrate as needed between the 4 processors to ensure fair sharing
of the processing resources.

Four of the workers will be given two iterations, while one will be
given a single iteration.

The worker with a single iteration will finish first. At that time
the other workers should have roughly one full iteration left.
At that point there are four workers with even work loads, and
four processors.

The workers proceed until all the work is complete, and
all processors were fully loaded for the entire processing.

Brad

0 new messages