Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How much overhead for a new templated class?

31 views
Skip to first unread message

jacob navia

unread,
May 21, 2012, 4:37:27 PM5/21/12
to
I am writing a containers library in C. The generic list container
adds for a new type:

code: 2495 bytes
data: 488 bytes

for a list container containing around 60 entry points.

This means that you will pay 2983 bytes for each instatiation
of the list template (one of the biggest around)

I would be interested to know the corresponding vaue of the
C++ stl. How many bytes of code it will cost you to instantiate
a list template for a new type?

Thanks in advance.



Ian Collins

unread,
May 21, 2012, 5:16:46 PM5/21/12
to
> a list template for a new type?\

I'm sure you can check that for yourself, but with g++, about 1K to
instantiate a list<int> and push_back a value.

--
Ian Collins

Dombo

unread,
May 21, 2012, 6:36:41 PM5/21/12
to
Op 21-May-12 22:37, jacob navia schreef:
That depends on the compiler, compiler settings, the type for which the
list template is instantiated and the other types are instantiated.

For example:

#include <list>
#include <string>

class A
{
public:
A(): foo(0) {}
long foo;
};

class B
{
public:
B() {}
std::string bar;
};

int main()
{
std::list<void*> lpv;
std::list<A*> lpa;
std::list<B*> lpb;
std::list<A> la;
std::list<B> lb;

lpv.push_back(new A);
lpa.push_back(new A);
lpb.push_back(new B);
la.push_back(A());
lb.push_back(B());

return 0;
}


If you compile this code Visual C++ 2010 with optimizations enabled all
push_back() method calls result in the same code being called regardless
of the list type (in other words the code is not replicated for each
template instantiation). The code for the constructors is also shared
between std::list<void*>, std::list<A*>, std::list<B*> and std::list<A>,
but not for std::list<B>; std::list<B> has its own constructor code.

My observation is that template instantiation for pointer types often
share code with template instantiations for other pointer types. I.e.
instantiating a list template with another pointer type in itself does
not add code. With non-pointer types only some parts, if any, are shared
between template instantiations. Of course YMMV because of all the
variables involved that can affect the outcome.

Ian Collins

unread,
May 21, 2012, 7:28:18 PM5/21/12
to
On 05/22/12 08:37 AM, jacob navia wrote:
Rather than fussing over a few bytes, how about a performance comparison?

Here's a simple (related to some work I have been doing recently) benchmark:

Generate a list of 10,000,000 random longs (call it random)
use that list to instantiate another list (call it list)
sort list
sum and empty list by removing the first entry until empty.

What would be your containers library solution and how does it compare
to this C++ solution?

#include <list>
#include <iostream>
#include <stdint.h>
#include <stdlib.h>

// For system hi-res timer, add your own here.
//
int64_t hiresTime();

const unsigned entries = 10000000;

typedef std::list<long> List;

void fill( List& list )
{
srand48(42);
for( unsigned n = 0; n < entries; ++n )
list.push_back(lrand48());
}

uint64_t get( List& list )
{
uint64_t result(0);
while( !list.empty() )
{
result += list.front();
list.pop_front();
}
return result;
}

double delta( int64_t start, int64_t now )
{
return (now-start) / 1000000000.0;
}

int main()
{
using std::cout;
using std::endl;

List random;

int64_t start = hiresTime();
fill( random );
int64_t now = hiresTime();

cout << "To generate " << delta(start, now) << 'S' << endl;

start = now;
List list(random);
now = hiresTime();

cout << "To create " << delta(start, now) << 'S' << endl;

start = now;
list.sort();
now = hiresTime();

cout << "To sort " << delta(start, now) << 'S' << endl;

start = now;
uint64_t n = get( list );
now = hiresTime();

cout << "To empty " << delta(start, now) << 'S' << endl;

cout << n << endl;
}

--
Ian Collins

Ian Collins

unread,
May 21, 2012, 7:31:33 PM5/21/12
to
On 05/22/12 11:28 AM, Ian Collins wrote:
> On 05/22/12 08:37 AM, jacob navia wrote:
>> I am writing a containers library in C. The generic list container
>> adds for a new type:
>>
>> code: 2495 bytes
>> data: 488 bytes
>>
>> for a list container containing around 60 entry points.
>>
>> This means that you will pay 2983 bytes for each instatiation
>> of the list template (one of the biggest around)
>>
>> I would be interested to know the corresponding vaue of the
>> C++ stl. How many bytes of code it will cost you to instantiate
>> a list template for a new type?
>
> Rather than fussing over a few bytes, how about a performance comparison?
>
> Here's a simple (related to some work I have been doing recently) benchmark:
>
> Generate a list of 10,000,000 random longs (call it random)
> use that list to instantiate another list (call it list)
> sort list
> sum and empty list by removing the first entry until empty.

I forget to add include error checking for allocation failure.

--
Ian Collins

Juha Nieminen

unread,
May 22, 2012, 1:59:26 AM5/22/12
to
jacob navia <ja...@spamsink.net> wrote:
> I would be interested to know the corresponding vaue of the
> C++ stl. How many bytes of code it will cost you to instantiate
> a list template for a new type?

It depends on which member functions of the templated class you are
calling. The compiler will instantiate only those functions that get
called and skip the rest.

(This is, in fact, not just an optimization, but a *necessity*. The compiler
*must* instantiate only what is being called because some templated classes
actually depend on this behavior.)

Juha Nieminen

unread,
May 22, 2012, 2:05:46 AM5/22/12
to
Ian Collins <ian-...@hotmail.com> wrote:
> Rather than fussing over a few bytes, how about a performance comparison?

A performance comparison would be quite moot because the biggest bottleneck
in the whole thing is memory allocation. By far. A performance comparison
would only make sense if one version uses more memory allocations than the
other (for example, I have seen "generic" linked lists in C that actually
make two allocations per element rather than the one that std::list makes,
hence basically making the C version twice as slow as std::list).

It can be quite surprising how much time the program spends in memory
allocation alone. For instance, if you instantiate a std::set and add
some tens of millions of elements to it, it will take many seconds on
a typical computer. You'd think that the majority of the time is spent
re-balancing the tree after each insertion. You'd be wrong. Something
like 80-90% of the time is spent allocating memory!

Using a very fast allocator with std::set or std::list can speed it up
quite considerably, if many insertions and deletions are performed.
(The great thing about the C++ standard data containers is that they
support user-created memory allocators, something that's way more
difficult to do in a "generic" C container; as everything else.)

jacob navia

unread,
May 22, 2012, 2:45:24 AM5/22/12
to
Le 22/05/12 08:05, Juha Nieminen a écrit :
> Ian Collins<ian-...@hotmail.com> wrote:
>> Rather than fussing over a few bytes, how about a performance comparison?
>
> A performance comparison would be quite moot because the biggest bottleneck
> in the whole thing is memory allocation. By far.

Yes. With a fast allocator, my software runs at the same
speed than C++. This due to bottleneck is in:

Memory I/O
Allocation

> A performance comparison
> would only make sense if one version uses more memory allocations than the
> other (for example, I have seen "generic" linked lists in C that actually
> make two allocations per element rather than the one that std::list makes,
> hence basically making the C version twice as slow as std::list).
>

My software makes one allocation per element.

> It can be quite surprising how much time the program spends in memory
> allocation alone. For instance, if you instantiate a std::set and add
> some tens of millions of elements to it, it will take many seconds on
> a typical computer. You'd think that the majority of the time is spent
> re-balancing the tree after each insertion. You'd be wrong. Something
> like 80-90% of the time is spent allocating memory!
>

That number is correct, that's why I decided to use (for single linked
lists)

struct element {
struct element *Next;
char data[1];
};

> Using a very fast allocator with std::set or std::list can speed it up
> quite considerably, if many insertions and deletions are performed.
> (The great thing about the C++ standard data containers is that they
> support user-created memory allocators, something that's way more
> difficult to do in a "generic" C container; as everything else.)

My software supports user supplied allocators. The default allocator
object is:

typedef struct tagAllocator {
void *(*malloc)(size_t); /* Function to allocate memory */
void (*free)(void *); /* Function to release it */
void *(*realloc)(void *,size_t);/* Function to resize a block of
memory */
void *(*calloc)(size_t,size_t);
} ContainerAllocator;

You can replace it with your own allocator. For instance you can
supply a GC (Boehm's for instance) allocator, or a custom one,
as you wish.

Ian Collins

unread,
May 22, 2012, 4:07:59 AM5/22/12
to
On 05/22/12 06:05 PM, Juha Nieminen wrote:
> Ian Collins<ian-...@hotmail.com> wrote:
>> Rather than fussing over a few bytes, how about a performance comparison?
>
> A performance comparison would be quite moot because the biggest bottleneck
> in the whole thing is memory allocation. By far. A performance comparison
> would only make sense if one version uses more memory allocations than the
> other (for example, I have seen "generic" linked lists in C that actually
> make two allocations per element rather than the one that std::list makes,
> hence basically making the C version twice as slow as std::list).

Hence my simple use of std::list, I just wanted some idea how the two
compare (particularly with regard to the sort) before any optimisations
are added.

--
Ian Collins

Ian Collins

unread,
May 22, 2012, 4:09:59 AM5/22/12
to
On 05/22/12 06:45 PM, jacob navia wrote:
> Le 22/05/12 08:05, Juha Nieminen a écrit :
>> Ian Collins<ian-...@hotmail.com> wrote:
>>> Rather than fussing over a few bytes, how about a performance comparison?
>>
>> A performance comparison would be quite moot because the biggest bottleneck
>> in the whole thing is memory allocation. By far.
>
> Yes. With a fast allocator, my software runs at the same
> speed than C++. This due to bottleneck is in:

Can you provide a simple, unoptimised example? I'm genuinely curios how
the two compare both before and after allocator optimisation.

--
Ian Collins

A. Bolmarcich

unread,
May 22, 2012, 4:55:04 PM5/22/12
to
On 2012-05-22, Juha Nieminen <nos...@thanks.invalid> wrote:
> Using a very fast allocator with std::set or std::list can speed it up
> quite considerably, if many insertions and deletions are performed.
> (The great thing about the C++ standard data containers is that they
> support user-created memory allocators, something that's way more
> difficult to do in a "generic" C container; as everything else.)

An alternative with C is to put the responsibility for allocation on the
program using containers rather on the functions that implement the containers.
An object that could be in a container would be a struct that included a member
used by the functions that implement the containers.

For a singly linked list the type of that member would be a pointer to a
struct that the field is in, as in:

struct point3d {
int x, y, z;
struct point3d *link;
}

Functions to add an object to or remove it from a container would use that
field and would not allocate container objects.

In general, the link field would be a struct with multiple members.
0 new messages