alloca and c++/proposal

Gianni Mariani

unread,

Mar 17, 2003, 12:21:22 AM3/17/03

to

alloca provides semantics that C++ classes just don't have. It's
somthing I've missed in C++ - especially in a real-time method,
malloc/heap allocation is the wrong thing.

I looked through boost - nothing ! (maybe I need to take a mommy look :)

So I came up with this, this morning - BUT, I'm boosted if I could find
a way to get the exception in constructor semantics correct without
making the interface suck. I also could not find a way not to use
***UGGGH**** macros !

Basically, it works like this:

AllocAAware(); // Place this macro at the beginning of a method

// create objects on the stack.... at whim
// vvvvvvvvvvvvvvvvvv - constructor parameters
YY * l_yy = AllocANew(YY)( "this is a yy" );

// create objects on the stack.... without constructor parameters
XX * l_xx = AllocANew(XX);

Some optimizations one could make are:

a) Objects with trivial destructors don't get placed in list of
destructors to be called.

b) Provide a method to destroy object before enclosing method returns

c) Similar technique could be used in an allocator where all objects in
an allocator can be removed.

I propose that the next revision in the C++ standard provide a way to
1. "new" objects as "auto" memory in a similar fashion to this example.
2. Correctly deallocate memory allocated within blocks (per semantics of
AllocABlockNew below).

//
// alloca_cpp.h
//
// This is an example of how to use C++ and alloca in a way more
// consistant with C++.
//
// There *ARE* a number of issues with object alignment or with
// exceptions taken in the constructor or destructor - especially the
// constructor because the destructor WILL be called even if the
// constructor fails.
//
// Objects allocated with AllocANew are deleted when the method returns
// - just like a local variable.
//

#include <iostream>
#include <stdlib.h>

class AllocaHead;

class AllocAElement
{
public:

AllocAElement * m_next;

virtual void Cleanup() = 0;

protected:

void LinkUp( AllocaHead * io_head );
};

class AllocaHead
{
public:

AllocAElement * m_head;

AllocaHead()
: m_head( 0 )
{
}

~AllocaHead();

};

void AllocAElement::LinkUp( AllocaHead * io_head )
{
m_next = io_head->m_head;
io_head->m_head = this;
}

inline AllocaHead::~AllocaHead()
{
AllocAElement * l_next;

while ( ( l_next = m_head ) != 0 ) {

m_head = l_next->m_next;

try
{
l_next->Cleanup();
}
catch ( ... )
{
}
}
}

template<class XT>
class AllocAWrapper
: public AllocAElement
{

public:

inline static int Size()
{
return sizeof( AllocAWrapper<XT> ) + sizeof( XT );
}

AllocAWrapper( AllocaHead * io_head )
{
LinkUp( io_head );
}

inline XT * Get()
{
return reinterpret_cast<XT *>( this + 1 );
}

virtual void Cleanup()
{
Get()->~XT();
}

};

// ======== AllocANew =================================================
/**
* AllocANew is used instead of new. Objects constructed using AllocANew
* can only be destroyed when the method returns.
*/

#define AllocANew(_TX) new ((\
_alloca_l_helper = alloca( AllocAWrapper<_TX>::Size() ),\
(new(static_cast<AllocAWrapper<_TX>*>(_alloca_l_helper))AllocAWrapper<_TX>(_alloca_l_ah))->Get()\
)) _TX\
// macro end

// ======== AllocABlockNew ============================================
/**
* AllocABlockNew can be used instead of new and Objects constructed
* using AllocABlockNew are destroyed when the AllocABlockAware in
* scope goes out of scope. NOTE::: Memory allocated is not reclaimed
* until the method returns (This may make this whole thing a bad idea).
*/

#define AllocABlockNew(_TX) new ((\
_alloca_l_bhelper = alloca( AllocAWrapper<_TX>::Size() ),\
(new(static_cast<AllocAWrapper<_TX>*>(_alloca_l_bhelper))AllocAWrapper<_TX>(_alloca_l_bah))->Get()\
)) _TX\
// macro end

// ======== AllocAAware ===============================================
/**
* AllocAAware must be placed at the beginning of any method that uses
* AllocANew.
*/

#define AllocAAware() \
AllocaHead _alloca_l_ah[1]; \
register void * _alloca_l_helper \
// end macro

// ======== AllocABlockAware ==========================================
/**
* AllocABlockAware must be placed at the beginning of the block which
* the AllocABlockNew objects must be retained.
*/

#define AllocABlockAware() \
AllocaHead _alloca_l_bah[1]; \
register void * _alloca_l_bhelper \
// end macro

/////////////////////////////////////
//
//
// example for alloca_cpp
//

class XX
{
public:

std::string xx;

XX()
{
std::cout << "XX constructor\n";
}

~XX()
{
std::cout << "XX destructor\n";
}

};

class YY
{
public:

std::string xx;

YY( const char * i_str)
: xx( i_str )
{
std::cout << "YY constructor " << xx << "\n";
}

~YY()
{
std::cout << "YY destructor " << xx << "\n";
}

};

class ZZ
{
public:

int xx;

ZZ( int i_zz )
: xx( i_zz )
{
std::cout << "ZZ constructor " << xx << "\n";
}

~ZZ()
{
std::cout << "ZZ destructor " << xx << "\n";
}

};

void foo()
{

AllocAAware();

XX * l_xx = AllocANew(XX);
YY * l_yy = AllocANew(YY)( "this is a yy" );
XX * l_xx1 = AllocANew(XX);

for ( int i = 1; i < 10; i ++ )
{
AllocABlockAware();

// WARNING - memory not reclaimed until method exits
AllocABlockNew(ZZ)(i);
AllocABlockNew(YY)("loopy");

if ( i == 5 )
{
// Allocate somthing in full scope
AllocANew(ZZ)(2222);
}
}

}

int main()
{
foo();
}

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std...@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]

Jack Klein

unread,

Mar 17, 2003, 2:33:43 PM3/17/03

to

On Mon, 17 Mar 2003 05:21:22 +0000 (UTC), gi2n...@mariani.ws (Gianni
Mariani) wrote in comp.lang.c++:

>
> alloca provides semantics that C++ classes just don't have. It's
> somthing I've missed in C++ - especially in a real-time method,
> malloc/heap allocation is the wrong thing.

The only responses you received to your original post in comp.lang.c++
told you that your entire discussion was off-topic there.

comp.std.c++ is indeed the newsgroup to post to if you want to propose
a new language or library feature for a future version of the C++
standard. But this is still off-topic in comp.lang.c, which deals
primarily with today's C++ standard. Cross posting it to comp.std.c++
does not change this, so you should have left comp.lang.c++ out.

Two other points:

I still think that if you design code that needs to allocate memory in
"real-time methods", either you have a design flaw or you do not
really understand what real time means. It is not synonymous with
"really, really fast". Perhaps you should visit news:comp.realtime.

Also, before you can propose standardizing some templates and
functions built around alloca(), you really need to propose adding
alloca() to a future version of the C++ standard. Your post does not
address the merits of adding it at all, just talks about what you want
to do with it should that happen.

I would suggest you start by posting arguments for adding alloca()
itself to standard C++. If and when such a thing occurs, the rest of
your post would be topical for comp.lang.c++.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++ ftp://snurse-l.org/pub/acllc-c++/faq

Alexander Terekhov

unread,

Mar 17, 2003, 3:24:49 PM3/17/03

to

Jack Klein wrote:
>
> On Mon, 17 Mar 2003 05:21:22 +0000 (UTC), gi2n...@mariani.ws (Gianni
> Mariani) wrote in comp.lang.c++:
>
> >
> > alloca provides semantics that C++ classes just don't have. It's
> > somthing I've missed in C++ - especially in a real-time method,
> > malloc/heap allocation is the wrong thing.
>
> The only responses you received to your original post in comp.lang.c++
> told you that your entire discussion was off-topic there.

That's not true (well, there was a minute window, of course ;-) ).

<copy&paste>

Gianni Mariani wrote:
>
> Alexander Terekhov wrote:
> > Gianni Mariani wrote: [... a lot ...]
> >
> > Uhmm, what's wrong with something along the lines of:
> >
> > scoped_ptr_like_beast_that_simply_invokes_dtor< T >
> > p(new(throw_bad_alloc_if_null(alloca(sizeof T))) T(/*...*/));
> >
>
> If you infer you can call a method then this will not work because upon
> return from a method call, the alloca storage in question disappears.

How does this apply to the "code" above? It does the following:

1. calls alloca()

2. checks the result and throws if it's null

void* throw_bad_alloc_if_null(void* p) {
if (!p) throw bad_alloc();
return p;
}

3. invokes placement new to create an object

4. passes the address to the local newly created instance of
"scoped_ptr_like_beast_that_simply_invokes_dtor" (when it
goes out of scope).

[...]
> The main issues I have with this approach is defining what should be
> done if exceptions are thrown.

See above. Did I really miss something?

</copy&paste>

regards,
alexander.

--
http://groups.google.com/groups?th=ed5dffd0a1e4d799

Allan W

unread,

Mar 17, 2003, 8:02:28 PM3/17/03

to

gi2n...@mariani.ws (Gianni Mariani) wrote

> alloca provides semantics that C++ classes just don't have. It's
> somthing I've missed in C++

What is it about alloca that you miss? Any reason you can't just use
auto variables instead?

> - especially in a real-time method,
> malloc/heap allocation is the wrong thing.

Please give an example.

> Basically, it works like this:
>
> AllocAAware(); // Place this macro at the beginning of a method
>
> // create objects on the stack.... at whim
> // vvvvvvvvvvvvvvvvvv - constructor parameters
> YY * l_yy = AllocANew(YY)( "this is a yy" );
>
> // create objects on the stack.... without constructor parameters
> XX * l_xx = AllocANew(XX);

What does this accomplish that isn't handled by:
YY yy("this is a yy");
YY * l_yy = &yy;
XX xx;
XX * l_xx = &xx;

> Some optimizations one could make are:
>
> a) Objects with trivial destructors don't get placed in list of
> destructors to be called.
>
> b) Provide a method to destroy object before enclosing method returns
>
> c) Similar technique could be used in an allocator where all objects in
> an allocator can be removed.
>
> I propose that the next revision in the C++ standard provide a way to
> 1. "new" objects as "auto" memory in a similar fashion to this example.
> 2. Correctly deallocate memory allocated within blocks (per semantics of
> AllocABlockNew below).

Maybe I'm slow; I think I'm missing your point entirely.

Hyman Rosen

unread,

Mar 18, 2003, 12:21:19 AM3/18/03

to

Allan W wrote:
> What is it about alloca that you miss?
> Any reason you can't just use auto variables instead?

Perhaps for the same reason that you use 'new',
instead of using static variables?

> Maybe I'm slow; I think I'm missing your point entirely.

Just like 'new', alloca lets you decide at runtime how much
space to allocate. It simply comes from a different memory
area, and is automatically released upon function exit.

E. Robert Tisdale

unread,

Mar 18, 2003, 2:51:25 PM3/18/03

to

Jack Klein wrote:

> comp.std.c++ is indeed the newsgroup to post to if you want to propose
> a new language or library feature for a future version of the C++
> standard. But this is still off-topic in comp.lang.c, which deals
> primarily with today's C++ standard. Cross posting it to comp.std.c++
> does not change this, so you should have left comp.lang.c++ out.

Why did you crosspost your response to comp.std.c++?

Michael S

unread,

Mar 18, 2003, 2:52:31 PM3/18/03

to

===================================== MODERATOR'S COMMENT:

Please trim quotes to a reasonable level when adding only
a single paragraph.

===================================== END OF MODERATOR'S COMMENT
gi2n...@mariani.ws (Gianni Mariani) wrote in message news:<b52hqv$4...@dispatch.concentric.net>...

Please take into account that alloca is not free, at least not on the
architectures with a small number of general-purpuse registers. Alloca
cost you one GPR (frame pointer), because you can't use stack pointer
to access local variables/parameters any more. It just happened that
the most popular architecture in the world (x386) has only seven GPRs.
Any father reduction of this already small amount is unlikely to be
wellcome.

Francis Glassborow

unread,

Mar 19, 2003, 2:51:48 PM3/19/03

to

In article <b52hqv$4...@dispatch.concentric.net>, Gianni Mariani
<gi2n...@mariani.ws> writes

>alloca provides semantics that C++ classes just don't have. It's
>somthing I've missed in C++ - especially in a real-time method,
>malloc/heap allocation is the wrong thing.

But, IIRC, alloca is not part of Standard C either. C99 goes some way to
solve this problem with its VLAs and I guess C++ will need to look at
those to see if they are a cost effective addition for C++0x

--
ACCU Spring Conference 2003 April 2-5
The Conference you cannot afford to miss
Check the details: http://www.accuconference.co.uk/
Francis Glassborow ACCU

Gianni Mariani

unread,

Mar 19, 2003, 2:53:36 PM3/19/03

to

Allan W wrote:
> gi2n...@mariani.ws (Gianni Mariani) wrote
>
>>alloca provides semantics that C++ classes just don't have. It's
>>somthing I've missed in C++
>
>
> What is it about alloca that you miss? Any reason you can't just use
> auto variables instead?
>
>
>>- especially in a real-time method,
>>malloc/heap allocation is the wrong thing.
>
>
> Please give an example.

I can construct all kinds of examples, nothing simple.

Say I want to construct a temporary list of strings.

class string_node
{
public:
string_node * next;
std::string str;
string_node( string_node * i_next, const std::string & i_str )
: next( i_next ),
str( i_str )
{
}
};

void someclass::foo( string_provider & provider )
{
AllocAAware();

// create a temporary list of strings

string_node * list = 0;
int count = 0;

std::string l_str;

while ( provider.get_next( l_str ) )
{
list = AllocANew(string_node)( list, l_str );
count ++;

if ( count > magic_number )
{
return; // by magic - the list is deleted
}
}

if ( count == magic_number )
{
process_list( list );
}

return // by magic - the list is deleted here
}

You could use this in some kind of deserializer where you're validating
the number of strings in the input before processing. In reality I
would likely use a different "string" class.

This is just one of many examples.

There should also be an AllocANewArray(T,int) version as most cases I
would use one of these would be as an array.

... I can find lots of examples. When I used C - many moons ago, I used
alloca quite alot, I've obviously avoided it using C++.

>
>
>>Basically, it works like this:
>>
>> AllocAAware(); // Place this macro at the beginning of a method
>>
>> // create objects on the stack.... at whim
>> // vvvvvvvvvvvvvvvvvv - constructor parameters
>> YY * l_yy = AllocANew(YY)( "this is a yy" );
>>
>> // create objects on the stack.... without constructor parameters
>> XX * l_xx = AllocANew(XX);
>
>
> What does this accomplish that isn't handled by:
> YY yy("this is a yy");
> YY * l_yy = &yy;
> XX xx;
> XX * l_xx = &xx;

It's an example of usage of the interface. It's was not meant to be
anything else.

Geert-Jan Giezeman

unread,

Mar 19, 2003, 2:54:09 PM3/19/03

to

"Gianni Mariani" <gi2n...@mariani.ws> wrote in message
news:b52hqv$4...@dispatch.concentric.net...

>
> alloca provides semantics that C++ classes just don't have. It's
> somthing I've missed in C++ - especially in a real-time method,
> malloc/heap allocation is the wrong thing.

Perhaps it is interesting to see how C# deals with this. In unsafe mode it
has a keyword (stackalloc) to deal
with this. See http://www.jaggersoft.com/csharp_standard/25.7.htm

>From the example there:
{
...
char* buffer = stackalloc char[16];
...
}

a stackalloc initializer is used to allocate a buffer of 16 characters on
the stack. The buffer is automatically discarded when the method returns.

Ron Natalie

unread,

Mar 19, 2003, 5:42:28 PM3/19/03

to

"Gianni Mariani" <gi2n...@mariani.ws> wrote in message news:b56cvk$a...@dispatch.concentric.net...

> You could use this in some kind of deserializer where you're validating
> the number of strings in the input before processing. In reality I
> would likely use a different "string" class.

This is C++ after all. Some sort of specialized container defined as an
automatic variable will perform the same function.

> ... I can find lots of examples. When I used C - many moons ago, I used
> alloca quite alot, I've obviously avoided it using C++.

I have been programming performance applications and operating systems
in C since 1977. I have never found a compelling need to use alloca, even
when it was available. I have less tolerances for such kludges in C++.

Kaz Kylheku

unread,

Mar 20, 2003, 1:15:52 PM3/20/03

to

gi2n...@mariani.ws (Gianni Mariani) wrote in message news:<b52hqv$4...@dispatch.concentric.net>...

> alloca provides semantics that C++ classes just don't have. It's
> somthing I've missed in C++ - especially in a real-time method,
> malloc/heap allocation is the wrong thing.

The alloca function is a nonstandard library exension; it is not in
the ANSI/ISO C language.

The C language now has variable length arrays now to give most of the
semantics of alloca behind a more sane interface.

It doesn't make much sense to work VLA's into C++ other than for C
compatibility, because the std::vector type already exists.

Nothing prevents a compiler implementor from stack-allocating a
std::vector object, particularly one that receives an initial size in
its constructor and is not thereafter resized for the remainder of its
lifetime.

That is to say, the definition of a std::vector variable can, in
principle, be recognized by the compiler and handled specially.

This is an optimization that need not leak into the programming
language syntax and semantics. It's already enough of a hint to the
compiler that the vector object is declared in automatic storage.

If your vendor is reluctant to provide this optimization under the
existing language, chances are they will be reluctant also to support
it under the guise of new language features.

In other words, you don't necessarily get ahead by getting a committee
to throw more syntax at an issue.

Allan W

unread,

Mar 20, 2003, 3:40:20 PM3/20/03

to

> Allan W wrote:
> > Please give an example.

gi2n...@mariani.ws (Gianni Mariani) wrote

auto_ptr<std::list<std::string> > list = new std::list<std::string>;
std::string l_str;

while ( provider.get_next( l_str ) ) {

list->push_front(l_str);
if (list->size() > magic_number)
return; // By magic -- the list is deleted
}
if ( count == magic_number ) process_list(list);

return // by magic - the list is deleted here
}

---

Allan W

unread,

Mar 20, 2003, 3:40:34 PM3/20/03

to

gi2n...@mariani.ws (Gianni Mariani) wrote

> Say I want to construct a temporary list of strings.
>
> class string_node
> {
> public:
> string_node * next;
> std::string str;
> string_node( string_node * i_next, const std::string & i_str )
> : next( i_next ),
> str( i_str )
> {
> }
> };
>
> void someclass::foo( string_provider & provider )
> {
> AllocAAware();
>
> // create a temporary list of strings
>
> string_node * list = 0;
> int count = 0;
>
>
> std::string l_str;
>
> while ( provider.get_next( l_str ) )
> {
> list = AllocANew(string_node)( list, l_str );
> count ++;
>
> if ( count > magic_number )
> {
> return; // by magic - the list is deleted
> }
> }
>
> if ( count == magic_number )
> {
> process_list( list );
> }
>
> return // by magic - the list is deleted here
> }

I didn't point it out in the message I posted a moment ago -- but this
would have undefined behavior. When the memory containing std::string
is released without calling the std::string destructor, the best you
can hope for is a memory leak.

Allan W

unread,

Mar 20, 2003, 4:25:39 PM3/20/03

to

hyr...@mail.com (Hyman Rosen) wrote in message

> Just like 'new', alloca lets you decide at runtime how much
> space to allocate.

Just like malloc() then, too.

> It simply comes from a different memory
> area,

Specifically the stack. But why is that superior?

You're not supposed to rely on the machine address in a pointer... you're
just supposed to use it as

> and is automatically released upon function exit.

Aha! The very definition of alloca() in C.

But an auto_ptr variable also automatically releases memory... AND it
properly cleans up (i.e. calls the destructor).

Neil Butterworth

unread,

Mar 20, 2003, 4:38:25 PM3/20/03

to

"Allan W" <all...@my-dejanews.com> wrote in message
news:7f2735a5.03032...@posting.google.com...

> hyr...@mail.com (Hyman Rosen) wrote in message

> Just like malloc() then, too.

>
> > It simply comes from a different memory
> > area,
>
> Specifically the stack. But why is that superior?

Possibly because stack allocation, which simply involves changing a
processor register, is typically much faster than dynamic allocation, which
may involve system calls etc.

NeilB

Ron Natalie

unread,

Mar 20, 2003, 5:02:19 PM3/20/03

to

"Neil Butterworth" <neil_but...@lineone.net> wrote in message news:3e7a3...@mk-nntp-1.news.uk.worldonline.com...

>
> Possibly because stack allocation, which simply involves changing a
> processor register, is typically much faster than dynamic allocation, which
> may involve system calls etc.
>

The stack allocation once it's more than a trivial amount will most likely cause
a page fault on most architectures, so it incurs a similar amount of overhead.

Gianni Mariani

unread,

Mar 21, 2003, 2:32:52 AM3/21/03

to

Allan W wrote:
> gi2n...@mariani.ws (Gianni Mariani) wrote
>

>
>

> I didn't point it out in the message I posted a moment ago -- but this
> would have undefined behavior. When the memory containing std::string
> is released without calling the std::string destructor, the best you
> can hope for is a memory leak.
>

I really don't know what you're referring to. If you refer to the
example in the original posting, whose interface I used in my previous
post, then the destructor IS called.

Your example performs essentially the same thing however has a
significantly higher cost on performance and cannot be considered real-time.

G

LLeweLLyn

unread,

Mar 21, 2003, 12:18:35 PM3/21/03

to

all...@my-dejanews.com (Allan W) writes:

> hyr...@mail.com (Hyman Rosen) wrote in message
> > Just like 'new', alloca lets you decide at runtime how much
> > space to allocate.
>
> Just like malloc() then, too.
>
> > It simply comes from a different memory
> > area,
>
> Specifically the stack. But why is that superior?

Better cache coherency. Less overhead.

> You're not supposed to rely on the machine address in a pointer... you're
> just supposed to use it as
>
> > and is automatically released upon function exit.
>
> Aha! The very definition of alloca() in C.
>
> But an auto_ptr variable also automatically releases memory... AND it
> properly cleans up (i.e. calls the destructor).

[snip]

Which is why the OP raised this issue - he wants stack objects with
runtime-determined size, and he wants their destructors called on
scope exit.

Ron Natalie

unread,

Mar 21, 2003, 1:56:58 PM3/21/03

to

"LLeweLLyn" <llewe...@xmission.dot.com> wrote in message news:m1bs05p...@localhost.localdomain...

>
> Better cache coherency. Less overhead.

If you're talking more than a few bytes, the above statement isn't even right.

> Which is why the OP raised this issue - he wants stack objects with
> runtime-determined size, and he wants their destructors called on
> scope exit.

That's why we have auto class variables in C++. Then it's not even a special
case whether you allocate them on the stack, as a subobject of another object,
with static duration, etc....they all very deterministicly clean themselves up at the
end of their determined lifetime.

Allan W

unread,

Mar 21, 2003, 4:12:39 PM3/21/03

to

gi2n...@mariani.ws (Gianni Mariani) wrote

> Allan W wrote:
> > I didn't point it out in the message I posted a moment ago -- but this
> > would have undefined behavior. When the memory containing std::string
> > is released without calling the std::string destructor, the best you
> > can hope for is a memory leak.
>
> I really don't know what you're referring to. If you refer to the
> example in the original posting, whose interface I used in my previous
> post, then the destructor IS called.

Maybe I read it wrong -- and I don't have time to re-read it.
But if AllocANew() works similar to alloca() -- that is, the memory
simply disappears when the current function ends -- then nothing
would have called the destructor.

> Your example performs essentially the same thing however has a
> significantly higher cost on performance and cannot be considered real-time.

According to some experts that I trust, *NO* program that performs
dynamic allocation can be considered real-time. Real-time systems
are likely to allocate as much memory as needed during initialization,
and then never touch dynamic allocation again until shutdown.

Gianni Mariani

unread,

Mar 24, 2003, 6:29:00 PM3/24/03

to

Allan W wrote:
> gi2n...@mariani.ws (Gianni Mariani) wrote

...

>
> According to some experts that I trust, *NO* program that performs
> dynamic allocation can be considered real-time. Real-time systems
> are likely to allocate as much memory as needed during initialization,
> and then never touch dynamic allocation again until shutdown.

Exactly. That's why you need to use alloca.

G

Ron Natalie

unread,

Mar 25, 2003, 2:07:52 PM3/25/03

to

"Gianni Mariani" <gi2n...@mariani.ws> wrote in message news:b5gjht$b...@dispatch.concentric.net...

> > According to some experts that I trust, *NO* program that performs
> > dynamic allocation can be considered real-time. Real-time systems
> > are likely to allocate as much memory as needed during initialization,
> > and then never touch dynamic allocation again until shutdown.
>
> Exactly. That's why you need to use alloca.

What does that mean? What's the difference between dynamic allocation
in the heap and essentially dynamic allocation on the stack. There's not
a significant difference in the cost to allocate and if you run out either way
you're going to lose.

Andrew Johnson

unread,

Mar 25, 2003, 3:23:10 PM3/25/03

to

Ron Natalie wrote:

> "Gianni Mariani" <gi2n...@mariani.ws> wrote:
>
>>Exactly. That's why you need to use alloca.
>
> What does that mean? What's the difference between dynamic allocation
> in the heap and essentially dynamic allocation on the stack. There's not
> a significant difference in the cost to allocate and if you run out either way
> you're going to lose.

Surely there *is* a very significant difference in the allocation cost.
On some architectures an alloca() only has to subtract the allocation size
from the stack pointer and return the old value of the stack pointer.
That is completely deterministic behaviour, you can't write a dynamic
allocator anywhere near that fast, and it's often possible to statically
check the stack usage to guarantee that you'll never run out.

- Andrew
--
There are 10 types of people in the world:
Those who understand binary, and those who don't.

Sam Holden

unread,

Mar 25, 2003, 3:32:56 PM3/25/03

to

On Tue, 25 Mar 2003 19:07:52 +0000 (UTC), Ron Natalie <r...@sensor.com> wrote:
>
> "Gianni Mariani" <gi2n...@mariani.ws> wrote in message news:b5gjht$b...@dispatch.concentric.net...
>> > According to some experts that I trust, *NO* program that performs
>> > dynamic allocation can be considered real-time. Real-time systems
>> > are likely to allocate as much memory as needed during initialization,
>> > and then never touch dynamic allocation again until shutdown.
>>
>> Exactly. That's why you need to use alloca.
>
> What does that mean? What's the difference between dynamic allocation
> in the heap and essentially dynamic allocation on the stack. There's not
> a significant difference in the cost to allocate and if you run out either way
> you're going to lose.

Huh?

Allocating on the stack is as fast as adding a number to the stack pointer,
Deallocating is as fast as subtracting a number from the stack pointer.

On the heap, allocation and deallocation involve lots unpredictable time costs
due to fragmentation issues, and finding an available block of sufficient
size.

On my system this code (which is C not C++)

#include <stdlib.h>

int heap(int n) {
free(malloc(n));
}

int stack(int n) {
alloca(n);
}

int main() {
int i;
srand(123456789);
for (i=0;i<1000000;++i)
heap(rand()%1000000);
return 0;
}

takes (on one run consistant with a number of runs)

real 0m4.955s
user 0m1.370s
sys 0m3.590s

to run. Replacing the call to heap with a call to stack in main() gives:

real 0m0.135s
user 0m0.130s
sys 0m0.000s

That seems to be a signficant difference to me.

--
Sam Holden

Ron Natalie

unread,

Mar 25, 2003, 4:05:01 PM3/25/03

to

"Andrew Johnson" <a...@aps.anl.gov> wrote in message news:b5qb2o$mv3$1...@milo.mcs.anl.gov...

>
> Surely there *is* a very significant difference in the allocation cost.

He was talking about the determinism of not doing dynamic allocation. To
do this assumes the stack has some prefixed size. Nothing prevents the
general purpose dynamic from doing similarly.

> On some architectures an alloca() only has to subtract the allocation size
> from the stack pointer and return the old value of the stack pointer.

If the stack isn't of some pre-fixed size, then the growing the stack like this
will cause a page fault and the system will have to put more memory to back
up the larger virtual size. This is not significantly harder than what it does
when you have to go after more dynamic memory. That was what I was
referring to.

Even addressing your issue, it's possible (and I'm working up the details)
to have a mini-heap allocator that does the same thing without having to
change the semantics of the existing language, nor having to figure out
what you're going to do about the fact that C++ objects have a definite
lifetime (how are the destructors on these alloca-resident memory invoked?).

If you add a LIFO allocation function, you can have:

1. Allocation with the exact same efficiency as alloca
2. Deallocation with the same efficiency as long as you are destroying things
consistant with LIFO.
3. Fallback mode when you don't release the memory LIFO.
4. Use of the existing C++ behavior with respect to the lifetimes of objects.

> That is completely deterministic behaviour, you can't write a dynamic
> allocator anywhere near that fast, and it's often possible to statically
> check the stack usage to guarantee that you'll never run out.
>

How on earth are you going to achieve the last one? The whole poing of
this argument was that you had an allocation, the size of which is not
known at compile time. Otherwise, you'd just create the objects as simple
auto objects.

Michael S

unread,

Mar 26, 2003, 1:23:39 PM3/26/03

to

Your benchmark suffers from the obvious flow - you don't access
allocated memory. It hides a dark side of alloca - an OS stack
overflow handler never shows up.
There is my benchmark. Slightly longer than yours but IMHO it worth
it.

#include <stdlib.h>
#include <malloc.h>
#include <memory.h>
#include <time.h>
#include <stdio.h>

void heap(int n)
{
void *p = malloc(n);
free(p);
}

void stack(int n)
{
alloca(n);
}

void heapa(int n)
{
void *p = malloc(n);
memset(p, 0, n);
free(p);
}

void stacka(int n)
{
void *p= alloca(n);
memset(p, 0, n);
}

const ITER_NO = 1000000;
const SRAND_SEED = 123456789;
int main()
{
int i, it, MAX_BLOCK_SZ;
clock_t t0, t1;

for (MAX_BLOCK_SZ = 100; MAX_BLOCK_SZ < 10001; MAX_BLOCK_SZ *= 10)
{
printf("MAX_BLOCK_SZ = %d.\n", MAX_BLOCK_SZ);
for (it = 0; it < 3; ++it)
{
srand(SRAND_SEED);
t0 = clock();
for (i = 0; i < ITER_NO; ++i)
heap(rand() % MAX_BLOCK_SZ);
t1 = clock();
printf("malloc: %6.0f ms, ", (t1-t0)*
(1000.0/CLOCKS_PER_SEC));

srand(SRAND_SEED);
t0 = clock();
for (i = 0; i < ITER_NO; ++i)
stack(rand() % MAX_BLOCK_SZ);
t1 = clock();
printf("alloca: %6.0f ms, ", (t1-t0)*
(1000.0/CLOCKS_PER_SEC));

srand(SRAND_SEED);
t0 = clock();
for (i = 0; i < ITER_NO; ++i)
heapa(rand() % MAX_BLOCK_SZ);
t1 = clock();
printf("malloc+a: %6.0f ms, ", (t1-t0)*
(1000.0/CLOCKS_PER_SEC));

srand(SRAND_SEED);
t0 = clock();
for (i = 0; i < ITER_NO; ++i)
stacka(rand() % MAX_BLOCK_SZ);
t1 = clock();
printf("alloca+a: %6.0f ms.", (t1-t0)*
(1000.0/CLOCKS_PER_SEC));
printf("\n");
}
}
return 0;
}

Following results were obtained on my old PC (Pentium-2/233 VC 5):
MAX_BLOCK_SZ = 100.
malloc: 1162 ms, alloca: 341 ms, malloc+a: 1392 ms, alloca+a:
611 ms.
malloc: 1151 ms, alloca: 351 ms, malloc+a: 1392 ms, alloca+a:
611 ms.
malloc: 1151 ms, alloca: 341 ms, malloc+a: 1392 ms, alloca+a:
611 ms.
MAX_BLOCK_SZ = 1000.
malloc: 1982 ms, alloca: 341 ms, malloc+a: 2984 ms, alloca+a:
1603 ms.
malloc: 1982 ms, alloca: 341 ms, malloc+a: 2994 ms, alloca+a:
1613 ms.
malloc: 1992 ms, alloca: 341 ms, malloc+a: 2974 ms, alloca+a:
1602 ms.
MAX_BLOCK_SZ = 10000.
malloc: 1473 ms, alloca: 390 ms, malloc+a: 5588 ms, alloca+a:
8442 ms.
malloc: 1472 ms, alloca: 391 ms, malloc+a: 5588 ms, alloca+a:
8442 ms.
malloc: 1472 ms, alloca: 391 ms, malloc+a: 5588 ms, alloca+a:
8452 ms.

As you can see, alloca shines when allocation size is short, but it
loses rather badly when the size of the allocated block exceeds system
page size. As you would expect, compiler is faster then c run time
library which is faster than the OS page fault handler.

Now this benchmark still shows the alloca in better colors than it
deserves, because it compares it with general-purpose allocator. The
correct comparison would put it against functionally equivalent
allocator, for example one based on std::stack<char> class. My guess,
alloca would do nearly equal on short blocks and would lose even worse
than in my test on longer ones.

Andrew Johnson

unread,

Mar 26, 2003, 7:10:20 PM3/26/03

to

Ron Natalie wrote:

> "Andrew Johnson" <a...@aps.anl.gov> wrote:
>
>>On some architectures an alloca() only has to subtract the allocation size
>>from the stack pointer and return the old value of the stack pointer.
>
> If the stack isn't of some pre-fixed size, then the growing the stack like this
> will cause a page fault and the system will have to put more memory to back
> up the larger virtual size. This is not significantly harder than what it does
> when you have to go after more dynamic memory. That was what I was
> referring to.

On systems with paged memory that is the case, but those systems aren't
deterministic anyway. Real-time systems for which the speed and
determinism of a stack-based allocator would be useful *don't have page
faults*. I spend most of my time working with such systems running the
vxWorks RTOS, which has a single flat memory space.

> Even addressing your issue, it's possible (and I'm working up the details)
> to have a mini-heap allocator that does the same thing without having to
> change the semantics of the existing language, nor having to figure out
> what you're going to do about the fact that C++ objects have a definite
> lifetime (how are the destructors on these alloca-resident memory invoked?).
>
> If you add a LIFO allocation function, you can have:
>
> 1. Allocation with the exact same efficiency as alloca
> 2. Deallocation with the same efficiency as long as you are destroying things
> consistant with LIFO.
> 3. Fallback mode when you don't release the memory LIFO.
> 4. Use of the existing C++ behavior with respect to the lifetimes of objects.

Good idea, although that doesn't allow as much flexibility because the
alloca approach allows the same fixed size stack space to be used for auto
objects in one routine and alloca objects in another, with no overhead in
switching between the two models. A LIFO allocator can't do that, and the
systems where this matters will often be limited on RAM anyway.

I'm not a compiler writer, but I don't see how an alloca object's lifetime
differs significantly from that of an auto object with function scope, so
the same destructor approach could be used - am I confused there?

>>That is completely deterministic behaviour, you can't write a dynamic
>>allocator anywhere near that fast, and it's often possible to statically
>>check the stack usage to guarantee that you'll never run out.
>
> How on earth are you going to achieve the last one? The whole poing of
> this argument was that you had an allocation, the size of which is not
> known at compile time. Otherwise, you'd just create the objects as simple
> auto objects.

We've already seen an example of this earlier in the thread: I don't know
which of several different kinds/sizes of objects I'm going to be creating
and destroying in each iteration of a loop, but I do know the maximum size
and can allocate stack space accordingly.

- Andrew
--
There are 10 types of people in the world:
Those who understand binary, and those who don't.

---

Andrew Johnson

unread,

Mar 26, 2003, 7:17:10 PM3/26/03

to

Michael S wrote:
> Your benchmark suffers from the obvious flow - you don't access
> allocated memory. It hides a dark side of alloca - an OS stack
> overflow handler never shows up.

<snip>

> Following results were obtained on my old PC (Pentium-2/233 VC 5):
> MAX_BLOCK_SZ = 100.
> malloc: 1162 ms, alloca: 341 ms, malloc+a: 1392 ms, alloca+a: 611 ms.
> malloc: 1151 ms, alloca: 351 ms, malloc+a: 1392 ms, alloca+a: 611 ms.
> malloc: 1151 ms, alloca: 341 ms, malloc+a: 1392 ms, alloca+a: 611 ms.
> MAX_BLOCK_SZ = 1000.
> malloc: 1982 ms, alloca: 341 ms, malloc+a: 2984 ms, alloca+a: 1603 ms.
> malloc: 1982 ms, alloca: 341 ms, malloc+a: 2994 ms, alloca+a: 1613 ms.
> malloc: 1992 ms, alloca: 341 ms, malloc+a: 2974 ms, alloca+a: 1602 ms.
> MAX_BLOCK_SZ = 10000.
> malloc: 1473 ms, alloca: 390 ms, malloc+a: 5588 ms, alloca+a: 8442 ms.
> malloc: 1472 ms, alloca: 391 ms, malloc+a: 5588 ms, alloca+a: 8442 ms.
> malloc: 1472 ms, alloca: 391 ms, malloc+a: 5588 ms, alloca+a: 8452 ms.

Not all systems have paged memory, and especially not those real-time
systems where alloca could be useful for reasons of speed and determinism.
Below are my results from a system running vxWorks (flat memory space,
no paging). As it doesn't actually support alloca() I used gcc's ability
to create variable-sized char arrays instead, and I had to change the time
measurement method as well, but my code was otherwise identical to yours.
I ran this on a 133MHz PowerPC (MPC-750) CPU.

mv5100> allocaTime
MAX_BLOCK_SZ = 100.
malloc: 2128 ms, alloca: 125 ms, malloc+a: 2272 ms, alloca+a: 255 ms.
malloc: 2128 ms, alloca: 125 ms, malloc+a: 2272 ms, alloca+a: 255 ms.
malloc: 2128 ms, alloca: 125 ms, malloc+a: 2272 ms, alloca+a: 255 ms.
MAX_BLOCK_SZ = 1000.
malloc: 2152 ms, alloca: 127 ms, malloc+a: 2814 ms, alloca+a: 776 ms.
malloc: 2152 ms, alloca: 127 ms, malloc+a: 2814 ms, alloca+a: 776 ms.
malloc: 2152 ms, alloca: 127 ms, malloc+a: 2814 ms, alloca+a: 776 ms.
MAX_BLOCK_SZ = 10000.
malloc: 2570 ms, alloca: 127 ms, malloc+a: 8089 ms, alloca+a: 5633 ms.
malloc: 2570 ms, alloca: 127 ms, malloc+a: 8089 ms, alloca+a: 5633 ms.
malloc: 2570 ms, alloca: 127 ms, malloc+a: 8089 ms, alloca+a: 5633 ms.
value = 0 = 0x0

On my system, the act of accessing the allocated memory has no significant
effect on the difference between the time taken to call malloc() and to
grab that same memory from the stack.

- Andrew
--
There are 10 types of people in the world:
Those who understand binary, and those who don't.

---

Fergus Henderson

unread,

Mar 27, 2003, 5:41:37 PM3/27/03

to

a...@aps.anl.gov (Andrew Johnson) writes:

> Below are my results from a system running vxWorks (flat memory space,
>no paging). As it doesn't actually support alloca() I used gcc's ability
>to create variable-sized char arrays instead, and I had to change the time
>measurement method as well,

If you are using gcc, you should be able to use __builtin_alloca(),
rather than needing to use variable-sized char arrays. But the two
should perform similarly anyway.

>mv5100> allocaTime
>MAX_BLOCK_SZ = 100.
>malloc: 2128 ms, alloca: 125 ms, malloc+a: 2272 ms, alloca+a: 255 ms.

Note that the benchmark loop body also included a call to rand(), a %
(modulo) operation, and a function call. These might take significant
time, perhaps a large fraction of the 125ms time above. So the real
ratio of alloca speed to malloc speed might be even higher than in
those figures.

--
Fergus Henderson <f...@cs.mu.oz.au> | "I have always known that the pursuit
The University of Melbourne | of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh> | -- the last words of T. S. Garp.

Gianni Mariani

unread,

Mar 28, 2003, 10:02:02 AM3/28/03

to

Fergus Henderson wrote:
> a...@aps.anl.gov (Andrew Johnson) writes:
>

> Note that the benchmark loop body also included a call to rand(), a %
> (modulo) operation, and a function call. These might take significant
> time, perhaps a large fraction of the 125ms time above. So the real
> ratio of alloca speed to malloc speed might be even higher than in
> those figures.
>

You're right. In a previous life I implemented a high performance
malloc and it turned out that rand took asignificant portion of the test
time.

So a rule of thumb I use, alloca *on average* takes 1-2 cycles if the
parameter is constant and 4-5 cycles if it's variable while malloc *on
average* takes 50 cycles.