Array of lambdas

1,612 views
Skip to first unread message

dgutson .

unread,
Jun 24, 2015, 11:03:12 AM6/24/15
to std-proposals

The reason that this cannot be done

const std::array<decltype([](){ return 0;}), 3> x = {
[](){return 1;},
[](){return 2;},
[](){return 3;}
};

is that each lambda has its own unique type?
Of course this can be done with an array of std::function.

Daniel.

dgutson .

unread,
Jun 24, 2015, 11:06:48 AM6/24/15
to std-proposals

The reason I'm posting it here is to analyze whether such code should make sense or not. At least from a performance POV, that example I think would be faster than the array of std::function.

>
> Daniel.

Nevin Liber

unread,
Jun 24, 2015, 11:20:16 AM6/24/15
to std-pr...@isocpp.org
Because you have captureless lambdas, this should work:

const std::array<int(*)(), 3> x = {


[](){return 1;},
[](){return 2;},
[](){return 3;}
};

Note:  You cannot use decltype on the lambda, because you are not allowed to use a lambda expression in an unevaluated context.
--
 Nevin ":-)" Liber  <mailto:ne...@eviloverlord.com(847) 691-1404

Shachar Shemesh

unread,
Jun 24, 2015, 11:38:28 AM6/24/15
to std-pr...@isocpp.org
std::function is horrible. It allocates dynamically, causing it to be useless in many cases that would otherwise need it.

However, even had that not been the case, calling a lmbda is faster, for precisely the same reason your array won't work. When you call a lmbda, its address is being resolved by the linker, not at run time. This is achieved by giving each lambda its own type. If you check the sizeof of a captureless lambda, it will be 1 (mostly because C++ forbids types of size zero). It simply does not carry any useful information beyond its static type.

Shachar

David Krauss

unread,
Jun 24, 2015, 11:41:57 AM6/24/15
to std-pr...@isocpp.org, dgutson .
On 2015–06–24, at 11:03 PM, dgutson . <daniel...@gmail.com> wrote:

The reason that this cannot be done

is that each lambda has its own unique type?

Correct.

Of course this can be done with an array of std::function.

Since those lambdas are all captureless, it can also be done with an array of void(*)().

If you do have captures, but you’re interested in something lighter than std::function, ideally you could pass it a non-allocating allocator. Then it’s basically a function pointer plus space for three word-size captures. (More or less… but usually three.)

Unfortunately, this doesn’t currently work in GCC (doesn’t find the function constructor) nor Clang (tries to call non_allocator::allocate). It does work under my own implementation… likely to work in real life the long run, FWIW.

template< typename t = char >
struct non_allocator : std::allocator< t > {
    t * allocate( std::size_t ) = delete; // Don’t really allocate anything!

    template< typename u >
    non_allocator( non_allocator< u > ) {}
    template< typename u >
    struct rebind { typedef non_allocator< u > other; };
};

const std::array<std::function< int() >, 3> x = {
{ std::allocator_arg, 
non_allocator<>{}, [](){return 1;} }, // OK: small functions don’t need heap allocation
{ std::allocator_arg, non_allocator<>{}, [](){return 2;} },
{ std::allocator_arg, non_allocator<>{}, [](){return 3;} },
{ std::allocator_arg, non_allocator<>{}, [ cap = std::array< int, 20 >() ](){return 4;} } // Error: not small
};

Evgeny Panasyuk

unread,
Jun 24, 2015, 12:01:06 PM6/24/15
to std-pr...@isocpp.org
24.06.2015 18:03, dgutson .:
> The reason that this cannot be done
>
> const std::array<decltype([](){ return 0;}), 3> x = {
> [](){return 1;},
> [](){return 2;},
> [](){return 3;}
> };
>
> is that each lambda has its own unique type?

Yes, each lambda has its own type. As others said - in your case you can
just convert lambda to pointer.
In general case, when you cannot convert lambda to pointer, another
option is to use some kind of hetergogeneus container, like std::tuple
or boost::fusion::vector, for example:
http://coliru.stacked-crooked.com/a/5947c7c35194321f

double x = 0.5;
int y = 11;
char z = 'a';

auto closures = fusion::make_vector
(
[&]{ return x; },
[&]{ return y; },
[&]{ return z; }
);

fusion::for_each(closures, [](auto a)
{
cout << a() << endl;
});


--
Evgeny Panasyuk

Nevin Liber

unread,
Jun 24, 2015, 12:01:10 PM6/24/15
to std-pr...@isocpp.org
On 24 June 2015 at 10:38, Shachar Shemesh <sha...@lingnu.com> wrote:
std::function is horrible. It allocates dynamically, causing it to be useless in many cases that would otherwise need it.

Given that most std::function implementations implement the small object optimization, and a captureless lambda is about as small an object as you can have, can you elaborate on why you think dynamic allocation is an in the case (other than the size of the array might be a little bigger)?  (Note:  it could be if the held object had a throwing move constructor and the particular implementation of std::function had a noexcept move constructor, but captureless lambdas don't have throwing move constructors),

Shachar Shemesh

unread,
Jun 24, 2015, 2:01:03 PM6/24/15
to std-pr...@isocpp.org
On 24/06/15 18:41, David Krauss wrote:

On 2015–06–24, at 11:03 PM, dgutson . <daniel...@gmail.com> wrote:

The reason that this cannot be done

is that each lambda has its own unique type?

Correct.

Of course this can be done with an array of std::function.

Since those lambdas are all captureless, it can also be done with an array of void(*)().

If you do have captures, but you’re interested in something lighter than std::function, ideally you could pass it a non-allocating allocator. Then it’s basically a function pointer plus space for three word-size captures. (More or less… but usually three.)
Can you elaborate how you reached this number? Or is this just your estimation of the average?

For a lambda where the capture is composed only of variables passed by reference, one pointer ought to have been enough. This is the situation in, e.g., D (where a delegate is defined to be the size of two pointers, one of which is the pointer to the actual function). At least under gcc, this is, sadly, not the case: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53097

Shachar

Shachar Shemesh

unread,
Jun 24, 2015, 2:10:05 PM6/24/15
to std-pr...@isocpp.org
On 24/06/15 19:00, Nevin Liber wrote:

Given that most std::function implementations implement the small object optimization, and a captureless lambda is about as small an object as you can have, can you elaborate on why you think dynamic allocation is an in the case (other than the size of the array might be a little bigger)?
I don't know about "most". What I can do is relate my personal experience. While working on fakeroot-ng (http://fakeroot-ng.lingnu.com), I had to pass a function with context between two threads. I created an interface based on std::function. When it gave horrid performance, I ran a profiler to find out why. The reason was the dynamic allocation. See http://sourceforge.net/p/fakerootng/source/ci/efc254eccaa212fe9cb96133a9004e1ade9c638c/ for the commit that dumps the use of std::function.

Part of the blame for this might be gcc's implementation of by reference capture variables. Instead of saving just the pointer to the frame from which these variables can be found, gcc saves a reference to each and every such variable. See my reply to David to the open bug about it. This greatly increases the size of lambda types, and may be the reason that in place allocation doesn't kick in.

Either way, in practice, std::function is useless for low latency programming with gcc.


 (Note:  it could be if the held object had a throwing move constructor and the particular implementation of std::function had a noexcept move constructor, but captureless lambdas don't have throwing move constructors),
No, that's not it.

Shachar

Thiago Macieira

unread,
Jun 24, 2015, 7:26:48 PM6/24/15
to std-pr...@isocpp.org
On Wednesday 24 June 2015 21:00:57 Shachar Shemesh wrote:
> For a lambda where the capture is composed only of variables passed by
> reference, one pointer ought to have been enough.

Qualification required: variables in the same scope that are captured by
reference.

If those variables are in different scopes, then the it needs to capture the
different scopes.

> This is the situation
> in, e.g., D (where a delegate is defined to be the size of two pointers,
> one of which is the pointer to the actual function). At least under gcc,
> this is, sadly, not the case:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53097

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358

David Krauss

unread,
Jun 24, 2015, 7:59:46 PM6/24/15
to std-pr...@isocpp.org

On 2015–06–25, at 2:00 AM, Shachar Shemesh <sha...@lingnu.com> wrote:

 (More or less… but usually three.)
Can you elaborate how you reached this number? Or is this just your estimation of the average?

The is the amount of storage recommended in [func.wrap.func.con] §20.9.12.2.1/5: “Implementations are encouraged to avoid the use of dynamically allocated memory for small callable objects, for example, where f’s target is an object holding only a pointer or reference to an object and a member function pointer.” It is the amount of storage provided by GCC, Clang, and most third-party libraries. A PTMF is two words, and an object pointer is one, on typical ABIs.


On 2015–06–25, at 7:26 AM, Thiago Macieira <thi...@macieira.org> wrote:

Qualification required: variables in the same scope that are captured by 
reference.

If those variables are in different scopes, then the it needs to capture the 
different scopes.

One pointer should be enough, and it would usually be a pointer to the current stack frame. The only scopes from which you can capture are inside the current function, and the only eligible variables are of automatic storage duration.

I’ve raised this issue occasionally for years. Thanks, Sachar, for the bug. The response I recall is that the common ABI defines the layout of lambda functions. I’m not sure how seriously this is taken.

Richard Smith

unread,
Jun 24, 2015, 8:22:32 PM6/24/15
to std-pr...@isocpp.org
The Itanium C++ ABI does not do so (though it should, in the cases where the layout is visible across translation units). The problem is that it's extremely difficult to use a pointer to the stack frame or similar. For instance, it depends on a number of factors that are not visible until some stage in code generation, and must be decided during parsing / semantic analysis because it affects observable properties of the closure type, such as its size. It would also force not just the layout of the closure type, but the stack frame layout of the enclosing function, to be part of the ABI if it applied to lambdas in inline functions or function templates, which is a completely unreasonable thing for an ABI to require. It's also not clear that it's unambiguously a good thing in the remaining cases: it puts some pretty stringent constraints on the stack layout of the enclosing function, and very likely makes it uninlineable.

David Krauss

unread,
Jun 24, 2015, 11:26:53 PM6/24/15
to std-pr...@isocpp.org
On 2015–06–25, at 8:22 AM, Richard Smith <ric...@metafoo.co.uk> wrote:

The Itanium C++ ABI does not do so (though it should, in the cases where the layout is visible across translation units).

… and this implies that the lambda layout should not be sensitive to stack frame layout, because the inlined function body in TU #1 may call the un-inlined copy from TU #2 if it tries to recurse.

The problem is that it's extremely difficult to use a pointer to the stack frame or similar. For instance, it depends on a number of factors that are not visible until some stage in code generation, and must be decided during parsing / semantic analysis because it affects observable properties of the closure type, such as its size.

Huh? The point is to decouple the size of the closure from its reference captures.

But, yeah, copies of an unrolled loop (for example) could have different stack layouts.

It would also force not just the layout of the closure type, but the stack frame layout of the enclosing function, to be part of the ABI if it applied to lambdas in inline functions or function templates, which is a completely unreasonable thing for an ABI to require. It's also not clear that it's unambiguously a good thing in the remaining cases: it puts some pretty stringent constraints on the stack layout of the enclosing function, and very likely makes it uninlineable.

I see. New idea, then: implement a local, invisible struct in the scope of the outermost capture. Populate it with the reference captures, adding assignment operations as targeted variables’ lifetimes begin. The lambda points to the invisible struct. In the common case, the assignments can be hoisted out of loops or such, all back out to the scope with the struct.

The indirection should only add one memory access per lambda call, and locality is good so it should never have much negative performance impact. The performance benefit with std::function on the other hand is significant.

Richard Smith

unread,
Jun 24, 2015, 11:46:40 PM6/24/15
to std-pr...@isocpp.org
On Wed, Jun 24, 2015 at 8:26 PM, David Krauss <pot...@gmail.com> wrote:

On 2015–06–25, at 8:22 AM, Richard Smith <ric...@metafoo.co.uk> wrote:

The Itanium C++ ABI does not do so (though it should, in the cases where the layout is visible across translation units).

… and this implies that the lambda layout should not be sensitive to stack frame layout, because the inlined function body in TU #1 may call the un-inlined copy from TU #2 if it tries to recurse.

The problem is that it's extremely difficult to use a pointer to the stack frame or similar. For instance, it depends on a number of factors that are not visible until some stage in code generation, and must be decided during parsing / semantic analysis because it affects observable properties of the closure type, such as its size.

Huh? The point is to decouple the size of the closure from its reference captures.

Right, but it means the decision of whether to perform the optimization must be made immediately. It can't be deferred until we have enough information to know whether it's a good idea.
 
But, yeah, copies of an unrolled loop (for example) could have different stack layouts.

It would also force not just the layout of the closure type, but the stack frame layout of the enclosing function, to be part of the ABI if it applied to lambdas in inline functions or function templates, which is a completely unreasonable thing for an ABI to require. It's also not clear that it's unambiguously a good thing in the remaining cases: it puts some pretty stringent constraints on the stack layout of the enclosing function, and very likely makes it uninlineable.

I see. New idea, then: implement a local, invisible struct in the scope of the outermost capture. Populate it with the reference captures, adding assignment operations as targeted variables’ lifetimes begin. The lambda points to the invisible struct. In the common case, the assignments can be hoisted out of loops or such, all back out to the scope with the struct.

The indirection should only add one memory access per lambda call, and locality is good so it should never have much negative performance impact. The performance benefit with std::function on the other hand is significant.

That's an interesting approach. The only obvious costs are the extra indirection (which, as you say, should be cheap) and extra stack usage (but even there, we win if there's more than one such lambda object alive at once for any particular function).

Shachar Shemesh

unread,
Jun 25, 2015, 12:57:12 AM6/25/15
to std-pr...@isocpp.org
On 25/06/15 02:26, Thiago Macieira wrote:
On Wednesday 24 June 2015 21:00:57 Shachar Shemesh wrote:
For a lambda where the capture is composed only of variables passed by
reference, one pointer ought to have been enough. 
Qualification required: variables in the same scope that are captured by 
reference.

If those variables are in different scopes, then the it needs to capture the 
different scopes.
The lambda is defined while inside one scope. Everything accessible by the lmbda's definition is accessible from within this one scope, be it global, a member of the enclosing class (in which case "this" is accessible from within the scope) or an automatic variable. I see no reason for a lambda's capture that gets arguments by reference to be more than one pointer in length (unless there is an ABI where the scope itself is more, in which case that length).

Shachar

Shachar Shemesh

unread,
Jun 25, 2015, 1:02:34 AM6/25/15
to std-pr...@isocpp.org
On 25/06/15 03:22, Richard Smith wrote:
It would also force not just the layout of the closure type, but the stack frame layout of the enclosing function
That's true whether you concentrate the size of the lambda to a single (whatever) pointer or not.

If you want to employ loop unrolling and duplicate a variable, it doesn't matter if you are storing a single pointer to the frame or a pointer to the variable in question. Either way, you cannot duplicate that variable if there is an outside reference to it.

Also, bear in mind that a lambda's capture cannot reference variables not yet defined in the function inside which it is defined. As such, I doubt your observation makes a difference in this particular case.

I might be missing something. If that's the case, please provide an example to illustrate your point.

Shachar

Richard Smith

unread,
Jun 25, 2015, 6:20:54 PM6/25/15
to std-pr...@isocpp.org
On Wed, Jun 24, 2015 at 10:02 PM, Shachar Shemesh <sha...@lingnu.com> wrote:
On 25/06/15 03:22, Richard Smith wrote:
It would also force not just the layout of the closure type, but the stack frame layout of the enclosing function
That's true whether you concentrate the size of the lambda to a single (whatever) pointer or not.

Read the next part of the sentence that you snipped:

"to be part of the ABI". If the closure type only holds pointers to locals, the lambda call operator doesn't need to know anything about the frame layout of the enclosing function. If it just holds a pointer to the enclosing stack frame, it needs to know where within that frame the local variables live.

If you want to employ loop unrolling and duplicate a variable, it doesn't matter if you are storing a single pointer to the frame or a pointer to the variable in question. Either way, you cannot duplicate that variable if there is an outside reference to it.

Also, bear in mind that a lambda's capture cannot reference variables not yet defined in the function inside which it is defined. As such, I doubt your observation makes a difference in this particular case.

I might be missing something. If that's the case, please provide an example to illustrate your point.

Shachar

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

Thiago Macieira

unread,
Jun 26, 2015, 11:52:57 AM6/26/15
to std-pr...@isocpp.org
It's easy to create a counter-example. Just make the lambda leave the scope it
is created in:

auto create_lambda(int &i, int &j)
{
return [&]{ ++i; ++j; return i + j; }
}

The above clearly requires two word-sized references to be stored in the
lambda.

Shachar Shemesh

unread,
Jun 26, 2015, 3:01:56 PM6/26/15
to std-pr...@isocpp.org
On 26/06/15 18:52, Thiago Macieira wrote:
It's easy to create a counter-example. Just make the lambda leave the scope it 
is created in:

auto create_lambda(int &i, int &j)
{
	return [&]{ ++i; ++j; return i + j; }
}

The above clearly requires two word-sized references to be stored in the 
lambda.

I stand corrected. There is also the case where the lambda is only accessing members of the class in which method it was defined, where you wouldn't want to limit the lambda's scope to the method in which it was defined.

Please note, however, that the compiler can easily detect those cases, and act accordingly. In my specific case, it could replace the scope's reference with a copy of "this". In your case it does, indeed, have no choice but to store two references inside the capture.

Either way, I've realized that this behavior bothers me less than the fact I don't have a "pointer to lambda" construct. std::function is too versatile, which costs it in performance. I'll open a separate thread for that, however.

Shachar

Shachar Shemesh

unread,
Jun 26, 2015, 3:05:59 PM6/26/15
to std-pr...@isocpp.org
On 26/06/15 01:20, Richard Smith wrote:
On Wed, Jun 24, 2015 at 10:02 PM, Shachar Shemesh <sha...@lingnu.com> wrote:
On 25/06/15 03:22, Richard Smith wrote:
It would also force not just the layout of the closure type, but the stack frame layout of the enclosing function
That's true whether you concentrate the size of the lambda to a single (whatever) pointer or not.

Read the next part of the sentence that you snipped:

"to be part of the ABI". If the closure type only holds pointers to locals, the lambda call operator doesn't need to know anything about the frame layout of the enclosing function. If it just holds a pointer to the enclosing stack frame, it needs to know where within that frame the local variables live.
While true, I would love to know why it is important. The compiler needs to know the frame layout in order to create the lambda object. I fail to see how storing that information inside the capture initialization code, rather than inside the generated body code, provides any advantage.

Also, I take issue with the ABI even bothering to define this structure. The only "interface" to define here is how you go about calling a lambda, passing it its capture. The capture layout is defined in the same place (same compilation unit, same function, same everything) as the code that uses that layout, and it thus doesn't need to be subject to any standardization.

Shachar
Reply all
Reply to author
Forward
0 new messages