This post is meant to be a continuation of the old subthread that
currently is named "C++0x lambda expressions".
I have gathered together my own thoughts - what was previously flying in
the air and sometimes lacked consistency.
It helped me to review my own points.
It is available here:
http://www.msobczak.com/prog/articles/lambdacpp.html
Any comments are welcome.
--
Maciej Sobczak : http://www.msobczak.com/
Programming : http://www.msobczak.com/prog/
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
...some type... a={lambda x,y: return (x*y-1)*c;};
you can define a class C that inherits from std::binary_function with
types of x, y and needed result and keeps c as its attribute. Then you
can write
C a = C(c);
Michael
Maciej Sobczak wrote:
> Hi,
>
> This post is meant to be a continuation of the old subthread that
> currently is named "C++0x lambda expressions".
> I have gathered together my own thoughts - what was previously flying
in
> the air and sometimes lacked consistency.
> It helped me to review my own points.
>
> It is available here:
>
> http://www.msobczak.com/prog/articles/lambdacpp.html
>
> Any comments are welcome.
I've read the article, and have been following the thread this came from,
and I, too, tend to prefer a syntax like (from the above article):
std::sort(b, e, bool (int a, int b) { return a < b; });
or, if types are unknown:
std::sort(b, e, template<class T> bool (const T &a, const T &b) { return a <
b; })
(However, as Maciej have pointed out, you might quite often have enough
context to not need type deduction here.)
rather than something like:
std::sort(b, e, _1 < _2); // Boost.Lambda version
std::sort(b, e, {a, b: a < b }) // Syntax deduced from one of Dave A's
postings
I'll later in this posting come to how the first versions may be transformed
by the compiler, but sufficient to say now is that they will both enable
inline calls to the lambda function.
My reasons for preferring the former forms are as follows (some of these
arguments are also used in the article):
- They are in a way a "natural" extension to current syntax. A lambda
function is essentially an unnamed function, and how better to represent
this in C++ than... a function (template) without a name? Even people not
knowing about the feature might be able to guess what it does, if they are
familiar with C++.
- It is explicit about the type of the parameters and the return value (if
any). Some might prefer it not to be so, preferring type inference, instead.
I also find type inference, such as how it's done in Haskell, to be rather
elegant, but C++ isn't Haskell, and I don't know if we should make it so.
The type inference in itself may be handy, but perhaps not the syntax
(however, I understand that's not a big issue).
Making type inference work for C++, in general (all functions, not just
unnamed ones), may be rather difficult, due to the separate compilation
model. What will the function signature be? And if we allow type inference
for lambda functions, only, we're making a special case, with lambda
functions having "powers" not available to ordinary functions.
Function templates is one way C++ supports type inference, and it's
perfectly applicable here. This also enables full control over parameter
types, as pointed out in the article.
- It supports meaningful names for the parameters (also like the last
example, but unlike the BLL one).
- The point in the article about the ease one may move between lamdba
version and normal function, for example if one find the lambda version
become too complex, or perhaps useful in other contexts.
- As it doesn't require additional type inference, it's essentially just a
text transformation, so it should be relatively easy to implement.
However, one of my main reasons for posting, is to point out that, when it
comes to the implementation of this, I think it would be better to perform a
transformation/rewrite like Dave A. has suggested, rather than a "naïve"
rewrite from the syntax, as used in the article. That is, one may rewrite:
std::sort(b, e, bool (int a, int b) { return a < b; });
as:
struct __compiler_generated
{
bool operator(int a, int b) const // Possibly static
{ return a < b; }
};
std::sort(b, e, __compiler_generated());
rather than:
bool __compiler_generated(int a, int b)
{ return a < b; }
std::sort(b, e, __compiler_generated);
The first version enables the call to the lambda function to be inlined,
while the second one, given that a pointer to function is passed to
std::sort, makes it much less likely that the call will be inlined. I
suggest you (Maciej) update the paper with the first rewrite rule, if you
agree.
Another thing, I feel that using "lambda" for everything unnamed (as is done
in the article) may be confusing. "lambda" comes from functional programming
(well, from lambda calculus, originally), where it's used to mean unnamed
_functions_. Thus, I think we should restrict the use of "lambda" to that,
and instead use unnamed classes and objects, when we talk about that, rather
than "lambda classes" or "lambda objects", or some such.
Regards,
Terje
So far I don't find it exciting because the lambdas are not closures. They
don't have access to variables from their enclosing scope. That to me is
more important than being able to define them at their point of use.
-- Dave Harris, Nottingham, UK
Terje Slettebř wrote:
>>http://www.msobczak.com/prog/articles/lambdacpp.html
> However, one of my main reasons for posting, is to point out that, when it
> comes to the implementation of this, I think it would be better to perform a
> transformation/rewrite like Dave A. has suggested, rather than a "naďve"
> rewrite from the syntax, as used in the article. That is, one may rewrite:
>
> std::sort(b, e, bool (int a, int b) { return a < b; });
>
> as:
>
> struct __compiler_generated
> {
> bool operator(int a, int b) const // Possibly static
> { return a < b; }
> };
>
> std::sort(b, e, __compiler_generated());
>
> rather than:
>
> bool __compiler_generated(int a, int b)
> { return a < b; }
>
> std::sort(b, e, __compiler_generated);
>
> The first version enables the call to the lambda function to be inlined,
> while the second one, given that a pointer to function is passed to
> std::sort, makes it much less likely that the call will be inlined. I
> suggest you (Maciej) update the paper with the first rewrite rule, if you
> agree.
I acknowledge your point, but there are resons to keep it unchanged.
Not the whole world is functor-oriented and I would find it extremely
frustrating if the lambda could be used with std::sort, but not with
std::qsort. Or with std::find but not with std::bsearch. And so on.
The point is that there are tons of C libraries (or C++ libraries that
use pointer to functions instead of functors), just waiting to benefit
from lambda. Ruling them out would be a disaster.
On the other hand, some variants are of course possible, like:
std::sort(b, e, inline bool (int a, int b) { return a < b; });
^^^^^^
Smart enough compiler would discover that std::sort can accept functors
and rewrite the whole thing in this direction, potentially benefiting
from inlining. On the other hand, the following is still possible,
because the result of "function lambda" is normally a pointer to
function, not a function object:
atexit(void () { puts("Good bye!"); });
And of course, the "class lambda" always allows you to write true
functor explicitly if you are really sure you need it - the distinction
between "function lambda" and "class lambda" was made exactly to give
you this control.
So:
// 1.
std::sort(b, e, bool (int a, int b) { return a < b; });
// 2.
std::sort(b, e,
struct { bool operator()(int a, int b) const { return a < b; } }()
);
// 3.
std::sort(b, e, inline bool (int a, int b) { return a < b; });
In 1. you ask for pointer to function (not likely inlined, but
appropriate for C libraries), in 2. you ask for function object
(appropriate for inlining) and in 3. you ask for the "smart mode"
(rewrite to functor and if its ill-formed, silently fall back to pointer
to function).
I will update the article with this.
> Another thing, I feel that using "lambda" for everything unnamed (as is done
> in the article) may be confusing.
Yes, I realize it. The word "anonymous" would be much more appropriate
in the broader context.
Regards,
--
Maciej Sobczak : http://www.msobczak.com/
Programming : http://www.msobczak.com/prog/
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
It seems that you mistook lambda expressions for on-the-fly
compilation.
> >From the other hand, you can easily do what you want
> if you define a
> functor object with all needed additional parameters as
> its attributes.
> Instead of writing
>
> ....some type... a={lambda x,y: return (x*y-1)*c;};
>
> you can define a class C that inherits from
> std::binary_function with
> types of x, y and needed result and keeps c as its
> attribute. Then you can write
>
> C a = C(c);
Well, so why not let a compiler do it for us? OP's paper shows this kind
of transformations. Have you read it?
BR
Grzegorz
--
Free C++ frontend library: http://opencxx.sourceforge.net
China from the inside: http://www.staryhutong.com
Myself: http://www.dziupla.net/gj/cv
Dave Harris wrote:
> So far I don't find it exciting because the lambdas are not closures. They
> don't have access to variables from their enclosing scope. That to me is
> more important than being able to define them at their point of use.
The opportunities that come with such access are great, but at the same
time it may be a place where Bad Things will achieve their peak density.
That's why I'm sceptical about this concept.
I think it may be instructive to ask for comments where there is an
already existing experience with it. What is the opinion of those who
actually used it in other languages?
Does it help, in total?
Is it easy to write/read/modify/maintain?
--
Maciej Sobczak : http://www.msobczak.com/
Programming : http://www.msobczak.com/prog/
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
Agreed.
> The opportunities that come with such access are great, but at the same
> time it may be a place where Bad Things will achieve their peak density.
> That's why I'm sceptical about this concept.
>
> I think it may be instructive to ask for comments where there is an
> already existing experience with it. What is the opinion of those who
> actually used it in other languages?
You should read Guy L. Steele's two papers on function closers in a
lexically scoped lisp from the 70s.
ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-353.pdf
ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-379.pdf
http://library.readscheme.org/page1.html
Closures are incredibly useful and powerful. Furthermore, they work
amazingly naturally in a lexically scoped language like C++.
> Does it help, in total?
I'd think that in the context of the STL containers, lambda closures would
be seriouosly useful, since they allow you to write functions that refer to
the current state (in the closed free variables) while iterating over one or
more containers.
> Is it easy to write/read/modify/maintain?
I'd prefer that lambda expressions be flagged by a keyword, e.g.
int lambda (int a, int b) { <statements> }
but with some creative indenting it works fine. (The lambda could help
creative text editors format nicely as you type.)
David J. Littleboy
Tokyo, Japan
The idea that stuff should be in the language if it could be misused hasn't
really
been one of C++'s major design goals, has it? :-)
| I think it may be instructive to ask for comments where there is an
| already existing experience with it. What is the opinion of those who
| actually used it in other languages?
| Does it help, in total?
| Is it easy to write/read/modify/maintain?
It's really powerful; with it we have (anonymous) local function support.
br
-Thorsten
Yes. After I had posted this, I also read your posting about possible code
bloat from multiple instantiations, so there are arguments either way.
However, a sort with an indirect call per comparison (rather than a simple
comparison) would hardly be very efficient (compared to what it could be).
OTOH, if most work is done by other things than the comparision/indirect
call, then it may be different..
> And of course, the "class lambda" always allows you to write true
> functor explicitly if you are really sure you need it - the distinction
> between "function lambda" and "class lambda" was made exactly to give
> you this control.
> So:
>
> // 1.
> std::sort(b, e, bool (int a, int b) { return a < b; });
> // 2.
> std::sort(b, e,
> struct { bool operator()(int a, int b) const { return a < b; } }()
> );
> // 3.
> std::sort(b, e, inline bool (int a, int b) { return a < b; });
>
> In 1. you ask for pointer to function (not likely inlined, but
> appropriate for C libraries), in 2. you ask for function object
> (appropriate for inlining) and in 3. you ask for the "smart mode"
> (rewrite to functor and if its ill-formed, silently fall back to pointer
> to function).
2 tends to be a little too wordy, even for my taste (i.e. I tend to find 1
(or 3) acceptable as it looks like C++), and I think it shouldn't be
necessary to use "inline" to enable "smart mode".
However, depending on how you specify the grammar, you may need something
like "inline" to disambiguate, anyway, or the compiler may need to do some
lookahead, to understand the meaning of the code. To illustrate:
f( int() { ... });
"int()" is a valid expression, today, so is this a function, f, being called
with a lambda function, or is it being called with int(), followed by some
(surprising?) braces?
Regarding the use of functions or objects, again. If you were to allow bound
variables from the surrounding context in your lambda function, then
function objects provide handy closures for them. And, no, I don't think you
should have to write this code, yourself (as 2 hints to). :) Contrived
example:
void f(int n1, int n2)
{
std::sort(b, e, bool (int a, int b) { return a*n1 < b*n2; } )
}
Possible translation:
struct __compiler_generated
{
__compiler_generated(int n1, int n2) : n1(n1), n2(n2) {}
bool operator()(int a, int b) { return a*n1 < b*n2; }
int n1;
int n2;
};
void f(int n1, int n2)
{
std::sort(b, e, __compiler_generated(n1, n2));
}
Some another arguments for function objects, compared to functions:
- Function objects is the way things are typically done today, with things
like std::plus, bind1st, etc. Lambda would automate the creation of such
function objects.
- They are first-class citizens in C++, being values (something functions
are not, but function pointers are). You can't pass or return functions from
other functions, but you can do so with function objects. The same goes for
partial application (supplying some parameters, to yield a new function),
and functional composition.
Regards,
Terje
In Ada, it is common to use (named) local procedures and functions
as parameters of generic instantiations, as opposed to the function
object approach used by C++. The local routine just keeps its state
in the local variables of its containing scope. Everyone there likes
doing it that way.
I think you're wrong about this. If I want to e.g. sort "person" classes on
the lastname, then with lambda I may do (using the syntax in the article):
std::sort(b, e, bool(const person &a, const person &b) { return a.lastname <
b.lastname; } )
How can you argue that just because the function is written inline, the
design is done after the code is written?
In any case, evolutionary design (before, during, or after the code is
written) is gaining momentum as a successful development technique (agile
development), rather than the "traditional" way of trying to predict the
whole design from the start, and often getting it wrong, making it either
too complex, or not flexible enough in the right places.
Anyway, that's really a different discussion, and unless you can present a
convincing argument for exactly how lambda functions would prevent up-front
design, I think we should drop it.
> One of the problems with lambda expressions in C++ is that C++ is a
> relatively strongly typed language. Expressions used in C++ have to
> belong to/inherit from/have transformation to some type. If you invent
> a new type, you would have nothing to do with it outside of its
> environment.
I don't understand what you mean, here. Even with type inference, the type
will still be a normal type in the C++ type system; you just don't have to
specify it (just like function templates).
> An additional difficulty is that a natural implementation will be
> self-modifying code which doesn't give you good security (viruses,
> testability problems).
I think you've misunderstood the kind of lambda being discussed here. They
are simply unnamed functions, meaning that you can have the point of
definition and the point of use close to each other, which tends to improve
cohesion and readability, IOW good software design. The code won't be
"self-modifying" at runtime, nor at compile-time.
> >From the other hand, you can easily do what you want if you define a
> functor object with all needed additional parameters as its attributes.
> Instead of writing
>
> ...some type... a={lambda x,y: return (x*y-1)*c;};
>
> you can define a class C that inherits from std::binary_function with
> types of x, y and needed result and keeps c as its attribute. Then you
> can write
>
> C a = C(c);
Sure, but that means you have to write a class for every operation you want
to do, and you have to put it in the global namespace, possibly far from its
use. Apart from the above mentioned advantages of lambda, it also helps to
reduce the "syntactic noise" of defining such functions.
Regards,
Terje
I agree it's good to be able to specify a function rather than a struct.
However, although the syntax for function falls naturally out of the
existing grammar, it still seems a bit unobvious to me. Also the use of
"inline" in (3) is not obvious either, and it's making the compiler be too
clever (so clever I'm not sure humans will always be able to follow it),
and perhaps the backtracking will lead to implementation problems.
I'd go back to the suggestion from the thread of using "inline" as the
function name. And use "struct" if you want a functor instead.
// (1) Function pointer.
std::sort(b, e, bool inline(int a, int b) { return a < b; });
// (2) Struct.
std::sort(b, e, struct { bool operator()(int a, int b) const {
return a < b; } }() );
// (3) Abbreviated struct.
std::sort(b, e, bool struct(int a, int b) { return a < b; });
Here (3) has the same effect as (2) but with fewer tokens. Although my (1)
is a token longer than yours, I think the extra clarity is worth it.
Incidently, I wasn't entirely happy with the trailing () in (2), but if
they can be omitted from (3) that would be fine. Then we would allow:
return new struct { bool operator()(int a, int b) const {
return a < b; } };
but not:
return new bool struct(int a, int b) const { return a < b; };
because the first has a type where the second has an object. This seems
natural to me, and it might satisfy some of James Kanze's concerns.
-- Dave Harris, Nottingham, UK
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
> actually used it in other languages?
> Does it help, in total?
> Is it easy to write/read/modify/maintain?
I have a little experience with nested functions in Pascal.
Not quite the same thing, but similar intent.
The equivalent in C++ would be:
. void foo(iter b, iter e) {
. bool lessint(int a, int b) { return a < b; }
. std::sort(b, e, lessint);
. }
To be honest, I don't have a lot of experience with nested functions.
I do know that overuse of this feature can be just as bad as overuse
of any other feature, i.e. in C++, overuse of overloaded operators
or macros. However, for simple cases I found nested functions allowed
me to express myself elegantly.
I know that nested functions aren't quite the same thing as lambdas,
but I think that some of the problems solved with lambdas are also
solved with nested functions. I'd like to see this concept explored
by someone that has more experience with it than I do.
Yes. Very much so. At least in latently typed languages, whether
dynamically type-checked like Smalltalk and Lisp, or statically
type-checked like Haskell.
In languages with mandatory manifest type declarations, eg Java, less so,
because the verbosity of the type declarations get in the way. However,
that's an issue with anonymous functions generally, not with closures. Of
course they can be abused, as can any feature.
I don't know of any language that has closures which doesn't also have
garbage collection, but I think that's because languages without GC are
rare. I suspect that practical experience of this combination will have to
be gained in C++ itself.
It might be useful, or at least interesting, to know the relative
importance of the 3 kinds of anonymous function:
(a) No access to enclosing scope.
(b) Read-only access to enclosing scope.
(c) Write access to enclosing scope.
Java actually forbids (c), which is occasionally annoying.
Do you have any reason to think closures will be harder to maintain than
equivalent code? At the moment, if you use an anonymous function you may
find a small change to the spec forces you to rewrite your routine to not
use one, precisely because they cannot access anything from the enclosing
scope.
For example:
void print_to_cout( iterator first, iterator last ) {
for_each( first, last, void struct( int x ) {
cout << x << "\n";
} );
}
if we want the output stream to be an argument, this seems natural:
void print_to( iterator first, iterator last, ostream &os ) {
for_each( first, last, void struct( int x ) {
os << x << "\n"; // Not allowed?
) };
}
and if it is not supported, then we have to either use binders, or
hand-craft a struct with an ostream reference and a constructor to
initialise it. Frankly I'd just as soon use a for-loop. But the above, I
think, is quite readable.
-- Dave Harris, Nottingham, UK
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
> > It is available here:
> > http://www.msobczak.com/prog/articles/lambdacpp.html
> > Any comments are welcome.
> So far I don't find it exciting because the lambdas are not
> closures. They don't have access to variables from their
> enclosing scope. That to me is more important than being able
> to define them at their point of use.
I've not yet read the paper, but as I remember the discussions,
I thought that there was a concenssus concerning the ability to
access local variables. The discussion drew out for a long time
concerning syntax, and the examples didn't use this ability, nor
did they require that the lambda expression end up a functional
class, but I thought that this was just for reasons of
simplification, and because there wasn't any disagreement here.
And BTW: I'd avoid the word closure for this. I could easily be
wrong, but when I head the word closure, I imagine some sort of
extension of lifetime as well -- something that would, for
example, allow me to write a class using local variables, and
return it from the function, with the guarantee that the local
variables would continue to exist until the class was through
with them. While this could possibly be a useful feature too,
it definitly goes well beyond anything which was discussed in
the previous thread.
--
James Kanze GABI Software http://www.gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
If the code is templated, like most in C++'s standard library, I don't
think there is any difference between using a functor or an inline function.
qsort() is slower because it can't inline the function pointer call,
std::sort() can.
| Regarding the use of functions or objects, again. If you were to allow bound
| variables from the surrounding context in your lambda function, then
| function objects provide handy closures for them. And, no, I don't think you
| should have to write this code, yourself (as 2 hints to). :) Contrived
| example:
|
| void f(int n1, int n2)
| {
| std::sort(b, e, bool (int a, int b) { return a*n1 < b*n2; } )
| }
|
| Possible translation:
|
| struct __compiler_generated
| {
| __compiler_generated(int n1, int n2) : n1(n1), n2(n2) {}
|
| bool operator()(int a, int b) { return a*n1 < b*n2; }
|
| int n1;
| int n2;
| };
These two variable should be (non-const) references by default to support
local
functions acting on outer-scope function parameters.
Then we should also.think of how we might do a copy, if we want. Perhaps like
void f(int n1, int n2)
{
// copy n1, n2
std::sort(b, e, bool (int a, int b) { int n1 = ::n1, n2 = ::n2; return a*n1
< b*n2; } )
}
Similarly, we can make the parameters const references this way.
br
-Thorsten
| Possible translation:
||
|| struct __compiler_generated
|| {
|| __compiler_generated(int n1, int n2) : n1(n1), n2(n2) {}
||
|| bool operator()(int a, int b) { return a*n1 < b*n2; }
||
|| int n1;
|| int n2;
|| };
|
| These two variable should be (non-const) references by default to support
| local
| functions acting on outer-scope function parameters.
|
| Then we should also.think of how we might do a copy, if we want. Perhaps
like
|
| void f(int n1, int n2)
| {
| // copy n1, n2
| std::sort(b, e, bool (int a, int b) { int n1 = ::n1, n2 = ::n2; return
a*n1
| < b*n2; } )
| }
|
| Similarly, we can make the parameters const references this way.
Hm..we should probably be able to say
std::sort( b, e, bool( int a, int b ) const { .... });
to make the all variable const references.
> And BTW: I'd avoid the word closure for this. I could easily be
> wrong, but when I head the word closure, I imagine some sort of
> extension of lifetime as well -- something that would, for
> example, allow me to write a class using local variables, and
> return it from the function, with the guarantee that the local
> variables would continue to exist until the class was through
> with them.
I've seen "upward closure" used for the case where there is no
lifetime extension and "full closure" for the case where there is
lifetime extension. Functional langages usually have full closures
and procedure langages with local function usually have upward
closures (Pascal has it, Ada currently don't but it is available in at
least one compiler -- GNAT -- as an extension and I think there is a
proposal to add it in Ada 2005 in a saver way than GNAT).
I'm strongly in favor of upward closures. In my Ada days I remember
ending up using GNAT extension because it was too painfull to work
around the unavailability of upward closures (it was the second item
on my wanted changes in Ada; btw AFAIK there are propositions for all
those items for next Ada revision).
Yours,
--
Jean-Marc
Some nits: The term "latently typed" refers to languages where values
carry around a type tag at run-time. Type errors like trying to add an
integer to a string are only detected and signalled during the execution
of a program. "Latently typed" is just another way of saying
"dynamically typed." Scheme and Common Lisp are latently-typed languages
(although Common Lisp allows for optional type declarations). Scripting
languages tend to be latently-typed.
In contrast, the term "statically typed" refers to languages where
expressions are typed at compile time. Some statically-typed langauges
require (manifest) type declarations, and some infer the types directly
from the code without the need for type declarations. Miranda, Haskell,
and SML are statically-typed langauges.
(The terms "strong" and "weak" typing seem to refer to how hard/easy it
is to circumvent the type system, or to the degree to which the type
information exists in the first place.)
The issue of support for (real) lambda (by way of closures as you bring
up) is orthogonal to the issue of types.
(Java and C++ are more accurately described as being a mixture of static
and dynamic typing in that given the ability to subtype, an object may
carry around a tag at run-time specifying to which of a set of subtypes
the object really belongs.)
> I don't know of any language that has closures which doesn't also have
> garbage collection, but I think that's because languages without GC are
> rare. I suspect that practical experience of this combination will have to
> be gained in C++ itself. [...]
Languages with (real) lambda universally support GC because you can't do
(real) closures without it. Lambda is a good thing. Static typing is
also a good thing, but it takes more work to do it well, which is why
scripting languages tend to punt on the issue and just go with latent
typing.
(This is not to say that I think adding (real) lambda to C++ is a good
idea--which is not to say that I think adding some of the things being
described in this thread to C++ is a bad idea--as long as it's not
called lambda.)
-thant
--
"Now, the temptation is going to be, by well-meaning people such as
yourself and others here, as we run up to the issue, to get me to
negotiate with myself in public," -- George W. Bush, December 20, 2004
[...]
> http://www.msobczak.com/prog/articles/lambdacpp.html
>
> Any comments are welcome.
Interesting stuff, but don't call it "lambda". It's not.
-thant
--
"Now, the temptation is going to be, by well-meaning people such as
yourself and others here, as we run up to the issue, to get me to
negotiate with myself in public," -- George W. Bush, December 20, 2004
> I'm strongly in favor of upward closures. In my Ada days I remember
> ending up using GNAT extension because it was too painfull to work
> around the unavailability of upward closures (it was the second item
> on my wanted changes in Ada; btw AFAIK there are propositions for all
> those items for next Ada revision).
I recall a fairly long discussion about closures in the standards committee
quite a number of years ago. The fundamental problem, which no one could
figure out how to solve, was howto avoid an incompatible change to the
layout of function pointers.
The trouble is that if you allow local functions, then a function pointer
effectively becomes two addresses: the address of the code and the address
of the corresponding activation record. Which means that such a feature is
difficult to introduce without forcing compiler vendors to break link
compatibility with previous versions.
One possibility would be to follow Modula-3's lead and allow pointers only
to global functions. However, that would make it impossible to pass local
functions to standard-library algorithms, a restriction that I suspect most
would find intolerable.
I suggest we support both function lambdas and object lambdas, and only
allow object lambdas to use variables from their enclosing scope.
void print_cout( int *first, int *last ) {
for_each( first, last, void inline( int x ) {
cout << x;
} );
}
void print_stream( int *first, int *last, ostream &os ) {
for_each( first, last, void struct( int x ) {
os << x;
} );
}
The first lambda would generate an unnamed function; the argument to
for_each is a pointer-to-function with all that implies. It cannot use
local variables for the reason you describe.
The second lambda would generate a struct, and the argument to for_each is
an instance of that struct. The struct would have a reference to the
stream, a constructor which initialised it, and an operator().
I don't see any implementation problems. It does mean that the two kinds
of lambdas can do different things, but to me that seems natural. They
*are* different.
Incidently, the above example demonstrates why I think closures will make
maintainence easier rather than harder. Without them, changing
print_cout() to take an ostream argument would require a complete rewrite.
-- Dave Harris, Nottingham, UK
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
No, it doesn't. That's "dynamically typed", as opposed to "statically
typed". "Latently typed" means that the types are not stated explicitly in
the program source. The opposite is "manifestly typed".
Haskell is an example of a latently typed language which can be statically
type-checked. Instead of manifest type declarations it uses type inference
from the known types of literals. (You can add your own type declarations
as well, either for clarity or to correct when the compiler infers
wrongly, and the compiler will infer from those too.) I don't know of a
language which combines dynamic type checking with manifestly declared
types, but it is logically possible and I suspect would be a very good
idea. Interface-based programming, such as COM and some styles of Java,
comes close.
C++ is mostly manifestly typed. Some people want to add type inference, eg
in the form "auto i = c.begin()" in which the static type of i is not
explicit but must be deduced. Then C++ would support some latent typing.
This would not affect whether it was statically type checked. Latent
typing and static type checking are a powerful combination.
I didn't invent this terminology; here are a couple of web pages which use
it:
http://osteele.com/archives/2003/08/test-versus-type
http://onestepback.org/articles/usingruby/strong.html
Many other references mix them up. That's usually because they don't
consider all the possibilities; they conflate "latent" with "dynamic"
which may have confused you. Some people go on to use "inferred types" for
latent, but that could mean Fortran's feature of deducing a variable's
type from its name: "i" is always integer, "r" is always real etc. It
seems to me that all 4 concepts are important and need names, and the
names I use here are the clearest.
> The issue of support for (real) lambda (by way of closures as you bring
> up) is orthogonal to the issue of types.
It is orthogonal to the issue of type-checking, but not orthogonal to the
issue of whether types need to be declared. This is because the
declarations make the lambda more verbose, and an aim of lambda is to be
concise.
In Java, for example, exception specifications are mandatory. So a Java
lambda may need to declare its return type, the types of all its
arguments, and the types of all the exceptions it can throw. That can add
up to so many names that it is clearer to separate out the function and
not use lambda at all.
-- Dave Harris, Nottingham, UK
| The trouble is that if you allow local functions, then a function pointer
| effectively becomes two addresses: the address of the code and the address
| of the corresponding activation record. Which means that such a feature is
| difficult to introduce without forcing compiler vendors to break link
| compatibility with previous versions.
why is that a problem? Or maybe, why is that
a new problem (isn't it normal for compilers to do that)?
br
-Thorsten
> "Jean-Marc Bourguet" <j...@bourguet.org> wrote in message
> news:pxbekgp...@news.bourguet.org...
>
> > I'm strongly in favor of upward closures. In my Ada days I remember
> > ending up using GNAT extension because it was too painfull to work
> > around the unavailability of upward closures (it was the second item
> > on my wanted changes in Ada; btw AFAIK there are propositions for all
> > those items for next Ada revision).
>
> I recall a fairly long discussion about closures in the standards
> committee quite a number of years ago. The fundamental problem,
> which no one could figure out how to solve, was howto avoid an
> incompatible change to the layout of function pointers.
GCC (gnat is the Ada part of GCC like g++ is the C++ part, the backend
is common) use trampoline for that (and BTW local function is
available in C as a GCC extension). It generates code on the stack
(leading to problems when someone makes the stack non executable, but
if the stack is the most convenient place to put it, it is not the
only one -- I think GCC is now able to handle that correctly but Gaby
could perhaps comment) which encapsulate the wide pointer and pass a
normal pointer to the generated code.
> One possibility would be to follow Modula-3's lead and allow
> pointers only to global functions. However, that would make it
> impossible to pass local functions to standard-library algorithms, a
> restriction that I suspect most would find intolerable.
That was also the Ada solution and as I wrote above, that lead me to
use the GNAT extension... when you have local functions (named or
unnamed), not beeing able to pass them is painfull. I seem to
remember that using generic (template) there was an possibility to
work around the restriction in Ada. I guess that the Ada 200X way of
making this save (the problem with wide pointer -- encapsulated or not
-- is that their livetime may exceed the one of the encapsulated
activation record rendering them not save, exactly the same problem
that can be the result of passing a pointer to a local variable) is to
add Pascal like function parameters whose address can't be taken and
saved (so ensuring that their livetime can never exceed the one of the
passed activation record).
Yours,
--
Jean-Marc
> "Andrew Koenig" <a...@acm.org> wrote in message
> news:bQzFd.8758$w62....@bgtnsc05-news.ops.worldnet.att.net...
>
> | The trouble is that if you allow local functions, then a function
> | pointer effectively becomes two addresses: the address of the code
> | and the address of the corresponding activation record. Which
> | means that such a feature is difficult to introduce without
> | forcing compiler vendors to break link compatibility with previous
> | versions.
>
> why is that a problem? Or maybe, why is that
> a new problem (isn't it normal for compilers to do that)?
Most compiler vendors (the major exception is gcc and even them are
getting better on this point) tend to be carefull not to introduce
gratious incompatibilites with previous version. Sun for example
still offer the possibility to get binary compatibility with its
pre-standard compilers. That's for C++. For C, the binary
compatibility is the common behavior for everybody and it is common
for an OS vendor to have an official ABI definition for C.
For the point present, I think the problem would not be the
compatibility with previous compilers but the compatibility with C.
Even if it is not demanded by the standard, some compilers offer the
additionnal garantee that a pointer to a linkage "C++" function is
compatible with a pointer to a linkage "C" function. Dropping that
could be difficult for them. It would be the result of having fat
pointer for function. (Changing the C ABI is surely *not* an option.)
As written elsewhere, fat pointer is not the only solution, gcc use
trampoline for that.
Yours,
--
Jean-Marc
> why is that a problem? Or maybe, why is that
> a new problem (isn't it normal for compilers to do that)?
It was a problem because some of the vendors on the committee said in no
uncertain terms that they would not break compatibility that way.
> No, it doesn't. That's "dynamically typed", as opposed to "statically
> typed". "Latently typed" means that the types are not stated explicitly in
> the program source. The opposite is "manifestly typed".
> I didn't invent this terminology; here are a couple of web pages which use
> it:
> http://osteele.com/archives/2003/08/test-versus-type
> http://onestepback.org/articles/usingruby/strong.html
My understanding has always been that latently typed was another term
for dynamically typed, and that the opposite of "manifestly typed" was
"type inferred" or "implicitly typed." The first web page I googled
concurred with my understanding. However, I will admit that as I dig
around, there seems to be no real agreement on the terminology, which
seems to have changed over the years. And indeed the problem seems to be
that people have not considered various axes-of-features as independent
of each other. Note that the first article you cite uses the term in a
way that seems to correlate more with my understanding than with yours:
"Python is an example of latent typing: types are always checked, just
not at compile time."
Regardless, my apologies for starting an argument over terminology.
>>The issue of support for (real) lambda (by way of closures as you bring
>>up) is orthogonal to the issue of types.
>
>
> It is orthogonal to the issue of type-checking, but not orthogonal to the
> issue of whether types need to be declared. This is because the
> declarations make the lambda more verbose, and an aim of lambda is to be
> concise.
The aim of lambda is to enable functional composition, and the power of
functional composition goes *way* beyond its ability to *syntactically*
shorten a computer program:
http://lists.cs.wisc.edu/archive/pl-seminar/2002/msg00020.shtml
Sorry the example is in SML, but C++ just doesn't cut it for this kind
of thing--even with the modifications being discussed in this thread. (I
have the same program in Scheme lying around somewhere.) SML infers
types as much as it can. But sometimes you need to declare types, and
you can always add (redundant) type declarations. Doing so in the
example above would do little to detract from the point it was making
about the power of functional composition.
-thant
--
"Gold: monetary unit of the peaceable kingdom. Dollah: toilet
paper of the war whores." --machinehead
[...]
| Sorry the example is in SML, but C++ just doesn't cut it for this kind
| of thing--even with the modifications being discussed in this thread. (I
| have the same program in Scheme lying around somewhere.) SML infers
| types as much as it can. But sometimes you need to declare types, and
| you can always add (redundant) type declarations. Doing so in the
| example above would do little to detract from the point it was making
| about the power of functional composition.
And the price of that deceptive facility, i.e. Hindley-Milner typing,
is the ban of overloading -- except when the language itself declares
that it knows about some symbols (e.g."+") but do not give programmers
ways to overload. OCaml chosed to remain consistent -- and carry the
logic to the extreme where, for instantce, "*" is for intergers and
"*." is for floating point. Haskell removed that limitation through
"type classes", which in C++ would correspond to abstract class
templates -- the "Haskell dictionaries" would correspond to "C++ vtables".
Decades of going in circles, with increasing resource spent in
obfuscation of common sense.
--
Gabriel Dos Reis
g...@integrable-solutions.net
| "Thorsten Ottosen" <nes...@cs.auc.dk> wrote in message
| news:41e70727$0$48316$1472...@news.sunsite.dk...
|
| > why is that a problem? Or maybe, why is that
| > a new problem (isn't it normal for compilers to do that)?
|
| It was a problem because some of the vendors on the committee said in no
| uncertain terms that they would not break compatibility that way.
Indeed. Although I'm very sympatetic to the unnamed functions, block
with values, lambdas, closures or whatever they end up being called, I
consider that the compatibility problem should not be underestimated.
--
Gabriel Dos Reis
g...@integrable-solutions.net
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[...]
| As written elsewhere, fat pointer is not the only solution, gcc use
| trampoline for that.
yes, but it is not a universal option.
--
Gabriel Dos Reis
g...@integrable-solutions.net
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
Yes. For example:
int sum( int *first, int *last ) {
int s = 0;
for_each( first, last, void struct( int item ) {
s += item;
} );
return s;
}
should do the obvious thing, and not always return 0.
> Then we should also.think of how we might do a copy, if we want.
> Perhaps like
>
> void f(int n1, int n2)
> {
> // copy n1, n2
> std::sort(b, e, bool (int a, int b) { int n1 = ::n1, n2 = ::n2;
> return a*n1 < b*n2; } )
> }
I am not keen on that syntax, because ::n1 already has meaning. Also, I
don't see why the copy has to be made in the anonymous function. Why not
just copy them on the stack and uses references to those? I would hope and
expect that the compiler would be smart enough to optimise away
unnecessary indirection.
The only case where you really need a copy in the anonymous object itself
is when the lifetimes differ:
Proc *make_adder( int inc ) {
return new int( int arg ) { return arg + inc; };
}
This case is difficult partly because it gets into the memory management
issue. Also because the type "Proc" is moot. If it is a typedef for
int()(int), then where is there space in the pointer to store the inc
copy? If it is class type, then where does the lambda say that it inherits
from it?
My initial position is that this kind of thing need only be supported by
going to a more verbose syntax:
Proc *make_adder( int inc ) {
return new struct : Proc {
int inc_copy = inc;
int operator()( int arg ) { return arg + inc_copy; };
}
}
and frankly the benefit over the current language:
Proc *make_adder( int inc ) {
struct Local : Proc {
int inc_copy;
Local( int inc ) : inc_copy( inc ) {}
int operator()( int arg ) { return arg + inc_copy; }
};
return new Local( inc );
}
is too small to worry about anyway. In other words, I suggest we support
downward funargs only at this stage.
-- Dave Harris, Nottingham, UK
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
Functional composition is a good thing, and it's something we can already
do in C++. The lambda proposals don't so much enable it as provide a nicer
syntax for it. All of the proposals for lambda so far are things which can
be translated directly into 1998 C++.
Which isn't to say that the benefits are only syntactic. The translation
would add details, and the lack of irrelevant detail is part of what
raises the abstraction level. Your SML code omits 3 kinds of detail:
(1) Explicit memory management.
(2) Explicit type declarations.
(3) Names of anonymous functions.
If all 3 were put back in, the code would still be structured the same but
it wouldn't be so clear or so abstract. And I doubt it would still fit in
a page.
I don't think we have an argument here. We're both making the point that
just addressing (3) on its own won't give C++ the full benefits of lambda
as found in SML et al, and that the proposal shouldn't be judged as if it
would.
If you are saying that (1) is more important than (2), then I'm not
disagreeing. I just think both are factors. I couldn't honestly talk about
the success of lambdas in functional languages without mentioning type
inference.
-- Dave Harris, Nottingham, UK
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
Why would std::sort be able to inline a function, if passed a pointer to
function, whereas qsort would not? I.e. how would it be able to translate
the following:
inline bool comp(int a,int b) { return a < b; }
std::sort(b, e, comp);
In both cases a function pointer is called, and unless the compiler does a
whole-program analysis, or some such, it's unlikely that it will be able to
see beyond the call to std::sort, and substitute the inline function inside
of it.
> | Regarding the use of functions or objects, again. If you were to allow
bound
> | variables from the surrounding context in your lambda function, then
> | function objects provide handy closures for them. And, no, I don't think
you
> | should have to write this code, yourself (as 2 hints to). :) Contrived
> | example:
> |
> | void f(int n1, int n2)
> | {
> | std::sort(b, e, bool (int a, int b) { return a*n1 < b*n2; } )
> | }
> |
> | Possible translation:
> |
> | struct __compiler_generated
> | {
> | __compiler_generated(int n1, int n2) : n1(n1), n2(n2) {}
> |
> | bool operator()(int a, int b) { return a*n1 < b*n2; }
> |
> | int n1;
> | int n2;
> | };
>
> These two variable should be (non-const) references by default to support
> local
> functions acting on outer-scope function parameters.
In this case, I assumed a compiler smart enough to realise that the
parameters weren't modified in the functor, and therefore stored them by
value, avoiding the indirection of using reference. ;)
Regards,
Terje
I thought the directions went the other way. "Downward" for when the
object is passed to called routines. "Upward" for when the object is
returned to callers. I suppose it depends on whether you think of the
stack as growing upwards or downwards. I usually think of main() as being
the top-most function, with the functions it calls below it.
A search for "upward closure" produced nothing useful, but searching for
"upward funarg" or "downward funarg" supported my view.
-- Dave Harris, Nottingham, UK
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
> My understanding has always been that latently typed was another term
> for dynamically typed, and that the opposite of "manifestly typed" was
> "type inferred" or "implicitly typed." The first web page I googled
> concurred with my understanding. However, I will admit that as I dig
> around, there seems to be no real agreement on the terminology, which
> seems to have changed over the years. And indeed the problem seems to be
> that people have not considered various axes-of-features as independent
> of each other. Note that the first article you cite uses the term in a
> way that seems to correlate more with my understanding than with yours:
>
> "Python is an example of latent typing: types are always checked, just
> not at compile time."
That's also inaccurate, in most senses of the word "checked." It's
fairly rare for any Python program to ask about the type of a variable.
--
Dave Abrahams
Boost Consulting
http://www.boost-consulting.com
I'll try one more time: Whether a programming language supports
functional composition and how a programming language handles types HAVE
NOTHING TO DO WITH EACH OTHER. I've written exactly the same program in
Scheme. (In fact, I wrote it in Scheme before I wrote it in SML.) Scheme
is dynamically typed, and (with the right libraries) supports
overloading just fine. And I have no doubt the same program could be
easily written in Haskell and OCaml (and Miranda, which, if I remember
correctly, requires type declarations). The difference between all these
languages and C++ is that they support what elsewhere in this thread are
called "full closures." Not coincidentally, they all provide automatic
memory management.
(I happen to prefer SML's module system to OCaml's OO support, but that
is an entirely different subject.)
-thant
| > If the code is templated, like most in C++'s standard library, I don't
| > think there is any difference between using a functor or an inline
| function.
| > qsort() is slower because it can't inline the function pointer call,
| > std::sort() can.
|
| Why would std::sort be able to inline a function, if passed a pointer to
| function, whereas qsort would not? I.e. how would it be able to translate
| the following:
|
| inline bool comp(int a,int b) { return a < b; }
|
| std::sort(b, e, comp);
|
| In both cases a function pointer is called, and unless the compiler does a
| whole-program analysis, or some such, it's unlikely that it will be able to
| see beyond the call to std::sort, and substitute the inline function inside
| of it.
whole program analysis should not be necessary. If the function is inline,
its present in the same TU as std::sort(). I haven't cheked all my compilers,
but
I assume most modern ones will do it.
-Thorsten
> I don't know of a
> language which combines dynamic type checking with manifestly declared
> types, but it is logically possible and I suspect would be a very good
> idea.
PHP 5 does this. You may optionally declare the type in a signature, and if
the passed-in object is not (derived from) this type, it will give a
run-time error. PHP is dynamically typed.
> I didn't invent this terminology; here are a couple of web pages which use
> it:
> http://osteele.com/archives/2003/08/test-versus-type
> http://onestepback.org/articles/usingruby/strong.html
>
> Many other references mix them up.
Yes. Often you see discussions about "strong vs weak typing", when they're
really about static vs dynamic typing. Dynamically typed languages (at least
scripting languages) tend also to have more implicit conversions. However,
as I understand, this is not the same as "weak typing" (or is it? See the
following). BCPL is an example of a weakly typed language (as is e.g.
assembly code), where type errors are not detected. However, may this
definition be used for languages with "eager conversions"? When a language -
such as PHP - permits implicit conversions between almost anything (with
more or less reasonable results), isn't it then also easier to get
undetected type errors, due to using an object in an improper way, but
conversions lets you "get away with it" without error messages?
> That's usually because they don't
> consider all the possibilities; they conflate "latent" with "dynamic"
> which may have confused you.
"latent" typing is a terminology that I find confusing in itself, and I
suspect that many wouldn't know (or agree about) what it means, without
reading a definition, somewhere. "implicit" vs "explicit" typing I think
gives a more, er, explicit way of stating what it's about.
> Some people go on to use "inferred types" for
> latent, but that could mean Fortran's feature of deducing a variable's
> type from its name: "i" is always integer, "r" is always real etc.
Hm, well, "type inference" is another common terminology for implicit/latent
typing, and I hardly think people would think of type inferred from the
"names", when they hear about inferred types. At least I think it's less
chance for that, than for someone understanding what "latent typing" means,
without prior definition. Then again, it's hard to tell what "most people"
would think.
> It seems to me that all 4 concepts are important and need names
Indeed. As well as the strong vs weak distinction. I'm also wondering about
the eager/not eager conversion distinction, unless it may be folded into
strong/weak.
>, and the names I use here are the clearest.
There may be some debate about that. :) One of the pages you give link to
above uses "explicit"/"implicit" as the terminology (with
"manifest"/"latent" in parenthesis).
> > The issue of support for (real) lambda (by way of closures as you bring
> > up) is orthogonal to the issue of types.
>
> It is orthogonal to the issue of type-checking, but not orthogonal to the
> issue of whether types need to be declared. This is because the
> declarations make the lambda more verbose, and an aim of lambda is to be
> concise.
>
> In Java, for example, exception specifications are mandatory. So a Java
> lambda may need to declare its return type, the types of all its
> arguments, and the types of all the exceptions it can throw. That can add
> up to so many names that it is clearer to separate out the function and
> not use lambda at all.
It's interesting that you should mention Java, as Java has its own lambda
"hack", in the form of anonymous inner classes (often used for event
listeners). However, using inheritance from an interface, this relies always
on dynamical dispatch.
Regards,
Terje
> Jean-Marc Bourguet <j...@bourguet.org> writes:
>
> [...]
>
> | As written elsewhere, fat pointer is not the only solution, gcc use
> | trampoline for that.
>
> yes, but it is not a universal option.
But I wonder if there are situation where neither solutions
is acceptable.
I think that fat pointer is about as universal as possible,
the only major problem seems to be the compatibility with
the past.
For trampoline beeing not available, we must be in a
situation where it is impossible for a process to generate
code and then execute it in the process space.
Architectually, that means Harvard architecture processor.
Thoose are quite rare and as far as I know used mostly in
situation like embdeded systems where compatibility is not
such a big issue.
Then there are constrainsts set by the OS. I known of none
where it would be a problem to have a secondary stack marked
executable where to build trampoline wrapper (perhaps AS-400
-- I never remember the ?Series nomemclature).
Yours,
--
Jean-Marc
>> "Python is an example of latent typing: types are always checked, just
>>not at compile time."
>
>
> That's also inaccurate, in most senses of the word "checked." It's
> fairly rare for any Python program to ask about the type of a variable.
I'm no Python expert, but my impression was that it was similar to
Scheme in that types aren't associated with variables at all, but with
values. That is, a variable could be assigned e.g. an integer value one
moment, and e.g. a string the next. In this kind of a language, values
carry around with them information about their type, and the types of
*values* (not variables) are checked when the program tries to do
something with them (e.g. add them together). [1] So the quoted
statement not only concurrs with my understanding of Python, but with
with my otherwise supposedly inaccurate understanding of the term
"latently typed." [2]
And again, the main point is that latent...er...dynamic typing has
nothing to do with lambda (beyond the fact that they both as a rule only
exist in languages that support automatic memory management).
-thant
[1] Good compilers will try, for performance reasons, to eliminate
run-time type checks where they have enough information to do so, and
Common Lisp, I understand, allows the programmer to provide optional
type declarations for this purpose.
[2] SML doesn't (for the most part) require type declarations, but it
does all its type checking at compile time. By the above definition, SML
is not latently typed.
[...]
> Functional composition is a good thing, and it's something we can already
> do in C++. The lambda proposals don't so much enable it as provide a nicer
> syntax for it. All of the proposals for lambda so far are things which can
> be translated directly into 1998 C++.
Functional composition in C++ is extremely limited, and refering to the
extensions proposed in this thread as "lambda" is misleading.
> Which isn't to say that the benefits are only syntactic. The translation
> would add details, and the lack of irrelevant detail is part of what
> raises the abstraction level. Your SML code omits 3 kinds of detail:
>
> (1) Explicit memory management.
> (2) Explicit type declarations.
> (3) Names of anonymous functions.
>
> If all 3 were put back in, the code would still be structured the same but
> it wouldn't be so clear or so abstract. And I doubt it would still fit in
> a page. [...]
At the end of this post I've appended my lexer generator rewritten to
eliminate all anonymous functions (and currying), and to include
complete explicit type declarations. (I've omitted the comments and the
testing code.) The code is less concise, but not by much. #2 and #3 are
nice things for a language to make optional when it can, but it should
be clear that the expressive power of functional composition is
something else entirely.
As for #1, it is a necessary, but not sufficient, precondition for
providing the advantages of functional composition.
-thant
---begin example---
type stream = char list * char list;
type streams = stream list;
type matcher = streams -> streams;
type matchers = matcher list;
fun tok (t : char) : matcher =
let fun f (s,h::r) = if h=t then SOME (h::s,r) else NONE
| f _ = NONE
in List.mapPartial f
end
fun applyToStreams (f : matcher, s : streams) : streams = f s
fun seq (rules : matchers) : matcher =
let fun f (streams : streams) : streams =
foldl applyToStreams streams rules
in f end
fun bar (rules : matchers) : matcher =
let fun f (streams : streams) : streams =
let fun g (f : matcher) : streams = f streams
in List.concat (map g rules) end
in f end
fun star (rule : matcher) : matcher =
let fun f (streams : streams) : streams =
let fun loop (streams : streams) : streams list =
case (rule streams) of
[] => []
| results => results::(loop results)
in List.concat (streams::(loop streams)) end
in f end
fun pos (rule : matcher) : matcher = seq [rule, star rule]
fun opt (rule : matcher) : matcher = bar [rule, fn x => x]
fun range (a : char, b : char) : matcher =
if b<a then range (b,a) else
let fun loop (i : int) : matchers = if (i<=Char.ord b)
then (tok (Char.chr i)) :: loop (i+1)
else []
in bar (loop (Char.ord a))
end
fun lit (s : string) : matcher = seq (map tok (String.explode s))
fun set (s : string) : matcher = bar (map tok (String.explode s))
---end example---
I agree, but I wonder if everyone sees the consequences of what you're
asking. :-)
You start to allude to the problem when you continue:
>I don't see why the copy has to be made in the anonymous function. Why not
>just copy them on the stack and uses references to those? I would hope and
>expect that the compiler would be smart enough to optimise away
>unnecessary indirection.
But it's not an optimization question, it's a type safety question, as you
then go on to discuss w.r.t. lifetimes.
We all know for_each, but often you'll pass a lambda off to something that
keeps it around for longer than your stack frame exists, and frankly you
can't always know if it will do that (e.g., it might be in a third-party
library for which you have only the binaries). For s physically on the
stack, that isn't type-safe: After the stack shrinks and re-grows, with
some other function's frame occupying the location where s used to be,
that lambda can be (ab)used to write in to that other function's space. I
suspect this is worse than most security holes because I'll bet it would
make a fairly high-precision weapon.
But I agree strongly that you should be able to write the above. I also
agree strongly that type safety mustn't be compromised. Can you have your
cake and eat it too? Sure; do what C# and other languages have done, and
use closures. Specifically, the trick is that for a given function call,
part of the local variables (the ones that aren't used in lambdas) are
laid out as usual in the stack frame, but the local variables like s that
are used in a lambda allocated together on the heap. That ensures their
lifetimes will continue, and both the function and the callees that get
the lambda get exactly what they expect.
Note that I didn't say when they get deallocated. Reference counting works
fine for this, as do other forms of GC. If you like lambdas, they pretty
slickly hand you exactly this great reason to like GC too, even if you
don't at first want to like GC. :-)
So I don't believe any extra syntax is needed. BTW, I would prefer to
spell lambda functions without "struct":
for_each( first, last, void ( int item ) {
s += item;
} );
It's a little more lookahead to find the { in the worst case, but I don't
think it's ambiguous. I'd also support having both lambda expressions
_and_ lambda functions, because for little stuff like this there's a lot
to like about
for_each( first, last, s += _1 );
as the Boost Lambda folks will tell you, I'm sure. :-) If I were ever to
put this into a compiler, it would take strong counterarguments to
dissuade me from supporting both lambda expressions and lambda functions.
Herb
---
Herb Sutter (www.gotw.ca) (www.pluralsight.com/blogs/hsutter)
Convener, ISO WG21 (C++ standards committee) (www.gotw.ca/iso)
Contributing editor, C/C++ Users Journal (www.gotw.ca/cuj)
Architect, Developer Division, Microsoft (www.gotw.ca/microsoft)
OK; I'm not disagreeing with that. I've continued to use that name only
because I've not seen a good alternative. I don't like "anonymous
function" because these things are not functions, and I think "closure"
has the same or worst problems as "lambda". What name do you suggest we
use?
Also, what further features would be needed before you'd accept that C++
had genuine lambdas?
> At the end of this post I've appended my lexer generator rewritten to
> eliminate all anonymous functions (and currying), and to include
> complete explicit type declarations.
I agree that GC is more important than anonymous functions and explicit
type declarations.
I'm afraid I don't understand SML well enough to really follow what your
code is doing. In particular, to see what a C++ 1998 version would look
like, using structs for closures and smart pointers for garbage
collection. I imagine it would be several times more verbose, but would
still be recognisably structured the same.
-- Dave Harris, Nottingham, UK
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
Agreed, of course: but the same issue arises whenever we pass pointers or
references. The library needs to document whether it holds onto the
object, or preferably, use some kind of smart-pointer. I'm not convinced
that anonymous functions deserve special treatment.
> But I agree strongly that you should be able to write the above. I also
> agree strongly that type safety mustn't be compromised. Can you have
> your cake and eat it too? Sure; do what C# and other languages
> have done, and use closures. Specifically, the trick is that for a
> given function call, part of the local variables (the ones that
> aren't used in lambdas) are laid out as usual in the stack frame,
> but the local variables like s that are used in a lambda allocated
> together on the heap. That ensures their lifetimes will continue,
> and both the function and the callees that get the lambda get
> exactly what they expect.
Although on the one hand that sounds cool, it also sounds like quite a
heavyweight solution. It sounds like the variables need to be copyable,
for example. The heap allocation is something that might fail, so there
are implications for exception safety. And it adds run-time overhead which
surely goes against the principle that you don't pay for what you don't
need.
I think the variables need to be copyable in order to provide access to
arguments:
int sum( int *first, int *last, int s=0 ) {
for_each( first, last, void struct( int item ) {
s += item;
} );
return s;
}
Here the caller of sum will allocate s on the stack, so it needs to be
copied from the stack to the heap for the closure. Not a problem for an
int, or indeed for any class with an accessible copy constructor, and
usually if the copy constructor is accessible to the caller of sum() it
will also be accessible within sum() - but not always.
Local variables (ie non-arguments) can be allocated on the heap in the
first place, so no copy is needed for those, but we need to be clear about
whether that is just an optional optimisation of a temporary like the RVO
(ie whether the copy constructor must be accessible), or whether it is
required (which implies different semantics for arguments and locals).
> Note that I didn't say when they get deallocated. Reference counting
> works fine for this, as do other forms of GC.
It's still a can of worms, though, especially if you are trying to do it
behind the scenes. It makes a pointer to an anonymous function be a very
different type to a pointer to a normal class (or function). I was hoping
these new types could interact seamlessly with existing code.
My feeling is that heap allocation should be explicitly requested by the
programmer, probably with the keyword "new". And if they want GC, they can
use their own smart pointer class. Or else it can be added to the language
in a general way, rather than as a special case for closures only.
Frankly I think if using closures means an extra heap allocation, many
programmers won't use them. They will go back to writing for-loops. And
exception safety is also a big issue, at least theoretically. You have
turned my no-throw sum() into one that can throw bad_alloc.
> So I don't believe any extra syntax is needed.
No extra syntax, but different and incompatible semantics for the
suggested syntax. We need to decide now whether to go this route, because
it can't be retro fitted later.
> BTW, I would prefer to spell lambda functions without "struct"
What do you intend to do about binary compatibility with plain function
pointers?
-- Dave Harris, Nottingham, UK
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
| On 15 Jan 2005 23:50:25 -0500, bran...@cix.co.uk (Dave Harris) wrote:
| >nes...@cs.auc.dk (Thorsten Ottosen) wrote (abridged):
| >> These two variable should be (non-const) references by default to
| >> support local functions acting on outer-scope function parameters.
| >
| >Yes. For example:
| >
| > int sum( int *first, int *last ) {
| > int s = 0;
| > for_each( first, last, void struct( int item ) {
| > s += item;
| > } );
| > return s;
| > }
| >
| >should do the obvious thing, and not always return 0.
|
| I agree, but I wonder if everyone sees the consequences of what you're
| asking. :-)
|
| You start to allude to the problem when you continue:
|
| >I don't see why the copy has to be made in the anonymous function. Why not
| >just copy them on the stack and uses references to those? I would hope and
| >expect that the compiler would be smart enough to optimise away
| >unnecessary indirection.
|
| But it's not an optimization question, it's a type safety question, as you
| then go on to discuss w.r.t. lifetimes.
|
| We all know for_each, but often you'll pass a lambda off to something that
| keeps it around for longer than your stack frame exists, and frankly you
But a library/program which does that with current C++ is already
broken, isn't it? I mean if one keeps a reference to a local variable
that may not exist long enough, one isalready in trouble.
[...]
| But I agree strongly that you should be able to write the above. I also
| agree strongly that type safety mustn't be compromised. Can you have your
| cake and eat it too? Sure; do what C# and other languages have done, and
I believe it is more accurate to say "do what other languages are
doing and C# has copied" ;-)
--
Gabriel Dos Reis
g...@integrable-solutions.net
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
Allan W schrieb:
>I have a little experience with nested functions in Pascal.
>Not quite the same thing, but similar intent.
>
>The equivalent in C++ would be:
>
>. void foo(iter b, iter e) {
>. bool lessint(int a, int b) { return a < b; }
>. std::sort(b, e, lessint);
>. }
>
>To be honest, I don't have a lot of experience with nested functions.
>I do know that overuse of this feature can be just as bad as overuse
>of any other feature, i.e. in C++, overuse of overloaded operators
>or macros. However, for simple cases I found nested functions allowed
>me to express myself elegantly.
>
I also used Pascal and as far as I remember, the described usage of
local functions was **not**
possibly (i.e. returning pointers to it). The reason is quite simple
(although solveable), if you consider
the more obvious usages of inner functions, in cases like
. void foo(iter b, iter e) {
. bool lessint(int a) { return a < *b; }
. int c = 42;
. lessint(c);
. }
or even worse
. void foo(iter b, iter e) {
. int a = 42;
. bool lessint() { return a < *b; }
. lessint();
. }
Greetings from Bremen,
Daniel Krügler
Are folks looking for a keyword? Or merely terminology? How do people
describe Pascal's ability to define functions within functions? (...as
constrained in their use as they are...)
> Also, what further features would be needed before you'd accept that C++
> had genuine lambdas?
The concept of adding genuine lambda to C++ hurts my brain.
> I'm afraid I don't understand SML well enough to really follow what your
> code is doing. [...]
I'm proud to say I did come up with this stuff on my own, but it turns
out in the literature it's called "parser combinators."
-thant
> David Abrahams wrote:
>> Thant Tessman [quoting a referenced article] wrote:
>
>>> "Python is an example of latent typing: types are always checked, just
>>>not at compile time."
>>
>>
>> That's also inaccurate, in most senses of the word "checked." It's
>> fairly rare for any Python program to ask about the type of a variable.
>
> I'm no Python expert, but my impression was that it was similar to
> Scheme in that types aren't associated with variables at all, but with
> values.
Yes, you're right. Technically I should have said "It's fairly rare
for any Python program to ask about the type of an object."
> That is, a variable could be assigned e.g. an integer value one
> moment, and e.g. a string the next. In this kind of a language, values
> carry around with them information about their type, and the types of
> *values* (not variables) are checked when the program tries to do
> something with them (e.g. add them together).
Except in Python there's no explicit checking of the type's
identity(**). The interpreter treats all objects the same way. If
the object's type has an "__add__" method, it gets called. Otherwise,
there's an exception.
(**) except in a very few places as an optimization, and, for some
reason, to make sure the first argument to a method has the right
type.
> [1] So the quoted statement not only concurrs with my understanding
> of Python, but with with my otherwise supposedly inaccurate
> understanding of the term "latently typed." [2]
I'm not making any claims about your understanding of latent typing.
I'm just trying to make the point that dynamic type systems often
don't do any typechecking.
> And again, the main point is that latent...er...dynamic typing has
> nothing to do with lambda (beyond the fact that they both as a rule
> only exist in languages that support automatic memory management).
Agreed.
--
Dave Abrahams
Boost Consulting
http://www.boost-consulting.com
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
> On 15 Jan 2005 23:50:25 -0500, bran...@cix.co.uk (Dave Harris) wrote:
>>nes...@cs.auc.dk (Thorsten Ottosen) wrote (abridged):
>>> These two variable should be (non-const) references by default to
>>> support local functions acting on outer-scope function parameters.
>>
>>Yes. For example:
>>
>> int sum( int *first, int *last ) {
>> int s = 0;
>> for_each( first, last, void struct( int item ) {
>> s += item;
>> } );
>> return s;
>> }
>>
>>should do the obvious thing, and not always return 0.
>
> I agree, but I wonder if everyone sees the consequences of what you're
> asking. :-)
<snip>
> We all know for_each, but often you'll pass a lambda off to something that
> keeps it around for longer than your stack frame exists, and frankly you
> can't always know if it will do that (e.g., it might be in a third-party
> library for which you have only the binaries).
Once you have to look at a library's source code, the game is over.
Either its documentation tells you what it does, or you shouldn't use
it. If a third party library is going to accept arbitrary function
objects and store copies somewhere, it has to tell you so, because you
could always write a function object containing pointers or references
to stack variables. There's nothing special about lambdas in this
regard.
[That said, if it's accepting arbitrary function objects, the
third-party library is a template library, so you can look at the
source]
> For s physically on the stack, that isn't type-safe: After the stack
> shrinks and re-grows, with some other function's frame occupying the
> location where s used to be, that lambda can be (ab)used to write in
> to that other function's space. I suspect this is worse than most
> security holes because I'll bet it would make a fairly
> high-precision weapon.
I have a really hard time understanding this argument. As Dave
Harris has posted elsewhere, it's no worse a problem than is created
by the ordinary ability to pass pointers and references around.
> But I agree strongly that you should be able to write the above. I also
> agree strongly that type safety mustn't be compromised.
Type safety isn't any further compromised by this feature than it is
by the rest of C++. And by "the rest of C++" I mean the ordinary part
that we think of as typesafe -- the part that doesn't include
dangerous shenanigans like casts.
> Can you have your cake and eat it too? Sure; do what C# and other
> languages have done, and use closures.
Ugh, this seems just like the conversations we've had about what
garbage collection for C++ should look like. Rather than do something
that fits naturally into the C++ model of computation, your suggesting
that we do something that's just like what Microsoft has done for
reasons that were particular to Microsoft's own language or system.
> Specifically, the trick is that for a given function call,
> part of the local variables (the ones that aren't used in lambdas) are
> laid out as usual in the stack frame, but the local variables like s that
> are used in a lambda allocated together on the heap. That ensures their
> lifetimes will continue, and both the function and the callees that get
> the lambda get exactly what they expect.
Then I have to pay for something I won't use. Why should the for_each
above cause a dynamic allocation and deallocation? In fact, I'd have
to pay for it even if there were a branch around the for_each that was
taken 90% of the time.
> Note that I didn't say when they get deallocated. Reference counting works
> fine for this, as do other forms of GC. If you like lambdas, they pretty
> slickly hand you exactly this great reason to like GC too, even if you
> don't at first want to like GC. :-)
I like GC anyway, but lambdas don't make it any more likeable. In
fact, if getting GC meant that I'd have to pay for a dynamic
allocation in each stack frame that used a local variable in a lambda,
I'd oppose it.
--
Dave Abrahams
Boost Consulting
http://www.boost-consulting.com
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
IMHO, the problem with this is that the lambda expression doesn't
explicitly specify the type(s) for its contextual binding or closure.
Assuming that the compiler implements it as an anonymous structure with
a member variable and constructor, what is the type of the member
variable - int or int &? Maybe it should be int * or, perhaps
shared_ptr<int>?
My argument would be that the syntax should have a way of specifying
this. In the case of other languages where I've seen lambda implemented
(e.g. lisp or Python) everything is a reference, so the issue doesn't
arise. It's not quite that simple in C++, with the lifetime issues
already mentioned.
A couple of years ago I proposed a syntax (and even implemented a
preprocessor) that would go something like this:
int sum( int *first, int *last, int s=0 ) {
for_each( first, last, lambda ( int item ) {
__ref(s) += item;
} );
return s;
}
where "__ref(s)" indicates that "s" should be stored by reference in
the closure.
I also toyed with the problem of returning lambda expressions from
functions, but that starts to get rather messy - it's difficult to
specify the exact type of the returned object. e.g.
template<typename T>
lambda bool (T) (const T &) // Return type (!)
match (const T &target) {
return lambda bool (const T &candidate) {
return candidate == __ctx(target);
};
}
// ...
std::vector<std::string> v;
// ...
std::vector<std::string>::iterator
i = find_if (v.begin(), v.end()
, match (std::string("hello")));
I still have some stuff on this on my webpage
(http://home.clara.net/raoulgough/clamp/index.html) but be warned that
the preprocessor probably generates code that doesn't work with
compilers that do two-phase name lookup.
--
Raoul Gough.
[...]
> I'm just trying to make the point that dynamic type systems often
> don't do any typechecking.
The process of method dispatch *is* type checking--dynamic type
checking. I know this is merely more bickering over terminology, but I
again find myself concurring with the terminology as used by the author
of the referenced article. Contrast what's going on in Python with what
happens in C/C++ if a program misidentifies the type of an object on
the other end of a void pointer.
If by "type checking" you mean the application of a formal type system
(as described e.g. by Pierce in "Types and Programming Languages"),
then I would agree that Python employs no formal type system.
[...]
-thant
> David Abrahams wrote:
>
> [...]
>
>> I'm just trying to make the point that dynamic type systems often
>> don't do any typechecking.
>
> The process of method dispatch *is* type checking--dynamic type
> checking. I know this is merely more bickering over terminology, but I
> again find myself concurring with the terminology as used by the author
> of the referenced article. Contrast what's going on in Python with what
> happens in C/C++ if a program misidentifies the type of an object on
> the other end of a void pointer.
Doing that comparison doesn't leave me any more convinced that
"dynamic type checking" is apt than I was previously. In particular,
that's because Python programs practically never attempt to identify
the type of an object.
> If by "type checking" you mean the application of a formal type system
> (as described e.g. by Pierce in "Types and Programming Languages"),
That qualifies. Something less rigorous qualifies too: what your C++
compiler does when you try to call a function, matching argument types
against parameter types.
> then I would agree that Python employs no formal type system.
--
Dave Abrahams
Boost Consulting
http://www.boost-consulting.com
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
Exactly right on both counts. On the latter, many of the functional
languages have been around for long time... :-)
Herb
---
Herb Sutter (www.gotw.ca) (www.pluralsight.com/blogs/hsutter)
Convener, ISO WG21 (C++ standards committee) (www.gotw.ca/iso)
Contributing editor, C/C++ Users Journal (www.gotw.ca/cuj)
Architect, Developer Division, Microsoft (www.gotw.ca/microsoft)
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
Actually I regret drawing fire by saying the word "C#" in this newsgroup.
:-) I'm suggesting borrowing from the functional languages.
>I like GC anyway, but lambdas don't make it any more likeable. In
>fact, if getting GC meant that I'd have to pay for a dynamic
>allocation in each stack frame that used a local variable in a lambda,
>I'd oppose it.
It's just an alternative.
Herb
---
Herb Sutter (www.gotw.ca) (www.pluralsight.com/blogs/hsutter)
Convener, ISO WG21 (C++ standards committee) (www.gotw.ca/iso)
Contributing editor, C/C++ Users Journal (www.gotw.ca/cuj)
Architect, Developer Division, Microsoft (www.gotw.ca/microsoft)
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
> On 18 Jan 2005 18:21:34 -0500, David Abrahams <da...@boost-consulting.com>
> wrote:
>>> Can you have your cake and eat it too? Sure; do what C# and other
>>> languages have done, and use closures.
>>
>>Ugh, this seems just like the conversations we've had about what
>>garbage collection for C++ should look like. Rather than do something
>>that fits naturally into the C++ model of computation, your suggesting
>>that we do something that's just like what Microsoft has done for
>>reasons that were particular to Microsoft's own language or system.
>
> Actually I regret drawing fire by saying the word "C#" in this
> newsgroup. :-) I'm suggesting borrowing from the functional
> languages.
I think it's a good idea to be aware of this option, and we should
explore how it fits into C++. Personally I think we'd have to make
explicit requests for that functionality, in order to keep from paying
undue penalties and when we get to that point there are probably
library solutions for implementing it that don't cost much in terms of
expressivity.
I regret having such a strong reaction to your posting, but actually
it had little to do with your mention of C#. I know you are planning
to give informative talks to non-committee members about the
committee's work on lambda lambda, and since there is little agreement
(even in the committee) about what lambda should look like I'd hate to
see public opinion become biased strongly towards such a solution
without an adequate exploration of its costs and of alternatives.
--
Dave Abrahams
Boost Consulting
http://www.boost-consulting.com
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
Yes. At minimum, compilers can warn when using a local variable in an
lvalue context, but I agree it could be desirable to be even more
explicit. I thought we were just throwing out ideas and current
thinking; I'm not ready to commit to a specific proposal.
> I regret having such a strong reaction to your posting, but actually
> it had little to do with your mention of C#. I know you are planning
> to give informative talks to non-committee members about the
> committee's work on lambda lambda, and since there is little agreement
> (even in the committee) about what lambda should look like I'd hate to
> see public opinion become biased strongly towards such a solution
> without an adequate exploration of its costs and of alternatives.
When Francis rounded us up to talk about "future C++" at ACCU and asked
us for placeholder topics, I randomly picked a noun to slot in a topic.
For exactly the reasons you note, including that there isn't even any
real lambda proposal at all, I've since replaced it with the completely
unspecified topic of "Something Cool in C++0x." With apologies to Scott
Meyers. :-) The placeholder abstract is "What will the topic be? Lambda
functions? Threads? Security? Whatever it is, it will be cool, exciting,
and relevant to the near-term future of C++. There might even be two
cool things! Come to listen or come to debate, but most of all come and
enjoy."
I'll be surprised if I don't end up deciding to talk about some aspect
of concurrency.
Herb
Agreed... I think that to get dynamic allocation you should need to
request it explicitly, ideally using the keyword "new".
But if that is done, it might be reasonable to also adopt what Herb Sutter
suggests. Eg:
struct Proc { ... };
Proc *make_adder( int to_add ) {
return new int struct( int item ): Proc { return item+to_add; };
}
where the keyword "new" means both that anonymous object is created on the
heap, and that the life-times of referenced variables are extended
appropriately. (The ": Proc" is inheritance not type declaration. Proc
would have a virtual operator()(int).)
"Extended appropriately" is a bit ticklish to specify exactly. I had in
mind that cases like this should work:
std::pair<Proc *, Proc *> make_procs( int arg ) {
int local = 0;
Proc *p1 = new int struct( int item ) : Proc {
local += item;
arg += item;
return local;
}
Proc *p2 = new int struct( int item ) : Proc {
arg += item + local;
return arg;
}
return std::pair<Proc *,Proc *>( p1, p2 );
}
with both procs sharing arg and local. I think that means we need 3 heap
allocations here, the 3rd one being for the stack frame. Arg needs to be
copied into it. The procs need to refer to the variables in that stack
frame by reference still. The stack frame needs to live as long as the
longer of the two procs, but that can be achieved with simple
reference-counting in the anonymous object's constructor and destructor.
How the Procs themselves are deleted is up to the programmer as normal.
The base class Proc needs a virtual destructor, of course.
It's ticklish but doable: you can easily visualise the equivalent C++1998
code. Most real cases would be simpler. If there is only one lambda, we
only need one heap allocation.
-- Dave Harris, Nottingham, UK
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
I have argued that it doesn't need to. int & is the only reasonable
choice.
There is some debate over what it is a reference to: whether the int is
left on the stack with limited lifetime, or whether its lifetime should be
extended by being created or moved to the heap behind the scenes. In my
view it is not reasonable to get a heap allocation without using the
keyword new.
If you want a shared_ptr or a int * or a copy or anything else, create one
on the stack and the lambda can refer to that.
> I also toyed with the problem of returning lambda expressions from
> functions, but that starts to get rather messy - it's difficult to
> specify the exact type of the returned object.
I have argued that inheritance can do this job. The class of the lambda
expression is anonymous, but we can specify its base class and return a
pointer/reference to that.
-- Dave Harris, Nottingham, UK
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
Wouldn't copying or moving it onto the heap effectively mean value
semantics anyway? I mean it no longer has a reference to the original
object in the "extended lifetime" case - imagine if the original gets
modified somehow (e.g. by aliasing) then it can become visible whether
the lambda closure has a reference or a copy, so it can't truely remain
behind the scenes.
>
> If you want a shared_ptr or a int * or a copy or anything else,
create one
> on the stack and the lambda can refer to that.
I would guess there are cases where the value has to be copied to avoid
having a dangling reference. My next question would be whether you
think it should be up to the compiler to decide when to make a copy and
when to store a reference? I don't imagine it can "know" which one to
choose in all cases.
>
>
> > I also toyed with the problem of returning lambda expressions from
> > functions, but that starts to get rather messy - it's difficult to
> > specify the exact type of the returned object.
>
> I have argued that inheritance can do this job. The class of the
lambda
> expression is anonymous, but we can specify its base class and return
a
> pointer/reference to that.
But you'd still have to determine the signature of a virtual function
in the base class. You'd also have to return it polymorphically and not
as a dangling reference, probably implying a reference counted pointer.
Maybe that's not such a big issue, but I remember the thing that really
convinced me my pre-processor wasn't going to cut it was trying to
implement a generic Curry function. If you can suggest some syntactic
way of doing this in (enhanced) C++, and the corresponding semantics in
plain C++, I'd be interested in seeing it! Here's an implementation for
clamp (my preprocessor) for currying a two-parameter function into a
single-parameter function:
template<typename Lambda>
lambda
typename Lambda::result_type // fn rtn type
(Lambda, typename Lambda::argument_type_1) // fn closure
(typename Lambda::argument_type_2) // fn param type
curry1 (Lambda const &fn, typename Lambda::argument_type_1 push)
{
// Return an anonymous function that takes a single
// parameter, from a function that takes two and an
// initial argument (like std::bind1st)
typedef typename Lambda::result_type result_type;
typedef typename Lambda::argument_type_2 argument_type;
return lambda result_type (argument_type p) {
return __ctx(fn) (__ctx(push), p);
};
}
The three-to-two parameter function version is even worse looking. A
really general-purpose one escapes me.
For those who haven't seen it before, Currying binds function arguments
one at a time, generating intermediate functions with one fewer
parameter each time. e.g. "curry (&std::plus<int>)(3)(4);" would
evalute to 7, where "curry (&std::plus<int>)(3);" is an anonymous
function with one int parameter.
I think Thant Tessman pointed out even more powerful functional
composition features in another part of this thread which (I guess)
just aren't possible in a language like C++.
--
Raoul Gough.
> For those who haven't seen it before, Currying binds function arguments
> one at a time, generating intermediate functions with one fewer
> parameter each time. e.g. "curry (&std::plus<int>)(3)(4);" would
> evalute to 7, where "curry (&std::plus<int>)(3);" is an anonymous
> function with one int parameter.
That's not currying, but "partial evaluation." Weren't you around for
this thread?
http://article.gmane.org/gmane.comp.lib.boost.devel/116105
--
Dave Abrahams
Boost Consulting
Rather, a curried function is a function that allows function arguments
to be bound one at a time (i.e. "partially evaluated"). The above is
both currying and partial evaluation.
-thant
--
"The nationalist not only does not disapprove of atrocities
committed by his own side, but he has a remarkable capacity
for not even hearing about them." -- George Orwell
Has there been any talk in the committee about having future
standards differentiate between warnings and errors? I know that
currently, the only time that a compiler is required to issue a
diagnostic is when it has detected an error, but AFAIK every
single compiler already makes this distinction.
I think that it might make sense for future standards to say that
certain situations have the following behavior, but a diagnostic
warning is required during compilation. Maybe (in the name of
codifying existing practice) we could make
if (a=0) { dosomething(); }
generate a warning message, as so many compilers already do.
David Abrahams wrote:
> > I know you are planning
> > to give informative talks to non-committee members about the
> > committee's work on lambda lambda,
I'm having enough trouble with the concept of lambda. Not sure
what "lambda lambda" is, but it sounds like a fraternity! Is there
some other name that would still be unambiguous?
OK, my knowledge of functional programming and associated terminology
is a little rusty, but I think the general point about higher-order
functions is still significant. I was basically using the curry
function (or partial evaluator or whatever) as a benchmark for my
implementation of "C++ with lambda", and it didn't work out too well! I
guess I agree with Thant on this one - the C++ language doesn't have
the facilities for dealing with higher-order functions. Adding some
kind of anonymous function syntax won't introduce it either, although
it might make the lack more obvious.
--
Raoul Gough.
> David Abrahams wrote:
>> "Raoul Gough" <Raoul...@yahoo.co.uk> writes:
>>
>>
>>>For those who haven't seen it before, Currying binds function arguments
>>>one at a time, generating intermediate functions with one fewer
>>>parameter each time. e.g. "curry (&std::plus<int>)(3)(4);" would
>>>evalute to 7, where "curry (&std::plus<int>)(3);" is an anonymous
>>>function with one int parameter.
>>
>>
>> That's not currying, but "partial evaluation." Weren't you around for
>> this thread?
>>
>> http://article.gmane.org/gmane.comp.lib.boost.devel/116105
>
> Rather, a curried function is a function that allows function arguments
> to be bound one at a time (i.e. "partially evaluated"). The above is
> both currying and partial evaluation.
If you read the thread there you'll see that I explicitly disagree
with that definition, and assert that Haskell's (and other functional
languages') function-calling syntax leads to that conflation. When
you're dealing with a language like C++, where parentheses are
required to call a function, they don't turn out to be equivalent
ideas.
--
Dave Abrahams
Boost Consulting
www.boost-consulting.com
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
If it is created on the heap in the first place, the issue doesn't arise.
That's what should be done for local variables. It's only arguments which
need to be copied. That copy can't be hidden, of course, because the
object may have a user-defined copy constructor. However, the lambda
should still access the copied object by reference, and any other local
uses should be pointed at the copy too. There can't be any non-local uses,
because the object has local scope.
For example:
BigInt make( BigInt a, BigInt &b ) {
BigInt c( 42 );
BigInt d( 99 );
register_lambda( new void struct() {
++a; ++b; ++c;
} };
BigInt e( 12 );
return a + b + c + d + e;
};
Becomes something like:
struct frame__ {
BigInt a;
BigInt &b;
BigInt c;
frame__( const BigInt &a, BigInt &b ) :
a(a), b(b), c(42) {}
};
struct lambda__ {
BigInt &a;
BigInt &b;
BigInt &c;
lambda__( BigInt &a, BigInt &b, BigInt &c ) :
a(a), b(b), c(c) {}
void operator() {
++a; ++b; ++c;
}
};
BigInt make( BigInt a, BigInt &b ) {
frame__ *pFrame__ = new frame__( a, b );
BigInt d( 99 );
lamba__ *pLambda__ = new lambda__(
pFrame__->a, pFrame__->b, pFrame->c );
register_lambda( pLambda__ );
BigInt e( 12 );
return pFrame__->a + pFrame__->b + pFrame__->c + d + e;
};
(In practice lambda__ and frame__ would be combined into a single object,
for just 1 heap allocation, but that is an optimisation which might not be
possible if there are multiple lambdas sharing local variables. Also, the
lambda__ object could have a single pointer to pFrame__, rather than 3
references.)
Anyway, the point is that 'a' must be copied, but then all uses can refer
to that copy. There can be no aliases outside of make() because 'a' is
local to make(). If make() itself takes 'a's address, it will use the
address of pFrame__->a.
'b' does not need to be copied because it is a reference already. Or
rather, it is the reference which is copied not the object referred to. If
the object is changed by an alias, the lambda will see those changes as
expected.
'c' does not need to be copied because it can be created on the heap in
the first place. It cannot be aliased because it is local.
'd' and 'e' are not used by the lambda expression. I propose that they be
left on the stack (ie that their lifetimes not be extended). In general
that would make the code harder to translate into C++1998 without getting
the order of construction wrong. We may need to interleave construction of
heap and stack objects, which means using placement new, solving the
alignment problems etc. But it is straightforward for the compiler to get
right.
> My next question would be whether you think it should be up to
> the compiler to decide when to make a copy and when to store a
> reference? I don't imagine it can "know" which one to choose in
> all cases.
The compiler should do the heap allocation if and only if the lambda is
created with the keyword "new".
> But you'd still have to determine the signature of a virtual function
> in the base class. You'd also have to return it polymorphically and not
> as a dangling reference, probably implying a reference counted pointer.
> Maybe that's not such a big issue, but I remember the thing that really
> convinced me my pre-processor wasn't going to cut it was trying to
> implement a generic Curry function.
If we ignore the problems of genericity and memory management, it's easy
enough.
struct Base1 {
virtual ~Base1() {}
virtual BigInt operator()( BigInt a ) = 0;
};
struct Base2 {
virtual ~Base2() {}
virtual BigInt operator()( BigInt a, BigInt b ) = 0;
};
Base1 *curry( Base2 *pFn, BigInt b ) {
return new BigInt struct( BigInt a ) : Base1 {
return (*pFn)( a, b );
}
}
We can add memory management by using the appropriate smart pointers. We
can add genericity by turning the above into templates. The result may be
verbose and clumsy, but that isn't really anything to do with lambda. It's
just that C++ templates are not ideal for this kind of thing. If we extend
the language with garbage collection and type inference we will be able to
do better - but those should be separate proposals.
-- Dave Harris, Nottingham, UK
I did read the thread and I can't figure out what you're talking about.
From the book "Functional Programming" by Anthony J. Field, Peter G.
Harrison (1988):
"This idea of treating a function of n arguments as a concatenation of
single-argument functions is called 'currying' after the mathematician
H.B. Curry. [I]n Miranda, a function-valued object is created by
applying a defined function to fewer arguments than appear on the
left-hand side of its definition; this is sometimes called partial
application."
On the other hand, your definition:
> Currying transforms a function taking a single tuple argument into a function taking multiple arguments, and uncurrying reverses the process.
seems to descripe not what a curried function is, but merely a typical
way of creating a curried function within certain languages within which
functions are curried by default, and within which a tuple is merely a
kind of data structure. Within such languages, the described
transformation definitly deserves the term "currying," but otherwise
seems to miss the point entirely.
The definition you cite:
> http://wombat.doc.ic.ac.uk/foldoc/foldoc.cgi?curried+function
might be misinterpreted as describing the use of lambda to facilitate
partial application as an *alternative* to currying, but a function that
is curried because the language defines them that way by default is
indistinguishable from a function that is curried using lambda within
the same language. Either way, the point is that the function allows for
partial application, which is a form of functional composition.
A lot of people (including me) have described funtions that turn a
function expecting n arguments plus the first argument into a function
expecting n-1 arguments as "curry." This is slightly inaccurate in that
it is a combination of "currying" and "partial application." I thought
that this was your objection. But if your claim is that this has nothing
to do with currying, you're simply mistaken.
-thant
> I did read the thread and I can't figure out what you're talking about.
>
> From the book "Functional Programming" by Anthony J. Field, Peter G.
> Harrison (1988):
>
> "This idea of treating a function of n arguments as a concatenation of
> single-argument functions is called 'currying' after the mathematician
> H.B. Curry. [I]n Miranda, a function-valued object is created by
> applying a defined function to fewer arguments than appear on the
> left-hand side of its definition; this is sometimes called partial
> application."
>
> On the other hand, your definition:
>
>> Currying transforms a function taking a single tuple argument into
>> a function taking multiple arguments, and uncurrying reverses the
>> process.
>
> seems to descripe not what a curried function is, but merely a typical
> way of creating a curried function within certain languages within which
> functions are curried by default
I don't understand what that might mean. But it's probably due to the
fact that we have different concepts of what "curried" means ;-)
> and within which a tuple is merely a
> kind of data structure.
When is a tuple not "a kind of data structure?"
I'm talking about C++ here, where tuples are certainly data structures
and f(x)(y)(z) is just anti-idiomatic.
By the way, this argument is based on one made to me by Brian
McNamara, who is much more familiar with these functional programming
concepts than I am. So I may have misunderstood him, or simply given
too much credence to what he said based on his greater expertise.
That said, as I understand it, in most "functional languages,"
f x y z
is an ordinary 3-argument function call. The way you express that in
C++ is
f(x, y, z)
To write f(x, y, z) in the functional languages, you need to define f
as a function of one argument: a tuple. So f(x, y, z) in those
languages doesn't mean the same thing as it does in C++. In those
languages there just happens to be an equivalence between
f x y z
and
(((f x) y) z)
We don't have a similar equivalence in C++; it would be the same as
saying that:
f(x, y, z)
is always equivalent to
f(x)(y)(z)
which we know to be false.
My definition is based on the accepted semantics of the curry and
uncurry functions.
- fun curry f = fn x => fn y => f(x, y);
- fun uncurry f = fn(x, y) => f x y;
Passing a function to curry yields a "curried function," by my
definition. But, oops, I think I got it backwards.
Currying transforms a function taking multiple arguments into a
function taking a single tuple argument, and uncurrying reverses
the process.
Whew, that's better.
> Within such languages, the described transformation definitly
> deserves the term "currying," but otherwise seems to miss the point
> entirely.
Maybe so.
> The definition you cite:
>
>> http://wombat.doc.ic.ac.uk/foldoc/foldoc.cgi?curried+function
>
> might be misinterpreted as describing the use of lambda to facilitate
> partial application as an *alternative* to currying, but a function that
> is curried because the language defines them that way by default
I don't know what you mean by that.
> is indistinguishable from a function that is curried using lambda
> within the same language. Either way, the point is that the function
> allows for partial application, which is a form of functional
> composition.
I distinguish functional composition from partial application, and
it's hard for me to see one as a form of the other.
> A lot of people (including me) have described funtions that turn a
> function expecting n arguments plus the first argument into a
> function expecting n-1 arguments as "curry." This is slightly
> inaccurate in that it is a combination of "currying" and "partial
> application."
Yes, that's true. curry just operates on a function, not a function
and a first argument.
> I thought that this was your objection. But if your claim is that
> this has nothing to do with currying, you're simply mistaken.
I'm not claiming that. It certainly is, as you say, related to
currying because it combines currying with partial application.
--
Dave Abrahams
Boost Consulting
www.boost-consulting.com
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
David Abrahams wrote:
[...]
> That said, as I understand it, in most "functional languages,"
>
> f x y z
>
> is an ordinary 3-argument function call.
It's not an "ordinary" function call (from the point of view of the C++
programmer) in that in languages like Haskell, Miranda, and SML, f is
already by default curried, which means that it by default supports
partial application. This means you don't have to provide all the
arguments at once. If you provide less than all the arguments, you get
as a result a function expecting the rest of the arguments. The magical
thing is that you can do this as often as you want. In SML:
fun add x y = x + y;
val add_seven = add 7;
val add_one = add 1;
Given the above:
add_seven 3;
evaluates to
10
and
add_one 3;
evaluates to
4
The "add" function is a "curried" function. The "add_seven" and
"add_one" functions are produced by "partial application" of the "add"
function.
There are, however, languages like Scheme and C++ in which functions are
not by default curried. They do not, by default, support partial
application. If such a language also happens to support lambda (like
Scheme), it is easy to turn a such a function into a curried function.
What apparently makes the issue confusing is that in such a language you
can no longer use the 'native' multi-argument tuple calling convention
to evaluate such a function. You must use a sequence of single-argument
calls.
To further confuse the issue, languages that do curry functions by
default may also support tuples, but as a plain data structure that is
not necessarily tied to function evaulation. In such a language, a tuple
looks like a single value in the same way a C++ structure looks like a
single value. If you pass (2,3) to a function, you are not passing two
arguments, but a single argument consisting of a tuple of two values. In
this rather strained sense, such a function is "uncurried." (Don't let
the parentheses distract you. They designate tuple syntax, not function
call syntax.)
With the note that you got them backwards in your last post (you got
them right the first time), that's what's going on here:
fun uncurry f = fn x => fn y => f(x, y);
fun curry f = fn(x, y) => f x y;
Note also that "fn" is just another name for "lambda."
But all this is a distraction from the main point, which is partial
application. Currying and partial application are merely two sides of
the same coin.
-thant
> I'm going to try to keep this reply short. For one thing, you keep
> saying things that make me think you get it, but then you say things
> that make me think you don't get it. So I don't get what you get. Folks
> that are interested in this stuff should learn a functional programming
> language for themselves. It's fun, and, frankly, easier than
> mastering C++.
I know LISP. Not functional enough for you?
<snip>
I'm going to keep this reply even shorter.
> But all this is a distraction from the main point, which is partial
> application. Currying and partial application are merely two sides of
> the same coin.
The point I'm trying to make is, "not really, in the context of a
language like C++." The Boost.Bind library does partial function
application, e.g. bind(f, _1, "foo") or bind(f, "foo", _1) [**]. As
far as I can tell, that has nothing to do with currying.
Incidentally, it also does functional composition, which seems very
distinct from the other two concepts.
[**] Note that there's no restriction on the order of bound arguments.
--
Dave Abrahams
Boost Consulting
www.boost-consulting.com
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
> But all this is a distraction from the main point, which is partial
> application. Currying and partial application are merely two sides of
> the same coin.
David Abrahams wrote:
> The point I'm trying to make is, "not really, in the context of a
> language like C++." The Boost.Bind library does partial function
> application, e.g. bind(f, _1, "foo") or bind(f, "foo", _1) [**]. As
> far as I can tell, that has nothing to do with currying.
The process of building the function object to play the role of closure
over a supplied function argument can be described as currying.
> Incidentally, it also does functional composition, which seems very
> distinct from the other two concepts.
It's distinct, but the first two concepts are a way of enabling the
third (as the Boost Bind documentation describes).
-thant
--
"Death has a tendency to encourage a depressing view of war."
-- Donald Rumsfeld
I see what you mean. I'd missed your example in another sub-thread when
I posted my original comments, so it makes more sense to me now. I
still don't agree with what you're saying, but it makes sense :-)
Allocating the local variables on the heap means an extra heap
allocation and level of indirection when you don't necessarily need it,
i.e. the compiler is just guessing that you need it done that way. If
you go with value semantics (or better yet, some syntax to specify the
semantics) then the programmer can choose. e.g. if you want heap
allocations, you would use a smart pointer as the local variables,
which then get copied (i.e. the *pointers* get copied) into the
anonymous functor. This might be less convenient in the case you do
want heap allocations, but it is (IMHO) more inline with the rest of
C++ and the "if you don't use it you don't pay for it" approach.
Well it sounds like it might be possible for the compiler to decide,
but it looks complicated. What's actually wrong with the idea of using
value (i.e. copying) semantics by default? Or, for that matter, having
some kind of syntax to choose between them? In the case of my
pre-processor, I just introduced two keywords __ref and __ctx which
told it which way to go. This was also necessary because the
preprocessor isn't smart enough to understand the scope of a symbol in
order to know whether it is a global or needs to go in the closure.
Well, the funny thing about this is that the base classes are no longer
anonymous! You have to be able to name the type, and you can't have one
generic "Functor" base class, since the virtual function signatures
must differ depending on the contents of the lambda construct. So then
how do you know what the different base classes are called? How do you
actually write a generic example, especially given that you can't have
virtual member function templates. For my preprocessor, if you omit the
body of the lambda function, it emits the name of the type, which makes
it possible (though awkward) to return a lambda object from a function.
>
> We can add memory management by using the appropriate smart pointers.
We
> can add genericity by turning the above into templates. The result
may be
> verbose and clumsy, but that isn't really anything to do with lambda.
It's
> just that C++ templates are not ideal for this kind of thing. If we
extend
> the language with garbage collection and type inference we will be
able to
> do better - but those should be separate proposals.
Can you suggest how to do a generic comparison functor generator? This
is a pretty simple example which doesn't even requiring deducing
function parameter types (which is why curry is too hard). Here's what
it looks like in clamp:
template<typename T>
lambda bool (T) (const T &)
match (const T &target) {
// Generate a functor which matches
// its argument against a value
// stored in its closure
return lambda bool (const T &candidate) {
return candidate == __ctx(target);
};
}
// usage
std::vector<std::string>::iterator
i = find_if (v.begin(), v.end()
, match (std::string("hello")));
Could you actually present an example of something like this in the
syntax you would propose? It's easy to speculate, but an actual
implementation is probably better for discussion purposes.
--
Raoul Gough.
The important thing is to get the semantics right. If that is done
correctly, the compiler can make it efficient.
For example:
BigInt sum( BigInt *first, BigInt *last ) {
BigInt s = 0;
for_each( first, last, void struct( BigInt &item ) {
s += item;
} );
return s;
}
would become like:
struct lambda__ {
BigInt s;
lambda__() : s(0) {}
void operator()( BigInt &item ) {
s += item;
}
}
BigInt sum( BigInt *first, BigInt *last ) {
lambda__ local__;
for_each( first, last, local__ );
return local__.s;
}
with no heap allocation, no unnecessary copying and no unnecessary
indirection.
BigInt demo() {
BigInt s = 0;
register_lambda( new void struct( BigInt &item ) {
s += item;
} );
return s;
}
would become like:
struct lambda__ { /* as before */ }
BigInt demo() {
lambda__ *pLocal__ = new lambda__;
register_lambda( pLocal__ );
return pLocal__->s;
}
with exactly 1 heap allocation, which has been requested explicitly by the
programmer using "new" and is unavoidable given that register_lambda() can
hold onto its argument indefinitely. There is an indirection in the return
statement, but that too is unavoidable given that s is used from two
places.
> If you go with value semantics (or better yet, some syntax to
> specify the semantics) then the programmer can choose. e.g. if you
> want heap allocations, you would use a smart pointer as the
> local variables, which then get copied (i.e. the *pointers* get
> copied) into the anonymous functor.
In my view having an 's' in the lambda which is different to the 's' in
the enclosing scope defeats much of the objective of adding lambda in the
first place. It's especially confusing if there is no visible declaration
of the second 's', and if there is a declaration, the syntax is getting
needlessly verbose in a context where conciseness is especially important.
With value semantics, the sum() example as written would return 0 always.
I think you'd have to write it as:
BigInt sum( BigInt *first, BigInt *last ) {
BigInt s = 0;
BigInt *ps = &s;
for_each( first, last, void struct( BigInt &item ) {
*ps += item; // Mustn't use s here.
} );
return s;
}
which is more verbose and complicated. The demo example is even worse:
BigInt demo() {
boost::smart_ptr<BigInt> ps( new BigInt( 0 ) );
register_lambda( new void struct( BigInt &item ) {
*ps += item;
} );
return *ps;
}
Now we have 2 heap allocations, and it is much harder for the compiler to
optimise the second one away.
> This might be less convenient in the case you do want heap allocations,
> but it is (IMHO) more inline with the rest of C++ and the "if you don't
> use it you don't pay for it" approach.
My previous article picked a complex case where there were two lambdas
which shared a variable. I wanted to show how that could be handled. The
apparent inefficiency is because of the complexity. Simpler cases can be
optimised by the compiler, so my approach is also "if you don't use it you
don't pay for it".
> Well it sounds like it might be possible for the compiler to decide,
> but it looks complicated.
It leads to simple user code.
> What's actually wrong with the idea of using value (i.e. copying)
> semantics by default?
It gives the wrong semantics in cases like sum(). It is unobvious and
dangerous to have two objects called 's' when only one was declared.
> Or, for that matter, having some kind of syntax to choose between
> them?
I dislike this idea mainly because I don't believe it is necessary. Also
because I believe that any explicit syntax will lead to verbosity, and we
especially want lambdas to be concise.
The specific syntax using __ctx(s) to mean that s will be copied strikes
me as a bit bizarre. What happens if __ctx(s) is used twice in the same
lambda? Two copies? Can I mix __ctx(s) with _ref(s) in the same lambda?
Isn't it simpler if there is only 1 s and 's' always refers to it?
> Can you suggest how to do a generic comparison functor generator?
template <typename T>
struct lambda {
virtual ~lambda() = 0;
virtual lambda<T> *clone() = 0;
virtual bool operator() ( const T &t ) = 0;
};
template <typename T>
struct lambda_handle {
lambda<T> *p;
~lambda_handle() { delete p; }
lambda_handle( lambda<T> *p ) : p(p) {}
lambda_handle( const lambda_handle<T> &rhs ) :
p(rhs.p->clone()) {}
bool operator() ( const T &t ) {
return p->operator()( t );
}
};
template <typename T>
lambda_handle<T> match( const T &target ) {
const T target_copy( target );
return new bool struct( const T &candidate ) : lambda<T> {
return candidate == target_copy;
} );
}
Used as:
std::vector<std::string>::iterator i =
find_if( v.begin(), v.end(), match(std::string("hello")) );
I don't believe it is necessary to copy the target, but I copied it anyway
to show how it is done. It doesn't require new syntax or concepts.
I imagine your implementation, if it works, does something similar to the
above, and the main difference is that it hides more of the machinery
behind the scenes. In other words, you surely have a value/handle class
that can be copied, and you surely have some runtime indirection
equivalent to my virtual functions, and some heap allocation.
I am not sure if it is a good idea to hide the machinery. It risks forcing
the overheads on all uses of lambda, even when they are not needed. Also,
I feel it is fairly important to allow users to create classes with the
usual class syntax (ie not the new lambda syntax), but which are
compatible with lambdas. So I think it is important that this stuff be
expressible as classes and that the machinery not be too hidden.
Of course, the machinery can be made more generic and put in a library.
I've not checked, but I expect my lambda and lambda_handle classes can be
replaced by Alexandrescu's Functor framework from "Modern C++ Design".
Probably boost::function could be used too. If not those then something
like them. I don't believe lambda adds new problems, and the old problems
of genericity and memory management have already been solved.
-- Dave Harris, Nottingham, UK
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
> The important thing is to get the semantics right. If that is done
> correctly, the compiler can make it efficient.
Even if you can actually make a compiler this smart, it only takes one
opaque (non-inlined) function call inside the lambda expression to
defeat this optimization.
--
Dave Abrahams
Boost Consulting
www.boost-consulting.com
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
> Raoul...@yahoo.co.uk (Raoul Gough) wrote (abridged):
[snip]
>> If you go with value semantics (or better yet, some syntax to
>> specify the semantics) then the programmer can choose. e.g. if you
>> want heap allocations, you would use a smart pointer as the
>> local variables, which then get copied (i.e. the *pointers* get
>> copied) into the anonymous functor.
>
> In my view having an 's' in the lambda which is different to the 's' in
> the enclosing scope defeats much of the objective of adding lambda in the
> first place. It's especially confusing if there is no visible declaration
> of the second 's', and if there is a declaration, the syntax is getting
> needlessly verbose in a context where conciseness is especially important.
I think there should be some kind of simple way to get the default
behaviour (whatever that might be) but there also has to be some way
to control it explicitly, IMHO.
>
> With value semantics, the sum() example as written would return 0 always.
> I think you'd have to write it as:
>
> BigInt sum( BigInt *first, BigInt *last ) {
> BigInt s = 0;
> BigInt *ps = &s;
> for_each( first, last, void struct( BigInt &item ) {
> *ps += item; // Mustn't use s here.
> } );
> return s;
> }
>
> which is more verbose and complicated. The demo example is even worse:
>
> BigInt demo() {
> boost::smart_ptr<BigInt> ps( new BigInt( 0 ) );
> register_lambda( new void struct( BigInt &item ) {
> *ps += item;
> } );
> return *ps;
> }
>
> Now we have 2 heap allocations, and it is much harder for the compiler to
> optimise the second one away.
If I understand correctly, you're advocating something like having the
lifetime of a local varible automatically be as long as the longest
lived lambda construct that references it. Maybe this can be done by
the compiler - it sounds a bit scary to me, especially since we're
considering only simple examples at the moment. It probably implies
some kind of reference counting behind the scenes, which it might be
possible to defeat (e.g. by cyclic data structures). The thing for me
is, stack frames are relatively simple and well understood, and what
you're advocating (IIUC) is a pretty fundamental change.
>> This might be less convenient in the case you do want heap allocations,
>> but it is (IMHO) more inline with the rest of C++ and the "if you don't
>> use it you don't pay for it" approach.
>
> My previous article picked a complex case where there were two lambdas
> which shared a variable. I wanted to show how that could be handled. The
> apparent inefficiency is because of the complexity. Simpler cases can be
> optimised by the compiler, so my approach is also "if you don't use it you
> don't pay for it".
>
>
>> Well it sounds like it might be possible for the compiler to decide,
>> but it looks complicated.
>
> It leads to simple user code.
I've got nothing against simple code where the semantics are simple. I
don't like the idea of seemingly simple code that does the right thing
"by magic".
>
>
>> What's actually wrong with the idea of using value (i.e. copying)
>> semantics by default?
>
> It gives the wrong semantics in cases like sum(). It is unobvious and
> dangerous to have two objects called 's' when only one was declared.
>
>
>> Or, for that matter, having some kind of syntax to choose between
>> them?
>
> I dislike this idea mainly because I don't believe it is necessary. Also
> because I believe that any explicit syntax will lead to verbosity, and we
> especially want lambdas to be concise.
I think there are two questions - what does the compiler do in the
simple cases, and are there cases where the programmer has to be able
to specify other semantics? I would say the simplest cases should use
value semantics and there does have to be some means of specifying
alternatives.
>
> The specific syntax using __ctx(s) to mean that s will be copied strikes
> me as a bit bizarre. What happens if __ctx(s) is used twice in the same
> lambda? Two copies? Can I mix __ctx(s) with _ref(s) in the same lambda?
> Isn't it simpler if there is only 1 s and 's' always refers to it?
Actually I think the __ctx keyword is only necessary because the
pre-processor I wrote doesn't keep track of declarations and scopes,
so it doesn't know when a symbol refers to a scope surrounding the
lambda construct. In fact, it doesn't even know what's an identifyer
and what isn't - it just counts braces and identifies its own
keywords. I think a real compiler wouldn't need the __ctx keyword, but
that doesn't necessarily mean that __ref is unnecessary (assuming that
__ctx would be the default).
I think we're starting to see the difficulties imposed by the core
language - and your lambda_handle<T> only works for single-parameter
anonymous functions returning bool. In reality, you would need
something more verbose than that, in order to cover general
cases. Annoyingly, the caller also has to know the kind of function
you're returning, unless we have something like:
automatic_typename func = function_generator(x);
In clamp (my preprocessor) you generate the name of the type by using
the lambda keyword, the closure and parameter lists, and omitting the
function body. It's not pretty, but I don't see any way to avoid
something like this.
>
> I imagine your implementation, if it works, does something similar to the
> above, and the main difference is that it hides more of the machinery
> behind the scenes. In other words, you surely have a value/handle class
> that can be copied, and you surely have some runtime indirection
> equivalent to my virtual functions, and some heap allocation.
Not necessarily heap allocation - it uses an internal pointer to
member function, allowing the same type to be used by multiple
different anonymous functions (and for the objects to be
passed/returned by value). The following match example:
// Define a templated function that returns a function object
template<typename T>
lambda bool (T) (const T &) match (const T &target)
{
return lambda bool (const T &candidate) {
return candidate == __ctx(target);
};
}
... generates code like this:
// Template used for all anonymous functions with one closure
// parameter and one call parameter
template <typename T1, typename T2, typename T3>
struct lambda_template_1_1
{
T1 (lambda_template_1_1::*mFn) (T3); // PTMF
T2 mContext1;
typedef T1 result_type;
typedef T2 context_type_1;
typedef T3 argument_type_1;
typedef T3 argument_type;
// Constructor accepts PTMF and closure
lambda_template_1_1 (T1 (lambda_template_1_1::*fn) (T3), T2 c1)
: mFn (fn)
, mContext1 (c1)
{ }
// Forwarding function
T1 operator() (T3 p1)
{
return (this->*mFn)(p1);
}
T1 fn1 (T3);
// Any other anonymous functions with same number
// of closure and call parameters:
//
// T1 fn2 (T3);
// ...
private:
lambda_template_1_1 operator= (lambda_template_1_1 const &);
};
// Generator for first anonymous function in the program
// with one closure and one call parameter
template <typename T1, typename T3>
struct lambda_generator_1
{
template <typename T2>
static lambda_template_1_1 <T1, T2, T3>
generate (T2 p1)
{
return lambda_template_1_1 <T1, T2, T3> (&lambda_template_1_1 <T1, T2, T3>::fn1, p1);
}
};
// Other generators...
// Body of the first anonymous function
template <typename T1, typename T2, typename T3>
T1 lambda_template_1_1<T1, T2, T3>::fn1 (T3 candidate)
{
return candidate == mContext1;
}
// Other function bodies...
// The implementation of the original match function
template<typename T>
lambda_template_1_1<bool, T, const T &> match (const T &target)
{
return lambda_generator_1<bool, const T &>::generate (target);
}
>
> I am not sure if it is a good idea to hide the machinery. It risks forcing
> the overheads on all uses of lambda, even when they are not needed. Also,
> I feel it is fairly important to allow users to create classes with the
> usual class syntax (ie not the new lambda syntax), but which are
> compatible with lambdas. So I think it is important that this stuff be
> expressible as classes and that the machinery not be too hidden.
Hmmmm... you might have a point there. But then again, I think the
fundamental problem is that the language doesn't have very good tools
for dealing with function objects in general. Introducing anonymous
functions just makes this problem more prominent.
>
> Of course, the machinery can be made more generic and put in a library.
> I've not checked, but I expect my lambda and lambda_handle classes can be
> replaced by Alexandrescu's Functor framework from "Modern C++ Design".
> Probably boost::function could be used too. If not those then something
> like them.
Yes, I think I should have made my generated code compatible with
boost functors or something along those lines (I mean things like the
names of the member typedefs). It wouldn't be a major change.
> I don't believe lambda adds new problems, and the old problems
> of genericity and memory management have already been solved.
I'm maybe not quite so convinced of that :-)
--
Raoul Gough.
export LESS='-X'
Can you give an example of what you mean? One in which the optimisation
gives the wrong result because of something an opaque function does?
It seems to me that the called function cannot make it go wrong because it
does not have access to the local variables of the calling function. Only
the lambda expression itself has that access, and of course the lambda
expression cannot be opaque.
And I believe the compiler is already allowed to allocate stack frames, or
parts of stack frames, from the heap; and it can decide whether to do so
on a function-by-function or variable-by-variable basis. Stack frames are
not objects. But maybe I am misunderstanding something.
-- Dave Harris, Nottingham, UK
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
> da...@boost-consulting.com (David Abrahams) wrote (abridged):
>> > The important thing is to get the semantics right. If that is done
>> > correctly, the compiler can make it efficient.
>>
>> Even if you can actually make a compiler this smart, it only takes one
>> opaque (non-inlined) function call inside the lambda expression to
>> defeat this optimization.
>
> Can you give an example of what you mean? One in which the
> optimisation gives the wrong result because of something an opaque
> function does?
Well, we'd never do that. If the compiler couldn't prove that no
references to the local variable were going to outlive the variable's
block scope, it would be forced to put the variable on the heap, thus
defeating the optimization. If the called function's definition is
not available in the caller's translation unit, and a reference to a
local is passed, well: no optimization.
> It seems to me that the called function cannot make it go wrong because it
> does not have access to the local variables of the calling
> function.
It does if the lambda expression passes such a reference.
void frobnicate(double& x, int& n); // defined elsewhere;
// might store &n
int n = 3;
std::for_each(a.begin(), a.end(), frobnicate(_1, n));
// or however you want to spell the lambda syntax
> Only the lambda expression itself has that access, and of course the
> lambda expression cannot be opaque.
>
> And I believe the compiler is already allowed to allocate stack
> frames, or parts of stack frames, from the heap; and it can decide
> whether to do so on a function-by-function or variable-by-variable
> basis. Stack frames are not objects. But maybe I am misunderstanding
> something.
Sure, it can do all of those things. But no compiler I have ever seen
does that today -- it's an anti-optimization. Under your scheme,
IIUC, keeping auto variables on the stack would become an optimization
that was only allowed when it was provably okay.
While we're on the subject, it seems perverse to do this sort of
deduction for lambdas only and not for all other uses of references
and pointers to automatic variables.
void frobnicate(double&, int&);
int n = 3;
double pie = 3.14;
frobnicate(pie, n); // shouldn't we allocate n on the heap?
As I've said earlier in this thread, trying to automatically manage
the lifetimes of local variables in this way seems to run counter to
the spirit of C++.
--
Dave Abrahams
Boost Consulting
www.boost-consulting.com
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
Did I not explain that variables would only ever be moved to the heap if
the lambda expression is preceded with the keyword "new"? The compiler
doesn't make that choice. It's a programmer decision, whether they want
the lambda to outlive the function defining it.
This example:
BigInt sum( BigInt *first, BigInt *last ) {
BigInt s = 0;
for_each( first, last, void struct( BigInt &item ) {
s += item;
} );
return s;
}
does not use a heap allocation because there is no "new" before the lambda
expression. No compiler smarts required. And this example:
BigInt demo() {
BigInt s = 0;
register_lambda( new void struct( BigInt &item ) {
s += item;
} );
return s;
}
does need a heap allocation because there is a "new" before the "void
struct".
The thing is, a function might have 2 lambda expressions, both using
"new", and which share a local variable. Then that variable needs to
out-live both lambdas, so it conceptually needs its own heap allocation.
In general, N lambdas means N+1 heap allocations. The optimisation is to
just use 1 heap allocation if there is only 1 lambda. You can get
cleverer, but that simple rule will be enough for the common cases.
> While we're on the subject, it seems perverse to do this sort of
> deduction for lambdas only and not for all other uses of references
> and pointers to automatic variables.
In my view there should never be a heap allocation which is not explicitly
requested by the programmer with the keyword "new".
> void frobnicate(double&, int&);
> int n = 3;
> double pie = 3.14;
> frobnicate(pie, n); // shouldn't we allocate n on the heap?
Only if, like the lambda expression, it uses "new". As in:
void frobnicate(double&, int&);
int *n = new int(3);
double *pie = new double(3.14);
frobnicate(*pie, *n);
-- Dave Harris, Nottingham, UK
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
I am suggesting the default semantics is reference, and if you want copy
semantics you make the copy yourself outside of the lambda expression.
But I am talking about copying the variables, not the lambda value itself.
> If I understand correctly, you're advocating something like having the
> lifetime of a local varible automatically be as long as the longest
> lived lambda construct that references it.
Yes. Actually I am suggesting two kinds of lambda, one with auto/stack
lifetime and one with heap lifetime. The programmer uses the "new" keyword
to say which they want - that's not a compiler choice. You've described
the semantics of the heap version.
> Maybe this can be done by the compiler - it sounds a bit scary to
> me, especially since we're considering only simple examples at
> the moment. It probably implies some kind of reference counting
> behind the scenes, which it might be possible to defeat (e.g. by
> cyclic data structures).
Reference counting is a viable implementation. I don't see how you can
possibly get cycles. Each lambda expression can point to its enclosing
scope, but the enclosing scope cannot point back. All the pointing is
uni-directional so there can't be cycles.
I hadn't previously thought about lambdas nested inside other lambdas, so
thanks for bringing it up, but I believe the above logic applies.
> The thing for me is, stack frames are relatively simple and
> well understood, and what you're advocating (IIUC) is a
> pretty fundamental change.
I don't believe you can avoid heap allocations and keep correct semantics
in all cases, without forcing static polymorphism (ie templates). I want a
match() function that returns a result I can use with non-template code.
(More on this below.)
> I think we're starting to see the difficulties imposed by the core
> language - and your lambda_handle<T> only works for single-parameter
> anonymous functions returning bool. In reality, you would need
> something more verbose than that, in order to cover general
> cases.
Indeed. I've not studied the boost stuff, but it seems to me we end up
with:
template <typename T>
lambda<bool, (T)> match( const T &target ) {
return new bool struct( const T &candidate ) {
return candidate == target;
} );
}
where "lambda" is a template similar to boost::function. The boost design
avoids the need for the lambda expression to inherit from a standard base
class, which helps the syntax.
> Annoyingly, the caller also has to know the kind of function
> you're returning [...]
> In clamp (my preprocessor) you generate the name of the type by using
> the lambda keyword, the closure and parameter lists, and omitting the
> function body. It's not pretty, but I don't see any way to avoid
> something like this.
Your example code helped me understand your thinking and why you're so
keen to use copy semantics. Thanks.
A difference between my and your approach is that your returned type
includes the closure argument list and mine doesn't. I have mixed feelings
about this. I appreciate that your approach gives you value semantics
without needing a heap allocation. However, I feel the closure argument
list is a detail of the lambda expression which callers should not be
exposed to.
For example, I want to write:
template <typename T>
lambda<bool, (T)> match2( T target1, T target2 ) {
return new bool struct( const T &candidate ) {
return candidate == target1 || candidate == target2;
} );
}
and have the type returned by match2() be the same as the type returned by
match(). The caller of the lambda should not need to know what the lambda
is doing or how many variables it uses internally. Without the facility to
hide those details, I don't think you have a complete lambda solution.
Do you think it would be reasonable to add something like boost::function
as another layer on top of the lambda_template_1_1 template?
> Not necessarily heap allocation - it uses an internal pointer to
> member function, allowing the same type to be used by multiple
> different anonymous functions (and for the objects to be
> passed/returned by value).
It seems to me that using member functions means that the compiler needs
to see every lambda expression in the program before it can create the
lambda_template_1_1 template, which makes it hard to support dynamic
linking.
Which I see as a big flaw, but not an essential one. You can use
non-member functions instead, and still keep the return-by-value without
needing a heap allocation. Along the lines of:
template <typename T1, typename T2, typename T3>
struct lambda_template_1_1 {
T1 (*pFn) ( lambda_template_1_1<T1,T2,T3> *pThis, T3 );
T2 mContext1;
// Forwarding function
T1 operator() (T3 p1) {
return (*pFn)(this, p1);
}
// ...
};
// Body of the first anonymous function
template <typename T1, typename T2, typename T3>
T1 fn1( lambda_template_1_1<T1, T2, T3> *pThis, T3 candidate ) {
return candidate == pThis->mContext1;
}
Now the template does not depend on the number of anonymous functions
which use it. We can add new anonymous functions without recompiling the
code which defines existing ones. No special link-time support is needed.
-- Dave Harris, Nottingham, UK
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
> da...@boost-consulting.com (David Abrahams) wrote (abridged):
>> Well, we'd never do that. If the compiler couldn't prove that no
>> references to the local variable were going to outlive the variable's
>> block scope, it would be forced to put the variable on the heap, thus
>> defeating the optimization.
>
> Did I not explain that variables would only ever be moved to the heap if
> the lambda expression is preceded with the keyword "new"?
If so, I missed it; sorry.
> The thing is, a function might have 2 lambda expressions, both using
> "new", and which share a local variable. Then that variable needs to
> out-live both lambdas, so it conceptually needs its own heap allocation.
> In general, N lambdas means N+1 heap allocations.
How could a single lambda ever require more than one heap allocation?
> The optimisation is to just use 1 heap allocation if there is only 1
> lambda. You can get cleverer, but that simple rule will be enough
> for the common cases.
How is that an optimization?
>> While we're on the subject, it seems perverse to do this sort of
>> deduction for lambdas only and not for all other uses of references
>> and pointers to automatic variables.
>
> In my view there should never be a heap allocation which is not explicitly
> requested by the programmer with the keyword "new".
SO I've heard.
>
>> void frobnicate(double&, int&);
>> int n = 3;
>> double pie = 3.14;
>> frobnicate(pie, n); // shouldn't we allocate n on the heap?
>
> Only if, like the lambda expression, it uses "new". As in:
>
> void frobnicate(double&, int&);
> int *n = new int(3);
> double *pie = new double(3.14);
> frobnicate(*pie, *n);
Ah, but there's a whole layer of weird semantics you have in the
lambda case: you're talking about giving regular local variables a
lifetime that extends beyond their scope, which never happens
otherwise.
--
Dave Abrahams
Boost Consulting
www.boost-consulting.com
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
Call it a pessimisation to do otherwise, if you prefer.
> Ah, but there's a whole layer of weird semantics you have in the
> lambda case: you're talking about giving regular local variables a
> lifetime that extends beyond their scope, which never happens
> otherwise.
Agreed.
It is a little bit like extending the lifetimes of temporary objects if
they are bound to const references; only a little bit but there is
precedence.
Saying that it is weird is not itself a killer argument against it. It
seemed to me to be a practical way to get reference semantics. And if we
don't have reference semantics we get implicit copying, which I think is
treacherous; and we need more convoluted code to work around the copying
when we need to get information back out of the lambda expression.
-- Dave Harris, Nottingham, UK
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
> Raoul...@yahoo.co.uk (Raoul Gough) wrote (abridged):
>> I think there should be some kind of simple way to get the default
>> behaviour (whatever that might be) but there also has to be some way
>> to control it explicitly, IMHO.
>
> I am suggesting the default semantics is reference, and if you want copy
> semantics you make the copy yourself outside of the lambda expression.
It's interesting enough to discuss possible implementations, but I
wonder what the motivation for defaulting to reference semantics is?
It just seems to me that value semantics is not only easier to define
and implement, but also fits closer to the C origins of pass-by-value.
Of course it also allows the possibility of reference semantics via
pointer indirection, at the cost of less convenient syntax in this
case.
[snip]
>> Maybe this can be done by the compiler - it sounds a bit scary to
>> me, especially since we're considering only simple examples at
>> the moment. It probably implies some kind of reference counting
>> behind the scenes, which it might be possible to defeat (e.g. by
>> cyclic data structures).
>
> Reference counting is a viable implementation. I don't see how you can
> possibly get cycles. Each lambda expression can point to its enclosing
> scope, but the enclosing scope cannot point back. All the pointing is
> uni-directional so there can't be cycles.
I haven't really thought of a plausible example, but I wouldn't go so
far as to claim that such an example cannot exist. For example, what
about the following:
lambda_type *foo() {
lambda_type *functor = new struct {
// something referencing functor
};
return functor;
}
Actually, the whole issue of how to manage your "new" lambda objects
is not clear to me. I think you suggested using a handle class, so
this would probably be something like:
lambda_handle functor = new struct {
// something referencing functor
};
[snip]
>> I think we're starting to see the difficulties imposed by the core
>> language - and your lambda_handle<T> only works for single-parameter
>> anonymous functions returning bool. In reality, you would need
>> something more verbose than that, in order to cover general
>> cases.
>
> Indeed. I've not studied the boost stuff, but it seems to me we end up
> with:
>
> template <typename T>
> lambda<bool, (T)> match( const T &target ) {
> return new bool struct( const T &candidate ) {
> return candidate == target;
> } );
> }
>
> where "lambda" is a template similar to boost::function. The boost design
> avoids the need for the lambda expression to inherit from a standard base
> class, which helps the syntax.
Well, I tried the following program (with a somewhat outdated version
of boost):
#include <boost/lambda/lambda.hpp>
int match (int target) {
return (boost::lambda::_1 == target);
}
int main() {
match(4);
}
Clearly attempting to return the lambda body through type "int"
produces a compilation error. The error includes the real type of the
expression, which looks like this:
boost::lambda::lambda_functor<
boost::lambda::lambda_functor_base<
boost::lambda::relational_action<
boost::lambda::equal_action
>,
boost::tuples::tuple<
boost::lambda::lambda_functor<
boost::lambda::placeholder<1>
>,
const int, boost::tuples::null_type, [...]
> > >
I think this shows the problem with naming the type of the lambda
expression pretty clearly. I don't really understand the boost lambda
library, but I assume the "const int" in there is the type of the
closure parameter.
>
>
>> Annoyingly, the caller also has to know the kind of function
>> you're returning [...]
>> In clamp (my preprocessor) you generate the name of the type by using
>> the lambda keyword, the closure and parameter lists, and omitting the
>> function body. It's not pretty, but I don't see any way to avoid
>> something like this.
>
> Your example code helped me understand your thinking and why you're so
> keen to use copy semantics. Thanks.
Yes, I wanted to avoid virtual functions so that the objects could be
passed by value. My original motivation stemmed from a discussion with
Thant Tessman a few years ago in this newsgroup, and one of the issues
was whether garbage collection was a big factor in implementing lambda
(IIRC). In my opinion, the biggest issues actually arise from the need
to name types explicitly (ignoring deduced contexts for a minute).
>
> A difference between my and your approach is that your returned type
> includes the closure argument list and mine doesn't. I have mixed feelings
> about this. I appreciate that your approach gives you value semantics
> without needing a heap allocation. However, I feel the closure argument
> list is a detail of the lambda expression which callers should not be
> exposed to.
>
> For example, I want to write:
>
> template <typename T>
> lambda<bool, (T)> match2( T target1, T target2 ) {
> return new bool struct( const T &candidate ) {
> return candidate == target1 || candidate == target2;
> } );
> }
>
> and have the type returned by match2() be the same as the type returned by
> match(). The caller of the lambda should not need to know what the lambda
> is doing or how many variables it uses internally. Without the facility to
> hide those details, I don't think you have a complete lambda solution.
Yes, using virtual functions would avoid the need for including the
closure parameters in the return type. However, it does introduce the
expense of having a handle class (presumably with reference counting),
or requiring everybody to deal with pointers to functors everywhere.
It's a trade-off, and I wanted to avoid reference counting or other
forms of garbage collection.
>
> Do you think it would be reasonable to add something like boost::function
> as another layer on top of the lambda_template_1_1 template?
>
Yes, I should certainly have made it compatible to one extent or
another, but hopefully not so far as using types like the one shown
above.
This is a good point, I wish I'd thought of that originally. If I get
time to update clamp, I'll include this along with correct code for
two-phase name lookup (which will be a lot trickier to do I expect).
--
Raoul Gough.
export LESS='-X'
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
> It is a little bit like extending the lifetimes of temporary objects if
> they are bound to const references; only a little bit but there is
> precedence.
>
> Saying that it is weird is not itself a killer argument against it. It
> seemed to me to be a practical way to get reference semantics.
You can have reference semantics without automatic lifetime
management. I don't believe automatic lifetime management is
appropriate for C++. I think the number of cases where it will
actually be needed is sufficiently small that it makes sense to do it
manually, or with a little library help.
> And if we don't have reference semantics we get implicit copying,
> which I think is treacherous;
> and we need more convoluted code to work around the copying when we
> need to get information back out of the lambda expression.
I agree that implicit copying would be bad. That's why I proposed
just holding ordinary references to stack variables mentioned in a
lambda. In fact, if your proposal only allows
reference semantics with automatic lifetime management
and
copying
I would oppose it vigorously, because IMO the most natural and
efficient approach is to use reference semantics with no lifetime
management.
--
Dave Abrahams
Boost Consulting
www.boost-consulting.com
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
If lambda supplies built in garbage collection, it will be used for that:
struct foo;
typedef lambda<foo&()> shared_foo;
struct foo
{
shared_foo x;
};
shared_foo make_foo()
{
foo f;
return new foo& struct() { return f; }
}
shared_foo f = make_foo();
f().x = f;
Which gives you a cycle.
Daniel
OK.
> I agree that implicit copying would be bad. That's why I proposed
> just holding ordinary references to stack variables mentioned in a
> lambda. In fact, if your proposal only allows
>
> reference semantics with automatic lifetime management
>
> and
>
> copying
>
> I would oppose it vigorously, because IMO the most natural and
> efficient approach is to use reference semantics with no lifetime
> management.
I agree. And that's not my proposal. My proposal is an extension of
yours, and compatible with it. I'd happily jettison the extra if it
endangered getting lambdas into the language. However, I also think it is
important, in support of your proposal, to note that it can be extended in
this way, if in the future we decide we want functions to be able to
return lambdas.
-- Dave Harris, Nottingham, UK
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
> da...@boost-consulting.com (David Abrahams) wrote (abridged):
>> I agree that implicit copying would be bad. That's why I proposed
>> just holding ordinary references to stack variables mentioned in a
>> lambda. In fact, if your proposal only allows
>>
>> reference semantics with automatic lifetime management
>>
>> and
>>
>> copying
>>
>> I would oppose it vigorously, because IMO the most natural and
>> efficient approach is to use reference semantics with no lifetime
>> management.
>
> I agree. And that's not my proposal.
Good; I wasn't sure.
> My proposal is an extension of yours, and compatible with it. I'd
> happily jettison the extra if it endangered getting lambdas into the
> language. However, I also think it is important, in support of your
> proposal, to note that it can be extended in this way, if in the
> future we decide we want functions to be able to return lambdas.
That sounds good.
--
Dave Abrahams
Boost Consulting
www.boost-consulting.com
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
My proposal doesn't supply general purpose garbage collection. That would
need a different, orthogonal proposal. I have merely pointed out that the
deletion of the stack frame can be piggy-backed onto the deletion of the
lambda object(s). The programmer is still responsible for deleting those
lambda objects that she creates on the heap with "new".
> struct foo;
> typedef lambda<foo&()> shared_foo;
>
> struct foo
> {
> shared_foo x;
> };
>
> shared_foo make_foo()
> {
> foo f;
> return new foo& struct() { return f; }
> }
>
> shared_foo f = make_foo();
> f().x = f;
>
> Which gives you a cycle.
Yes, but not a harmful one. You have a new expression without a
corresponding delete. Either you have proper garbage collection, in which
case it will be able to handle cycles, or else you are not, in which case
it is a memory leak. Lambda doesn't change this.
Incidently, the type "lambda<foo&()>" is not something I have proposed.
Nor does Raoul propose it as far as I can tell. Your use of that type
makes it hard to translate your code into C++98 to see what is actually
going on.
-- Dave Harris, Nottingham, UK
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
I see the main use being downward funargs; in other words, passing them to
std::for_each and similar. There reference semantics does the right thing
in the simplest, clearest and most efficient way.
Upward funargs, in other words functions which return lambda values, I see
as relatively rare and esoteric. David Abrahams has argued that this case
doesn't need direct support at all; that we should have to write out the
classes by hand. Although I would accept that, I also wanted to explore
how upward funargs could be supported as an extension to the basic case.
Also, I see reference semantics as being what "closure" means. We can see
a variable in the enclosing scope, and the lambda expression refers to
that variable. It doesn't refer to some other variable that happens to
have the same name and type and value but is different. I don't know of
any language in which closures work like that.
I'm less sure about this now than I was when I started, though.
> lambda_type *foo() {
> lambda_type *functor = new struct {
> // something referencing functor
> };
>
> return functor;
> }
Here functor is a plain pointer, not a counted one.
At some point the programmer will cause the lambda object to be deleted.
Nothing the compiler does behind the scenes will change that. And when the
lambda object is deleted, the associated stack frame will be deleted too.
If the compiler uses reference counting to achieve that, then it will be
with a separate system to any user-defined count and the user will not be
able to affect it. In this case the count will never be more than 1.
-- Dave Harris, Nottingham, UK
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
I see how this makes sense. The question of default behaviour didn't
arise in my preprocessor because the programmer has to do something
explicit to refer to variables from the surrounding scope, and so
specifies references or copies via __ref or __ctx.
>
> Upward funargs, in other words functions which return lambda values,
I see
> as relatively rare and esoteric. David Abrahams has argued that this
case
> doesn't need direct support at all; that we should have to write out
the
> classes by hand. Although I would accept that, I also wanted to
explore
> how upward funargs could be supported as an extension to the basic
case.
This also raises the issue of explicitly naming the type of the
lambda-generated functor. I haven't seen a really satisfactory solution
to this.
For the record, I *don't* like the idea of implicitly creating
references which will then almost certainly be dangling.
Why not? I thought we also saw an example where a function "registered"
a lambda functor, e.g. in a global variable. If we treat these functors
as first class objects, you can have arbitrarily many references to a
stack frame. Then you can also have this:
lambda_type *foo() {
int x = 0;
lambda_type functor1 = struct {
// something referencing x
};
return new struct {
// something referencing functor1
};
}
Or some such. Maybe functor1 also gets allocated via new. Anyway, now
functor1 and foo's return value both reference the stack frame.
Deleting the returned pointer cannot cleanup the frame via simple
refernce counting.
Seems to me that what you're proposing would require proper garbage
collection, for the very good reason that you are introducing implicit
reference semantics, as seen in Java, Python, lisp, etc.
--
Raoul Gough.
Because in this case only one functor is created per frame. In other cases
there may be more, of course. Eg:
struct Base {
virtual ~Base() = 0;
virtual int operator()() = 0;
};
void fillit( std::vector<Base *> &vec ) {
int local = 0;
for (size_t i = 0; i < vec.size(); ++i)
vec[i] = new int struct() : Base {
return ++local;
};
}
Here the count can be arbitrarily large, so we need something as powerful
as reference counting.
Responsibility for deleting the elements lies with the programmer, as
before. Somewhere there will be a loop with delete vec[i]. When the last
element of the vector is deleted, the stack frame must be deleted too.
Reference counting is still sufficient.
> I thought we also saw an example where a function "registered"
> a lambda functor, e.g. in a global variable. If we treat these functors
> as first class objects, you can have arbitrarily many references to a
> stack frame.
Yes, so you need reference counting with arbitrarily high counts.
> Then you can also have this:
>
> lambda_type *foo() {
> int x = 0;
> lambda_type functor1 = struct {
> // something referencing x
> };
> return new struct {
> // something referencing functor1
> };
> }
Let's see how it might be translated into C++98. (Apologies for the
length, but I doubt I can get away with not showing the code.)
struct functor2_t__;
struct frame__ {
int x;
struct functor1_t__ {
frame__ *pFrame__;
functor1_t__( frame__ *pFrame__ ) : pFrame__(pFrame__) {
}
int &operator() {
pframe__->++x; // Something referencing x.
return &pframe__->x;
}
};
functor1_t__ functor1__;
functor2_t__ *pFunctor2__;
size_t count__;
frame__() : count__(0), x(0) {
}
void addref__() {
++count__;
}
void release__() {
if (--count__)
delete this;
}
};
struct functor2_t__ {
frame__ *pFrame__;
functor2_t__( frame__ *pFrame__ ) : pFrame__(pFrame) {
pFrame__->addref__();
}
functor2_t__( const functor2_t__ &rhs ) : pFrame__(rhs.pFrame__) {
pFrame__->addref__();
}
~functor2_t__() {
pFrame__->release__();
}
int operator() {
// Something referencing functor1.
return pFrame__->functor1();
};
};
functor2__ *foo() {
frame__ *pFrame = new frame__;
pFrame->addref__();
pFrame->pFunctor2__ = new functor2_t__( pFrame );
pFrame->release__();
return pFrame->pFunctor2__;
}
This is another case where the refcounting can be optimised away, but I
include it anyway. It would be more efficient to move x into functor1_t__
and skip the back pointer. I also gave the frame a pointer to functor2;
allocating functor1 with new would be similar.
A key point is that functor2_t__ is not refcounted. It's instance is
created by the user with new, so it must be deleted by the user with
delete. The pointer from stack frame to functor is a not a counted
pointer.
Another key point is that functor1 is embedded in the stack frame, so does
not outlive the stack frame, so does not keep the stack frame alive. The
pointer from functor1_t__ to frame__ is not a counted pointer.
> Or some such. Maybe functor1 also gets allocated via new.
If so, then the functor1_t__ is no longer embedded in frame__. Then the
pointer from functor1_t__ to frame__ becomes counted.
> Anyway, now functor1 and foo's return value both reference the
> stack frame.
Your code doesn't show a return value for functor1. I added a reference to
x. It could return a reference to itself if desired. Since it is not a
counted reference, it makes no difference.
It can't return a direct reference to the stack frame, because stack
frames are not user-level objects. It can return references to variables
in the stack frame, like x.
> Deleting the returned pointer cannot cleanup the frame via simple
> refernce counting.
The function returns a pointer to a functor2. When that is deleted, its
destructor will decrement the frame's refcount and the frame will be
deleted. It works.
I really can't see how you can make it go wrong. Only one object is
reference counted, namely the stack frame. How can there be a loop?
-- Dave Harris, Nottingham, UK
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
> Raoul...@yahoo.co.uk (Raoul Gough) wrote (abridged):
>>> In this case the count will never be more than 1.
>>
>> Why not?
>
> Because in this case only one functor is created per frame. In other cases
> there may be more, of course. Eg:
>
> struct Base {
> virtual ~Base() = 0;
> virtual int operator()() = 0;
> };
>
> void fillit( std::vector<Base *> &vec ) {
> int local = 0;
> for (size_t i = 0; i < vec.size(); ++i)
> vec[i] = new int struct() : Base {
> return ++local;
> };
> }
>
> Here the count can be arbitrarily large, so we need something as powerful
> as reference counting.
>
> Responsibility for deleting the elements lies with the programmer, as
> before. Somewhere there will be a loop with delete vec[i]. When the last
> element of the vector is deleted, the stack frame must be deleted too.
> Reference counting is still sufficient.
Are you really proposing to extend the lifetime of *all* local
variables if even one of them is used from within a lambda expression?
That seems as though it could wreak havoc with all kinds of
expectations. Consider that a scoped mutex lock might be used, for
example.
--
Dave Abrahams
Boost Consulting
www.boost-consulting.com
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
No. The idea is that referring to a variable from the enclosing scope
would extended the lifetime of that variable, but not any others.
-- Dave Harris, Nottingham, UK
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
> da...@boost-consulting.com (David Abrahams) wrote (abridged):
>> Are you really proposing to extend the lifetime of *all* local
>> variables if even one of them is used from within a lambda expression?
>
> No. The idea is that referring to a variable from the enclosing scope
> would extended the lifetime of that variable, but not any others.
Oh. You seemed to be saying that you wanted to reference count the
whole stack frame.
--
Dave Abrahams
Boost Consulting
www.boost-consulting.com
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
> Raoul...@yahoo.co.uk (Raoul Gough) wrote (abridged):
[snip]
>> Or some such. Maybe functor1 also gets allocated via new.
>
> If so, then the functor1_t__ is no longer embedded in frame__. Then the
> pointer from functor1_t__ to frame__ becomes counted.
Does it matter whether the object is actually embedded in frame__ or
not? What if frame__ contains an auto_ptr to it? e.g.
std::auto_ptr<functor2> foo () {
std::auto_ptr<functor1> f1p (new struct {
// some recursive use of f1p
});
return std::auto_ptr<functor2> (new struct {
// some other use of f1p
});
}
[snip]
>
> I really can't see how you can make it go wrong. Only one object is
> reference counted, namely the stack frame. How can there be a loop?
It's just that we know reference counting doesn't correctly handle
cyclic data structures. Even if we can't think of an example
immediately, how do you prove that such an example can't exist? What
is it about the lambda syntax you propose that excludes the
possibility of cyclic references?
--
Raoul Gough.
export LESS='-X'
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
> da...@boost-consulting.com (David Abrahams) wrote (abridged):
>> Are you really proposing to extend the lifetime of *all* local
>> variables if even one of them is used from within a lambda expression?
>
> No. The idea is that referring to a variable from the enclosing scope
> would extended the lifetime of that variable, but not any others.
What if your functor references local variable foo, which itself holds
a (plain) pointer or reference to a different local variable bar. If
you don't extend the lifetime of the whole frame then foo becomes a
dangling reference.
--
Raoul Gough.
export LESS='-X'
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
Closures that survive the dynamic scope of their creation impose
significant demands on the memory management model. They are not
that different from an object that retains a pointer/reference to
a local variable.
The closure is essentially a pair <environemnt, code>. In Java,
the "environment" contains copies of captured locals, which works
out reasonably well for objects (they're only references and they're
garbage collected), but for "builtin" types (int, boolean, etc...)
the environment captures copies that are distinct (identity-wise)
from the locals.
One solution is to create a second-class closure type incapable
of surviving dynamic scope. That is: something you can put in
an argument list, but cannot express as a local, global, or instance
variable. I think some dialects of Pascal had something of this sort:
procedure foo(var a:int, procedure b(x:real)); (* OK *)
var b : procedure (x:real); (* syntax error *)
In C++, this probably requires introducing new syntax and semantics,
since function pointers and functors are not constrained in this
manner.
--
A. Kanawati
NO.anto...@comcast.net
Indeed. That's what happens. It's not my intention to remove dangling
references from the whole language.
I believe it's what happens with your approach, too.
-- Dave Harris, Nottingham, UK
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
> Raoul...@yahoo.co.uk (Raoul Gough) wrote (abridged):
>> What if your functor references local variable foo, which itself holds
>> a (plain) pointer or reference to a different local variable bar. If
>> you don't extend the lifetime of the whole frame then foo becomes a
>> dangling reference.
>
> Indeed. That's what happens. It's not my intention to remove dangling
> references from the whole language.
Halfway solutions are often harmful, because they lead to confusion
and a false sense of security (or insecurity, and the feature doesn't
get used). I'd rather leave out the implicit lifetime management and
do it the way we always have, if extended lifetime is required:
explicitly allocate the data on the heap, and manage it with an
appropriate mechanism such as shared_ptr.
--
Dave Abrahams
Boost Consulting
www.boost-consulting.com
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
> da...@boost-consulting.com (David Abrahams) wrote (abridged):
>> I'd rather leave out the implicit lifetime management and
>> do it the way we always have, if extended lifetime is required:
>> explicitly allocate the data on the heap, and manage it with an
>> appropriate mechanism such as shared_ptr.
>
> Do you mean, without using lambda expressions? Or do you think lambda
> expressions should use value semantics? I don't see what you have in mind
> here. Giving the lambda object a reference to a shared_ptr won't achieve
> lifetime extension.
Oh, I see the problem. Yeah, a tough nut to crack. I guess I'd be
inclined to annotate uses of variables that should be copied into the
lambda expression. Probably the way to do it is define a general
notation that means "evaluate this immediately instead of lazily. So
_1 + immediate(a + b)
would store a copy of (a + b) in the lambda expression.
_1 + *immediate(p)
would store a copy of p in the lambda expression. Obviously the
correct notation for "immediate" is up for grabs.
Cheers,