N4009: return type of erase_if

226 views
Skip to first unread message

jgot...@gmail.com

unread,
May 24, 2014, 9:45:04 PM5/24/14
to std-pr...@isocpp.org
N4009 proposes a new template function, erase_if, that takes a reference to container and a predicate as parameters and erases every element of the container for which that predicate is true. It is overloaded for each container type, so for instance the vector version is

template <class T, class A, class Predicate>
 
void erase_if(vector<T, A>& c, Predicate pred);



I think this is a very good idea except for the return type of the function.  Often, when coders are erasing more than one element from a container they need to know the number of elements that were erased.  While for std::vector it would be easy to compute this by saving the size before the erasure and after the erase_if operator and subtracting, if a container does not provide size() or has a linear-time size() function this isn't that easy. Also, the version of container's erase() function that takes a value returns the number of elements erased.  Therefore, I think the erase_if function should return the number of elements that were erased. It can be declared as

template <class T, class A, class Predicate>
 
typename vector<T, A>::size_type erase_if(vector<T, A>& c, Predicate pred);



Joe Gottman

David Krauss

unread,
May 25, 2014, 12:25:27 AM5/25/14
to std-pr...@isocpp.org

On 2014–05–25, at 9:45 AM, jgot...@gmail.com wrote:

Therefore, I think the erase_if function should return the number of elements that were erased. It can be declared as

template <class T, class A, class Predicate>
 
typename vector<T, A>::size_type erase_if(vector<T, A>& c, Predicate pred);

Agreed. Also, I don’t buy the proposal’s reasons for being a (non-generic) non-member. It’s making a lot of excuses not to simply apply list::remove_if to the other containers.

Considering list, why doesn’t list::remove_if already return size_type?

Dain Bray

unread,
May 25, 2014, 12:44:59 PM5/25/14
to std-pr...@isocpp.org
Making it a non-member allows you to use type tags to implement it, instead of re-implementing it for every container, so you really only need to specialize for 1) containers that support erase(remove_if()) 2) list/forward list 3) associative containers

 I don't really care if it returns the # removed, but having implemented this years ago in my own code base, I found 40 cases where I had used it, and I don't recall ever wanting the # removed.

gmis...@gmail.com

unread,
May 26, 2014, 7:32:11 PM5/26/14
to std-pr...@isocpp.org, jgot...@gmail.com
Hi
I'm keen on N4009.  I'm not strongly opposed to the return value being changed, but I think void return is the right answer.
I say this because while I've found knowing a count of erased items occasionally useful, I've mostly not needed it in my own code.

I also think if you want to know a count, you can supply a predicate that does the counting. Right?
This would seem to be inline with the don't pay for what you don't use principle.

I can also imagine (perhaps in a future stl) where you are erasing over some kind of container that is stream like making the count logically infinite. That's not a well thought out worry or probably relevant here; but it might be something to think about in future designs.

If there is going to be a return value though, I'm not sure which is the more useful return value to return: a count or what's been erased, or an index to the position of the first erased item? Which is more useful? Or perhaps a pair containing both values? This would not be valid for all container types but it could be zero in that case I imagine.

Ultimately though I'm of the opinion the void return value is the right answer, on the basis that if you want a count you can count it yourself easy enough via the predicate and not forcing the algorithm to provide a count might make the implementation easier and provide more optimisation opportunities.

Thanks

David Krauss

unread,
May 26, 2014, 8:09:33 PM5/26/14
to std-pr...@isocpp.org
On 2014–05–26, at 12:44 AM, Dain Bray <dain...@gmail.com> wrote:

Making it a non-member allows you to use type tags to implement it, instead of re-implementing it for every container, so you really only need to specialize for 1) containers that support erase(remove_if()) 2) list/forward list 3) associative containers

list and forward_list need separate implementations using splice and splice_after, respectively.

Users wouldn’t get to use those tags, and so need to reimplement it for their own containers. The standard library could use a CRTP base to write the member function generically, or let the members simply dispatch to a generic non-member. (This is arguably simpler than tag dispatching.) Library details shouldn’t be an interface consideration; by generic I meant for user-defined containers.

I don't really care if it returns the # removed, but having implemented this years ago in my own code base, I found 40 cases where I had used it, and I don't recall ever wanting the # removed.

I don’t think this should be a vote. Sometimes you need to strictly account for the size of a set which is not stored in a container. If the result is discarded, the compiler will be able to elide its computation after inlining. Seldom does an increment instruction impact performance anyway. There’s really no trade-off.

Magnus Fromreide

unread,
May 27, 2014, 2:35:24 AM5/27/14
to std-pr...@isocpp.org, jgot...@gmail.com
On Mon, May 26, 2014 at 04:32:10PM -0700, gmis...@gmail.com wrote:
> Hi
>
> On Sunday, May 25, 2014 1:45:04 PM UTC+12, jgot...@gmail.com wrote:
>
> > N4009 <https://isocpp.org/files/papers/n4009.txt> proposes a new template
If the predicate is supposed to do any processing then I think the right
return value might be the predicate, similarly to for_each.

This should also work well together with the proxy support from N4035.

/MF

David Krauss

unread,
May 27, 2014, 3:23:01 AM5/27/14
to std-pr...@isocpp.org
On 2014–05–27, at 2:35 PM, Magnus Fromreide <ma...@lysator.liu.se> wrote:

If the predicate is supposed to do any processing then I think the right
return value might be the predicate, similarly to for_each.

Ideally the user won’t be writing their own predicate, and even if they do, it’s easier to have it write to a reference to a variable, than to give it a member getter function. I don’t think many folks ever availed of for_each like that, and these days most predicates are lambdas.

Optimization is easy with a local variable that is modified in a loop and then perhaps left unused. It gets harder when that variable is a member of a predicate object, or accessed by it through a reference. Returning size_type provides the available information at essentially no cost. Not returning size_type makes the one-size-fits-all solution impossible, for no real benefit.

This should also work well together with the proxy support from N4035.

I don’t see the connection. A predicate should not be used as a proxy for a side-effect result, and changing operator size_type() to operator auto() would only be even more obscure.

vadim.pet...@gmail.com

unread,
May 27, 2014, 7:14:15 AM5/27/14
to std-pr...@isocpp.org
I'm curious, is there any general policy in the library on changing or not changing "return void" to something in any way useful?

Because assuming the statement
If the result is discarded, the compiler will be able to elide its computation after inlining.
is true, a lot of member functions, for example, could potentially benefit from returning *this instead of void, but they still return void for some reason.

David Krauss

unread,
May 27, 2014, 12:18:03 PM5/27/14
to std-pr...@isocpp.org
On 2014–05–27, at 7:14 PM, vadim.pet...@gmail.com wrote:

I'm curious, is there any general policy in the library on changing or not changing "return void" to something in any way useful?

Because assuming the statement
If the result is discarded, the compiler will be able to elide its computation after inlining.
is true, a lot of member functions, for example, could potentially benefit from returning *this instead of void, but they still return void for some reason.

Returning this (or *this) is of no help to the CPU, which already has access to the value from wherever it got it prior to the call.

Some ABIs (well, at least one, for Windows x86 if I recall correctly) return this from constructors, so as to relieve register pressure, but that’s a special case of integration between the optimizer and the ABI.

Anyway, I’m unclear on the connection with my statement. Optimizers can perform lifetime analysis on local variables, but variables accessed through pointers or references are trickier. Inlining a function may allow its local variables to become locals in the calling function, and then lifetime analysis may discover that the “number erased” result is never used aside from increment instructions. If all this works correctly, which is reasonable to suppose but far from a sure thing, then counting but ignoring the number of erased elements is free. That’s all I’m saying.

Matthew Woehlke

unread,
May 27, 2014, 12:44:50 PM5/27/14
to std-pr...@isocpp.org
On 2014-05-27 08:45, David Krauss wrote:
> On 2014-05-27, at 7:14 PM, vadim.pet...@gmail.com wrote:
>> I'm curious, is there any general policy in the library on changing
>> or not changing "return void" to something in any way useful?
>>
>> Because assuming the statement If the result is discarded, the
>> compiler will be able to elide its computation after inlining. is
>> true, a lot of member functions, for example, could potentially
>> benefit from returning *this instead of void, but they still return
>> void for some reason.
>
> Returning this (or *this) is of no help to the CPU, which already has
> access to the value from wherever it got it prior to the call.

I suspect the benefit here isn't optimization, but rather code writing,
e.g. to be able to write:

std::vector<int> v;
v.clear().push_back(2).insert(v.begin(), 1); // etc.

(Not the best example, but hopefully you get the idea.)

--
Matthew

David Krauss

unread,
May 27, 2014, 10:17:25 PM5/27/14
to std-pr...@isocpp.org

On 2014–05–28, at 12:44 AM, Matthew Woehlke <mw_t...@users.sourceforge.net> wrote:

> On 2014-05-27 08:45, David Krauss wrote:
>> On 2014-05-27, at 7:14 PM, vadim.pet...@gmail.com wrote:
>>> I'm curious, is there any general policy in the library on changing
>>> or not changing "return void" to something in any way useful?
>>>
>>> Because assuming the statement If the result is discarded, the
>>> compiler will be able to elide its computation after inlining. is
>>> true, a lot of member functions, for example, could potentially
>>> benefit from returning *this instead of void, but they still return
>>> void for some reason.
>>
>> Returning this (or *this) is of no help to the CPU, which already has
>> access to the value from wherever it got it prior to the call.
>
> I suspect the benefit here isn't optimization, but rather code writing,
> e.g. to be able to write:

There was probably some misunderstanding about what I meant by “elide computation.”

> std::vector<int> v;
> v.clear().push_back(2).insert(v.begin(), 1); // etc.
>
> (Not the best example, but hopefully you get the idea.)

As good an example as any. That’s not the direction of C++. A carriage return and a variable declaration never killed anyone.


jgot...@gmail.com

unread,
May 27, 2014, 10:54:01 PM5/27/14
to std-pr...@isocpp.org, mw_t...@users.sourceforge.net


On Tuesday, May 27, 2014 12:44:50 PM UTC-4, Matthew Woehlke wrote:

I suspect the benefit here isn't optimization, but rather code writing,
e.g. to be able to write:

  std::vector<int> v;
  v.clear().push_back(2).insert(v.begin(), 1); // etc.

(Not the best example, but hopefully you get the idea.)



I get the idea, and this example accidentally shows why this is not a good idea.  The v.begin() passed to the insert() function might be the value of begin() at the beginning of the statement.  If the push_back() allocated memory (i.e. if v started with capacity() ==  0), then that iterator will be invalidated before the insert can be done, resulting in undefined behavior.  

Matthew Woehlke

unread,
May 28, 2014, 11:28:53 AM5/28/14
to std-pr...@isocpp.org
(Getting way OT...) If that's actually possible I'd be inclined to call
it a bug in the language. I'm having a hard time imagining under what
circumstances it's desirable to evaluate the arguments for a method
invocation before you are able to know what object the method will be
invoked against.

Of course, you could always write instead:

v.insert(v.clear().push_back(2).begin(), 1);

...but that's just getting really ugly.

There are probably better examples of methods that return void that
could return themselves where call chaining would be more useful /
intuitive and less error prone, but I agree that going crazy with it is
probably not good.

--
Matthew

David Krauss

unread,
May 28, 2014, 8:13:43 PM5/28/14
to std-pr...@isocpp.org
On 2014–05–28, at 11:28 PM, Matthew Woehlke <mw_t...@users.sourceforge.net> wrote:

On 2014-05-27 22:54, jgot...@gmail.com wrote:
On Tuesday, May 27, 2014 12:44:50 PM UTC-4, Matthew Woehlke wrote:
I suspect the benefit here isn't optimization, but rather code writing, 
e.g. to be able to write: 

 std::vector<int> v; 
 v.clear().push_back(2).insert(v.begin(), 1); // etc. 

(Not the best example, but hopefully you get the idea.) 

I get the idea, and this example accidentally shows why this is not a good 
idea.  The v.begin() passed to the insert() function might be the value of 
begin() at the beginning of the statement.  If the push_back() allocated 
memory (i.e. if v started with capacity() ==  0), then that iterator will 
be invalidated before the insert can be done, resulting in undefined 
behavior.   

(Getting way OT...) If that's actually possible I'd be inclined to call
it a bug in the language.

The real bug is that we don’t have a way to explicitly express such ambiguity without putting things all on one line.

If you want subexpressions to be evaluated in deterministic order, then write in Java.

I'm having a hard time imagining under what
circumstances it's desirable to evaluate the arguments for a method
invocation before you are able to know what object the method will be
invoked against.

You (the programmer) are responsible for making order of evaluation inconsequential. The compiler should be free to schedule things for the convenience of the CPU.

Reply all
Reply to author
Forward
0 new messages