STL algorithm member function problem

li...@givemefish.com

unread,

Nov 15, 2006, 4:59:15 PM11/15/06

to

Hi all,

I have an application that works on a series of file names stored in a
vector<string>.
I have a set of rules that I want to apply to that list to narrow the
list down. Each rule operates on a single file name, and it determine
whether the filename should remain or be removed from the list of
filenames. There are several different types of rules, each inherited
from a base class Rule, which is inherited from unary_function. The
list of rules is stored in a vector as boost shared_ptr objects of type
Rule*.

I want to iterate over all the rules in the list and apply them to the
filename list using the remove_if algorithm. But I am having trouble
with the remove_if call.

Thus the code looks something like this:

class Rule : public std::unary_function<std::string, bool> {
public:
virtual result_type operator()(const argument_type&) const =0;
};

typedef boost::shared_ptr<Rule> RulePtr;

///////////////
// Concrete rules are inherited from the Rule class and define
// the operator()

.....

vector<string> fileList;
// .... fill fileList with a series of file names ...

//Now remove items from the fileList according to the rules.
//Iterate over all the rules in the rules vector. Apply the rule
to each of
// items in the fileList and remove it if the rule returns true.
for(vector<RulePtr>::const_iterator ruleIter = m_rules.begin();
ruleIter != m_rules.end(); ++ruleIter) {

// This is the problem line...
std::remove_if(fileList.begin(), fileList.end(), ptr_fun(
xxx );
}

Can someone help me with the correct syntax for the remove_if function?
I can't seem to get it right.

Thanks,
Matt.

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Alberto Ganesh Barbati

unread,

Nov 15, 2006, 9:07:47 PM11/15/06

to

li...@givemefish.com ha scritto:

>
> class Rule : public std::unary_function<std::string, bool> {
> public:
> virtual result_type operator()(const argument_type&) const =0;
> };
>
> typedef boost::shared_ptr<Rule> RulePtr;
>
> ///////////////
> // Concrete rules are inherited from the Rule class and define
> // the operator()
>
> .....
>
> vector<string> fileList;
> // .... fill fileList with a series of file names ...
>
> //Now remove items from the fileList according to the rules.
> //Iterate over all the rules in the rules vector. Apply the rule
> to each of
> // items in the fileList and remove it if the rule returns true.
> for(vector<RulePtr>::const_iterator ruleIter = m_rules.begin();
> ruleIter != m_rules.end(); ++ruleIter) {
>
> // This is the problem line...
> std::remove_if(fileList.begin(), fileList.end(), ptr_fun(
> xxx );
> }

This code suffer of a common but very serious bug. But let's answer the
question first and the bug later.

If Rule wasn't a polymorphic class, the correct answer would be the
simplest one:

std::remove_if(fileList.begin(), fileList.end(), *ruleIter);

However in your case it won't work, because the static type of *ruleIter
is Rule which is an abstract class which can't be passed by-value to
remove_if. Notice that making Rule non-abstract doesn't help because of
the slicing problem: as it's passed by-value the dynamic type is lost.

Passing by-value and polymorphism are things that don't go together
well. Functors and predicate are usually passed by-value, so you should
have realized by now that having a polymorphic predicate wasn't a good
idea in the first place.

What I suggest is to change the design like this:

class RuleBase {
public:
virtual bool operator()(const std::string&) const =0;
};

class Rule : public std::unary_function<std::string, bool> {

boost::shared_ptr<RuleBase> rule;

public:
Rule(const boost::shared_ptr<RuleBase>& r) : rule(r)
{}

// nonvirtual!
result_type operator()(const argument_type& s) const
{
return (*rule)(s);
}
};

Of course now all your rules shall inherit from RuleBase, not Rule.

Then you have two alternatives:

1) you replace your vector<RulePtr> with vector<Rule> (a Rule is nothing
more than a RulePtr after all) and write:

std::remove_if(fileList.begin(), fileList.end(), *ruleIter);

2) you keep using your vector<RulePtr> and write:

std::remove_if(fileList.begin(), fileList.end(), Rule(*ruleIter));

Now for the bug...

The bad news is that despite what the name remove_if suggests, the
algorithm std::remove_if does *not* actually remove elements from the
container (read very carefully the footnote [1] in
http://www.sgi.com/tech/stl/remove_if.html again and again until you
understand this point, before continuing to read).

So the correct loop to do all removal is (using suggestion n.1 above):

vector<string> end = fileList.end();
for(vector<Rule>::const_iterator ruleIter = m_rules.begin();
ruleIter != m_rules.end(); ++ruleIter)
{
end = std::remove_if(fileList.begin(), end, *ruleIter);
}
fileList.erase(end, fileList.end());

However, if you have a lot of strings and a lot of predicates, the
performance of this loop are suboptimal because strings are moved a lot
of times. It is much better IMHO to move the loop *in the predicate*, as in:

struct RuleSet // add unary_function here if you want, it's not needed
{
const vector<RulePtr>& rules;

RuleSet(const vector<RulePtr>& r) : rules(r)
{}

bool operator()(const string& s) const
{

for(vector<RulePtr>::const_iterator ruleIter = m_rules.begin();
ruleIter != m_rules.end(); ++ruleIter)
{

if((*ruleIter)(s))
return true;
}
return false;
}
};

(notice that in this case you don't need to follow neither of my two
advices described above)

Now the entire code will magically turn up to be, simply:

fileList.erase(
std::remove_if(fileList.begin(), fileList.end(), RuleSet(m_rules)),
fileList.end());

Simple, isn't it?

HTH,

Ganesh

Markus Moll

unread,

Nov 15, 2006, 9:10:17 PM11/15/06

to

Hi

li...@givemefish.com wrote:

> I want to iterate over all the rules in the list and apply them to the
> filename list using the remove_if algorithm. But I am having trouble
> with the remove_if call.
>
> Thus the code looks something like this:
>
> class Rule : public std::unary_function<std::string, bool> {
> public:
> virtual result_type operator()(const argument_type&) const =0;
> };
>
> typedef boost::shared_ptr<Rule> RulePtr;

This means "Rule" should have a virtual destructor.

> vector<string> fileList;
> // .... fill fileList with a series of file names ...
>
> //Now remove items from the fileList according to the rules.
> //Iterate over all the rules in the rules vector. Apply the rule
> to each of
> // items in the fileList and remove it if the rule returns true.
> for(vector<RulePtr>::const_iterator ruleIter = m_rules.begin();
> ruleIter != m_rules.end(); ++ruleIter) {
>
> // This is the problem line...
> std::remove_if(fileList.begin(), fileList.end(), ptr_fun(
> xxx );
> }

Two problems:

1. remove_if is not doing what you think it does. remove_if(b,e,pred)
returns an iterator k such that the range [b,k) contains those elements x
for which pred(x) is false. Nevertheless, no elements are actually removed
from the underlying container (you couldn't, only given the iterators. And
maybe you couldn't at all. Consider int arr[42] }

2. ptr_fun is not doing what you think it does. It is just a wrapper around
ordinary pointers to functions, so that you suddenly have the member
typedefs argument_type and result_type. The predicate you are looking for
is *ruleIter.

Correcting both, we get:

vector<string>::iterator current_end = fileList.end();

for(vector<RulePtr>::const_iterator ruleIter = m_rules.begin();
ruleIter != m_rules.end(); ++ruleIter) {

current_end = std::remove_if(fileList.begin(), current_end, *ruleIter);
}

// you might want to erase trailing garbage...
fileList.erase(current_end, fileList.end());

Markus

Carl Barron

unread,

Nov 16, 2006, 5:15:21 AM11/16/06

to

In article <1163606212.5...@h48g2000cwc.googlegroups.com>,
<li...@givemefish.com> wrote:

remove_if does not remove anything that is fileLIst.size() [before
remove_if(...) ] is equal to fileList.size() [after remove_if(...)]

remove_if returns an iterator so that
[fileList.begin(),result_of_remove_if) is a range containing the non
removed items and [result_of_remove_if,fileList.end()) is garbage.
if you want to really remove the garbage then
fileList.erase
(
std::remove_if(fileList.begin(),fileList.end(),pred),
fileList.end()
);
actually removes the items satisfying pred. that is those
whose for which pred returns true.

struct remove_by_rule
{
std::vector<std::string> &fileList;
remove_by_rule(std::vector<std::string> &a):fileList(a){}
void operator () (RulePtr p)
{
fileList.erase
(
std::remove_if(fileList.begin(),fileList.end(),*p),
fileList.end()
);
}
};

int main()
{
// ...
std::for_each(m_rules.begin(),m_rules.end(),
remove_rule(fileList) );
// ...
}

easier to read in my opinion and avoids explicitly naming iterator
types for iterators local to the loops actually performed. Holding
a non constant reference in the functor remove_by_rule assures the
same vector will be used no matter how many copies of remove_by_rule
are preformed and those copies will be 'cheap'. It also avoids copying
after the for_each() is performed.

iwongu

unread,

Nov 16, 2006, 5:18:15 AM11/16/06

to

If you can use boost::shared_ptr, you might be able to use
boost::function.

<code>
typedef vector<function<bool (string)> > RuleGrp;
RuleGrp rules;

rules.push_back(YourPredicate1());
rules.push_back(YourPredicate2());

// filtering
vector<string>::iterator last = filelist.end();

for(RuleGrp::iterator iter = rules.begin();
iter != rules.end(); ++iter)
{
last = remove_if(filelist.begin(), last, *iter)
}

filelist.erase(last, filelist.end());
</code>

HTH,

iwongu

li...@givemefish.com

unread,

Nov 16, 2006, 5:25:57 AM11/16/06

to

Markus Moll wrote:
> Hi
>
> li...@givemefish.com wrote:
>
> > I want to iterate over all the rules in the list and apply them to the
> > filename list using the remove_if algorithm. But I am having trouble
> > with the remove_if call.
> >
> > Thus the code looks something like this:
> >
> > class Rule : public std::unary_function<std::string, bool> {
> > public:
> > virtual result_type operator()(const argument_type&) const =0;
> > };
> >
> > typedef boost::shared_ptr<Rule> RulePtr;
>
> This means "Rule" should have a virtual destructor.

Thanks. It actually does, but to keep everything brief, I condensed
all the code.

> > vector<string> fileList;
> > // .... fill fileList with a series of file names ...
> >
> > //Now remove items from the fileList according to the rules.
> > //Iterate over all the rules in the rules vector. Apply the rule
> > to each of
> > // items in the fileList and remove it if the rule returns true.
> > for(vector<RulePtr>::const_iterator ruleIter = m_rules.begin();
> > ruleIter != m_rules.end(); ++ruleIter) {
> >
> > // This is the problem line...
> > std::remove_if(fileList.begin(), fileList.end(), ptr_fun(
> > xxx );
> > }
>
> Two problems:
>
> 1. remove_if is not doing what you think it does. remove_if(b,e,pred)
> returns an iterator k such that the range [b,k) contains those elements x
> for which pred(x) is false. Nevertheless, no elements are actually removed
> from the underlying container (you couldn't, only given the iterators. And
> maybe you couldn't at all. Consider int arr[42] }

Thanks.

> 2. ptr_fun is not doing what you think it does. It is just a wrapper around
> ordinary pointers to functions, so that you suddenly have the member
> typedefs argument_type and result_type. The predicate you are looking for
> is *ruleIter.
>
> Correcting both, we get:
>
> vector<string>::iterator current_end = fileList.end();
> for(vector<RulePtr>::const_iterator ruleIter = m_rules.begin();
> ruleIter != m_rules.end(); ++ruleIter) {
>
> current_end = std::remove_if(fileList.begin(), current_end, *ruleIter);
> }
>
> // you might want to erase trailing garbage...
> fileList.erase(current_end, fileList.end());
>
> Markus
>

This doesn't work, however. Derefencing the iterator returns a
RulePtr, which is typedef'd as a boost::shared_ptr<Rule>. Derefencing
this, i.e., (*(*ruleIter)), results in an error that the compiler
cannot construct the abstract base class Rule.

Thanks,
Matt.

Alberto Ganesh Barbati

unread,

Nov 16, 2006, 5:23:05 AM11/16/06

to

Markus Moll ha scritto:

>
> 1. remove_if is not doing what you think it does. remove_if(b,e,pred)
> returns an iterator k such that the range [b,k) contains those elements x
> for which pred(x) is false. Nevertheless, no elements are actually removed
> from the underlying container (you couldn't, only given the iterators. And
> maybe you couldn't at all. Consider int arr[42] }
>
> 2. ptr_fun is not doing what you think it does. It is just a wrapper around
> ordinary pointers to functions, so that you suddenly have the member
> typedefs argument_type and result_type. The predicate you are looking for
> is *ruleIter.
>
> Correcting both, we get:
>
> vector<string>::iterator current_end = fileList.end();
> for(vector<RulePtr>::const_iterator ruleIter = m_rules.begin();
> ruleIter != m_rules.end(); ++ruleIter) {
>
> current_end = std::remove_if(fileList.begin(), current_end, *ruleIter);
> }
>

This code is not doing what you think it does ;-) It suffers the slicing
problem. In fact it won't even compile because Rule is an abstract
class, which can't be passed by-value, but would it be incorrect
regardless of that. See my other post for details.

Ganesh

Seungbeom Kim

unread,

Nov 16, 2006, 8:40:27 AM11/16/06

to

Alberto Ganesh Barbati wrote:

> If Rule wasn't a polymorphic class, the correct answer would be the
> simplest one:
>
> std::remove_if(fileList.begin(), fileList.end(), *ruleIter);
>
> However in your case it won't work, because the static type of *ruleIter
> is Rule which is an abstract class which can't be passed by-value to
> remove_if. Notice that making Rule non-abstract doesn't help because of
> the slicing problem: as it's passed by-value the dynamic type is lost.

ruleIter has the type vector<RulePtr>::const_iterator, so *ruleIter has
the type RulePtr, which is a typedef for boost::shared_ptr<Rule>.
Therefore, there's no problem with abstractness or slicing.

Maybe it was **ruleIter that you meant? Then there can be such problems.
But I guess there is a way to wrap a pointer to Rule into a predicate
class and pass it to std::remove_if. I tried

std::mem_fun(&Rule::operator());

and it was fine, but I couldn't bind a RulePtr into the first argument
(even with Rule* as RulePtr):

typedef Rule* RulePtr;
RulePtr p;
std::bind1st(std::mem_fun(&Rule::operator()), p);

gave me errors. The error disappeared when I changed the argument of
Rule::operator() from const argument_type& to argument_type. But even
that doesn't seem to help with boost::shared_ptr<Rule> as RulePtr.
Can someone explain this situation?

--
Seungbeom Kim

Alberto Ganesh Barbati

unread,

Nov 16, 2006, 12:52:54 PM11/16/06

to

Carl Barron ha scritto:

> struct remove_by_rule
> {
> std::vector<std::string> &fileList;
> remove_by_rule(std::vector<std::string> &a):fileList(a){}
> void operator () (RulePtr p)
> {
> fileList.erase
> (
> std::remove_if(fileList.begin(),fileList.end(),*p),
> fileList.end()
> );
> }
> };
>
> int main()
> {
> // ...
> std::for_each(m_rules.begin(),m_rules.end(),
> remove_rule(fileList) );
> // ...
> }
>
> easier to read in my opinion and avoids explicitly naming iterator
> types for iterators local to the loops actually performed. Holding
> a non constant reference in the functor remove_by_rule assures the
> same vector will be used no matter how many copies of remove_by_rule
> are preformed and those copies will be 'cheap'. It also avoids copying
> after the for_each() is performed.
>

Except that it doesn't work because *p is an abstract class and that
it's very inefficient because strings may be moved many times. It's
better to keep remove_if/erase as the "outer loop" rather than keep it
in the "inner loop".

Ganesh

James Kanze

unread,

Nov 16, 2006, 1:12:19 PM11/16/06

to

> > .....

> std::remove_if(fileList.begin(), fileList.end(), *ruleIter);

Supposing, of course, that he changed the vector to contain
Rule, and not RulePtr.

> However in your case it won't work, because the static type of *ruleIter
> is Rule which is an abstract class which can't be passed by-value to
> remove_if. Notice that making Rule non-abstract doesn't help because of
> the slicing problem: as it's passed by-value the dynamic type is lost.

> Passing by-value and polymorphism are things that don't go together
> well. Functors and predicate are usually passed by-value, so you should
> have realized by now that having a polymorphic predicate wasn't a good
> idea in the first place.

Which of course could be considered a major design flaw.

Of course, it's possible to make value based objects behave
polymorphically, by means of the letter-envelope idiom.

> What I suggest is to change the design like this:

> class RuleBase {
> public:
> virtual bool operator()(const std::string&) const =0;
> };

> class Rule : public std::unary_function<std::string, bool> {
> boost::shared_ptr<RuleBase> rule;

> public:
> Rule(const boost::shared_ptr<RuleBase>& r) : rule(r)
> {}

> // nonvirtual!
> result_type operator()(const argument_type& s) const
> {
> return (*rule)(s);
> }
> };

> Of course now all your rules shall inherit from RuleBase, not Rule.

That's a sort of a variant of the letter/envelope idiom. In a
case like this, where the interface is extremely narrow, it may
even be preferable (just as using shallow copy instead of deep
works if the objects are inmutable). The letter/envelope idiom
has the advantage that you don't need to duplicate the interface
in two separate, formally unrelated classes. (It also has the
advantage that it is an established pattern, immediately
recognizable as such. And the disadvantage that if you forget
to override one of the virtual functions in a derived class, the
compiler won't complain, since the virtual functions in the base
are not pure virtual. The code will core dump rather quickly,
however.)

[...]

> However, if you have a lot of strings and a lot of predicates, the
> performance of this loop are suboptimal because strings are moved a lot
> of times. It is much better IMHO to move the loop *in the predicate*, as in:

The loop over the rules, you mean.

Independantly of the performance issues, I'd favor this from the
design point of view. You apply *one* rule, which just happens
to be a composite rule. You can even make the solution more
complex, with and's and or's of the different rules; his case is
just a degenerate OR node:

while ( current != end && !(*current)( s ) ) {
++ current ;
}
return current != end ;

For and:

while ( current != end && (*current)( s ) ) {
++ current ;
}
return current == end ;

For completeness, let's not forget not:

return ! (*current)( s ) ;

Obviously, you can put OR nodes in the list of an AND, and vice
versa. Add an REMatch node, based on boost::regular_expression,
and you can create just about any condition imaginable.

(In fact, it's probably possible to construct just about any
rule with a single regular expression. But the expression
becomes extremely complex, and boost::regular_expression
requires it to be built up as a string. My own
RegularExpression class allows the or'ing of already constructed
regular expressions, and converts the results to a DFA on the
fly, so it would be much faster than the looping constructs
above, but it only handles pure regular expressions, and none of
the extensions that are almost always present elsewhere.)

--
James Kanze (GABI Software) email:james...@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Carl Barron

unread,

Nov 16, 2006, 9:03:29 PM11/16/06

to

In article <fZX6h.44807$uv5.3...@twister1.libero.it>, Alberto Ganesh
Barbati <Alberto...@libero.it> wrote:

A solution using boost/tr1 function<bool(string const &)> avoids any
slicing discusion, but I am not so sure there is ever a conversion from
Rule & to Rule in the expansion of remove_if. [shared_ptr::operator *()
returns a Rule & not a Rule].

As far as copying strings, there is the same # of copies performed
whether a home made loop is done or for_each is performed since the
container is held by reference, The # of copies from std::remove_if is
the same if the erase is performed immediately or delayed by using an
iterator pair and erasing upon exit from for_each. If the copying is a
problem use a list<string> and list<string>::remove_if() member
function and no copies will be performed and erasure is automatic.

I don't see it being seriously inefficient since references are passed
and not containers and the code is merely replacing a homemade loop
with with explicit iterator types and template type deduction not to
clutter the code with those nested types.

By the way it compiles with CW9.

Carl Barron

unread,

Nov 17, 2006, 3:43:53 AM11/17/06

to

In article <161120061749010683%cbarr...@adelphia.net>, Carl Barron
<cbarr...@adelphia.net> wrote:

> In article <fZX6h.44807$uv5.3...@twister1.libero.it>, Alberto Ganesh
> Barbati <Alberto...@libero.it> wrote:
>
> > Except that it doesn't work because *p is an abstract class and that
> > it's very inefficient because strings may be moved many times. It's
> > better to keep remove_if/erase as the "outer loop" rather than keep it
> > in the "inner loop".
> >
> A solution using boost/tr1 function<bool(string const &)> avoids any
> slicing discusion, but I am not so sure there is ever a conversion from
> Rule & to Rule in the expansion of remove_if. [shared_ptr::operator *()
> returns a Rule & not a Rule].
>
> As far as copying strings, there is the same # of copies performed
> whether a home made loop is done or for_each is performed since the
> container is held by reference,

> [snip]
Ignore the response as it compiles because it does not work.

I prefer a container of function<bool(const std::string &)> It does not
require any derivation from a base class for any of the functor
objects, functions, or even member functions of a given class/struct.

Further I like the idea for testing each string in the listing and
removing those which succeed for any of the container<function<...> >.

If copying is a problem then consider finding the element to remove,
swapping it with the last entry, decrementing end then if end is
returned the range[end, container.end()) can be erased.

assuming that [test_begin,test.end) is a sequence of iterators so that
it in the sequence (*it)(const std::string &) is a tests if a condition
is satified by the argument. then

BI is a bidirectional iterator, and For is a forward_iterator.

template <class BI,class For>
BI remove_if_any
(
BI begin,BI end,
For test_begin,For test_end
)
{
while((begin = std::find_if(begin,end,any_of(test_begin,test_end)))
!=end)
{
using std::swap;
if(--end == begin)
{
break; // one entry left and we want it gone...
}
swap(*begin,*end);
}
return end;
}

behaves like std::remove_if does not erase anything.
This swaps the strings out of the sequence but rearranges the order
of the strings. If order is unimportant and the size and or # is large
this may be more efficient.

Markus Moll

unread,

Nov 18, 2006, 4:41:24 PM11/18/06

to

Hi

Alberto Ganesh Barbati wrote:

> This code is not doing what you think it does ;-) It suffers the slicing
> problem. In fact it won't even compile because Rule is an abstract
> class, which can't be passed by-value, but would it be incorrect
> regardless of that. See my other post for details.

Yeah, I realized too late that some kind of reference-wrapper would be
needed. I promise not to post untested code anymore... (well, say, for the
next two days... ;-) )

Markus

Alberto Ganesh Barbati

unread,

Nov 19, 2006, 4:30:51 PM11/19/06

to

Carl Barron ha scritto:

>
> If copying is a problem then consider finding the element to remove,
> swapping it with the last entry, decrementing end then if end is
> returned the range[end, container.end()) can be erased.
>

> <code snipped>

>
> behaves like std::remove_if does not erase anything.
> This swaps the strings out of the sequence but rearranges the order
> of the strings. If order is unimportant and the size and or # is large
> this may be more efficient.
>

Using swap instead of assignment is definitely a good idea when swapping
is cheaper than copying. However, you don't need to reinvent the wheel.
The standard library already has an algorithm that essentially is a
remove_if with swapping, it's std::partition. The only difference is
that it takes a logically negated predicate. Notice that if the order if
important, you can also use std::stable_partition, although it's going
to be less efficient, of course.

Ganesh