Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Pruning a std::map efficiently?

13 views
Skip to first unread message

Jorgen Grahn

unread,
Apr 1, 2009, 4:04:54 AM4/1/09
to
Around here, March--April is apple tree pruning time. I have a slighly
different problem:

I have a std::map<K, T> where I want to move all T (or pair<K, T>)
which match some predicate Pred to some other container. The map may
have a size in the millions, and Pred often matches all, or 50%, or
10% or so of the elements.

I'd prefer to use a standard algorithm for this, rather than iterating
manually, dealing with iterator invalidation and so on. Avoiding
repeated tree rebalancing would be welcome, but I suspect it's not
possible.

What's a good std algorithm for this? I guess std::remove_if() doesn't
work on maps -- since you cannot reorder the elements.

Since T is a pointer type in my case, I have some other options. I can
build a new map with only the !Pred elements in it, swap it with the
original, and blow away the original. Not sure that I would gain
performance from that (even more rebalancing?) but at least the code
would be clean ...

/Jorgen

--
// Jorgen Grahn <grahn@ Ph'nglui mglw'nafh Cthulhu
\X/ snipabacken.se> R'lyeh wgah'nagl fhtagn!

alasha...@gmail.com

unread,
Apr 1, 2009, 6:59:28 AM4/1/09
to
Hello,

This could be one implementation: copy_if is not part of the standard
library, but is easy to implement (see The C++ Programming Language)

template<typename In, typename Out, typename Pred>
Out copy_if(In first, In last, Out res, Pred Pr)
{
while (first != last)
{
if (Pr(*first))
*res++ = *first;
++first;
}
return res;
}

It is then possible to copy from map m1 to m2, applying predicate
Pred:

copy_if( m1.begin(), m1.end(), std::inserter( m2, m2.begin() ),
Pred );

Regards

Jorgen Grahn

unread,
Apr 1, 2009, 9:41:08 AM4/1/09
to
On Wed, 1 Apr 2009 03:59:28 -0700 (PDT), alasha...@gmail.com <alasha...@gmail.com> wrote:
> Hello,
>
> This could be one implementation: copy_if is not part of the standard
> library, but is easy to implement (see The C++ Programming Language)
>
> template<typename In, typename Out, typename Pred>
> Out copy_if(In first, In last, Out res, Pred Pr)
...

>
> It is then possible to copy from map m1 to m2, applying predicate
> Pred:
>
> copy_if( m1.begin(), m1.end(), std::inserter( m2, m2.begin() ),
> Pred );

Thanks, but it's not really what I asked for. That was part of my
fallback solution, which I knew how to implement, but felt was
suboptimal.

Three things I noted while going down that road:

- std::copy_if() doesn't exist, but std::remove_copy_if() does,
and has the same semantics except the predicate must be negated.
I am a bit surprised Stroustrup didn't mention that.

- It doesn't help me, since back_inserter doesn't work on std::map. Or
is there another way to fill a std::map via std::remove_copy_if()?

- But the loop is trivial enough in this case, and I can
build my new tree more efficiently using the hinted
m.insert(m.end(), val) -- by a factor 5 in my experiments.

alasha...@gmail.com

unread,
Apr 1, 2009, 12:44:07 PM4/1/09
to
Hello,

Although still not providing the optimal solution you are seeking, I
think I can comment on two points:

> - It doesn't help me, since back_inserter doesn't work on std::map. Or
>   is there another way to fill a std::map via std::remove_copy_if()?

std::inserter works with std::map - it uses the std::map::insert
member function


> - But the loop is trivial enough in this case, and I can
>   build my new tree more efficiently using the hinted
>   m.insert(m.end(), val) -- by a factor 5 in my experiments.

You can similarly do:

remove_copy_if(
m1.begin(), m1.end(), std::inserter( m2, m2.end() ), Pred );


Also, it might be much more efficient (depending on the use you make
of the map object) to create an iterator adaptor that would allow you
to iterate over members of the container that meet a predicate, rather
than deleting elements, or copying to another container (see the
Boost.Iterator Library). Of course I would not be able to tell if that
is of use to your particular application.

Regards

James Kanze

unread,
Apr 1, 2009, 4:04:56 PM4/1/09
to
On Apr 1, 10:04 am, Jorgen Grahn <grahn+n...@snipabacken.se> wrote:
> Around here, March--April is apple tree pruning time. I have a
> slighly different problem:

> I have a std::map<K, T> where I want to move all T (or pair<K,
> T>) which match some predicate Pred to some other container.
> The map may have a size in the millions, and Pred often
> matches all, or 50%, or 10% or so of the elements.

> I'd prefer to use a standard algorithm for this, rather than
> iterating manually, dealing with iterator invalidation and so
> on. Avoiding repeated tree rebalancing would be welcome, but I
> suspect it's not possible.

> What's a good std algorithm for this? I guess std::remove_if()
> doesn't work on maps -- since you cannot reorder the elements.

Correct.

> Since T is a pointer type in my case, I have some other
> options. I can build a new map with only the !Pred elements in
> it, swap it with the original, and blow away the original. Not
> sure that I would gain performance from that (even more
> rebalancing?) but at least the code would be clean ...

If you're going to drop the original map, then there's no point
in removing the elements from it. Some variant on the
non-existant copy_if could be used. A more interesting solution
would be to write it yourself, using the hinting form of insert;
this should speed things up, since there would be no need for
std::map to find the correct insertion point for each insert.
If you had copy_if, I think using an insert_iterator would also
work like this:

std::map< K, T > newMap ;
copy_if( oldMap.begin(), oldMap.end(),
std::inserter( newMap, newMap.end() ),
predicate ) ;

--
James Kanze (GABI Software) email:james...@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Jorgen Grahn

unread,
Apr 2, 2009, 7:25:35 AM4/2/09
to
On Wed, 1 Apr 2009 13:04:56 -0700 (PDT), James Kanze <james...@gmail.com> wrote:
> On Apr 1, 10:04 am, Jorgen Grahn <grahn+n...@snipabacken.se> wrote:
>> Around here, March--April is apple tree pruning time. I have a
>> slighly different problem:
>
>> I have a std::map<K, T> where I want to move all T (or pair<K,
>> T>) which match some predicate Pred to some other container.
>> The map may have a size in the millions, and Pred often
>> matches all, or 50%, or 10% or so of the elements.
>
>> I'd prefer to use a standard algorithm for this, rather than
>> iterating manually, dealing with iterator invalidation and so
>> on. Avoiding repeated tree rebalancing would be welcome, but I
>> suspect it's not possible.
>
>> What's a good std algorithm for this? I guess std::remove_if()
>> doesn't work on maps -- since you cannot reorder the elements.
>
> Correct.
>
>> Since T is a pointer type in my case, I have some other
>> options. I can build a new map with only the !Pred elements in
>> it, swap it with the original, and blow away the original. Not
>> sure that I would gain performance from that (even more
>> rebalancing?) but at least the code would be clean ...
>
> If you're going to drop the original map, then there's no point
> in removing the elements from it.

It's the other way around for me: since I am afraid to remove elements
while iterating, I'm thinking of using "copy, swap, kill the
original". When I mentioned "even more rebalancing" above, I was
thinking of rebalancing during insertion into the new tree, not during
deletion in the original.

Actually that's what I have settled for for now. I may revisit the
problem later, maybe with a pooled allocator, or replacing the
std::map with a hash-based container.

Thanks,

guy.tr...@gmail.com

unread,
Apr 2, 2009, 11:04:14 AM4/2/09
to
On 1 Apr, 09:04, Jorgen Grahn <grahn+n...@snipabacken.se> wrote:
<snip>

> I'd prefer to use a standard algorithm for this, rather than iterating
> manually, dealing with iterator invalidation and so on.

Avoiding iterator invalidation isn't too difficult here:

while( it != map.end() )
{
if( remove_condition( it ) )
it = map.erase( it );
else
++it;
}

should do the trick. It might be more efficient if you are removing
large numbers of elements to identify ranges of unwanted elements and
remove them with erase( first, last ).

HTH - Guy

Jorgen Grahn

unread,
Apr 3, 2009, 9:07:17 AM4/3/09
to
On Wed, 1 Apr 2009 09:44:07 -0700 (PDT), alasha...@gmail.com <alasha...@gmail.com> wrote:
...

> Also, it might be much more efficient (depending on the use you make
> of the map object) to create an iterator adaptor that would allow you
> to iterate over members of the container that meet a predicate, rather
> than deleting elements, or copying to another container (see the
> Boost.Iterator Library).

Good point. Keeping the "erased" elements in the map and manually
ignoring them, right?

> Of course I would not be able to tell if that
> is of use to your particular application.

Me neither, at this stage :-)

Jorgen Grahn

unread,
Apr 3, 2009, 9:14:04 AM4/3/09
to
On Thu, 2 Apr 2009 08:04:14 -0700 (PDT), guy.tr...@gmail.com <guy.tr...@gmail.com> wrote:
> On 1 Apr, 09:04, Jorgen Grahn <grahn+n...@snipabacken.se> wrote:
> <snip>
>> I'd prefer to use a standard algorithm for this, rather than iterating
>> manually, dealing with iterator invalidation and so on.
>
> Avoiding iterator invalidation isn't too difficult here:
>
> while( it != map.end() )
> {
> if( remove_condition( it ) )
> it = map.erase( it );
> else
> ++it;
> }
>
> should do the trick.

Hm, map.erase( it ) returns void on my system (Gnu). Do you have a
library extension, perhaps?

I have to admit that I didn't look into the iterator invalidation
rules at all myself.

> It might be more efficient if you are removing
> large numbers of elements to identify ranges of unwanted elements and
> remove them with erase( first, last ).

Not likely to be useful in my case, but a good idea. (BTW, it seems
that the Gnu implementation turns that into a series of erase() at the
lowest level, but I guess another std::map implementation could be
more efficient).

Jerry Coffin

unread,
Apr 3, 2009, 11:36:56 AM4/3/09
to
In article <slrngtc2nc.2...@frailea.sa.invalid>,
grahn...@snipabacken.se says...

> On Thu, 2 Apr 2009 08:04:14 -0700 (PDT), guy.tr...@gmail.com <guy.tr...@gmail.com> wrote:

[ ... ]

> > Avoiding iterator invalidation isn't too difficult here:
> >
> > while( it != map.end() )
> > {
> > if( remove_condition( it ) )
> > it = map.erase( it );
> > else
> > ++it;
> > }
> >
> > should do the trick.
>
> Hm, map.erase( it ) returns void on my system (Gnu). Do you have a
> library extension, perhaps?

Yes, but it's pretty easy to get by without it as well:

if (remove_condition(it))
map.erase(it++);
else
++it;

In this case, the post-increment probably doesn't really cost much (if
anything) compared to a pre-increment. In particular, the compiler's
going to create a copy of the iterator to pass to the function anyway,
so the fact that a post-increment requires creating a copy of the
iterator doesn't really add any complexity or work.

Ther's also an implicit (but widespread) assumption that copying
iterators is cheap in any case -- just for one example, the standard
algorithms (almost?) all receive them by value, not reference.

--
Later,
Jerry.

The universe is a figment of its own imagination.

Michael Mol

unread,
Apr 3, 2009, 12:43:51 PM4/3/09
to
On Apr 3, 11:36 am, Jerry Coffin <jcof...@taeus.com> wrote:
> Yes, but it's pretty easy to get by without it as well:
>
>         if (remove_condition(it))
>                 map.erase(it++);
>         else
>                 ++it;
>
> In this case, the post-increment probably doesn't really cost much (if
> anything) compared to a pre-increment. In particular, the compiler's
> going to create a copy of the iterator to pass to the function anyway,
> so the fact that a post-increment requires creating a copy of the
> iterator doesn't really add any complexity or work.

Is the iterator still valid to perform an increment on after you've
erased it? I'm fairly certain you can't dereference it, and I
presumed that that extended to increment and decrement.

Kai-Uwe Bux

unread,
Apr 3, 2009, 1:50:51 PM4/3/09
to
Michael Mol wrote:

> On Apr 3, 11:36 am, Jerry Coffin <jcof...@taeus.com> wrote:
>> Yes, but it's pretty easy to get by without it as well:
>>
>> if (remove_condition(it))
>> map.erase(it++);
>> else
>> ++it;
>>
>> In this case, the post-increment probably doesn't really cost much (if
>> anything) compared to a pre-increment. In particular, the compiler's
>> going to create a copy of the iterator to pass to the function anyway,
>> so the fact that a post-increment requires creating a copy of the
>> iterator doesn't really add any complexity or work.
>
> Is the iterator still valid to perform an increment on after you've
> erased it?

No.

> I'm fairly certain you can't dereference it, and I
> presumed that that extended to increment and decrement.

Yes.

Note: that does not happen in the code snippet from above. A sequence point
between the start of the erase() function and the evaluation of its
argument ensures that the side-effect of incrementing it has taken place
before the entry marked by the previous value of it is being erased.


Best

Kai-Uwe Bux

Jerry Coffin

unread,
Apr 3, 2009, 3:33:56 PM4/3/09
to
In article <7bbc95f4-e9a5-48b8-b99b-f59905278e28
@j8g2000yql.googlegroups.com>, mik...@gmail.com says...

The iterator is no longer valid after the erasure, but that doesn't
matter -- we're not incrementing it after the erasure. By the current
(soon to be obsolete for C++) terminology, there is a sequence point
after the arguments to a function are evaluated, but before the function
itself executes. That means the increment has to take place _before_ the
erasure.

guy.tr...@gmail.com

unread,
Apr 3, 2009, 7:15:41 PM4/3/09
to
On Apr 3, 2:14 pm, Jorgen Grahn <grahn+n...@snipabacken.se> wrote:
> Hm, map.erase( it ) returns void on my system (Gnu).  Do you have a
> library extension, perhaps?

Oh, that's interesting: I'm not sure if it's an MS extension or if
other implementations and docs haven't caught up with a change. Their
docs seem to disagree with other docs on the net and I don't have (or
at least can't find) access to the latest standard. A bit annoying
from a portability standpoint if they have unilaterally made this
extension without explicitly flagging it as such.

[erasing range instead of single elements]

> Not likely to be useful in my case, but a good idea. (BTW, it seems
> that the Gnu implementation turns that into a series of erase() at the
> lowest level, but I guess another std::map implementation could be
> more efficient).

Yes - just peeked at MS's and it's the same there.

Guy

James Kanze

unread,
Apr 4, 2009, 3:37:23 AM4/4/09
to
On Apr 3, 3:14 pm, Jorgen Grahn <grahn+n...@snipabacken.se> wrote:
> On Thu, 2 Apr 2009 08:04:14 -0700 (PDT),
> guy.trist...@gmail.com <guy.trist...@gmail.com> wrote:
> > On 1 Apr, 09:04, Jorgen Grahn <grahn+n...@snipabacken.se> wrote:
> > <snip>
> >> I'd prefer to use a standard algorithm for this, rather
> >> than iterating manually, dealing with iterator invalidation
> >> and so on.

> > Avoiding iterator invalidation isn't too difficult here:

> > while( it != map.end() )
> > {
> > if( remove_condition( it ) )
> > it = map.erase( it );
> > else
> > ++it;
> > }

> > should do the trick.

> Hm, map.erase( it ) returns void on my system (Gnu). Do you
> have a library extension, perhaps?

This was a rather obvious error in the standard (which has been
corrected in the next draft). Officially, map<>::erase should
return void, but better implementations have always had it
returning the iterator, and this will be required in the next
release of the standard.

0 new messages