Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Counter Class -- Bag/Multiset

14 views
Skip to first unread message

Raymond Hettinger

unread,
Jan 22, 2009, 1:11:37 PM1/22/09
to
The collections module in Python 2.7 and Python 3.1 has gotten a new
Counter class that works like bags and multisets in other languages.

I've adapted it for Python2.5/2.6 so people can start using it right
away:
http://docs.python.org/dev/library/collections.html#counter-objects

Here's a link to the docs for the new class:
http://code.activestate.com/recipes/576611/


Raymond

Raymond Hettinger

unread,
Jan 22, 2009, 1:36:24 PM1/22/09
to
> I've adapted it for Python2.5/2.6 so people can start using it right
> away:

That should just be Python2.6.

Raymond Hettinger

unread,
Jan 22, 2009, 3:41:13 PM1/22/09
to
> That should just be Python2.6.

Fixed. Now runs of Python 2.5 as well.

Giovanni Bajo

unread,
Jan 22, 2009, 8:41:38 PM1/22/09
to

Hi Raymond,

* I'm not a native speaker, but why use the word "Counter"? A "counter"
to my ear sounds like a number that is increased each time an event
occurs; the website counter, eg, comes to mind. I can understanda its
meaning probably stretches to "an object that counts", but I really can't
think of it as a group of object, or a container of object. Moreover, I
find it a much more useful abstraction the idea of a "multi-set" (that
is, a set where elements can appear with multiple cardinality), rather
than stressing the concept of "counting" how many times each element
appears in the set.

* I find it *very* confusing c.items() vs c.elements(). Items and
elements are synonymous (again, in my understanding of English).

All in all, I think I prefer your previous bag class:
http://code.activestate.com/recipes/259174/

--
Giovanni Bajo
Develer S.r.l.
http://www.develer.com

Chris Rebert

unread,
Jan 22, 2009, 8:49:42 PM1/22/09
to Giovanni Bajo, pytho...@python.org
On Thu, Jan 22, 2009 at 5:41 PM, Giovanni Bajo <ra...@develer.com> wrote:
> On Thu, 22 Jan 2009 10:11:37 -0800, Raymond Hettinger wrote:
>
>> The collections module in Python 2.7 and Python 3.1 has gotten a new
>> Counter class that works like bags and multisets in other languages.
>>
>> I've adapted it for Python2.5/2.6 so people can start using it right
>> away:
>> http://docs.python.org/dev/library/collections.html#counter-objects
>>
>> Here's a link to the docs for the new class:
>> http://code.activestate.com/recipes/576611/
>
> Hi Raymond,
>
> * I'm not a native speaker, but why use the word "Counter"? A "counter"
> to my ear sounds like a number that is increased each time an event
> occurs; the website counter, eg, comes to mind. I can understanda its
> meaning probably stretches to "an object that counts", but I really can't
> think of it as a group of object, or a container of object. Moreover, I
> find it a much more useful abstraction the idea of a "multi-set" (that
> is, a set where elements can appear with multiple cardinality), rather
> than stressing the concept of "counting" how many times each element
> appears in the set.
>
> * I find it *very* confusing c.items() vs c.elements(). Items and
> elements are synonymous (again, in my understanding of English).

I concur and would like to say additionally that having Counter's
len() be the number of *unique* items as opposed to just the number of
items seems a bit counterintuitive.

Cheers,
Chris

--
Follow the path of the Iguana...
http://rebertia.com

Paul Rubin

unread,
Jan 22, 2009, 8:54:57 PM1/22/09
to
Giovanni Bajo <ra...@develer.com> writes:
> * I'm not a native speaker, but why use the word "Counter"?

I agree with this, the new functionality is welcome but I think
the traditional term "multiset" or "bag" would have been better.

Terry Reedy

unread,
Jan 22, 2009, 9:42:11 PM1/22/09
to pytho...@python.org
Chris Rebert wrote:
> On Thu, Jan 22, 2009 at 5:41 PM, Giovanni Bajo <ra...@develer.com> wrote:
>> On Thu, 22 Jan 2009 10:11:37 -0800, Raymond Hettinger wrote:
>>
>>> The collections module in Python 2.7 and Python 3.1 has gotten a new
>>> Counter class that works like bags and multisets in other languages.
>>>
>>> I've adapted it for Python2.5/2.6 so people can start using it right
>>> away:
>>> http://docs.python.org/dev/library/collections.html#counter-objects
>>>
>>> Here's a link to the docs for the new class:
>>> http://code.activestate.com/recipes/576611/
>> Hi Raymond,
>>
>> * I'm not a native speaker, but why use the word "Counter"? A "counter"
>> to my ear sounds like a number that is increased each time an event
>> occurs; the website counter, eg, comes to mind. I can understanda its
>> meaning probably stretches to "an object that counts",

Yes, little clicker devices for instances.

>> but I really can't
>> think of it as a group of object, or a container of object.

Me neither.

>> Moreover, I
>> find it a much more useful abstraction the idea of a "multi-set" (that
>> is, a set where elements can appear with multiple cardinality), rather
>> than stressing the concept of "counting" how many times each element
>> appears in the set.
>>
>> * I find it *very* confusing c.items() vs c.elements(). Items and
>> elements are synonymous (again, in my understanding of English).
>

> I concur and would like to say additionally that having Counter's
> len() be the number of *unique* items as opposed to just the number of
> items seems a bit counterintuitive.

bag/multiset/counter has been implemented as dict with items as keys and
counts as values, and len(dict) = #keys, so unique items is what I
expect. Giving the bag a sum-of-counts attributes also might be nice,
though.

msrac...@gmail.com

unread,
Jan 23, 2009, 2:29:48 AM1/23/09
to
On Jan 22, 5:41 pm, Giovanni Bajo <ra...@develer.com> wrote:
> * I find it *very* confusing c.items() vs c.elements(). Items and
> elements are synonymous (again, in my understanding of English).

Would have used the term "items" but that term has a different meaning
in the context of dictionaries where "items" means (key,value) pairs.
The word "elements" is used for members of the set object and has a
parallel meaning in the context of multisets. Since the class is a
dict subclass (as recommended by Guido), the term "items" was off
limits because it means something else. So, "elements" as used in the
sets documentation was the next, natural choice.

Raymond

msrac...@gmail.com

unread,
Jan 23, 2009, 2:37:14 AM1/23/09
to

The term counter was what was originally approved. At first, I didn't
like it but found that it had some advantages. The main advantage is
that there are *bazillions* of programmers (including some very good
ones) who have never heard the term multiset or bag but have an
instant,
intuitive understanding of counting and counters.

Also, in the context of this implementation, "multiset" would not be a
good choice. The mathematical entity, multiset, is defined with
elements
having a count of one or more. In contrast, the Counter class allows
counts to go to zero or become negative. In addition, I worried that
calling it a multiset would suggest that it has a set-like API instead
of a dict-like API. With the former, you write c.add(elem). With the
latter, you write c[elem] += 1. So the name, "multiset" would be
misleading.

Bike-shed discussions aside (everyone has an opinion of how to name
things),
how do you guys like the functionality? Do you find it useable in
real-world
use cases?

Raymond

Paul Rubin

unread,
Jan 23, 2009, 3:06:47 AM1/23/09
to
msrac...@gmail.com writes:
> how do you guys like the functionality? Do you find it useable in
> real-world use cases?

It's interesting. I wouldn't have thought of that API--I'm used to
populating defaultdict(int) in the obvious ways for this sort of
thing--but it is attractive and I think it will be useful. Thanks for
implementing it.

bearoph...@lycos.com

unread,
Jan 23, 2009, 3:24:35 AM1/23/09
to
Raymond Hettinger:

> The collections module in Python 2.7 and Python 3.1 has gotten a new
> Counter class that works like bags and multisets in other languages.

Very nice. Python std lib is growing more data structures, increasing
the power of a default Python installation. I can remove more and more
modules from my bag of tricks.

I like the name Bag(), it's shorter. Names are important enough.

Are keys restricted to be long and int values only? Or are they
general (referenced) objects + a control of their integral nature?

I think you can add better explanations to this, like an example:
c += Counter() # remove zero and negative counts

Bye,
bearophile

bearoph...@lycos.com

unread,
Jan 23, 2009, 3:27:03 AM1/23/09
to
bearophile:

> Are keys restricted to be long and int values only? Or are they
> general (referenced) objects + a control of their integral nature?

Here I meant values, sorry.
Also: what's the rationale of allowing negative values too?

Bye,
bearophile

msrac...@gmail.com

unread,
Jan 23, 2009, 3:53:04 AM1/23/09
to

The dict values can be anything, but only ints make sense in the
context of bags/multisets/counters etc.
Since it is a dict subclass, we really have no control of the
content. It's pretty much a "consenting adults" data structure (much
like the heapq module which cannot enforce a heap condition on the
underlying list which is exposed to the user).

To block negative values, the setter methods would have to be
overridden and would dramatically slow down the major use cases for
counters (and we would not be able to let people freely convert to and
from arbitrary dicts). Also, we would have to introduce and document
some kind of exception for attempts to set a negative value (i.e. c[x]
-= 1 could raise an exception). This would unnecessarily complicate
the API in an attempt to limit what a user can do (perhaps barring
them from a use case they consider to be important).

So, if you don't want negative counts, I recommend that you don't
subtract more than you started off with ;-)

Raymond

Terry Reedy

unread,
Jan 23, 2009, 4:03:54 AM1/23/09
to pytho...@python.org
msrac...@gmail.com wrote:

> The term counter was what was originally approved. At first, I didn't
> like it but found that it had some advantages. The main advantage is
> that there are *bazillions* of programmers (including some very good
> ones) who have never heard the term multiset or bag but have an
> instant,
> intuitive understanding of counting and counters.
>
> Also, in the context of this implementation, "multiset" would not be a
> good choice. The mathematical entity, multiset, is defined with
> elements
> having a count of one or more. In contrast, the Counter class allows
> counts to go to zero or become negative. In addition, I worried that
> calling it a multiset would suggest that it has a set-like API instead
> of a dict-like API. With the former, you write c.add(elem). With the
> latter, you write c[elem] += 1. So the name, "multiset" would be
> misleading.

Agreed. I withdraw my objection.

Terry Reedy

unread,
Jan 23, 2009, 4:10:20 AM1/23/09
to pytho...@python.org
bearoph...@lycos.com wrote:

> Also: what's the rationale of allowing negative values too?

I can guess two:
1) Nuisance to check given that Python does not have a count (0,1,2...)
type.
2. Useful. + = items on hand; - = items back-ordered.
Bank account go negative also ;-). So does elevation.
Similarly, an app could set 0 to mean 'desired quantity on hand', so
non-zero count is surplus or deficit.

I might not have thought to allow this, but it seems to open new
applications. (This definitely makes bag or multiset inappropriate as
names.)

tjr

Giovanni Bajo

unread,
Jan 23, 2009, 7:05:02 AM1/23/09
to Chris Rebert, pytho...@python.org
On 1/23/2009 2:49 AM, Chris Rebert wrote:
> On Thu, Jan 22, 2009 at 5:41 PM, Giovanni Bajo <ra...@develer.com> wrote:
>> On Thu, 22 Jan 2009 10:11:37 -0800, Raymond Hettinger wrote:
>>
>>> The collections module in Python 2.7 and Python 3.1 has gotten a new
>>> Counter class that works like bags and multisets in other languages.
>>>
>>> I've adapted it for Python2.5/2.6 so people can start using it right
>>> away:
>>> http://docs.python.org/dev/library/collections.html#counter-objects
>>>
>>> Here's a link to the docs for the new class:
>>> http://code.activestate.com/recipes/576611/
>> Hi Raymond,
>>
>> * I'm not a native speaker, but why use the word "Counter"? A "counter"
>> to my ear sounds like a number that is increased each time an event
>> occurs; the website counter, eg, comes to mind. I can understanda its
>> meaning probably stretches to "an object that counts", but I really can't
>> think of it as a group of object, or a container of object. Moreover, I
>> find it a much more useful abstraction the idea of a "multi-set" (that
>> is, a set where elements can appear with multiple cardinality), rather
>> than stressing the concept of "counting" how many times each element
>> appears in the set.
>>
>> * I find it *very* confusing c.items() vs c.elements(). Items and
>> elements are synonymous (again, in my understanding of English).
>
> I concur and would like to say additionally that having Counter's
> len() be the number of *unique* items as opposed to just the number of
> items seems a bit counterintuitive.

In fact, I think that it makes sense when you're stressing the fact that
it's just a dictionary of objects and their counters (which the name
"Counter" in fact stresses).

And my main objection is exactly this: a multiset is a differnt abstract
data structure (which is probably worth its own ABC in Python); the fact
that can be implemented through a mapping of objects and counters is
just its concrete implementation. This was perfectly represented by
Raymond's previous bag class, that was *using* a dictionary instead of
*being* a dictionary.

MRAB

unread,
Jan 23, 2009, 12:20:29 PM1/23/09
to pytho...@python.org
Terry Reedy wrote:

> bearoph...@lycos.com wrote:
>
>> Also: what's the rationale of allowing negative values too?
>
> I can guess two:
> 1) Nuisance to check given that Python does not have a count
> (0,1,2...) type.
> 2. Useful. + = items on hand; - = items back-ordered. Bank account go
> negative also ;-). So does elevation.
> Similarly, an app could set 0 to mean 'desired quantity on hand', so
> non-zero count is surplus or deficit.
>
> I might not have thought to allow this, but it seems to open new
> applications. (This definitely makes bag or multiset inappropriate
> as names.)
>
I would've limited the counts to non-negative values too, but being able
to store negative values does allow credit/debit or in-stock/on-order.
In such cases, the name 'Counter' makes more sense.

pataphor

unread,
Jan 24, 2009, 10:02:14 AM1/24/09
to
On Thu, 22 Jan 2009 10:11:37 -0800 (PST)
Raymond Hettinger <pyt...@rcn.com> wrote:

> The collections module in Python 2.7 and Python 3.1 has gotten a new
> Counter class that works like bags and multisets in other languages.

I like that! Now that we have a multiset or Counter I think a
redefinition of itertools.permutations is in order.

For example something like this:

def genperm(D, R = []):
#generate the permutations of a multiset
if not max(D.values()):
yield tuple(R)
else:
for k,v in sorted(D.items()):
if v:
D[k] -= 1
for g in genperm(D,R+[k]):
yield g
D[k] += 1

def perm(seq):
D = {}
for x in seq:
D[x] = D.get(x,0)+1
for X in genperm(D):
yield X

def test():
for i,x in enumerate(perm('aabbcc')):
print i,x
print len(set(perm('aabbcc')))

if __name__ == '__main__':
test()

The dictionary I'm using here could be seamlessly replaced by your
Counter class.

By the way, I like your itertools recipes a lot, but I miss something
that repeats elements n times, an xcycle or ncycle or whatever, like I
used in this (older code, sorry for the name overlap):

def repeat(x,n = 0):
i = 0
while i < n :
yield x
i += 1

def xcycle(seq,n):
while 1:
for x in seq:
for y in repeat(x,n):
yield y

def counter(symbols,width):
base = len(symbols)
R = []
k = width
while k:
k -= 1
R.append(xcycle(symbols,base**k))
nc = base**width
while nc:
yield list(x.next() for x in R)
nc -=1

def test():
symbols = '01'
width = 4
for state in counter(symbols,width):
print state

if __name__=='__main__':
test()

I think such a thing could be handy for generating more kinds of
periodic sequences. By the way, itertools is starting to feel like it's
becoming some sort of alternate programming design all by itself. I
wonder when it is going to break loose from python! I like that
way of thinking a lot, thanks.

P.

0 new messages