On Thu, 22 Jan 2009 10:11:37 -0800, Raymond Hettinger wrote: > The collections module in Python 2.7 and Python 3.1 has gotten a new > Counter class that works like bags and multisets in other languages.
* I'm not a native speaker, but why use the word "Counter"? A "counter" to my ear sounds like a number that is increased each time an event occurs; the website counter, eg, comes to mind. I can understanda its meaning probably stretches to "an object that counts", but I really can't think of it as a group of object, or a container of object. Moreover, I find it a much more useful abstraction the idea of a "multi-set" (that is, a set where elements can appear with multiple cardinality), rather than stressing the concept of "counting" how many times each element appears in the set.
* I find it *very* confusing c.items() vs c.elements(). Items and elements are synonymous (again, in my understanding of English).
> * I'm not a native speaker, but why use the word "Counter"? A "counter" > to my ear sounds like a number that is increased each time an event > occurs; the website counter, eg, comes to mind. I can understanda its > meaning probably stretches to "an object that counts", but I really can't > think of it as a group of object, or a container of object. Moreover, I > find it a much more useful abstraction the idea of a "multi-set" (that > is, a set where elements can appear with multiple cardinality), rather > than stressing the concept of "counting" how many times each element > appears in the set.
> * I find it *very* confusing c.items() vs c.elements(). Items and > elements are synonymous (again, in my understanding of English).
I concur and would like to say additionally that having Counter's len() be the number of *unique* items as opposed to just the number of items seems a bit counterintuitive.
Chris Rebert wrote: > On Thu, Jan 22, 2009 at 5:41 PM, Giovanni Bajo <ra...@develer.com> wrote: >> On Thu, 22 Jan 2009 10:11:37 -0800, Raymond Hettinger wrote:
>>> The collections module in Python 2.7 and Python 3.1 has gotten a new >>> Counter class that works like bags and multisets in other languages.
>> * I'm not a native speaker, but why use the word "Counter"? A "counter" >> to my ear sounds like a number that is increased each time an event >> occurs; the website counter, eg, comes to mind. I can understanda its >> meaning probably stretches to "an object that counts",
Yes, little clicker devices for instances.
>> but I really can't >> think of it as a group of object, or a container of object.
Me neither.
>> Moreover, I >> find it a much more useful abstraction the idea of a "multi-set" (that >> is, a set where elements can appear with multiple cardinality), rather >> than stressing the concept of "counting" how many times each element >> appears in the set.
>> * I find it *very* confusing c.items() vs c.elements(). Items and >> elements are synonymous (again, in my understanding of English).
> I concur and would like to say additionally that having Counter's > len() be the number of *unique* items as opposed to just the number of > items seems a bit counterintuitive.
bag/multiset/counter has been implemented as dict with items as keys and counts as values, and len(dict) = #keys, so unique items is what I expect. Giving the bag a sum-of-counts attributes also might be nice, though.
On Jan 22, 5:41 pm, Giovanni Bajo <ra...@develer.com> wrote:
> * I find it *very* confusing c.items() vs c.elements(). Items and > elements are synonymous (again, in my understanding of English).
Would have used the term "items" but that term has a different meaning in the context of dictionaries where "items" means (key,value) pairs. The word "elements" is used for members of the set object and has a parallel meaning in the context of multisets. Since the class is a dict subclass (as recommended by Guido), the term "items" was off limits because it means something else. So, "elements" as used in the sets documentation was the next, natural choice.
On Jan 22, 5:54 pm, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
> Giovanni Bajo <ra...@develer.com> writes: > > * I'm not a native speaker, but why use the word "Counter"?
> I agree with this, the new functionality is welcome but I think > the traditional term "multiset" or "bag" would have been better.
The term counter was what was originally approved. At first, I didn't like it but found that it had some advantages. The main advantage is that there are *bazillions* of programmers (including some very good ones) who have never heard the term multiset or bag but have an instant, intuitive understanding of counting and counters.
Also, in the context of this implementation, "multiset" would not be a good choice. The mathematical entity, multiset, is defined with elements having a count of one or more. In contrast, the Counter class allows counts to go to zero or become negative. In addition, I worried that calling it a multiset would suggest that it has a set-like API instead of a dict-like API. With the former, you write c.add(elem). With the latter, you write c[elem] += 1. So the name, "multiset" would be misleading.
Bike-shed discussions aside (everyone has an opinion of how to name things), how do you guys like the functionality? Do you find it useable in real-world use cases?
msrache...@gmail.com writes: > how do you guys like the functionality? Do you find it useable in > real-world use cases?
It's interesting. I wouldn't have thought of that API--I'm used to populating defaultdict(int) in the obvious ways for this sort of thing--but it is attractive and I think it will be useful. Thanks for implementing it.
> The collections module in Python 2.7 and Python 3.1 has gotten a new > Counter class that works like bags and multisets in other languages.
Very nice. Python std lib is growing more data structures, increasing the power of a default Python installation. I can remove more and more modules from my bag of tricks.
I like the name Bag(), it's shorter. Names are important enough.
Are keys restricted to be long and int values only? Or are they general (referenced) objects + a control of their integral nature?
I think you can add better explanations to this, like an example: c += Counter() # remove zero and negative counts
On Jan 23, 12:27 am, bearophileH...@lycos.com wrote:
> bearophile:
> > Are keys restricted to be long and int values only? Or are they > > general (referenced) objects + a control of their integral nature?
> Here I meant values, sorry. > Also: what's the rationale of allowing negative values too?
The dict values can be anything, but only ints make sense in the context of bags/multisets/counters etc. Since it is a dict subclass, we really have no control of the content. It's pretty much a "consenting adults" data structure (much like the heapq module which cannot enforce a heap condition on the underlying list which is exposed to the user).
To block negative values, the setter methods would have to be overridden and would dramatically slow down the major use cases for counters (and we would not be able to let people freely convert to and from arbitrary dicts). Also, we would have to introduce and document some kind of exception for attempts to set a negative value (i.e. c[x] -= 1 could raise an exception). This would unnecessarily complicate the API in an attempt to limit what a user can do (perhaps barring them from a use case they consider to be important).
So, if you don't want negative counts, I recommend that you don't subtract more than you started off with ;-)
msrache...@gmail.com wrote: > The term counter was what was originally approved. At first, I didn't > like it but found that it had some advantages. The main advantage is > that there are *bazillions* of programmers (including some very good > ones) who have never heard the term multiset or bag but have an > instant, > intuitive understanding of counting and counters.
> Also, in the context of this implementation, "multiset" would not be a > good choice. The mathematical entity, multiset, is defined with > elements > having a count of one or more. In contrast, the Counter class allows > counts to go to zero or become negative. In addition, I worried that > calling it a multiset would suggest that it has a set-like API instead > of a dict-like API. With the former, you write c.add(elem). With the > latter, you write c[elem] += 1. So the name, "multiset" would be > misleading.
bearophileH...@lycos.com wrote: > Also: what's the rationale of allowing negative values too?
I can guess two: 1) Nuisance to check given that Python does not have a count (0,1,2...) type. 2. Useful. + = items on hand; - = items back-ordered. Bank account go negative also ;-). So does elevation. Similarly, an app could set 0 to mean 'desired quantity on hand', so non-zero count is surplus or deficit.
I might not have thought to allow this, but it seems to open new applications. (This definitely makes bag or multiset inappropriate as names.)
>> * I'm not a native speaker, but why use the word "Counter"? A "counter" >> to my ear sounds like a number that is increased each time an event >> occurs; the website counter, eg, comes to mind. I can understanda its >> meaning probably stretches to "an object that counts", but I really can't >> think of it as a group of object, or a container of object. Moreover, I >> find it a much more useful abstraction the idea of a "multi-set" (that >> is, a set where elements can appear with multiple cardinality), rather >> than stressing the concept of "counting" how many times each element >> appears in the set.
>> * I find it *very* confusing c.items() vs c.elements(). Items and >> elements are synonymous (again, in my understanding of English).
> I concur and would like to say additionally that having Counter's > len() be the number of *unique* items as opposed to just the number of > items seems a bit counterintuitive.
In fact, I think that it makes sense when you're stressing the fact that it's just a dictionary of objects and their counters (which the name "Counter" in fact stresses).
And my main objection is exactly this: a multiset is a differnt abstract data structure (which is probably worth its own ABC in Python); the fact that can be implemented through a mapping of objects and counters is just its concrete implementation. This was perfectly represented by Raymond's previous bag class, that was *using* a dictionary instead of *being* a dictionary. -- Giovanni Bajo Develer S.r.l. http://www.develer.com
Terry Reedy wrote: > bearophileH...@lycos.com wrote:
>> Also: what's the rationale of allowing negative values too?
> I can guess two: > 1) Nuisance to check given that Python does not have a count > (0,1,2...) type. > 2. Useful. + = items on hand; - = items back-ordered. Bank account go > negative also ;-). So does elevation. > Similarly, an app could set 0 to mean 'desired quantity on hand', so > non-zero count is surplus or deficit.
> I might not have thought to allow this, but it seems to open new > applications. (This definitely makes bag or multiset inappropriate > as names.)
I would've limited the counts to non-negative values too, but being able to store negative values does allow credit/debit or in-stock/on-order. In such cases, the name 'Counter' makes more sense.
Raymond Hettinger <pyt...@rcn.com> wrote: > The collections module in Python 2.7 and Python 3.1 has gotten a new > Counter class that works like bags and multisets in other languages.
I like that! Now that we have a multiset or Counter I think a redefinition of itertools.permutations is in order.
For example something like this:
def genperm(D, R = []): #generate the permutations of a multiset if not max(D.values()): yield tuple(R) else: for k,v in sorted(D.items()): if v: D[k] -= 1 for g in genperm(D,R+[k]): yield g D[k] += 1
def perm(seq): D = {} for x in seq: D[x] = D.get(x,0)+1 for X in genperm(D): yield X
def test(): for i,x in enumerate(perm('aabbcc')): print i,x print len(set(perm('aabbcc')))
if __name__ == '__main__': test()
The dictionary I'm using here could be seamlessly replaced by your Counter class.
By the way, I like your itertools recipes a lot, but I miss something that repeats elements n times, an xcycle or ncycle or whatever, like I used in this (older code, sorry for the name overlap):
def repeat(x,n = 0): i = 0 while i < n : yield x i += 1
def xcycle(seq,n): while 1: for x in seq: for y in repeat(x,n): yield y
def counter(symbols,width): base = len(symbols) R = [] k = width while k: k -= 1 R.append(xcycle(symbols,base**k)) nc = base**width while nc: yield list(x.next() for x in R) nc -=1
def test(): symbols = '01' width = 4 for state in counter(symbols,width): print state
if __name__=='__main__': test()
I think such a thing could be handy for generating more kinds of periodic sequences. By the way, itertools is starting to feel like it's becoming some sort of alternate programming design all by itself. I wonder when it is going to break loose from python! I like that way of thinking a lot, thanks.