I have two related lists:
x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]
what I need is a list representing the mean value of 'a', 'b' and 'c'
while maintaining the number of items (len):
w = [1.5, 1.5, 8, 4, 4, 4]
I have looked at iter(tools) and next(), but that did not help me. I'm
a bit stuck here, so your help is appreciated!
thanks!
Dimitri
First pass: count the number of times each string occurs in 'y' and the
total for each (zip/izip and defaultdict are useful for these).
Second pass: create the result list containing the mean values.
from __future__ import division
def group(keys, values):
#requires None not in keys
groups = []
cur_key = None
cur_vals = None
for key, val in zip(keys, values):
if key != cur_key:
if cur_key is not None:
groups.append((cur_key, cur_vals))
cur_vals = [val]
cur_key = key
else:
cur_vals.append(val)
groups.append((cur_key, cur_vals))
return groups
def average(lst):
return sum(lst) / len(lst)
def process(x, y):
result = []
for key, vals in group(y, x):
avg = average(vals)
for i in xrange(len(vals)):
result.append(avg)
return result
x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]
print process(x, y)
#=> [1.5, 1.5, 8.0, 4.0, 4.0, 4.0]
It could be tweaked to use itertools.groupby(), but it would probably
be less efficient/clear.
Cheers,
Chris
--
http://blog.rebertia.com
--
---
You can't have everything. Where would you put it? -- Steven Wright
---
please visit www.serpia.org
Nobody expects object-orientation (or the Spanish Inquisition):
#-------------------------
from collections import defaultdict
class Tally:
def __init__(self, id=None):
self.id = id
self.total = 0
self.count = 0
x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c']
# gather data
tally_dict = defaultdict(Tally)
for i in range(len(x)):
obj = tally_dict[y[i]]
obj.id = y[i]
obj.total += x[i]
obj.count += 1
# process data
result_list = []
for key in sorted(tally_dict):
obj = tally_dict[key]
mean = 1.0 * obj.total / obj.count
result_list.extend([mean] * obj.count)
print result_list
#-------------------------
-John
<snip>
> # gather data
> tally_dict = defaultdict(Tally)
> for i in range(len(x)):
> obj = tally_dict[y[i]]
> obj.id = y[i] <--- statement redundant, remove it
> obj.total += x[i]
> obj.count += 1
-John
>> obj.id = y[i] <--- statement redundant, remove it
Sorry for the thrashing! It's more correct to say that the Tally class
doesn't require an "id" attribute at all. So the code becomes:
#---------
from collections import defaultdict
class Tally:
def __init__(self):
self.total = 0
self.count = 0
x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c']
# gather data
tally_dict = defaultdict(Tally)
for i in range(len(x)):
obj = tally_dict[y[i]]
obj.total += x[i]
obj.count += 1
# process data
result_list = []
for key in sorted(tally_dict):
obj = tally_dict[key]
mean = 1.0 * obj.total / obj.count
result_list.extend([mean] * obj.count)
print result_list
#---------
-John
This kinda looks like you used the wrong data structure.
Maybe you should have used a dict, like:
{'a': [1, 2], 'c': [5, 0, 7], 'b': [8]} ?
> I have looked at iter(tools) and next(), but that did not help me. I'm
> a bit stuck here, so your help is appreciated!
As said, I'd have used a dict in the first place, so lets transform this
straight forward into one:
x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]
# initialize dict
d={}
for idx in set(y):
d[idx]=[]
#collect values
for i, idx in enumerate(y):
d[idx].append(x[i])
print("d is now a dict of lists: %s" % d)
#calculate average
for key, values in d.items():
d[key]=sum(values)/len(values)
print("d is now a dict of averages: %s" % d)
# build the final list
w = [ d[key] for key in y ]
print("w is now the list of averages, corresponding with y:\n \
\n x: %s \n y: %s \n w: %s \n" % (x, y, w))
Output is:
d is now a dict of lists: {'a': [1, 2], 'c': [5, 0, 7], 'b': [8]}
d is now a dict of averages: {'a': 1.5, 'c': 4.0, 'b': 8.0}
w is now the list of averages, corresponding with y:
x: [1, 2, 8, 5, 0, 7]
y: ['a', 'a', 'b', 'c', 'c', 'c']
w: [1.5, 1.5, 8.0, 4.0, 4.0, 4.0]
Could have used a defaultdict to avoid dict initialisation, though.
Or write a custom class:
x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]
class A:
def __init__(self):
self.store={}
def add(self, key, number):
if key in self.store:
self.store[key].append(number)
else:
self.store[key] = [number]
a=A()
# collect data
for idx, val in zip(y,x):
a.add(idx, val)
# build the final list:
w = [ sum(a.store[key])/len(a.store[key]) for key in y ]
print("w is now the list of averages, corresponding with y:\n \
\n x: %s \n y: %s \n w: %s \n" % (x, y, w))
Produces same output, of course.
Note that those solutions are both not very efficient, but who cares ;)
> thanks!
No Problem,
Michael
x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]
def f(a,b,v={}):
try: v[a].append(b)
except: v[a]=[b]
def g(a): return sum(v[a])/len(v[a])
return g
w = [g(i) for g,i in [(f(i,v),i) for i,v in zip(y,x)]]
print("w is now the list of averages, corresponding with y:\n \
\n x: %s \n y: %s \n w: %s \n" % (x, y, w))
Output:
w is now the list of averages, corresponding with y:
x: [1, 2, 8, 5, 0, 7]
y: ['a', 'a', 'b', 'c', 'c', 'c']
w: [1.5, 1.5, 8.0, 4.0, 4.0, 4.0]
Regards,
Michael
>>> [sum(a for a,b in zip(x,y) if b==c)/y.count(c)for c in y]
[1.5, 1.5, 8.0, 4.0, 4.0, 4.0]
Peter
... pwned.
Should be the fastest and shortest way to do it.
I tried to do something like this, but my brain hurt while trying to
visualize list comprehension evaluation orders ;)
Regards,
Michael
Heh. Yep, I avoided OO for this. Seems like a functional problem.
My solution is functional on the outside, imperative on the inside.
You could add recursion here, but I don't think it would be as
straightforward.
def num_dups_at_head(lst):
assert len(lst) > 0
val = lst[0]
i = 1
while i < len(lst) and lst[i] == val:
i += 1
return i
def smooth(x, y):
result = []
while x:
cnt = num_dups_at_head(y)
avg = sum(x[:cnt]) * 1.0 / cnt
result += [avg] * cnt
x = x[cnt:]
y = y[cnt:]
return result
> Am 09.03.2010 13:02, schrieb Peter Otten:
>>>>> [sum(a for a,b in zip(x,y) if b==c)/y.count(c)for c in y]
>> [1.5, 1.5, 8.0, 4.0, 4.0, 4.0]
>> Peter
>
> ... pwned.
> Should be the fastest and shortest way to do it.
It may be short, but it is not particularly efficient. A dict-based approach
is probably the fastest. If y is guaranteed to be sorted itertools.groupby()
may also be worth a try.
$ cat tmp_average_compare.py
from __future__ import division
from collections import defaultdict
try:
from itertools import izip as zip
except ImportError:
pass
x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]
def f(x=x, y=y):
p = defaultdict(int)
q = defaultdict(int)
for a, b in zip(x, y):
p[b] += a
q[b] += 1
return [p[b]/q[b] for b in y]
def g(x=x, y=y):
return [sum(a for a,b in zip(x,y)if b==c)/y.count(c)for c in y]
if __name__ == "__main__":
print(f())
print(g())
assert f() == g()
$ python3 -m timeit -s 'from tmp_average_compare import f, g' 'f()'
100000 loops, best of 3: 11.4 usec per loop
$ python3 -m timeit -s 'from tmp_average_compare import f, g' 'g()'
10000 loops, best of 3: 22.8 usec per loop
Peter
What results are you expecting if you have multiple runs of 'a' in a
longer list?
BTW I recognize that my solution would be inefficient for long lists,
unless the underlying list implementation had copy-on-write. I'm
wondering what the easiest fix would be. I tried a quick shot at
islice(), but the lack of len() thwarted me.
I converged to the same solution but had an extra reduction step in
case there were a lot of repeats in the input. I think it is a good
compromise between efficiency, readability and succinctness.
x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]
from collections import defaultdict
totdct = defaultdict(int)
cntdct = defaultdict(int)
for name, num in zip(y,x):
totdct[name] += num
cntdct[name] += 1
avgdct = {name : totdct[name]/cnts for name, cnts in cntdct.items()}
w = [avgdct[name] for name in y]