My code snippet reads data from excel ranges. First row and first
column are column headers and row headers respectively. After reding
the range I build a dict.
................'A'..............'B'
'ab'............3................5
'cd'............7................2
'cd'............9................1
'ac'............7................2
d={('ab','A'): 3, ('ab','B'): 5, ('cd','A'): 7, ...
However, as you can see there are two rows that start with 'cd', and
dicts, AFAIK do not accept duplicates.
What is the best workaround for this? Should I discard dicts? Should I
somehow have under 'cd'... a list of values?
One of the difficulties I find here is that I want to be able to
easily sum all the values for each row key: 'ab', 'cd' and 'ac'.
However, using lists inside dicts makes it a difficult issue for me.
What is the best approach for this problem? Can anybody help?
Normall dicts are used if you want to access your data at a later point
in time by the key name.
Do you want to be able to do this?
Then what would you expect to receive for d[('cd','A')] ?
The first value? the second value? both values?
Could you perhaps change further occurences of 'cd' with 'cd1' , 'cd2' ,
'cd3', ... ?
Not knowing your exact context makes it difficult to suggest solutions?
perhaps you could switch to a list containing a tuple of (rowname,rowdict)
l = [ ('ab', { 'A': 3 , 'B': 5 } ),
'cd', { 'A': 7 , 'B': 2 } ),
'cd', { 'A': 9 , 'B': 1 } ),
'ac', { ... }
]
bye
N
> Hello,
>
> My code snippet reads data from excel ranges. First row and first column
> are column headers and row headers respectively. After reding the range
> I build a dict.
>
> ................'A'..............'B'
> 'ab'............3................5
> 'cd'............7................2
> 'cd'............9................1
> 'ac'............7................2
>
> d={('ab','A'): 3, ('ab','B'): 5, ('cd','A'): 7, ...
>
> However, as you can see there are two rows that start with 'cd', and
> dicts, AFAIK do not accept duplicates.
> One of the difficulties I find here is that I want to be able to easily
> sum all the values for each row key: 'ab', 'cd' and 'ac'. However,
> using lists inside dicts makes it a difficult issue for me.
Given the sample above, what answer do you expect for summing the 'cd'
row? There are four reasonable answers:
7 + 2 = 9
9 + 1 = 10
7 + 2 + 9 + 1 = 19
Error
You need to decide what you want to do before asking how to do it.
--
Steven
Steven,
What I need is that sum(('cd','A')) gives me 16, sum(('cd','B')) gives
me 3.
I apologize for not having made it clear.
But you really *do* want lists inside the dict if you want to be
able to call sum() on them. You want to map the tuple ('cd','A')
to the list [7,9] so you can sum the results. And if you plan to
sum the results, it's far easier to have one-element lists and
just sum them, instead of having to special case "if it's a list,
sum it, otherwise, return the value". So I'd use something like
import csv
f = file(INFILE, 'rb')
r = csv.reader(f, ...)
headers = r.next() # discard the headers
d = defaultdict(list)
for (label, a, b) in r:
d[(label, 'a')].append(int(a))
d[(label, 'b')].append(int(b))
# ...
for (label, col), value in d.iteritems():
print label, col, 'sum =', sum(value)
Alternatively, if you don't need to store the intermediate
values, and just want to store the sums, you can accrue them as
you go along:
d = defaultdict(int)
for (label, a, b) in r:
d[(label, 'a')] += int(a)
d[(label, 'b')] += int(b)
# ...
for (label, col), value in d.iteritems():
print label, col, 'sum =', value
Both are untested, but I'm pretty sure they're both viable,
modulo my sleep-deprived eyes.
-tkc
Have you tried xlread ? (http://www.python-excel.org/)
Best,
-- Yinon