Duplicate keys in dict?

vsoler

unread,

Mar 7, 2010, 11:23:13 AM3/7/10

to

Hello,

My code snippet reads data from excel ranges. First row and first
column are column headers and row headers respectively. After reding
the range I build a dict.

................'A'..............'B'
'ab'............3................5
'cd'............7................2
'cd'............9................1
'ac'............7................2

d={('ab','A'): 3, ('ab','B'): 5, ('cd','A'): 7, ...

However, as you can see there are two rows that start with 'cd', and
dicts, AFAIK do not accept duplicates.

What is the best workaround for this? Should I discard dicts? Should I
somehow have under 'cd'... a list of values?

One of the difficulties I find here is that I want to be able to
easily sum all the values for each row key: 'ab', 'cd' and 'ac'.
However, using lists inside dicts makes it a difficult issue for me.

What is the best approach for this problem? Can anybody help?

News123

unread,

Mar 7, 2010, 11:46:02 AM3/7/10

to

vsoler wrote:
> Hello,
>
> My code snippet reads data from excel ranges. First row and first
> column are column headers and row headers respectively. After reding
> the range I build a dict.
>
> ................'A'..............'B'
> 'ab'............3................5
> 'cd'............7................2
> 'cd'............9................1
> 'ac'............7................2
>
> d={('ab','A'): 3, ('ab','B'): 5, ('cd','A'): 7, ...
>
> However, as you can see there are two rows that start with 'cd', and
> dicts, AFAIK do not accept duplicates.

Normall dicts are used if you want to access your data at a later point
in time by the key name.

Do you want to be able to do this?

Then what would you expect to receive for d[('cd','A')] ?

The first value? the second value? both values?

Could you perhaps change further occurences of 'cd' with 'cd1' , 'cd2' ,
'cd3', ... ?

Not knowing your exact context makes it difficult to suggest solutions?

perhaps you could switch to a list containing a tuple of (rowname,rowdict)

l = [ ('ab', { 'A': 3 , 'B': 5 } ),
'cd', { 'A': 7 , 'B': 2 } ),
'cd', { 'A': 9 , 'B': 1 } ),
'ac', { ... }
]

bye

N

Steven D'Aprano

unread,

Mar 7, 2010, 11:53:55 AM3/7/10

to

On Sun, 07 Mar 2010 08:23:13 -0800, vsoler wrote:

> Hello,
>
> My code snippet reads data from excel ranges. First row and first column
> are column headers and row headers respectively. After reding the range
> I build a dict.
>
> ................'A'..............'B'
> 'ab'............3................5
> 'cd'............7................2
> 'cd'............9................1
> 'ac'............7................2
>
> d={('ab','A'): 3, ('ab','B'): 5, ('cd','A'): 7, ...
>
> However, as you can see there are two rows that start with 'cd', and
> dicts, AFAIK do not accept duplicates.

> One of the difficulties I find here is that I want to be able to easily

> sum all the values for each row key: 'ab', 'cd' and 'ac'. However,
> using lists inside dicts makes it a difficult issue for me.

Given the sample above, what answer do you expect for summing the 'cd'
row? There are four reasonable answers:

7 + 2 = 9
9 + 1 = 10
7 + 2 + 9 + 1 = 19
Error

You need to decide what you want to do before asking how to do it.

--
Steven

vsoler

unread,

Mar 7, 2010, 12:13:13 PM3/7/10

to

On 7 mar, 17:53, Steven D'Aprano <st...@REMOVE-THIS-

Steven,

What I need is that sum(('cd','A')) gives me 16, sum(('cd','B')) gives
me 3.

I apologize for not having made it clear.

Message has been deleted

Tim Chase

unread,

Mar 7, 2010, 2:11:18 PM3/7/10

to vsoler, Python-list

vsoler wrote:
> On 7 mar, 17:53, Steven D'Aprano <st...@REMOVE-THIS-
> cybersource.com.au> wrote:
>> On Sun, 07 Mar 2010 08:23:13 -0800, vsoler wrote:
>>> Hello,
>>> My code snippet reads data from excel ranges. First row and first column
>>> are column headers and row headers respectively. After reding the range
>>> I build a dict.
>>> ................'A'..............'B'
>>> 'ab'............3................5
>>> 'cd'............7................2
>>> 'cd'............9................1
>>> 'ac'............7................2
>>> d={('ab','A'): 3, ('ab','B'): 5, ('cd','A'): 7, ...
>>> However, as you can see there are two rows that start with 'cd', and
>>> dicts, AFAIK do not accept duplicates.
>>> One of the difficulties I find here is that I want to be able to easily
>>> sum all the values for each row key: 'ab', 'cd' and 'ac'. However,
>>> using lists inside dicts makes it a difficult issue for me.
>

> What I need is that sum(('cd','A')) gives me 16, sum(('cd','B')) gives
> me 3.

But you really *do* want lists inside the dict if you want to be
able to call sum() on them. You want to map the tuple ('cd','A')
to the list [7,9] so you can sum the results. And if you plan to
sum the results, it's far easier to have one-element lists and
just sum them, instead of having to special case "if it's a list,
sum it, otherwise, return the value". So I'd use something like

import csv
f = file(INFILE, 'rb')
r = csv.reader(f, ...)
headers = r.next() # discard the headers
d = defaultdict(list)
for (label, a, b) in r:
d[(label, 'a')].append(int(a))
d[(label, 'b')].append(int(b))
# ...
for (label, col), value in d.iteritems():
print label, col, 'sum =', sum(value)

Alternatively, if you don't need to store the intermediate
values, and just want to store the sums, you can accrue them as
you go along:

d = defaultdict(int)
for (label, a, b) in r:
d[(label, 'a')] += int(a)
d[(label, 'b')] += int(b)
# ...
for (label, col), value in d.iteritems():
print label, col, 'sum =', value

Both are untested, but I'm pretty sure they're both viable,
modulo my sleep-deprived eyes.

-tkc

Yinon Ehrlich

unread,

Mar 9, 2010, 7:27:59 AM3/9/10

to

On Mar 7, 6:23 pm, vsoler <vicente.so...@gmail.com> wrote:
> Hello,
>
> My code snippet reads data from excel ranges. First row and first
> column are column headers and row headers respectively. After reding
> the range I build a dict.
>

> What is the best approach for this problem? Can anybody help?

Have you tried xlread ? (http://www.python-excel.org/)
Best,
-- Yinon