[Python-ideas] Dict joining using + and +=

127 views
Skip to first unread message

João Matos

unread,
Feb 27, 2019, 11:29:46 AM2/27/19
to Python-Ideas
Hello,

I would like to propose that instead of using this (applies to Py3.5 and upwards)
dict_a = {**dict_a, **dict_b}

we could use
dict_a = dict_a + dict_b

or even better
dict_a += dict_b


Best regards,

João Matos

Rhodri James

unread,
Feb 27, 2019, 11:50:29 AM2/27/19
to python...@python.org
On 27/02/2019 16:25, João Matos wrote:
> Hello,
>
> I would like to propose that instead of using this (applies to Py3.5 and upwards)
> dict_a = {**dict_a, **dict_b}
>
> we could use
> dict_a = dict_a + dict_b
>
> or even better
> dict_a += dict_b

While I don't object to the idea of concatenating dictionaries, I feel
obliged to point out that this last is currently spelled
dict_a.update(dict_b)

--
Rhodri James *-* Kynesim Ltd
_______________________________________________
Python-ideas mailing list
Python...@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Guido van Rossum

unread,
Feb 27, 2019, 12:06:44 PM2/27/19
to Rhodri James, Python-Ideas
On Wed, Feb 27, 2019 at 8:50 AM Rhodri James <rho...@kynesim.co.uk> wrote:
On 27/02/2019 16:25, João Matos wrote:
> I would like to propose that instead of using this (applies to Py3.5 and upwards)
> dict_a = {**dict_a, **dict_b}
>
> we could use
> dict_a = dict_a + dict_b
>
> or even better
> dict_a += dict_b

While I don't object to the idea of concatenating dictionaries, I feel
obliged to point out that this last is currently spelled
dict_a.update(dict_b)

This is likely to be controversial. But I like the idea. After all, we have `list.extend(x)` ~~ `list += x`. The key conundrum that needs to be solved is what to do for `d1 + d2` when there are overlapping keys. I propose to make d2 win in this case, which is what happens in `d1.update(d2)` anyways. If you want it the other way, simply write `d2 + d1`.

--
--Guido van Rossum (python.org/~guido)

George Castillo

unread,
Feb 27, 2019, 12:36:00 PM2/27/19
to gu...@python.org, Python-Ideas
The key conundrum that needs to be solved is what to do for `d1 + d2` when there are overlapping keys. I propose to make d2 win in this case, which is what happens in `d1.update(d2)` anyways. If you want it the other way, simply write `d2 + d1`.

This would mean that addition, at least in this particular instance, is not a commutative operation.  Are there other places in Python where this is the case?

~ George


Guido van Rossum

unread,
Feb 27, 2019, 12:38:21 PM2/27/19
to George Castillo, Python-Ideas
On Wed, Feb 27, 2019 at 9:34 AM George Castillo <gmca...@gmail.com> wrote:
The key conundrum that needs to be solved is what to do for `d1 + d2` when there are overlapping keys. I propose to make d2 win in this case, which is what happens in `d1.update(d2)` anyways. If you want it the other way, simply write `d2 + d1`.

This would mean that addition, at least in this particular instance, is not a commutative operation.  Are there other places in Python where this is the case?

Yes there are. 'a' + 'b' is not the same as 'b' + 'a'.

For non-numbers we only require + to be associative, i.e. a + b + c == (a + b) + c == a + (b + c).

That is satisfied for this proposal.
 

Oleg Broytman

unread,
Feb 27, 2019, 12:39:40 PM2/27/19
to python...@python.org
On Wed, Feb 27, 2019 at 09:05:20AM -0800, Guido van Rossum <gu...@python.org> wrote:
> On Wed, Feb 27, 2019 at 8:50 AM Rhodri James <rho...@kynesim.co.uk> wrote:
>
> > On 27/02/2019 16:25, Jo??o Matos wrote:
> > > I would like to propose that instead of using this (applies to Py3.5 and
> > upwards)
> > > dict_a = {**dict_a, **dict_b}
> > >
> > > we could use
> > > dict_a = dict_a + dict_b
> > >
> > > or even better
> > > dict_a += dict_b
> >
> > While I don't object to the idea of concatenating dictionaries, I feel
> > obliged to point out that this last is currently spelled
> > dict_a.update(dict_b)
> >
>
> This is likely to be controversial. But I like the idea. After all, we have
> `list.extend(x)` ~~ `list += x`. The key conundrum that needs to be solved
> is what to do for `d1 + d2` when there are overlapping keys. I propose to
> make d2 win in this case, which is what happens in `d1.update(d2)` anyways.
> If you want it the other way, simply write `d2 + d1`.

That is, ``d1 + d2`` is::

d = d1.copy()
d.update(d2)
return d

> --
> --Guido van Rossum (python.org/~guido)

Oleg.
--
Oleg Broytman https://phdru.name/ p...@phdru.name
Programmers don't die, they just GOSUB without RETURN.

E. Madison Bray

unread,
Feb 27, 2019, 12:40:58 PM2/27/19
to George Castillo, Python-Ideas
On Wed, Feb 27, 2019 at 6:35 PM George Castillo <gmca...@gmail.com> wrote:
>>
>> The key conundrum that needs to be solved is what to do for `d1 + d2` when there are overlapping keys. I propose to make d2 win in this case, which is what happens in `d1.update(d2)` anyways. If you want it the other way, simply write `d2 + d1`.
>
>
> This would mean that addition, at least in this particular instance, is not a commutative operation. Are there other places in Python where this is the case?

Sure:

>>> a = "A"
>>> b = "B"
>>> a + b == b + a
False

MRAB

unread,
Feb 27, 2019, 1:15:50 PM2/27/19
to python...@python.org
> Are there any advantages of using '+' over '|'?

Anders Hovmöller

unread,
Feb 27, 2019, 1:23:15 PM2/27/19
to João Matos, Python-Ideas
I dislike the asymmetry with sets:

> {1} | {2}
{1, 2}

To me it makes sense that if + works for dict then it should for set too.

/ Anders

Michael Selik

unread,
Feb 27, 2019, 1:42:39 PM2/27/19
to Anders Hovmöller, Python-Ideas
On Wed, Feb 27, 2019 at 10:22 AM Anders Hovmöller <bo...@killingar.net> wrote:
I dislike the asymmetry with sets:

> {1} | {2}
{1, 2}

To me it makes sense that if + works for dict then it should for set too.

/ Anders

> On 27 Feb 2019, at 17:25, João Matos <jcrm...@gmail.com> wrote:
>
> Hello,
>
> I would like to propose that instead of using this (applies to Py3.5 and upwards)
> dict_a = {**dict_a, **dict_b}
>
> we could use
> dict_a = dict_a + dict_b


The dict subclass collections.Counter overrides the update method for adding values instead of overwriting values.


Counter also uses +/__add__ for a similar behavior.

    >>> c = Counter(a=3, b=1)
    >>> d = Counter(a=1, b=2)
    >>> c + d # add two counters together:  c[x] + d[x]
    Counter({'a': 4, 'b': 3})

At first I worried that changing base dict would cause confusion for the subclass, but Counter seems to share the idea that update and + are synonyms.

Guido van Rossum

unread,
Feb 27, 2019, 1:49:40 PM2/27/19
to Michael Selik, Python-Ideas
Great, this sounds like a good argument for + over |. The other argument is that | for sets *is* symmetrical, while + is used for other collections where it's not symmetrical. So it sounds like + is a winner here.
 

João Matos

unread,
Feb 27, 2019, 2:06:25 PM2/27/19
to python...@python.org
Hello,

Great.
Because I don't program in any other language except Python, I can't make the PR (with the C code).
Maybe someone who program in C can help?

Best regards,

João Matos

Guido van Rossum

unread,
Feb 27, 2019, 2:10:10 PM2/27/19
to João Matos, Python-Ideas
On Wed, Feb 27, 2019 at 11:06 AM João Matos <jcrm...@gmail.com> wrote:
Great.
Because I don't program in any other language except Python, I can't make the PR (with the C code).
Maybe someone who program in C can help?

First we need a PEP, and for a PEP you need a core dev interested in sponsoring the PEP. And I'm not it. Is there a core dev who is interested in sponsoring or co-authoring this PEP?
 

David Mertz

unread,
Feb 27, 2019, 3:12:38 PM2/27/19
to George Castillo, python-ideas
"foo" + "bar" != "bar" + "foo"

Brandt Bucher

unread,
Feb 27, 2019, 3:19:12 PM2/27/19
to João Matos, python...@python.org
I’d like to try my hand at implementing this, if nobody else is interested. I should be able to have something up today.

Brandt

Antoine Pitrou

unread,
Feb 27, 2019, 3:20:58 PM2/27/19
to python...@python.org
On Wed, 27 Feb 2019 10:48:21 -0800
Guido van Rossum <gu...@python.org> wrote:
>
> Great, this sounds like a good argument for + over |. The other argument is
> that | for sets *is* symmetrical, [...]

As much as it can be:

>>> {-0.0} | {0.0}
{-0.0}
>>> {0.0} | {-0.0}
{0.0}

;-)

Antoine.

Steven D'Aprano

unread,
Feb 27, 2019, 6:52:58 PM2/27/19
to python...@python.org
On Wed, Feb 27, 2019 at 10:34:43AM -0700, George Castillo wrote:
> >
> > The key conundrum that needs to be solved is what to do for `d1 + d2` when
> > there are overlapping keys. I propose to make d2 win in this case, which is
> > what happens in `d1.update(d2)` anyways. If you want it the other way,
> > simply write `d2 + d1`.
>
>
> This would mean that addition, at least in this particular instance, is not
> a commutative operation. Are there other places in Python where this is
> the case?

Strings, bytes, lists, tuples.

In this case, I wouldn't call it dict addition, I would call it a union
operator. That suggests that maybe we match sets and use | for union.

That also suggests d1 & d2 for the intersection between two dicts, but
which value should win?

More useful than intersection is, I think, dict subtraction: d1 - d2
being a new dict with the keys/values from d1 which aren't in d2.



--
Steven

Guido van Rossum

unread,
Feb 27, 2019, 7:08:50 PM2/27/19
to Steven D'Aprano, Python-Ideas
OK, you're it. Please write a PEP for this.

Brandt Bucher

unread,
Feb 27, 2019, 11:22:51 PM2/27/19
to gu...@python.org, Python-Ideas
Here is a working implementation of dictionary addition, for consideration with the PEP:

Serhiy Storchaka

unread,
Feb 28, 2019, 2:18:13 AM2/28/19
to python...@python.org
27.02.19 20:48, Guido van Rossum пише:

>
> On Wed, Feb 27, 2019 at 10:42 AM Michael Selik
> <mi...@selik.org
> <mailto:mi...@selik.org>> wrote > The dict subclass collections.Counter overrides the update method

> for adding values instead of overwriting values.
>
> https://docs.python.org/3/library/collections.html#collections.Counter.update
>
> Counter also uses +/__add__ for a similar behavior.
>
>     >>> c = Counter(a=3, b=1)
>     >>> d = Counter(a=1, b=2)
>     >>> c + d # add two counters together:  c[x] + d[x]
>     Counter({'a': 4, 'b': 3})
>
> At first I worried that changing base dict would cause confusion for
> the subclass, but Counter seems to share the idea that update and +
> are synonyms.
>
>
> Great, this sounds like a good argument for + over |. The other argument
> is that | for sets *is* symmetrical, while + is used for other
> collections where it's not symmetrical. So it sounds like + is a winner
> here.

Counter uses + for a *different* behavior!

>>> Counter(a=2) + Counter(a=3)
Counter({'a': 5})

I do not understand why we discuss a new syntax for dict merging if we
already have a syntax for dict merging: {**d1, **d2} (which works with
*all* mappings). Is not this contradicts the Zen?

Eric V. Smith

unread,
Feb 28, 2019, 3:43:18 AM2/28/19
to python...@python.org

I'd help out.

Eric

James Lu

unread,
Feb 28, 2019, 7:41:30 AM2/28/19
to Serhiy Storchaka, python...@python.org
I agree with Storchaka here. The advantage of existing dict merge syntax is that it will cause an error if the object is not a dict or dict-like object, thus preventing people from doing bad things.

Karthikeyan

unread,
Feb 28, 2019, 12:43:49 PM2/28/19
to João Matos, Python-Ideas
Just to add to the discussion this was brought up previously as part of PEP 448 unpacking generalizations that also added {**x, **y} to merge two dicts in Python 3.5.


The previous thread is worth reading as some of the points still stand even with {**x, **y} added.

_______________________________________________
Python-ideas mailing list
Python...@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


--
Regards,
Karthikeyan S

Greg Ewing

unread,
Feb 28, 2019, 4:20:56 PM2/28/19
to python...@python.org
Serhiy Storchaka wrote:
> I do not understand why we discuss a new syntax for dict merging if we
> already have a syntax for dict merging: {**d1, **d2} (which works with
> *all* mappings).

But that always returns a dict. A '+' operator could be implemented
by other mapping types to return a mapping of the same type.

--
Greg

Guido van Rossum

unread,
Feb 28, 2019, 11:23:05 PM2/28/19
to Serhiy Storchaka, Eric Smith, Python-Ideas
On Wed, Feb 27, 2019 at 11:18 PM Serhiy Storchaka <stor...@gmail.com> wrote:
27.02.19 20:48, Guido van Rossum пише:
>
> On Wed, Feb 27, 2019 at 10:42 AM Michael Selik
> <mi...@selik.org
> <mailto:mi...@selik.org>> wrote >     The dict subclass collections.Counter overrides the update method
>     for adding values instead of overwriting values.
>
>     https://docs.python.org/3/library/collections.html#collections.Counter.update
>
>     Counter also uses +/__add__ for a similar behavior.
>
>          >>> c = Counter(a=3, b=1)
>          >>> d = Counter(a=1, b=2)
>          >>> c + d # add two counters together:  c[x] + d[x]
>          Counter({'a': 4, 'b': 3})
>
>     At first I worried that changing base dict would cause confusion for
>     the subclass, but Counter seems to share the idea that update and +
>     are synonyms.
>
>
> Great, this sounds like a good argument for + over |. The other argument
> is that | for sets *is* symmetrical, while + is used for other
> collections where it's not symmetrical. So it sounds like + is a winner
> here.

Counter uses + for a *different* behavior!

 >>> Counter(a=2) + Counter(a=3)
Counter({'a': 5})

Well, you can see this as a special case. The proposed + operator on Mappings returns a new Mapping whose keys are the union of the keys of the two arguments; the value is the single value for a key that occurs in only one of the arguments, and *somehow* combined for a key that's in both. The way of combining keys is up to the type of Mapping. For dict, the second value wins (not so different as {'a': 1, 'a': 2}, which becomes {'a': 2}). But for other Mappings, the combination can be done differently -- and Counter chooses to add the two values.
 
I do not understand why we discuss a new syntax for dict merging if we
already have a syntax for dict merging: {**d1, **d2} (which works with
*all* mappings). Is not this contradicts the Zen?

But (as someone else pointed out) {**d1, **d2} always returns a dict, not the type of d1 and d2.

Also, I'm sorry for PEP 448, but even if you know about **d in simpler contexts, if you were to ask a typical Python user how to combine two dicts into a new one, I doubt many people would think of {**d1, **d2}. I know I myself had forgotten about it when this thread started! If you were to ask a newbie who has learned a few things (e.g. sequence concatenation) they would much more likely guess d1+d2.

The argument for + over | has been mentioned elsewhere already.
> I'd help out.

Please do! I tried to volunteer Stephen d'Aprano but I think he isn't interested in pushing through a controversial PEP.

The PEP should probably also propose d1-d2.

Karthikeyan

unread,
Feb 28, 2019, 11:59:24 PM2/28/19
to gu...@python.org, Serhiy Storchaka, Eric Smith, Python-Ideas
The PEP should probably also propose d1-d2.

What would be the output of this? Does this return a new dictionary where keys in d2 are removed in d1 like sets?

>>> d = dict((i, i) for i in range(5))
>>> e = dict((i, i) for i in range(4, 10))
>>> d
{0: 0, 1: 1, 2: 2, 3: 3, 4: 4}
>>> e
{4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9}
>>> d.items() - e.items()
{(0, 0), (1, 1), (3, 3), (2, 2)}
>>> dict(d.items() - e.items())
{0: 0, 1: 1, 3: 3, 2: 2}

--
Regards,
Karthikeyan S

Hasan Diwan

unread,
Mar 1, 2019, 12:00:55 AM3/1/19
to Python-Ideas
Do we really need a "+" and a "-" operation on dictionaries? [dictinstance.update({k:v}) for k,v in dictinstance.items()] does handle merges already. And I'm assuming that "-" should return the difference -- set(d1.keys()) - set(d2.keys()), right? -- H
--
If you wish to request my time, please do so using bit.ly/hd1AppointmentRequest.
Si vous voudrais faire connnaisance, allez a bit.ly/hd1AppointmentRequest.

Sent from my mobile device
Envoye de mon portable

Steven D'Aprano

unread,
Mar 1, 2019, 12:56:13 AM3/1/19
to python...@python.org
On Thu, Feb 28, 2019 at 08:59:30PM -0800, Hasan Diwan wrote:

> Do we really need a "+" and a "-" operation on dictionaries?
> [dictinstance.update({k:v}) for k,v in dictinstance.items()] does handle
> merges already.

I don;t think that does what you intended. That merges dictinstance with
itself (a no-op!), but one item at a time, so in the slowest, most
inefficient way possible.

Writing a comprehension for its side-effects is an anti-pattern that
should be avoided. You are creating a (potentially large) list of Nones
which has to be created, then garbage collected.


> And I'm assuming that "-" should return the difference --
> set(d1.keys()) - set(d2.keys()), right?

No. That throws away the values associated with the keys.



P.S. As per Guido's ~~command~~ request *wink* I'm writing a PEP for
this. I should have a draft ready later this evening.


--
Steven

fhsxfhsx

unread,
Mar 1, 2019, 1:09:45 AM3/1/19
to python...@python.org
Considering potential ambiguity, I suggest `d1.append(d2)` so we can have an additional argument saying `d1.append(d2, mode="some mode that tells how this function behaviours")`.
If we are really to have the new syntax `d1 + d2`, I suggest leaving it for `d1.append(d2, mode="strict")` which raises an error when there're duplicate keys. The semantics is nature and clear when two dicts have no overlapping keys.


 

Serhiy Storchaka

unread,
Mar 1, 2019, 1:30:18 AM3/1/19
to python...@python.org
28.02.19 23:19, Greg Ewing пише:

> Serhiy Storchaka wrote:
>> I do not understand why we discuss a new syntax for dict merging if we
>> already have a syntax for dict merging: {**d1, **d2} (which works with
>> *all* mappings).
>
> But that always returns a dict. A '+' operator could be implemented
> by other mapping types to return a mapping of the same type.

And this opens a non-easy problem: how to create a mapping of the same
type? Not all mappings, and even not all dict subclasses have a copying
constructor.

Adrien Ricocotam

unread,
Mar 1, 2019, 1:46:13 AM3/1/19
to fhsxfhsx, python...@python.org
I really like this idea. It’s not obvious how to deal with key conflicts and I don’t think replacing by the keys of the second dict is that obviously a good behaviour. With the actual merging ({**d1, **d2}) it works the same as when you build a custom dict so it’s usually known by people. If we add a new syntax/function, we might think of better behaviors. 

IMO, and I might be wrong, merging two mapping having common keys is an error. Thus we would need a clean way to combine two dicts. A simple way could be adding a key function that takes the values of each merged dict and returns the new value : 

d1 = ...
d2 = ...

d1.merge(d2, key=lambda values: values[0])

That’s an example, I don’t like the syntax. 

On Fri 1 Mar 2019 at 07:09, fhsxfhsx <fhsx...@126.com> wrote:
Considering potential ambiguity, I suggest `d1.append(d2)` so we can have an additional argument saying `d1.append(d2, mode="some mode that tells how this function behaviours")`.
If we are really to have the new syntax `d1 + d2`, I suggest leaving it for `d1.append(d2, mode="strict")` which raises an error when there're duplicate keys. The semantics is nature and clear when two dicts have no overlapping keys.


 

_______________________________________________

Serhiy Storchaka

unread,
Mar 1, 2019, 1:48:51 AM3/1/19
to python...@python.org
01.03.19 06:21, Guido van Rossum пише:

> On Wed, Feb 27, 2019 at 11:18 PM Serhiy Storchaka
> <stor...@gmail.com
> <mailto:stor...@gmail.com>> wrote:
> Counter uses + for a *different* behavior!
>
>  >>> Counter(a=2) + Counter(a=3)
> Counter({'a': 5})
>
>
> Well, you can see this as a special case. The proposed + operator on
> Mappings returns a new Mapping whose keys are the union of the keys of
> the two arguments; the value is the single value for a key that occurs
> in only one of the arguments, and *somehow* combined for a key that's in
> both. The way of combining keys is up to the type of Mapping. For dict,
> the second value wins (not so different as {'a': 1, 'a': 2}, which
> becomes {'a': 2}). But for other Mappings, the combination can be done
> differently -- and Counter chooses to add the two values.

Currently Counter += dict works and Counter + dict is an error. With
this change Counter + dict will return a value, but it will be different
from the result of the += operator.

Also, if the custom dict subclass implemented the plus operator with
different semantic which supports the addition with a dict, this change
will break it, because dict + CustomDict will call dict.__add__ instead
of CustomDict.__radd__. Adding support of new operators to builting
types is dangerous.

> I do not understand why we discuss a new syntax for dict merging if we
> already have a syntax for dict merging: {**d1, **d2} (which works with
> *all* mappings). Is not this contradicts the Zen?
>
>
> But (as someone else pointed out) {**d1, **d2} always returns a dict,
> not the type of d1 and d2.

And this saves us from the hard problem of creating a mapping of the
same type. Note that reference implementations discussed above make d1 +
d2 always returning a dict. dict.copy() returns a dict.

> Also, I'm sorry for PEP 448, but even if you know about **d in simpler
> contexts, if you were to ask a typical Python user how to combine two
> dicts into a new one, I doubt many people would think of {**d1, **d2}. I
> know I myself had forgotten about it when this thread started! If you
> were to ask a newbie who has learned a few things (e.g. sequence
> concatenation) they would much more likely guess d1+d2.

Perhaps the better solution is to update the documentation.

Ivan Levkivskyi

unread,
Mar 1, 2019, 4:04:29 AM3/1/19
to Serhiy Storchaka, python-ideas
On Thu, 28 Feb 2019 at 07:18, Serhiy Storchaka <stor...@gmail.com> wrote:
[...]


I do not understand why we discuss a new syntax for dict merging if we
already have a syntax for dict merging: {**d1, **d2} (which works with
*all* mappings). Is not this contradicts the Zen?

FWIW there are already three ways for lists/sequences:

[*x, *y]
x + y
x.extend(y)  # in-place version

We already have first and third for dicts/mappings, I don't see a big problem in adding a + for dicts,
also this is not really a new syntax, just implementing couple dunders for a builtin class.

So I actually like this idea.

--
Ivan


Steven D'Aprano

unread,
Mar 1, 2019, 5:47:06 AM3/1/19
to python...@python.org
On Fri, Mar 01, 2019 at 08:47:36AM +0200, Serhiy Storchaka wrote:

> Currently Counter += dict works and Counter + dict is an error. With
> this change Counter + dict will return a value, but it will be different
> from the result of the += operator.

That's how list.__iadd__ works too: ListSubclass + list will return a
value, but it might not be the same as += since that operates in place
and uses a different dunder method.

Why is it a problem for dicts but not a problem for lists?


> Also, if the custom dict subclass implemented the plus operator with
> different semantic which supports the addition with a dict, this change
> will break it, because dict + CustomDict will call dict.__add__ instead
> of CustomDict.__radd__.

That's not how operators work in Python or at least that's not how they
worked the last time I looked: if the behaviour has changed without
discussion, that's a breaking change that should be reverted.

Obviously I can't show this with dicts, but here it is with lists:

py> class MyList(list):
... def __radd__(self, other):
... print("called subclass first")
... return "Something"
...
py> [1, 2, 3] + MyList()
called subclass first
'Something'


This is normal, standard behaviour for Python operators: if the right
operand is a subclass of the left operand, the reflected method __r*__
is called first.


> Adding support of new operators to builting
> types is dangerous.

Explain what makes new operators more dangerous than old operators
please.


> > I do not understand why we discuss a new syntax for dict merging if we
> > already have a syntax for dict merging: {**d1, **d2} (which works with
> > *all* mappings). Is not this contradicts the Zen?
> >
> >
> >But (as someone else pointed out) {**d1, **d2} always returns a dict,
> >not the type of d1 and d2.
>
> And this saves us from the hard problem of creating a mapping of the
> same type.

What's wrong with doing this?

new = type(self)()

Or the equivalent from C code. If that doesn't work, surely that's the
fault of the subclass, the subclass is broken, and it will raise an
exception.

I don't think it is our responsibility to do anything more than call
the subclass constructor. If that's broken, then so be it.


Possibly relevant: I've always been frustrated and annoyed at classes
that hardcode their own type into methods. E.g. something like:

class X:
def spam(self, arg):
return X(eggs)
# Wrong! Bad! Please use type(self) instead.

That means that each subclass has to override every method:

class MySubclass(X):
def spam(self, arg):
# Do nothing except change the type returned.
return type(self)( super().spam(arg) )


This gets really annoying really quickly. Try subclassing int, for
example, where you have to override something like 30+ methods and do
nothing but wrap calls to super.


--
Steven

Steven D'Aprano

unread,
Mar 1, 2019, 6:18:09 AM3/1/19
to python...@python.org
On Thu, Feb 28, 2019 at 07:40:25AM -0500, James Lu wrote:

> I agree with Storchaka here. The advantage of existing dict merge
> syntax is that it will cause an error if the object is not a dict or
> dict-like object, thus preventing people from doing bad things.

What sort of "bad things" are you afraid of?


--
Steven

INADA Naoki

unread,
Mar 1, 2019, 7:01:18 AM3/1/19
to João Matos, Python-Ideas
I dislike adding more operator overload to builtin types.

str is not commutative, but it satisfies a in (a+b), and b in (a+b).
There are no loss.

In case of dict + dict, it not only sum.  There may be loss value.

   {"a":1} + {"a":2} = ?

In case of a.update(b), it's clear that b wins.
In case of a + b, "which wins" or "exception raised on duplicated key?" is unclear to me.

Regards,

On Thu, Feb 28, 2019 at 1:28 AM João Matos <jcrm...@gmail.com> wrote:
Hello,

I would like to propose that instead of using this (applies to Py3.5 and upwards)
dict_a = {**dict_a, **dict_b}

we could use
dict_a = dict_a + dict_b

or even better
dict_a += dict_b


Best regards,

João Matos
_______________________________________________
Python-ideas mailing list
Python...@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


--
INADA Naoki  <songof...@gmail.com>

Chris Angelico

unread,
Mar 1, 2019, 7:48:22 AM3/1/19
to Python-Ideas
On Fri, Mar 1, 2019 at 11:00 PM INADA Naoki <songof...@gmail.com> wrote:
>
> I dislike adding more operator overload to builtin types.
>
> str is not commutative, but it satisfies a in (a+b), and b in (a+b).
> There are no loss.
>
> In case of dict + dict, it not only sum. There may be loss value.
>
> {"a":1} + {"a":2} = ?
>
> In case of a.update(b), it's clear that b wins.
> In case of a + b, "which wins" or "exception raised on duplicated key?" is unclear to me.

Picking semantics can be done as part of the PEP discussion, and
needn't be a reason for rejecting the proposal before it's even made.
We have at least one other precedent to consider:

>>> {1} | {1.0}
{1}
>>> {1.0} | {1}
{1.0}

I have absolutely no doubt that these kinds of questions will be
thoroughly hashed out (multiple times, even) before the PEP gets to
pronouncement.

ChrisA

INADA Naoki

unread,
Mar 1, 2019, 7:59:38 AM3/1/19
to Chris Angelico, Python-Ideas
On Fri, Mar 1, 2019 at 9:47 PM Chris Angelico <ros...@gmail.com> wrote:
>
> On Fri, Mar 1, 2019 at 11:00 PM INADA Naoki <songof...@gmail.com> wrote:
> >
> > I dislike adding more operator overload to builtin types.
> >
> > str is not commutative, but it satisfies a in (a+b), and b in (a+b).
> > There are no loss.
> >
> > In case of dict + dict, it not only sum. There may be loss value.
> >
> > {"a":1} + {"a":2} = ?
> >
> > In case of a.update(b), it's clear that b wins.
> > In case of a + b, "which wins" or "exception raised on duplicated key?" is unclear to me.
>
> Picking semantics can be done as part of the PEP discussion, and
> needn't be a reason for rejecting the proposal before it's even made.

Yes. I say just no semantics seems clear to me. I don't discuss
which one is best.
And I say only I dislike it. It must be free to express like or dislike, no?

> We have at least one other precedent to consider:
>
> >>> {1} | {1.0}
> {1}
> >>> {1.0} | {1}
> {1.0}

It is just because of behavior of int and float. It is not caused by
set behavior.
Set keeps "no loss" semantics when view of equality.

>>> {1} <= ({1} | {1.0})
True
>>> {1.0} <= ({1} | {1.0})
True

So dict + dict is totally different than set | set.
dict + dict has los at equality level.

--
INADA Naoki <songof...@gmail.com>

Steven D'Aprano

unread,
Mar 1, 2019, 8:10:59 AM3/1/19
to python...@python.org
On Fri, Mar 01, 2019 at 08:59:45PM +0900, INADA Naoki wrote:
> I dislike adding more operator overload to builtin types.
>
> str is not commutative, but it satisfies a in (a+b), and b in (a+b).
> There are no loss.

Is this an invariant you expect to apply for other classes that support
the addition operator?

5 in (5 + 6)

[1, 2, 3] in ([1, 2, 3] + [4, 5, 6])


Since it doesn't apply for int, float, complex, list or tuple, why do
you think it must apply to dicts?


> In case of dict + dict, it not only sum. There may be loss value.

Yes? Why is that a problem?


> {"a":1} + {"a":2} = ?

Would you like to argue that Counter.__add__ is a mistake for the same
reason?

Counter(('a', 1)) + Counter(('a', 2)) = ?


For the record, what I expected the above to do turned out to be
*completely wrong* when I tried it. I expected Counter({'a': 3}) but the
actual results are Counter({'a': 2, 1: 1, 2: 1}).

Every operation is going to be mysterious if you have never
learned what it does:

from array import array
a = array('i', [1, 2, 3])
b = array('i', [10, 20, 30])
a + b = ?

Without trying it or reading the docs, should that be an
error, or concatenation, or element-wise addition?


> In case of a.update(b), it's clear that b wins.

It wasn't clear to me when I was a beginner and first came across
dict.update. I had to learn what it did by experimenting with manual
loops until it made sense to me.


> In case of a + b, "which wins" or "exception raised on duplicated key?" is
> unclear to me.

Many things are unclear to me too. That doesn't make them any less
useful.



--
Steven

Steven D'Aprano

unread,
Mar 1, 2019, 8:19:15 AM3/1/19
to python...@python.org
On Fri, Mar 01, 2019 at 09:58:08PM +0900, INADA Naoki wrote:

> >>> {1} <= ({1} | {1.0})
> True
> >>> {1.0} <= ({1} | {1.0})
> True
>
> So dict + dict is totally different than set | set.
> dict + dict has los at equality level.


Is that an invariant you expect to apply to other uses of the +
operator?

py> x = -1
py> x <= (x + x)
False

py> [999] <= ([1, 2, 3] + [999])
False



--
Steven

INADA Naoki

unread,
Mar 1, 2019, 8:21:05 AM3/1/19
to Steven D'Aprano, python-ideas
On Fri, Mar 1, 2019 at 10:10 PM Steven D'Aprano <st...@pearwood.info> wrote:
>
> On Fri, Mar 01, 2019 at 08:59:45PM +0900, INADA Naoki wrote:
> > I dislike adding more operator overload to builtin types.
> >
> > str is not commutative, but it satisfies a in (a+b), and b in (a+b).
> > There are no loss.
>
> Is this an invariant you expect to apply for other classes that support
> the addition operator?
>
> 5 in (5 + 6)

I meant more high level semantics: "no loss". Not only "in".
So my example about set used "<=" operator.

5 + 6 is sum of 5 and 6.

>
> [1, 2, 3] in ([1, 2, 3] + [4, 5, 6])
>

Both of [1,2,3] and [4,5,6] are not lost in result.

>
> Since it doesn't apply for int, float, complex, list or tuple, why do
> you think it must apply to dicts?
>

You misunderstood my "no loss" expectation.


>
> > In case of dict + dict, it not only sum. There may be loss value.
>
> Yes? Why is that a problem?
>

It's enough reason to I dislike.

>
> > {"a":1} + {"a":2} = ?
>
> Would you like to argue that Counter.__add__ is a mistake for the same
> reason?
>

In Counter's case, it's clear. In case of dict, it's unclear.

> Counter(('a', 1)) + Counter(('a', 2)) = ?
>
>
> For the record, what I expected the above to do turned out to be
> *completely wrong* when I tried it. I expected Counter({'a': 3}) but the
> actual results are Counter({'a': 2, 1: 1, 2: 1}).

It just because you misunderstood Counter's initializer argument.
It's not relating to how overload + or | operator.

>
> Every operation is going to be mysterious if you have never
> learned what it does:
>
> from array import array
> a = array('i', [1, 2, 3])
> b = array('i', [10, 20, 30])
> a + b = ?
>
> Without trying it or reading the docs, should that be an
> error, or concatenation, or element-wise addition?
>

I never say every operator must be expected by everyone.
Don't straw man.


--
INADA Naoki <songof...@gmail.com>

INADA Naoki

unread,
Mar 1, 2019, 8:48:54 AM3/1/19
to Steven D'Aprano, python-ideas
>
>
> Is that an invariant you expect to apply to other uses of the +
> operator?
>
> py> x = -1
> py> x <= (x + x)
> False
>
> py> [999] <= ([1, 2, 3] + [999])
> False
>

Please calm down. I meant each type implements "sum"
in semantics of the type, in lossless way.
What "lossless" means is changed by the semantics of the type.

-1 + -1 = -2 is sum in numerical semantics. There are no loss.

[1, 2, 3] + [999] = [1, 2, 3, 999] is (lossless) sum in sequence semantics.

So what about {"a": 1} + {"a": 2}. Is there (lossless) sum in dict semantics?

* {"a": 1} -- It seems {"a": 2} is lost in dict semantics. Should it
really called "sum" ?
* {"a": 2} -- It seems {"a": 1} is lost in dict semantics. Should it
really called "sum" ?
* {"a": 3} -- It seems bit curious compared with + of sequence,
because [2]+[3] is not [5].
It looks like more Counter than container.
* ValueError -- Hmm, it looks ugly to me.

So I don't think "sum" is not fit to dict semantics.

Regards,
--
INADA Naoki <songof...@gmail.com>

Rémi Lapeyre

unread,
Mar 1, 2019, 9:08:04 AM3/1/19
to python...@python.org
I’m having issues to understand the semantics of d1 + d2.

I think mappings are more complicated than sequences it some things
seems not obvious to me.

What would be OrderedDict1 + OrderedDict2, in which positions would be
the resulting keys, which value would be used if the same key is
present in both?

What would be defaultdict1 + defaultdict2?

It seems to me that subclasses of dict are complex mappings for which
« merging » may be less obvious than for sequences.

Ivan Levkivskyi

unread,
Mar 1, 2019, 9:20:25 AM3/1/19
to INADA Naoki, python-ideas
On Fri, 1 Mar 2019 at 13:48, INADA Naoki <songof...@gmail.com> wrote:
>
>
> Is that an invariant you expect to apply to other uses of the +
> operator?
>
> py> x = -1
> py> x <= (x + x)
> False
>
> py> [999] <= ([1, 2, 3] + [999])
> False
>

Please calm down.  I meant each type implements "sum"
in semantics of the type, in lossless way.
What "lossless" means is changed by the semantics of the type.

-1 + -1 = -2 is sum in numerical semantics.  There are no loss.

TBH I don't understand what is lossless about numeric addition. What is the definition of lossless?
Clearly some information is lost, since you can't uniquely restore two numbers you add from the result.

Unless you define what lossless means, there will be just more misunderstandings.

--
Ivan


Rhodri James

unread,
Mar 1, 2019, 9:33:41 AM3/1/19
to python...@python.org
On 01/03/2019 14:06, Rémi Lapeyre wrote:
> I’m having issues to understand the semantics of d1 + d2.

That's understandable, clouds of confusion have been raised. As far as
I can tell it's pretty straightforward: d = d1 + d2 is equivalent to:

>>> d = d1.copy()
>>> d.update(d2)

All of your subsequent questions then become "What does
DictSubclassInQuestion.update() do?" which should be well defined.


--
Rhodri James *-* Kynesim Ltd

Neil Girdhar

unread,
Mar 1, 2019, 9:38:05 AM3/1/19
to python-ideas
I agree with you here.  You might want to start a different thread with this idea and possibly come up with a PEP.  There might be some pushback for efficiency's sake, so you might have to reel in your proposal to collections.abc mixin methods and UserDict methods.

Regarding the proposal, I agree with the reasoning put forward by Guido and I like it.  I think there should be:
* d1 + d2
* d1 += d2
* d1 - d2
* d1 -= d2

which are roughly (ignoring steve's point about types)
* {**d1, **d2}
* d1.update(d2)
* {k: v for k, v in d1.items() if k not in d2}
* for k in list(d1): if k not in d2: del d1[k]

Seeing this like this, there should be no confusion about what the operators do.

I understand the points people made about the Zen of Python.  However, I think that just like with lists, we tend to use l1+l2 when combining lists and [*l1, x, *l2, y] when combining lists and elements.  Similarly, I think {**d1, **d2} should only be written when there are also key value pairs, like {**d1, k: v, **d2, k2: v2}.

Best,

Neil

INADA Naoki

unread,
Mar 1, 2019, 9:40:31 AM3/1/19
to Ivan Levkivskyi, python-ideas
Sorry, I'm not good at English enough to explain my mental model.

I meant no skip, no ignorance, no throw away.

In case of 1+2=3, both of 1 and 2 are not skipped, ignored or thrown away.

On the other hand, in case of {a:1, b:2}+{a:2}={a:2, b:2}, I feel {a:1} is skipped, ignored, or thrown away.  I used "lost" to explain it.

And I used "lossless" for "there is no lost".  Not for reversible.

If it isn't understandable to you, please ignore me.

I think Rémi’s comment is very similar to my thought.  Merging mapping is more complex than concatenate sequence and it seems hard to call it "sum".

Regards,


2019年3月1日(金) 23:19 Ivan Levkivskyi <levki...@gmail.com>:

Stefan Behnel

unread,
Mar 1, 2019, 9:41:39 AM3/1/19
to python...@python.org
Rémi Lapeyre schrieb am 01.03.19 um 15:06:

> I’m having issues to understand the semantics of d1 + d2.
>
> I think mappings are more complicated than sequences it some things
> seems not obvious to me.
>
> What would be OrderedDict1 + OrderedDict2, in which positions would be
> the resulting keys, which value would be used if the same key is
> present in both?

The only reasonable answer I can come up with is:

1) unique keys from OrderedDict1 are in the same order as before
2) duplicate keys and new keys from OrderedDict2 come after the keys from
d1, in their original order in d2 since they replace keys in d1.

Basically, the expression says: "take a copy of d1 and add the items from
d2 to it". That's exactly what you should get, whether the mappings are
ordered or not (and dict are ordered by insertion in Py3.6+).


> What would be defaultdict1 + defaultdict2?

No surprises here, the result is a copy of defaultdict1 (using the same
missing-key function) with all items from defaultdict2 added.

Remember that the order of the two operands matters. The first always
defines the type of the result, the second is only added to it.


> It seems to me that subclasses of dict are complex mappings for which
> « merging » may be less obvious than for sequences.

It's the same for subclasses of sequences.

Stefan

Dan Sommers

unread,
Mar 1, 2019, 9:45:41 AM3/1/19
to python...@python.org
I don't mean to put words into anyone's mouth, but I think I
see what IDANA Naoki means: in other cases of summation,
the result somehow includes or contains both operands. In
the case of summing dicts, though, some of the operands are
"lost" in the process.

I'm sure that I'm nowhere near as prolific as many of the
members of this list, but I don't remember ever merging
dicts (and a quick grep of my Python source tree confirms
same), so I won't comment further on the actual issue at
hand.

Eric V. Smith

unread,
Mar 1, 2019, 9:50:10 AM3/1/19
to python...@python.org
On 3/1/2019 9:38 AM, INADA Naoki wrote:
> Sorry, I'm not good at English enough to explain my mental model.
>
> I meant no skip, no ignorance, no throw away.
>
> In case of 1+2=3, both of 1 and 2 are not skipped, ignored or thrown away.
>
> On the other hand, in case of {a:1, b:2}+{a:2}={a:2, b:2}, I feel {a:1}
> is skipped, ignored, or thrown away.  I used "lost" to explain it.
>
> And I used "lossless" for "there is no lost".  Not for reversible.
>
> If it isn't understandable to you, please ignore me.
>
> I think Rémi’s comment is very similar to my thought.  Merging mapping
> is more complex than concatenate sequence and it seems hard to call it
> "sum".

I understand Inada to be saying that each value on the LHS (as shown
above) affects the result on the RHS. That's the case with addition of
ints and other types, but not so with the proposed dict addition. As he
says, the {a:1} doesn't affect the result. The result would be the same
if this key wasn't present in the first dict, or if the key had a
different value.

This doesn't bother me, personally. I'm just trying to clarify.

Eric

>
> Regards,
>
>
> 2019年3月1日(金) 23:19 Ivan Levkivskyi <levki...@gmail.com

> <mailto:levki...@gmail.com>>:


>
> On Fri, 1 Mar 2019 at 13:48, INADA Naoki <songof...@gmail.com
> <mailto:songof...@gmail.com>> wrote:
>
> >
> >
> > Is that an invariant you expect to apply to other uses of the +
> > operator?
> >
> > py> x = -1
> > py> x <= (x + x)
> > False
> >
> > py> [999] <= ([1, 2, 3] + [999])
> > False
> >
>
> Please calm down.  I meant each type implements "sum"
> in semantics of the type, in lossless way.
> What "lossless" means is changed by the semantics of the type.
>
> -1 + -1 = -2 is sum in numerical semantics.  There are no loss.
>
>
> TBH I don't understand what is lossless about numeric addition. What
> is the definition of lossless?
> Clearly some information is lost, since you can't uniquely restore
> two numbers you add from the result.
>
> Unless you define what lossless means, there will be just more
> misunderstandings.
>
> --
> Ivan
>
>
>

Stefan Behnel

unread,
Mar 1, 2019, 10:04:34 AM3/1/19
to python...@python.org
Rémi Lapeyre schrieb am 01.03.19 um 15:50:

> Le 1 mars 2019 à 15:41:52, Stefan Behnel a écrit:
>
>> Rémi Lapeyre schrieb am 01.03.19 um 15:06:
>>> I’m having issues to understand the semantics of d1 + d2.
>>>
>>> I think mappings are more complicated than sequences it some things
>>> seems not obvious to me.
>>>
>>> What would be OrderedDict1 + OrderedDict2, in which positions would be
>>> the resulting keys, which value would be used if the same key is
>>> present in both?
>>
>> The only reasonable answer I can come up with is:
>>
>> 1) unique keys from OrderedDict1 are in the same order as before
>> 2) duplicate keys and new keys from OrderedDict2 come after the keys from
>> d1, in their original order in d2 since they replace keys in d1.
>>
>> Basically, the expression says: "take a copy of d1 and add the items from
>> d2 to it". That's exactly what you should get, whether the mappings are
>> ordered or not (and dict are ordered by insertion in Py3.6+).
>
> Thanks Stefan for your feedback, unless I’m mistaken this does not work like

> Rhodri suggested, he said:
>
> I can tell it's pretty straightforward:
>
> d = d1 + d2 is equivalent to:
>
> >>> d = d1.copy()
> >>> d.update(d2)
>
> But doing this:
>
> >>> d1 = OrderedDict({"a": 1, "b": 2, "c": 3})
> >>> d2 = OrderedDict({"d": 4, "b": 5})
> >>> d = d1.copy()
> >>> d.update(d2)
> >>> d
> OrderedDict([('a', 1), ('b', 5), ('c', 3), ('d', 4)])
>
> It looks like that the semantics are either not straightforward or what you
> proposed is not the only reasonable answer. Am I missing something?

No, I was, apparently. In Py3.7:

>>> d1 = {"a": 1, "b": 2, "c": 3}
>>> d1
{'a': 1, 'b': 2, 'c': 3}
>>> d2 = {"d": 4, "b": 5}
>>> d = d1.copy()
>>> d.update(d2)
>>> d
{'a': 1, 'b': 5, 'c': 3, 'd': 4}

I think the behaviour makes sense when you know how it's implemented (keys
are stored separately from values). I would have been less surprised if the
keys had also been reordered, but well, this is how it is now in Py3.6+, so
this is how it's going to work also for the operator.

No *additional* surprises here. ;)

Ivan Levkivskyi

unread,
Mar 1, 2019, 10:33:22 AM3/1/19
to Eric V. Smith, python-ideas
On Fri, 1 Mar 2019 at 14:50, Eric V. Smith <er...@trueblade.com> wrote:
On 3/1/2019 9:38 AM, INADA Naoki wrote:
> Sorry, I'm not good at English enough to explain my mental model.
>
> I meant no skip, no ignorance, no throw away.
>
> In case of 1+2=3, both of 1 and 2 are not skipped, ignored or thrown away.
>
> On the other hand, in case of {a:1, b:2}+{a:2}={a:2, b:2}, I feel {a:1}
> is skipped, ignored, or thrown away.  I used "lost" to explain it.
>
> And I used "lossless" for "there is no lost".  Not for reversible.
>
> If it isn't understandable to you, please ignore me.
>
> I think Rémi’s comment is very similar to my thought.  Merging mapping
> is more complex than concatenate sequence and it seems hard to call it
> "sum".

I understand Inada to be saying that each value on the LHS (as shown
above) affects the result on the RHS. That's the case with addition of
ints and other types, but not so with the proposed dict addition. As he
says, the {a:1} doesn't affect the result. The result would be the same
if this key wasn't present in the first dict, or if the key had a
different value.

This doesn't bother me, personally. I'm just trying to clarify.

OK, thanks for explaining! So more formally speaking, you want to say that for other examples of '+' in Python
x1 + y == x2 + y if and only if x1 == x2, while for the proposed '+' for dicts there may be many different x_i such that
x_i + y gives the same result.

This doesn't bother me either, since this is not a critical requirement for addition. I would say this is rather a coincidence
than a conscious decision.

--
Ivan


Stefan Behnel

unread,
Mar 1, 2019, 10:35:49 AM3/1/19
to python...@python.org
Eric V. Smith schrieb am 01.03.19 um 15:49:
> I understand Inada to be saying that each value on the LHS (as shown above)
> affects the result on the RHS. That's the case with addition of ints and
> other types, but not so with the proposed dict addition. As he says, the
> {a:1} doesn't affect the result. The result would be the same if this key
> wasn't present in the first dict, or if the key had a different value.
>
> This doesn't bother me, personally.

+1

Stefan

INADA Naoki

unread,
Mar 1, 2019, 10:58:12 AM3/1/19
to Ivan Levkivskyi, Eric V. Smith, python-ideas
>
> OK, thanks for explaining! So more formally speaking, you want to say that for other examples of '+' in Python
> x1 + y == x2 + y if and only if x1 == x2, while for the proposed '+' for dicts there may be many different x_i such that
> x_i + y gives the same result.
>

It's bit different thank my mind. I'm OK to violate " x1 + y == x2 +
y if and only if x1 == x2", if it's not
important for semantics of type of x1, x2, and y.
Mapping is defined by key: value pairs. It's core part. I don't want
to call operator losts key: value pair as "sum".
That's why I thought this proposal is more serious abuse of + operator.

By the way, in case of sequence, `len(a) + len(b) == len(a + b)`. In
case of set, `len(a) + len(b) >= len(a | b)`.
Proposed operation looks similar to `set | set` than `seq + seq` in
this point of view.

I don't propose | than +. I just mean difference between
dict.update() and seq+seq is not
smaller than difference between dict.update() and set|set.

If | seems not fit to this operation, + seems not fit to this operation too.

--
INADA Naoki <songof...@gmail.com>

Stefan Behnel

unread,
Mar 1, 2019, 10:59:30 AM3/1/19
to python...@python.org
Rémi Lapeyre schrieb am 01.03.19 um 16:44:

> Le 1 mars 2019 à 16:04:47, Stefan Behnel a écrit:
>> I think the behaviour makes sense when you know how it's implemented (keys
>> are stored separately from values).
>
> Is a Python user expected to know the implementation details of all mappings
> thought?

No, it just helps _me_ in explaining the behaviour to myself. Feel free to
look it up in the documentation if you prefer.


>> I would have been less surprised if the
>> keys had also been reordered, but well, this is how it is now in Py3.6+, so
>> this is how it's going to work also for the operator.
>>
>> No *additional* surprises here. ;)
>

> There is never any surprises left once all details have been carefully worked
> out but having `+` for mappings make it looks like an easy operation whose
> meaning is non ambiguous and obvious.
>
> I’m still not convinced that it the meaning is obvious, and gave an example
> in my other message where I think it could be ambiguous.

What I meant was that it's obvious in the sense that it is no new behaviour
at all. It just provides an operator for behaviour that is already there.

We are not discussing the current behaviour here. That ship has long sailed
with the release of Python 3.6 beta 1 back in September 2016. The proposal
that is being discussed here is the new operator.

Guido van Rossum

unread,
Mar 1, 2019, 2:35:01 PM3/1/19
to Serhiy Storchaka, Python-Ideas
On Thu, Feb 28, 2019 at 10:30 PM Serhiy Storchaka <stor...@gmail.com> wrote:
28.02.19 23:19, Greg Ewing пише:

> Serhiy Storchaka wrote:
>> I do not understand why we discuss a new syntax for dict merging if we
>> already have a syntax for dict merging: {**d1, **d2} (which works with
>> *all* mappings).
>
> But that always returns a dict. A '+' operator could be implemented
> by other mapping types to return a mapping of the same type.

And this opens a non-easy problem: how to create a mapping of the same
type? Not all mappings, and even not all dict subclasses have a copying
constructor.

There's a compromise solution for this possible. We already do this for Sequence and MutableSequence: Sequence does *not* define __add__, but MutableSequence *does* define __iadd__, and the default implementation just calls self.update(other). I propose the same for Mapping (do nothing) and MutableMapping: make the default __iadd__ implementation call self.update(other).

Looking at the code for Counter, its __iadd__ and __add__ behave subtly different than Counter.update(): __iadd__ and __add__ (and __radd__) drop values that are <= 0, while update() does not. That's all fine -- Counter is not bound by the exact same semantics as dict (starting with its update() method, which adds values rather than overwriting).

Anyways, the main reason to prefer d1+d2 over {**d1, **d2} is that the latter is highly non-obvious except if you've already encountered that pattern before, while d1+d2 is what anybody familiar with other Python collection types would guess or propose. And the default semantics for subclasses of dict that don't override these are settled with the "d = d1.copy(); d.update(d2)" equivalence.

--
--Guido van Rossum (python.org/~guido)

Greg Ewing

unread,
Mar 1, 2019, 5:34:44 PM3/1/19
to python...@python.org
Serhiy Storchaka wrote:
> And this opens a non-easy problem: how to create a mapping of the same
> type?

That's the responsibility of the class implementing the + operator.

There doesn't have to be any guarantee that a subclass of it will
automatically return an instance of the subclass (many existing
types provide no such guarantee, e.g. + on strings), so whatever
strategy it uses doesn't have to be part of its public API.

--
Greg

Raymond Hettinger

unread,
Mar 2, 2019, 2:15:27 PM3/2/19
to Guido van Rossum, Serhiy Storchaka, Python-Ideas

> On Mar 1, 2019, at 11:31 AM, Guido van Rossum <gu...@python.org> wrote:
>
> There's a compromise solution for this possible. We already do this for Sequence and MutableSequence: Sequence does *not* define __add__, but MutableSequence *does* define __iadd__, and the default implementation just calls self.update(other). I propose the same for Mapping (do nothing) and MutableMapping: make the default __iadd__ implementation call self.update(other).

Usually, it's easy to add methods to classes without creating disruption, but ABCs are more problematic. If MutableMapping grows an __iadd__() method, what would that mean for existing classes that register as MutableMapping but don't already implement __iadd__? When "isinstance(m, MutableMapping)" returns True, is it a promise that the API is fully implemented? Is this something that mypy could would or should complain about?

> Anyways, the main reason to prefer d1+d2 over {**d1, **d2} is that the latter is highly non-obvious except if you've already encountered that pattern before

I concur. The latter is also an eyesore and almost certain to be a stumbling block when reading code.

That said, I'm not sure we actually need a short-cut for "d=e.copy(); d.update(f)". Code like this comes-up for me perhaps once a year. Having a plus operator on dicts would likely save me five seconds per year.

If the existing code were in the form of "d=e.copy(); d.update(f); d.update(g); d.update(h)", converting it to "d = e + f + g + h" would be a tempting but algorithmically poor thing to do (because the behavior is quadratic). Most likely, the right thing to do would be "d = ChainMap(e, f, g, h)" for a zero-copy solution or "d = dict(ChainMap(e, f, g, h))" to flatten the result without incurring quadratic costs. Both of those are short and clear.

Lastly, I'm still bugged by use of the + operator for replace-logic instead of additive-logic. With numbers and lists and Counters, the plus operator creates a new object where all the contents of each operand contribute to the result. With dicts, some of the contents for the left operand get thrown-away. This doesn't seem like addition to me (IIRC that is also why sets have "|" instead of "+").


Raymond

francismb

unread,
Mar 2, 2019, 3:26:15 PM3/2/19
to python...@python.org

On 3/2/19 8:14 PM, Raymond Hettinger wrote:
> Lastly, I'm still bugged by use of the + operator for replace-logic instead of additive-logic. With numbers and lists and Counters, the plus operator creates a new object where all the contents of each operand contribute to the result. With dicts, some of the contents for the left operand get thrown-away. This doesn't seem like addition to me (IIRC that is also why sets have "|" instead of "+").
+1, it's a good point. IMHO the proposed (meaning) overloading for + and
+= is too much/unclear. If the idea is to 'join' dicts why not to use
"d.join(...here the other dicts ...)"

Regards,
--francis

francismb

unread,
Mar 2, 2019, 5:03:39 PM3/2/19
to python...@python.org

On 2/27/19 7:14 PM, MRAB wrote:
> Are there any advantages of using '+' over '|'?
or for e.g. '<=' (d1 <= d2) over '+' (d1 + d2)

MRAB

unread,
Mar 2, 2019, 5:15:21 PM3/2/19
to python...@python.org
On 2019-03-02 22:02, francismb wrote:
>
> On 2/27/19 7:14 PM, MRAB wrote:
>> Are there any advantages of using '+' over '|'?
> or for e.g. '<=' (d1 <= d2) over '+' (d1 + d2)
>
'<=' is for comparison, less-than-or-equal (in the case of sets, subset,
which is sort of the same kind of thing). Using it for anything else in
Python would be too confusing.

francismb

unread,
Mar 3, 2019, 8:28:44 AM3/3/19
to python...@python.org


On 3/2/19 11:11 PM, MRAB wrote:
> '<=' is for comparison, less-than-or-equal (in the case of sets, subset,
> which is sort of the same kind of thing). Using it for anything else in
> Python would be too confusing.
Understandable, so the the proposed (meaning) overloading for <= is also
too much/unclear.

francismb

unread,
Mar 3, 2019, 8:37:40 AM3/3/19
to python...@python.org

On 2/27/19 7:14 PM, MRAB wrote:
> Are there any advantages of using '+' over '|'?
or '<-' (d1 <- d2) meaning merge priority (overriding policy for equal
keys) on the right dict, and may be '->' (d1 -> d2) merge priority on
the left dict over '+' (d1 + d2) ?

E.g.:
>>> d1 = {'a':1, 'b':1 }
>>> d2 = {'a':2 }
>>> d3 = d1 -> d2
>>> d3
{'a':1, 'b':1 }

>>> d1 = {'a':1, 'b':1 }
>>> d2 = {'a':2 }
>>> d3 = d1 <- d2
>>> d3
{'a':2, 'b':1 }

Regards,
--francis

Ivan Levkivskyi

unread,
Mar 4, 2019, 5:38:39 AM3/4/19
to Raymond Hettinger, Serhiy Storchaka, Python-Ideas
On Sat, 2 Mar 2019 at 19:15, Raymond Hettinger <raymond....@gmail.com> wrote:

> On Mar 1, 2019, at 11:31 AM, Guido van Rossum <gu...@python.org> wrote:
>
> There's a compromise solution for this possible. We already do this for Sequence and MutableSequence: Sequence does *not* define __add__, but MutableSequence *does* define __iadd__, and the default implementation just calls self.update(other). I propose the same for Mapping (do nothing) and MutableMapping: make the default __iadd__ implementation call self.update(other).

Usually, it's easy to add methods to classes without creating disruption, but ABCs are more problematic.  If MutableMapping grows an __iadd__() method, what would that mean for existing classes that register as MutableMapping but don't already implement __iadd__?  When "isinstance(m, MutableMapping)" returns True, is it a promise that the API is fully implemented? Is this something that mypy could would or should complain about?

Just to clarify the situation, currently Mapping and MutableMapping are not protocols from both runtime and mypy points of view. I.e. they don't have the structural __subclasshook__() (as e.g. Iterable), and are not declared as Protocol in typeshed. So to implement these (and be considered a subtype by mypy) one needs to explicitly subclass them (register() isn't supported by mypy). This means that adding a new method will not cause any problems here, since the new method will be non-abstract with a default implementation that calls update() (the same way as for MutableSequence).

The only potential for confusion I see is if there is a class that de-facto implements current MutableMapping API and made a subclass (at runtime) of MutableMapping using register(). Then after we add __iadd__, users of that class might expect that __iadd__ is implemented, while it might be not. This is however OK I think, since register() is already non type safe. Also there is a simple way to find if there are any subclassses of MutableMapping in typeshed that don't have __iadd__: one can *try* declaring MutableMapping.__iadd__ as abstract, and mypy will error on all such subclasses.

--
Ivan


Serhiy Storchaka

unread,
Mar 4, 2019, 8:30:22 AM3/4/19
to python...@python.org
01.03.19 21:31, Guido van Rossum пише:

> On Thu, Feb 28, 2019 at 10:30 PM Serhiy Storchaka
> <stor...@gmail.com
> <mailto:stor...@gmail.com>> wrote:
> And this opens a non-easy problem: how to create a mapping of the same
> type? Not all mappings, and even not all dict subclasses have a copying
> constructor.
>
>
> There's a compromise solution for this possible. We already do this for
> Sequence and MutableSequence: Sequence does *not* define __add__, but
> MutableSequence *does* define __iadd__, and the default implementation
> just calls self.update(other). I propose the same for Mapping (do
> nothing) and MutableMapping: make the default __iadd__ implementation
> call self.update(other).

This LGTM for mappings. But the problem with dict subclasses still
exists. If use the copy() method for creating a copy, d1 + d2 will
always return a dict (unless the plus operator or copy() are redefined
in a subclass). If use the constructor of the left argument type, there
will be problems with subclasses with non-compatible constructors (e.g.
defaultdict).

> Anyways, the main reason to prefer d1+d2 over {**d1, **d2} is that the
> latter is highly non-obvious except if you've already encountered that
> pattern before, while d1+d2 is what anybody familiar with other Python
> collection types would guess or propose. And the default semantics for
> subclasses of dict that don't override these are settled with the "d =
> d1.copy(); d.update(d2)" equivalence.

Dicts are not like lists or deques, or even sets. Iterating dicts
produces keys, but not values. The "in" operator tests a key, but not a
value.

It is not that I like to add an operator for dict merging, but dicts are
more like sets than sequences: they can not contain duplicated keys and
the size of the result of merging two dicts can be less than the sum of
their sizes. Using "|" looks more natural to me than using "+". We
should look at discussions for using the "|" operator for sets, if the
alternative of using "+" was considered, I think the same arguments for
preferring "|" for sets are applicable now for dicts.

But is merging two dicts a common enough problem that needs introducing
an operator to solve it? I need to merge dicts maybe not more than one
or two times by year, and I am fine with using the update() method.
Perhaps {**d1, **d2} can be more appropriate in some cases, but I did
not encounter such cases yet.

INADA Naoki

unread,
Mar 4, 2019, 8:43:09 AM3/4/19
to Serhiy Storchaka, python-ideas
On Mon, Mar 4, 2019 at 10:29 PM Serhiy Storchaka <stor...@gmail.com> wrote:
>
> It is not that I like to add an operator for dict merging, but dicts are
> more like sets than sequences: they can not contain duplicated keys and
> the size of the result of merging two dicts can be less than the sum of
> their sizes. Using "|" looks more natural to me than using "+". We
> should look at discussions for using the "|" operator for sets, if the
> alternative of using "+" was considered, I think the same arguments for
> preferring "|" for sets are applicable now for dicts.
>

I concur with Serhiy. While I don't like adding operator to dict, proposed +/-
looks similar to set |/- than seq +/-.
If we're going to add such set-like operations, operators can be:

* dict & dict_or_set
* dict - dict_or_set
* dict | dict

Especially, dict - set can be more useful than proposed dict - dict.


> But is merging two dicts a common enough problem that needs introducing
> an operator to solve it? I need to merge dicts maybe not more than one
> or two times by year, and I am fine with using the update() method.

+1.

Adding new method to builtin should have a high bar.
Adding new operator to builtin should have a higher bar.
Adding new syntax should have a highest bar.

--
INADA Naoki <songof...@gmail.com>

Serhiy Storchaka

unread,
Mar 4, 2019, 8:44:58 AM3/4/19
to python...@python.org
01.03.19 12:44, Steven D'Aprano пише:

> On Fri, Mar 01, 2019 at 08:47:36AM +0200, Serhiy Storchaka wrote:
>
>> Currently Counter += dict works and Counter + dict is an error. With
>> this change Counter + dict will return a value, but it will be different
>> from the result of the += operator.
>
> That's how list.__iadd__ works too: ListSubclass + list will return a
> value, but it might not be the same as += since that operates in place
> and uses a different dunder method.
>
> Why is it a problem for dicts but not a problem for lists?

Because the plus operator for lists predated any list subclasses.

>> Also, if the custom dict subclass implemented the plus operator with
>> different semantic which supports the addition with a dict, this change
>> will break it, because dict + CustomDict will call dict.__add__ instead
>> of CustomDict.__radd__.
>
> That's not how operators work in Python or at least that's not how they
> worked the last time I looked: if the behaviour has changed without
> discussion, that's a breaking change that should be reverted.

You are right.

> What's wrong with doing this?
>
> new = type(self)()
>
> Or the equivalent from C code. If that doesn't work, surely that's the
> fault of the subclass, the subclass is broken, and it will raise an
> exception.

Try to do this with defaultdict.

Note that none of builtin sequences or sets do this. For good reasons
they always return an instance of the base type.

David Mertz

unread,
Mar 4, 2019, 9:44:11 AM3/4/19
to Serhiy Storchaka, python-ideas
On Mon, Mar 4, 2019, 8:30 AM Serhiy Storchaka <stor...@gmail.com> wrote:
But is merging two dicts a common enough problem that needs introducing
an operator to solve it? I need to merge dicts maybe not more than one
or two times by year, and I am fine with using the update() method.
Perhaps {**d1, **d2} can be more appropriate in some cases, but I did not encounter such cases yet.

Like other folks in the thread, I also want to merge dicts three times per year. And every one of those times, itertools.ChainMap is the right way to do that non-destructively, and without copying.

Steven D'Aprano

unread,
Mar 4, 2019, 11:45:13 AM3/4/19
to python...@python.org
On Mon, Mar 04, 2019 at 09:42:53AM -0500, David Mertz wrote:
> On Mon, Mar 4, 2019, 8:30 AM Serhiy Storchaka <stor...@gmail.com> wrote:
>
> > But is merging two dicts a common enough problem that needs introducing
> > an operator to solve it? I need to merge dicts maybe not more than one
> > or two times by year, and I am fine with using the update() method.
> > Perhaps {**d1, **d2} can be more appropriate in some cases, but I did not
> > encounter such cases yet.
> >
>
> Like other folks in the thread, I also want to merge dicts three times per
> year.

I'm impressed that you have counted it with that level of accuracy. Is
it on the same three days each year, or do they move about? *wink*


> And every one of those times, itertools.ChainMap is the right way to
> do that non-destructively, and without copying.

Can you elaborate on why ChainMap is the right way to merge multiple
dicts into a single, new dict?

ChainMap also seems to implement the opposite behaviour to that usually
desired: first value seen wins, instead of last:

py> from collections import ChainMap
py> cm = ChainMap({'a': 1}, {'b': 2}, {'a': 999})
py> cm
ChainMap({'a': 1}, {'b': 2}, {'a': 999})
py> dict(cm)
{'a': 1, 'b': 2}


If you know ahead of time which order you want, you can simply reverse
it:

# prefs = site_defaults + user_defaults + document_prefs
prefs = dict(ChainMap(document_prefs, user_defaults, site_defaults))

but that seems a little awkward to me, and reads backwards. I'm used to
thinking reading left-to-right, not right-to-left.

ChainMap seems, to me, to be ideal for implementing "first wins"
mappings, such as emulating nested scopes, but not so ideal for
update/merge operations.


--
Steven

Steven D'Aprano

unread,
Mar 4, 2019, 11:51:19 AM3/4/19
to python...@python.org
On Mon, Mar 04, 2019 at 03:43:48PM +0200, Serhiy Storchaka wrote:
> 01.03.19 12:44, Steven D'Aprano пише:
> >On Fri, Mar 01, 2019 at 08:47:36AM +0200, Serhiy Storchaka wrote:
> >
> >>Currently Counter += dict works and Counter + dict is an error. With
> >>this change Counter + dict will return a value, but it will be different
> >>from the result of the += operator.
> >
> >That's how list.__iadd__ works too: ListSubclass + list will return a
> >value, but it might not be the same as += since that operates in place
> >and uses a different dunder method.
> >
> >Why is it a problem for dicts but not a problem for lists?
>
> Because the plus operator for lists predated any list subclasses.

That doesn't answer my question. Just because it is older is no
explaination for why this behaviour is not a problem for lists, or a
problem for dicts.

[...]


> >What's wrong with doing this?
> >
> > new = type(self)()
> >
> >Or the equivalent from C code. If that doesn't work, surely that's the
> >fault of the subclass, the subclass is broken, and it will raise an
> >exception.
>
> Try to do this with defaultdict.

I did. It seems to work fine with my testing:

py> defaultdict()
defaultdict(None, {})

is precisely the behaviour I would expect.

If it isn't the right thing to do, then defaultdict can override __add__
and __radd__.


> Note that none of builtin sequences or sets do this. For good reasons
> they always return an instance of the base type.

What are those good reasons?


--
Steven

Dan Sommers

unread,
Mar 4, 2019, 12:58:07 PM3/4/19
to python...@python.org
On 3/4/19 10:44 AM, Steven D'Aprano wrote:

> If you know ahead of time which order you want, you can simply reverse
> it:
>
> # prefs = site_defaults + user_defaults + document_prefs
> prefs = dict(ChainMap(document_prefs, user_defaults, site_defaults))
>
> but that seems a little awkward to me, and reads backwards. I'm used to
> thinking reading left-to-right, not right-to-left.

I read that as use document preferences first, then user
defaults, then site defautls, exactly as I'd explain the
functionality to someone else.

So maybe we're agreeing: if you think in terms of updating
a dictionary of preferences, then maybe it reads backwards,
but if you think of implementing features, then adding
dictionaries of preferences reads backwards.

David Mertz

unread,
Mar 4, 2019, 1:44:30 PM3/4/19
to Steven D'Aprano, python-ideas
On Mon, Mar 4, 2019, 11:45 AM Steven D'Aprano <st...@pearwood.info> wrote:
> Like other folks in the thread, I also want to merge dicts three times per
> year.

I'm impressed that you have counted it with that level of accuracy. Is it on the same three days each year, or do they move about? *wink*

To be respectful, I always merge dicts on Eid al-Fitr, Diwali, and Lent. I was speaking approximate since those do not appears line up with the same Gregorian year.

> And every one of those times, itertools.ChainMap is the right way to do that non-destructively, and without copying.

Can you elaborate on why ChainMap is the right way to merge multiple dicts into a single, new dict?

Zero-copy.


ChainMap also seems to implement the opposite behaviour to that usually desired: first value seen wins, instead of last:

True, the semantics are different, but equivalent, to the proposed dict addition. I put the key I want to "win" first rather than last.

If you know ahead of time which order you want, you can simply reverse it:

This seems nonsensical. If I write, at some future time, 'dict1+dict2+dict3' I need exactly as much to know "ahead of time" which keys I intend to win.

Guido van Rossum

unread,
Mar 4, 2019, 2:26:03 PM3/4/19
to python-ideas
* Dicts are not like sets because the ordering operators (<, <=, >, >=) are not defined on dicts, but they implement subset comparisons for sets. I think this is another argument pleading against | as the operator to combine two dicts.

* Regarding how to construct the new set in __add__, I now think this should be done like this:

class dict:
    <other methods>
    def __add__(self, other):
        <checks that other makes sense, else return NotImplemented>
        new = self.copy()  # A subclass may or may not choose to override
        new.update(other)
        return new

AFAICT this will give the expected result for defaultdict -- it keeps the default factory from the left operand (i.e., self).

* Regarding how often this is needed, we know that this is proposed and discussed at length every few years, so I think this will fill a real need.

* Regarding possible anti-patterns that this might encourage, I'm not aware of problems around list + list, so this seems an unwarranted worry to me.

Neil Girdhar

unread,
Mar 4, 2019, 3:07:41 PM3/4/19
to python-ideas, python-ideas
On Mon, Mar 4, 2019 at 2:26 PM Guido van Rossum <gu...@python.org> wrote:
>
> * Dicts are not like sets because the ordering operators (<, <=, >, >=) are not defined on dicts, but they implement subset comparisons for sets. I think this is another argument pleading against | as the operator to combine two dicts.
>

I feel like dict should be treated like sets with the |, &, and -
operators since in mathematics a mapping is sometimes represented as a
set of pairs with unique first elements. Therefore, I think the set
metaphor is stronger.

> * Regarding how to construct the new set in __add__, I now think this should be done like this:
>
> class dict:
> <other methods>
> def __add__(self, other):
> <checks that other makes sense, else return NotImplemented>
> new = self.copy() # A subclass may or may not choose to override
> new.update(other)
> return new

I like that, but it would be inefficient to do that for __sub__ since
it would create elements that it might later delete.

def __sub__(self, other):
new = self.copy()
for k in other:
del new[k]
return new

is less efficient than

def __sub__(self, other):
return type(self)({k: v for k, v in self.items() if k not in other})

when copying v is expensive. Also, users would probably not expect
values that don't end up being returned to be copied.

>
> AFAICT this will give the expected result for defaultdict -- it keeps the default factory from the left operand (i.e., self).
>
> * Regarding how often this is needed, we know that this is proposed and discussed at length every few years, so I think this will fill a real need.
>
> * Regarding possible anti-patterns that this might encourage, I'm not aware of problems around list + list, so this seems an unwarranted worry to me.
>

I agree with these points.

Best,

Neil
> --
> --Guido van Rossum (python.org/~guido)
>
> --
>
> ---
> You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/zfHYRHMIAdM/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to python-ideas...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
> _______________________________________________
> Python-ideas mailing list
> Python...@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
> --
>
> ---
> You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/zfHYRHMIAdM/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to python-ideas...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Guido van Rossum

unread,
Mar 4, 2019, 3:22:42 PM3/4/19
to Neil Girdhar, python-ideas, python-ideas
On Mon, Mar 4, 2019 at 12:12 PM Neil Girdhar <miste...@gmail.com> wrote:
On Mon, Mar 4, 2019 at 2:26 PM Guido van Rossum <gu...@python.org> wrote:
>
> * Dicts are not like sets because the ordering operators (<, <=, >, >=) are not defined on dicts, but they implement subset comparisons for sets. I think this is another argument pleading against | as the operator to combine two dicts.
>

I feel like dict should be treated like sets with the |, &, and -
operators since in mathematics a mapping is sometimes represented as a
set of pairs with unique first elements.  Therefore, I think the set
metaphor is stronger.

That ship has long sailed.
 
> * Regarding how to construct the new set in __add__, I now think this should be done like this:
>
> class dict:
>     <other methods>
>     def __add__(self, other):
>         <checks that other makes sense, else return NotImplemented>
>         new = self.copy()  # A subclass may or may not choose to override
>         new.update(other)
>         return new

I like that, but it would be inefficient to do that for __sub__ since
it would create elements that it might later delete.

def __sub__(self, other):
 new = self.copy()
 for k in other:
  del new[k]
return new

is less efficient than

def __sub__(self, other):
 return type(self)({k: v for k, v in self.items() if k not in other})

when copying v is expensive.  Also, users would probably not expect
values that don't end up being returned to be copied.

No, the values won't be copied -- it is a shallow copy that only increfs the keys and values.

Neil Girdhar

unread,
Mar 4, 2019, 3:33:48 PM3/4/19
to Guido van Rossum, python-ideas, python-ideas
On Mon, Mar 4, 2019 at 3:22 PM Guido van Rossum <gu...@python.org> wrote:
>
> On Mon, Mar 4, 2019 at 12:12 PM Neil Girdhar <miste...@gmail.com> wrote:
>>
>> On Mon, Mar 4, 2019 at 2:26 PM Guido van Rossum <gu...@python.org> wrote:
>> >
>> > * Dicts are not like sets because the ordering operators (<, <=, >, >=) are not defined on dicts, but they implement subset comparisons for sets. I think this is another argument pleading against | as the operator to combine two dicts.
>> >
>>
>> I feel like dict should be treated like sets with the |, &, and -
>> operators since in mathematics a mapping is sometimes represented as a
>> set of pairs with unique first elements. Therefore, I think the set
>> metaphor is stronger.
>
>
> That ship has long sailed.

Maybe, but reading through the various replies, it seems that if you
are adding "-" to be analogous to set difference, then the combination
operator should be analogous to set union "|". And it also opens an
opportunity to add set intersection "&". After all, how do you filter
a dictionary to a set of keys?

>> d = {'some': 5, 'extra': 10, 'things': 55}
>> d &= {'some', 'allowed', 'options'}
>> d
{'some': 5}

>>
>> > * Regarding how to construct the new set in __add__, I now think this should be done like this:
>> >
>> > class dict:
>> > <other methods>
>> > def __add__(self, other):
>> > <checks that other makes sense, else return NotImplemented>
>> > new = self.copy() # A subclass may or may not choose to override
>> > new.update(other)
>> > return new
>>
>> I like that, but it would be inefficient to do that for __sub__ since
>> it would create elements that it might later delete.
>>
>> def __sub__(self, other):
>> new = self.copy()
>> for k in other:
>> del new[k]
>> return new
>>
>> is less efficient than
>>
>> def __sub__(self, other):
>> return type(self)({k: v for k, v in self.items() if k not in other})
>>
>> when copying v is expensive. Also, users would probably not expect
>> values that don't end up being returned to be copied.
>
>
> No, the values won't be copied -- it is a shallow copy that only increfs the keys and values.

Oh right, good point. Then your way is better since it would preserve
any other data stored by the dict subclass.

Guido van Rossum

unread,
Mar 4, 2019, 3:41:21 PM3/4/19
to Neil Girdhar, python-ideas, python-ideas
Honestly I would rather withdraw the subtraction operators than reopen the discussion about making dict more like set.

Christopher Barker

unread,
Mar 4, 2019, 3:59:40 PM3/4/19
to gu...@python.org, Neil Girdhar, python-ideas, python-ideas
On Mon, Mar 4, 2019 at 12:41 PM Guido van Rossum <gu...@python.org> wrote:
Honestly I would rather withdraw the subtraction operators than reopen the discussion about making dict more like set.

+1

I think the "dicts are like more-featured" sets is a math-geek perspective, and unlikely to make things more clear for the bulk of users. And may make it less clear.

We need to be careful -- there are a lot more math geeks on this list than in the general Python coding population.

Simply adding "+" is a non-critical nice to have, but it seems unlikely to really confuse anyone.

-CHB


--
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython

Neil Girdhar

unread,
Mar 4, 2019, 4:28:36 PM3/4/19
to Christopher Barker, Guido van Rossum, python-ideas, python-ideas
On Mon, Mar 4, 2019 at 3:58 PM Christopher Barker <pyth...@gmail.com> wrote:
>
>
>
> On Mon, Mar 4, 2019 at 12:41 PM Guido van Rossum <gu...@python.org> wrote:
>>
>> Honestly I would rather withdraw the subtraction operators than reopen the discussion about making dict more like set.

I think that's unfortunate.
>
>
> +1
>
> I think the "dicts are like more-featured" sets is a math-geek perspective, and unlikely to make things more clear for the bulk of users. And may make it less clear.

I'd say reddit has some pretty "common users", and they're having a
discussion of this right now
(https://www.reddit.com/r/Python/comments/ax4zzb/pep_584_add_and_operators_to_the_builtin_dict/).
The most popular comment is how it should be |.

Anyway, I think that following the mathematical metaphors tends to
make things more intuitive in the long run. Python is an adventure.
You learn it for years and then it all makes sense. If dict uses +,
yes, new users might find that sooner than |. However, when they
learn set union, I think they will wonder why it's not consistent with
dict union.

The PEP's main justification for + is that it matches Counter, but
counter is adding the values whereas | doesn't touch the values. I
think it would be good to at least make a list of pros and cons of
each proposed syntax.

Paul Moore

unread,
Mar 4, 2019, 4:35:48 PM3/4/19
to Guido van Rossum, Neil Girdhar, python-ideas, python-ideas
On Mon, 4 Mar 2019 at 20:42, Guido van Rossum <gu...@python.org> wrote:
>
> Honestly I would rather withdraw the subtraction operators than reopen the discussion about making dict more like set.

I'm neutral on dict addition, but dict subtraction seemed an odd
extension to the proposal. Using b in a - b solely for its keys, and
ignoring its values, seems weird to me. Even if dict1 - dict2 were
added to the language, I think I'd steer clear of it as being too
obscure.

I'm not going to get sucked into this debate, but I'd be happy to see
the subtraction operator part of the proposal withdrawn.

Paul

Steven D'Aprano

unread,
Mar 4, 2019, 5:57:45 PM3/4/19
to python...@python.org
On Sat, Mar 02, 2019 at 11:14:18AM -0800, Raymond Hettinger wrote:

> If the existing code were in the form of "d=e.copy(); d.update(f);
> d.update(g); d.update(h)", converting it to "d = e + f + g + h" would
> be a tempting but algorithmically poor thing to do (because the
> behavior is quadratic).

I mention this in the PEP. Unlike strings, but like lists and tuples, I
don't expect that this will be a problem in practice:

- it's easy to put repeated string concatenation in a tight loop;
it is harder to think of circumstances where one needs to
concatenate lists or tuples, or merge dicts, in a tight loop;

- it's easy to have situations where one is concatenating thousands
of strings; its harder to imagine circumstances where one would be
merging more than three or four dicts;

- concatentation s1 + s2 + ... for strings, lists or tuples results
in a new object of length equal to the sum of the lengths of each
of the inputs, so the output is constantly growing; but merging
dicts d1 + d2 + ... typically results in a smaller object of
length equal to the number of unique keys.


> Most likely, the right thing to do would be
> "d = ChainMap(e, f, g, h)" for a zero-copy solution or "d =
> dict(ChainMap(e, f, g, h))" to flatten the result without incurring
> quadratic costs. Both of those are short and clear.

And both result in the opposite behaviour of what you probably intended
if you were trying to match e + f + g + h. Dict merging/updating
operates on "last seen wins", but ChainMap is "first seen wins". To get
the same behaviour, we have to write the dicts in opposite order
compared to update, from most to least specific:

# least specific to most specific
prefs = site_defaults + user_defaults + document_prefs

# most specific to least
prefs = dict(ChainMap(document_prefs, user_defaults, site_defaults))

To me, the later feels backwards: I'm applying document prefs first, and
then trusting that the ChainMap doesn't overwrite them with the
defaults. I know that's guaranteed behaviour, but every time I read it
I'll feel the need to check :-)


> Lastly, I'm still bugged by use of the + operator for replace-logic
> instead of additive-logic. With numbers and lists and Counters, the
> plus operator creates a new object where all the contents of each
> operand contribute to the result. With dicts, some of the contents
> for the left operand get thrown-away. This doesn't seem like addition
> to me (IIRC that is also why sets have "|" instead of "+").

I'm on the fence here. Addition seems to be the most popular operator
(it often gets requested) but you might be right that this is more like
a union operation than concatenation or addition operation. MRAB also
suggested this earlier.

One point in its favour is that + goes nicely with - but on the other
hand, sets have | and - with no + and that isn't a problem.


--
Steven

Steven D'Aprano

unread,
Mar 4, 2019, 6:12:58 PM3/4/19
to python...@python.org
On Mon, Mar 04, 2019 at 11:56:54AM -0600, Dan Sommers wrote:
> On 3/4/19 10:44 AM, Steven D'Aprano wrote:
>
> > If you know ahead of time which order you want, you can simply reverse
> > it:
> >
> > # prefs = site_defaults + user_defaults + document_prefs
> > prefs = dict(ChainMap(document_prefs, user_defaults, site_defaults))
> >
> > but that seems a little awkward to me, and reads backwards. I'm used to
> > thinking reading left-to-right, not right-to-left.
>
> I read that as use document preferences first, then user
> defaults, then site defautls, exactly as I'd explain the
> functionality to someone else.

If you explained it to me like that, with the term "use", I'd think that
the same feature would be done three times: once with document prefs,
then with user defaults, then site defaults.

Clearly that's not what you mean, so I'd then have to guess what you
meant by "use", since you don't actually mean use. That would leave me
trying to guess whether you meant that *site defaults* overrode document
prefs or the other way.

I don't like guessing, so I'd probably explicitly ask: "Wait, I'm
confused, which wins? It sounds like site defaults wins, surely that's
not what you meant."


> So maybe we're agreeing: if you think in terms of updating
> a dictionary of preferences, then maybe it reads backwards,
> but if you think of implementing features, then adding
> dictionaries of preferences reads backwards.

Do you think "last seen wins" is backwards for dict.update() or for
command line options?


--
Steven

Dan Sommers

unread,
Mar 4, 2019, 6:50:45 PM3/4/19
to python...@python.org
On 3/4/19 5:11 PM, Steven D'Aprano wrote:
> On Mon, Mar 04, 2019 at 11:56:54AM -0600, Dan Sommers wrote:
>> On 3/4/19 10:44 AM, Steven D'Aprano wrote:
>>
>> > If you know ahead of time which order you want, you can simply reverse
>> > it:
>> >
>> > # prefs = site_defaults + user_defaults + document_prefs
>> > prefs = dict(ChainMap(document_prefs, user_defaults, site_defaults))
>> >
>> > but that seems a little awkward to me, and reads backwards. I'm used to
>> > thinking reading left-to-right, not right-to-left.
>>
>> I read that as use document preferences first, then user
>> defaults, then site defautls, exactly as I'd explain the
>> functionality to someone else.
>
> If you explained it to me like that, with the term "use", I'd think that
> the same feature would be done three times: once with document prefs,
> then with user defaults, then site defaults.
>
> Clearly that's not what you mean, so I'd then have to guess what you
> meant by "use", since you don't actually mean use. That would leave me
> trying to guess whether you meant that *site defaults* overrode document
> prefs or the other way.
>
> I don't like guessing, so I'd probably explicitly ask: "Wait, I'm
> confused, which wins? It sounds like site defaults wins, surely that's
> not what you meant."

You're right: "use" is the wrong word. Perhaps "prefer"
is more appropriate. To answer the question of which wins:
the first one in the list [document, user, site] that
contains a given preference in question. Users don't see
dictionary updates; they see collections of preferences in
order of priority.

Documentation is hard. :-)

Sorry.

>> So maybe we're agreeing: if you think in terms of updating
>> a dictionary of preferences, then maybe it reads backwards,
>> but if you think of implementing features, then adding
>> dictionaries of preferences reads backwards.
>
> Do you think "last seen wins" is backwards for dict.update() or for
> command line options?

As a user, "last seen wins" is clearly superior for command
line options. As a programmer, because object methods
operate on their underlying object, it's pretty obvious that
d1.update(d2) starts with d1 and applies the changes
expressed in d2, which is effectively "last seen wins."

If I resist the temptation to guess in the face of
ambiguity, though, I don't think that d1 + d2 is any less
ambiguous than a hypothetical dict_update(d1, d2) function.
When I see a + operator, I certainly don't think of one
operand or the other winning.

Brett Cannon

unread,
Mar 4, 2019, 8:55:45 PM3/4/19
to Neil Girdhar, python-ideas, python-ideas
On Mon, Mar 4, 2019 at 1:29 PM Neil Girdhar <miste...@gmail.com> wrote:
On Mon, Mar 4, 2019 at 3:58 PM Christopher Barker <pyth...@gmail.com> wrote:
>
>
>
> On Mon, Mar 4, 2019 at 12:41 PM Guido van Rossum <gu...@python.org> wrote:
>>
>> Honestly I would rather withdraw the subtraction operators than reopen the discussion about making dict more like set.

I think that's unfortunate.
>
>
> +1
>
> I think the "dicts are like more-featured" sets is a math-geek perspective, and unlikely to make things more clear for the bulk of users. And may make it less clear.

I'd say reddit has some pretty "common users", and they're having a
discussion of this right now
(https://www.reddit.com/r/Python/comments/ax4zzb/pep_584_add_and_operators_to_the_builtin_dict/).
The most popular comment is how it should be |.

Anyway, I think that following the mathematical metaphors tends to
make things more intuitive in the long run.

Only if you know the mathematical metaphors. ;)
 
  Python is an adventure.
You learn it for years and then it all makes sense.  If dict uses +,
yes, new users might find that sooner than |.  However, when they
learn set union, I think they will wonder why it's not consistent with
dict union.

Not to me. I barely remember that | is supported for sets, but I sure know about + and lists (and strings, etc.) and I'm willing to bet the vast majority of folks are the some; addition is much more widely known than set theory.
 

The PEP's main justification for + is that it matches Counter, but
counter is adding the values whereas | doesn't touch the values.   I
think it would be good to at least make a list of pros and cons of
each proposed syntax.

I suspect Steven will add more details to a Rejected Ideas section.
 

> We need to be careful -- there are a lot more math geeks on this list than in the general Python coding population.
>
> Simply adding "+" is a non-critical nice to have, but it seems unlikely to really confuse anyone.

I agree with Chris.

-Brett
 
>
> -CHB
>
>
> --
> Christopher Barker, PhD
>
> Python Language Consulting
>   - Teaching
>   - Scientific Software Development
>   - Desktop GUI and Web Development
>   - wxPython, numpy, scipy, Cython

Raymond Hettinger

unread,
Mar 5, 2019, 12:53:11 AM3/5/19
to Guido van Rossum, python-ideas


> On Mar 4, 2019, at 11:24 AM, Guido van Rossum <gu...@python.org> wrote:
>
> * Regarding how often this is needed, we know that this is proposed and discussed at length every few years, so I think this will fill a real need.

I'm not sure that conclusion follows from the premise :-) Some ideas get proposed routinely because they are obvious things to propose, not because people actually need them. One hint is that the proposals always have generic variable names, "d = d1 + d2", and another is that they are almost never accompanied by actual use cases or real code that would be made better. I haven't seen anyone in this thread say they would use this more than once a year or that their existing code was unclear or inefficient in any way. The lack of dict addition support in other languages (like Java example) is another indicator that there isn't a real need -- afaict there is nothing about Python that would cause us to have a unique requirement that other languages don't have.

FWIW, there are some downsides to the proposal -- it diminishes some of the unifying ideas about Python that I typically present on the first day of class:

* One notion is that the APIs nudge users toward good code. The "copy.copy()" function has to be imported -- that minor nuisance is a subtle hint that copying isn't good for you. Likewise for dicts, writing "e=d.copy(); e.update(f)" is a minor nuisance that either serves to dissuade people from unnecessary copying or at least will make very clear what is happening. The original motivating use case for ChainMap() was to make a copy free replacement for excessively slow dict additions in ConfigParser. Giving a plus-operator to mappings is an invitation to writing code that doesn't scale well.

* Another unifying notion is that the star-operator represents repeat addition across multiple data types. It is a nice demo to show that "a * 5 == a + a + a + a + a" where "a" is an int, float, complex, str, bytes, tuple, or list. Giving __add__() to dicts breaks this pattern.

* When teaching dunder methods, the usual advice regarding operators is to use them only when their meaning is unequivocal; otherwise, have a preference for named methods where the method name clarifies what is being done -- don't use train+car to mean train.shunt_to_middle(car). For dicts that would mean not having the plus-operator implement something that isn't inherently additive (it applies replace/overwrite logic instead), that isn't commutative, and that isn't linear when applied in succession (d1+d2+d3).

* In the advanced class where C extensions are covered, the organization of the slots is shown as a guide to which methods make sense together: tp_as_number, tp_as_sequence, and tp_as_mapping. For dicts to gain the requisite methods, they will have to become numbers (in the sense of filling out the tp_as_number slots). That will slow down the abstract methods that search the slot groups, skipping over groups marked as NULL. It also exposes method groups that don't typically appear together, blurring their distinction.

* Lastly, there is a vague piece of zen-style advice, "if many things in the language have to change to implement idea X, it stops being worth it". In this case, it means that every dict-like API and the related abstract methods and typing equivalents would need to grow support for addition in mappings (would it even make sense to add to shelve objects or os.environ objects together?)

That's my two cents worth. I'm ducking out now (nothing more to offer on the subject). Guido's participation in the thread has given it an air of inevitability so this post will likely not make a difference.


Raymond

Amber Yust

unread,
Mar 5, 2019, 1:19:54 AM3/5/19
to Raymond Hettinger, python-ideas
Adding the + operator for dictionaries feels like it would be a mistake in that it offers at most sugar-y benefits, but introduces the significant drawback of making it easier to introduced unintended errors. This would be the first instance of "addition" where the result can potentially lose/overwrite data (lists and strings both preserve the full extent of each operand; Counters include the full value from each operand, etc).

Combining dictionaries is fundamentally an operation that requires more than one piece of information, because there's no single well-defined way to combine a pair of them. Off the top of my head, I can think of at least 2 different common options (replacement aka .update(), combination of values a la Counter). Neither of these is really a more valid "addition" of dictionaries.

For specific dict-like subclasses, addition may make sense - Counter is a great example of this, because the additional context adds definition to the most logical method via which two instances would be combined. If anything, this seems like an argument to avoid implementing __ladd__ on dict itself, to leave the possibility space open for interpretation by more specific classes.

Anders Hovmöller

unread,
Mar 5, 2019, 2:06:31 AM3/5/19
to Amber Yust, python-ideas

> Adding the + operator for dictionaries feels like it would be a mistake in that it offers at most sugar-y benefits, but introduces the significant drawback of making it easier to introduced unintended errors.

I disagree. This argument only really applies to the case "a = a + b", not "a = b + c". Making it easier and more natural to produce code that doesn't mutate in place is something that should reduce errors, not make them more common.

The big mistake here was * for strings which is unusual, would be just as well served by a method, and will ensure that type errors blow up much later than it could have been. This type of mistake for dicts when you expected numbers is a much stronger argument against this proposal in my opinion. Let's not create another pitfall! The current syntax is a bit unwieldy but is really fine.

/ Anders

Serhiy Storchaka

unread,
Mar 5, 2019, 2:24:30 AM3/5/19
to python...@python.org
04.03.19 21:24, Guido van Rossum пише:

> * Dicts are not like sets because the ordering operators (<, <=, >, >=)
> are not defined on dicts, but they implement subset comparisons for
> sets. I think this is another argument pleading against | as the
> operator to combine two dicts.

Well, I suppose that the next proposition will be to implement the
ordering operators for dicts. Because why not? Lists and numbers support
them. /sarcasm/

Jokes aside, dicts have more common with sets than with sequences. Both
can not contain duplicated keys/elements. Both have the constant
computational complexity of the containment test. For both the size of
the merging/unioning can be less than the sum of sizes of original
containers. Both have the same restrictions for keys/elements (hashability).

> * Regarding how to construct the new set in __add__, I now think this
> should be done like this:
>
> class dict:
>     <other methods>
>     def __add__(self, other):
>         <checks that other makes sense, else return NotImplemented>
>         new = self.copy()  # A subclass may or may not choose to override
>         new.update(other)
>         return new
>
> AFAICT this will give the expected result for defaultdict -- it keeps
> the default factory from the left operand (i.e., self).

No one builtin type that implements __add__ uses the copy() method. Dict
would be the only exception from the general rule.

And it would be much less efficient than {**d1, **d2}.

> * Regarding how often this is needed, we know that this is proposed and
> discussed at length every few years, so I think this will fill a real need.

And every time this proposition was rejected. What has been changed
since it was rejected the last time? We now have the expression form of
dict merging ({**d1, **d2}), this should be decrease the need of the
plus operator for dicts.

Steven D'Aprano

unread,
Mar 5, 2019, 4:34:48 AM3/5/19
to python...@python.org
On Mon, Mar 04, 2019 at 09:34:34PM +0000, Paul Moore wrote:
> On Mon, 4 Mar 2019 at 20:42, Guido van Rossum <gu...@python.org> wrote:
> >
> > Honestly I would rather withdraw the subtraction operators than
> > reopen the discussion about making dict more like set.

As some people have repeatedly pointed out, we already have four ways
to spell dict merging:

- in-place dict.update;
- copy, followed by update;
- use a ChainMap;
- the obscure new (**d1, ...} syntax.

But if there's a good way to get dict difference apart from a manual
loop or comprehension, I don't know it.

So from my perspective, even though most of the attention has been on
the merge operator, I'd rather keep the difference operator.

As far as making dicts "more like set", I'm certainly not proposing
that. The furthest I'd go is bow to the consensus if it happened to
decide that | is a better choice than + (but that seems unlikely).


> I'm neutral on dict addition, but dict subtraction seemed an odd
> extension to the proposal. Using b in a - b solely for its keys, and
> ignoring its values, seems weird to me.

The PEP current says that dict subtraction requires the right-hand
operand to be a dict. That's the conservative choice that follows the
example of list addition (it requires a list, not just any iterable) and
avoids breaking changes to code that uses operator-overloading:

mydict - some_object

works if some_object overloads __rsub__. If dict.__sub__ was greedy in
what it accepted, it could break such code. Better (in my opinion) to be
less greedy by only allowing dicts.

dict -= on the other hand can take any iterable of keys, as the
right-hand operand isn't called.

Oh, another thing the PEP should gain... a use-case for dict
subtraction. Here's a few:

(1) You have a pair of dicts D and E, and you want to update D with only
the new keys from E:

D.update(E - D)

which I think is nicer than writing a manual loop:

D.update({k:E[k] for k in (E.keys() - D.keys())})
# or
D.update({k:v for k,v in E.items() if k not in D})


(This is a form of update with "first seen wins" instead of the usual
"last seen wins".)


(2) You have a dict D, and you want to unconditionally remove keys from
a blacklist, e.g.:

all_users = {'username': user, ...}
allowed_users = all_users - banned_users


(3) You have a dict, and you want to ensure there's no keys that
you didn't expect:

if (d := actual-expected):
print('unexpected key:value pairs', d)


> Even if dict1 - dict2 were
> added to the language, I think I'd steer clear of it as being too
> obscure.

Everything is obscure until people learn it and get used to it.


--
Steven

Jimmy Girardet

unread,
Mar 5, 2019, 4:43:51 AM3/5/19
to python...@python.org
Indeed the "obscure" argument should be thrown away.

The `|` operator in sets seems to be evident for every one on this list
but I would be curious to know how many people first got a TypeError
doing set1 + set2 and then found set1 | set2 in the doc.

Except for math geek the `|` is always something obscure.


>> Even if dict1 - dict2 were
>> added to the language, I think I'd steer clear of it as being too
>> obscure.
> Everything is obscure until people learn it and get used to it.
>
>


Inada Naoki

unread,
Mar 5, 2019, 4:58:20 AM3/5/19
to Jimmy Girardet, python-ideas
On Tue, Mar 5, 2019 at 6:42 PM Jimmy Girardet <ij...@netc.fr> wrote:
>
> Indeed the "obscure" argument should be thrown away.
>
> The `|` operator in sets seems to be evident for every one on this list
> but I would be curious to know how many people first got a TypeError
> doing set1 + set2 and then found set1 | set2 in the doc.
>
> Except for math geek the `|` is always something obscure.
>

Interesting point.

In Japan, we learn set in high school, not in university. And I think
it's good idea that people using `set` type learn about `set` in math.
So I don't think "union" is not only for math geeks.

But we use "A ∪ B" in math. `|` is borrowed from "bitwise OR" in C.
And "bitwise" operators are for "geeks".

Although I'm not in favor of adding `+` to set, it will be worth enough to
add `+` to set too if it is added to dict for consistency.

FWIW, Scala uses `++` for join all containers.
Kotlin uses `+` for join all containers.
(ref https://discuss.python.org/t/pep-584-survey-of-other-languages-operator-overload/977)

Regards,

--
Inada Naoki <songof...@gmail.com>

Steven D'Aprano

unread,
Mar 5, 2019, 5:00:19 AM3/5/19
to python...@python.org
On Mon, Mar 04, 2019 at 03:33:36PM -0500, Neil Girdhar wrote:

> Maybe, but reading through the various replies, it seems that if you
> are adding "-" to be analogous to set difference, then the combination
> operator should be analogous to set union "|".

That's the purpose of this discussion, to decide whether dict merging is
more like addition/concatenation or union :-)

> And it also opens an
> opportunity to add set intersection "&".

What should intersection do in the case of matching keys?

I see the merge + operator as a kind of update, whether it makes a copy
or does it in place, so to me it is obvious that "last seen wins" should
apply just as it does for the update method.

But dict *intersection* is a more abstract operation than merge/update.
And that leads to the problem, what do you do with the values?

{key: "spam"} & {key: "eggs"}

# could result in any of:

{key: "spam"}
{key: "eggs"}
{key: ("spam", "eggs")}
{key: "spameggs"}
an exception
something else?

Unlike "update", I don't have any good use-cases to prefer any one of
those over the others.


> After all, how do you filter a dictionary to a set of keys?
>
> >> d = {'some': 5, 'extra': 10, 'things': 55}
> >> d &= {'some', 'allowed', 'options'}
> >> d
> {'some': 5}

new = d - (d - allowed)

{k:v for (k,v) in d if k in allowed}


> >> > * Regarding how to construct the new set in __add__, I now think this should be done like this:
> >> >
> >> > class dict:
> >> > <other methods>
> >> > def __add__(self, other):
> >> > <checks that other makes sense, else return NotImplemented>
> >> > new = self.copy() # A subclass may or may not choose to override
> >> > new.update(other)
> >> > return new
> >>
> >> I like that, but it would be inefficient to do that for __sub__ since
> >> it would create elements that it might later delete.
> >>
> >> def __sub__(self, other):
> >> new = self.copy()
> >> for k in other:
> >> del new[k]
> >> return new
> >>
> >> is less efficient than
> >>
> >> def __sub__(self, other):
> >> return type(self)({k: v for k, v in self.items() if k not in other})

I don't think you should be claiming what is more or less efficient
unless you've actually profiled them for speed and memory use. Often,
but not always, the two are in opposition: we make things faster by
using more memory, and save memory at the cost of speed.

Your version of __sub__ creates a temporary dict, which then has to be
copied in order to preserve the type. Its not obvious to me that that's
faster or more memory efficient than building a dict then deleting keys.

(Remember that dicts aren't lists, and deleting keys is an O(1)
operation.)


--
Steven

Steven D'Aprano

unread,
Mar 5, 2019, 5:22:41 AM3/5/19
to python...@python.org
On Mon, Mar 04, 2019 at 10:18:13PM -0800, Amber Yust wrote:

> Adding the + operator for dictionaries feels like it would be a mistake in
> that it offers at most sugar-y benefits, but introduces the significant
> drawback of making it easier to introduced unintended errors.

What sort of errors?

I know that some (mis-)features are "bug magnets" that encourage people
to write buggy code, but I don't see how this proposal is worse than
dict.update().

In one way it is better, since D + E returns a new dict, instead of
over-writing the data in D. Ask any functional programmer, and they'll
tell you that we should avoid side-effects.


> This would be
> the first instance of "addition" where the result can potentially
> lose/overwrite data (lists and strings both preserve the full extent of
> each operand; Counters include the full value from each operand, etc).

I don't see why this is relevant to addition. It doesn't even apply to
numeric addition! If I give you the result of an addition:

101

say, you can't tell what the operands were. And that's not even getting
into the intricicies of floating point addition, which can violate
associativity

``(a + b) + c`` is not necessarily equal to ``a + (b + c)``

and distributivity:

``x*(a + b)`` is not necessarily equal to ``x*a + x*b``


even for well-behaved, numeric floats (not NANs or INFs).


> Combining dictionaries is fundamentally an operation that requires more
> than one piece of information, because there's no single well-defined way
> to combine a pair of them.

Indeed, But some ways are more useful than others.


> Off the top of my head, I can think of at least
> 2 different common options (replacement aka .update(), combination of
> values a la Counter). Neither of these is really a more valid "addition" of
> dictionaries.

That's why we have subclasses and operator overloading :-)

By far the most commonly requested behaviour for this is copy-and-
update (or merge, if you prefer). But subclasses are free to define it
as they will, including:

- add values, as Counter already does;
- raise an exception if there is a duplicate key;
- "first seen wins"

or anything else.



--
Steven

Serhiy Storchaka

unread,
Mar 5, 2019, 5:47:58 AM3/5/19
to python...@python.org
04.03.19 15:29, Serhiy Storchaka пише:
> Using "|" looks more natural to me than using "+". We
> should look at discussions for using the "|" operator for sets, if the
> alternative of using "+" was considered, I think the same arguments for
> preferring "|" for sets are applicable now for dicts.

See the Python-Dev thread with the subject "Re: Re: PEP 218 (sets);
moving set.py to Lib" starting from
https://mail.python.org/pipermail/python-dev/2002-August/028104.html

Pål Grønås Drange

unread,
Mar 5, 2019, 6:12:03 AM3/5/19
to python...@python.org

I just wanted to mention this since it hasn't been brought up, but neither of these work

a.keys() + b.keys()
a.values() + b.values()
a.items() + b.items()

However, the following do work:

a.keys() | b.keys()
a.items() | b.items()

Perhaps they work by coincidence (being set types), but I think it's worth bringing up, since a naive/natural Python implementation of dict addition/union would possibly involve the |-operator.

Pål

Rhodri James

unread,
Mar 5, 2019, 6:50:24 AM3/5/19
to python...@python.org
On 05/03/2019 09:42, Jimmy Girardet wrote:
> Indeed the "obscure" argument should be thrown away.
>
> The `|` operator in sets seems to be evident for every one on this list
> but I would be curious to know how many people first got a TypeError
> doing set1 + set2 and then found set1 | set2 in the doc.

Every. Single. Time.

I don't use sets a lot (purely by happenstance rather than choice), and
every time I do I have to go and look in the documentation because I
expect the union operator to be '+'.

> Except for math geek the `|` is always something obscure.

Two thirds of my degree is in maths, and '|' is still something I don't
associate with sets. It would be unreasonable to expect '∩' and '∪' as
the operators, but reasoning from '-' for set difference I always expect
'+' and '*' as the union and intersection operators. Alas my hopes are
always cruelly crushed :-)

--
Rhodri James *-* Kynesim Ltd

Greg Ewing

unread,
Mar 5, 2019, 5:15:09 PM3/5/19
to python...@python.org
Rhodri James wrote:
> I have to go and look in the documentation because I
> expect the union operator to be '+'.

Anyone raised on Pascal is likely to find + and * more
natural. Pascal doesn't have bitwise operators, so it
re-uses + and * for set operations. I like the economy
of this arrangement -- it's not as if there's any
other obvious meaning that + and * could have for sets.

--
Greg

Raymond Hettinger

unread,
Mar 5, 2019, 9:06:59 PM3/5/19
to Greg Ewing, python...@python.org

> On Mar 5, 2019, at 2:13 PM, Greg Ewing <greg....@canterbury.ac.nz> wrote:
>
> Rhodri James wrote:
>> I have to go and look in the documentation because I expect the union operator to be '+'.
>
> Anyone raised on Pascal is likely to find + and * more
> natural. Pascal doesn't have bitwise operators, so it
> re-uses + and * for set operations. I like the economy
> of this arrangement -- it's not as if there's any
> other obvious meaning that + and * could have for sets.

The language SETL (the language of sets) also uses + and * for set operations.¹

For us though, the decision to use | and & are set in stone. The time for debating the decision was 19 years ago.²


Raymond


¹ https://www.linuxjournal.com/article/6805
² https://www.python.org/dev/peps/pep-0218/

Guido van Rossum

unread,
Mar 5, 2019, 9:27:01 PM3/5/19
to Raymond Hettinger, Python-Ideas
On Tue, Mar 5, 2019 at 6:07 PM Raymond Hettinger <raymond....@gmail.com> wrote:

> On Mar 5, 2019, at 2:13 PM, Greg Ewing <greg....@canterbury.ac.nz> wrote:
>
> Rhodri James wrote:
>> I have to go and look in the documentation because I expect the union operator to be '+'.
>
> Anyone raised on Pascal is likely to find + and * more
> natural. Pascal doesn't have bitwise operators, so it
> re-uses + and * for set operations. I like the economy
> of this arrangement -- it's not as if there's any
> other obvious meaning that + and * could have for sets.

The language SETL (the language of sets) also uses + and * for set operations.¹
 
So the secret is out: Python inherits a lot from SETL, through ABC -- ABC was heavily influenced by SETL.
 
¹ https://www.linuxjournal.com/article/6805
² https://www.python.org/dev/peps/pep-0218/

Davide Rizzo

unread,
Mar 8, 2019, 8:07:52 AM3/8/19
to Michael Selik, Python-Ideas
> Counter also uses +/__add__ for a similar behavior.
>
> >>> c = Counter(a=3, b=1)
> >>> d = Counter(a=1, b=2)
> >>> c + d # add two counters together: c[x] + d[x]
> Counter({'a': 4, 'b': 3})
>
> At first I worried that changing base dict would cause confusion for the subclass, but Counter seems to share the idea that update and + are synonyms.

Counter is a moot analogy. Counter's + and - operators follow the
rules of numbers addition and subtraction:

>>> c = Counter({"a": 1})
>>> c + Counter({"a": 5})
Counter({'a': 6})
>>> c + Counter({"a": 5}) - Counter({"a": 4})
Counter({'a': 2})

Which also means that in most cases (c1 + c2) - c2 == c1 which is not
something you would expect with the suggested "dictionary addition"
operation. As a side note, this is not true in general for Counters
because of how subtraction handles 0. E.g.

>>> c0 = Counter({"a": 0})
>>> c1 = Counter({"a": 1})
>>> (c0 + c1) - c1
Counter()
>>> (c0 + c1) - c1 == c0
False

---

The current intuition of how + and - work don't apply literally to
this suggestion:

1) numeric types are their own story
2) most built-in sequences imply concatenation for + and have no subtraction
3) numpy-like arrays behave closer to numbers
4) Counters mimic numbers in some ways and while addition reminds of
concatenation (but order is not relevant) they also have subtraction
5) sets have difference which is probably the closest you expect from
dict subtraction, but no + operator

---

I understand the arguments against a | operator for dicts but I don't
entirely agree with them. dict is obviously a different type of object
than all the others I've mentioned, even mathematically, and there is
no clear precedent. If sets happened to maintain insertion order, like
dicts after 3.6/3.7, I would expect the union operator to also
preserve the order. Before 3.6 we probably would have seen dicts as
closer to sets from that point of view, and this suggested addition as
closer to set union.

The question of symmetry ({"a": 1} + {"a": 2}) is an important one and
I would consider not enforcing one resolution in PEP 584, and instead
leave this undefined (i.e. in the resulting dict, the value could be
either 1 or 2, or just completely undefined to also be compatible with
Counter-like semantics in the same PEP). This is something to consider
carefully if the plan is to make the new operators part of Mapping.
It's not obvious that all mappings should implement this the same way,
and a survey of what is being done by other implementation of Mappings
would be useful. On the other hand leaving it undefined might make it
harder to standardize it later, once other implementations have
defined their own behavior.

This question is probably on its own a valid argument against the
proposal. When it comes to dicts (and not Mappings in general) {**d1,
**d2} or d.update() already have clearly-defined semantics. The new
proposal for a merge() operation might be more useful. The added value
would be the ability to add two mappings regardless of concrete type.
But it's with Mappings in general that this proposal is the most
problematic.

On the other hand the subtraction operator is probably less
controversial and immediately useful (the idiom to remove keys from a
dictionary is not obvious).
It is loading more messages.
0 new messages