[Python-ideas] PEP: Dict addition and subtraction

213 views
Skip to first unread message

Steven D'Aprano

unread,
Mar 1, 2019, 11:27:54 AM3/1/19
to python...@python.org
Attached is a draft PEP on adding + and - operators to dict for
discussion.

This should probably go here:

https://github.com/python/peps

but due to technical difficulties at my end, I'm very limited in what I
can do on Github (at least for now). If there's anyone who would like to
co-author and/or help with the process, that will be appreciated.


--
Steven
dict_addition_pep.txt

INADA Naoki

unread,
Mar 1, 2019, 11:49:07 AM3/1/19
to Steven D'Aprano, python-ideas
> If the keys are not strings, it currently works in CPython, but it may not work with other implementations, or future versions of CPython[2].

I don't think so. https://bugs.python.org/issue35105 and
https://mail.python.org/pipermail/python-dev/2018-October/155435.html
are about kwargs. I think non string keys are allowed for {**d1,
**d2} by language.

--
INADA Naoki <songof...@gmail.com>
_______________________________________________
Python-ideas mailing list
Python...@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Brandt Bucher

unread,
Mar 1, 2019, 11:50:04 AM3/1/19
to Steven D'Aprano, python...@python.org
I’ve never been part of this process before, but I’m interested in learning and helping any way I can.

My addition implementation is attached to the bpo, and I’m working today on bringing it in line with the PEP in its current form (specifically, subtraction operations).

<dict_addition_pep.txt>

Eric V. Smith

unread,
Mar 1, 2019, 11:53:24 AM3/1/19
to Steven D'Aprano, python...@python.org
Hi, Steven.

I can help you with it. I added it as PEP 584. I had to add the PEP
headers, but didn't do any other editing.

I'm going to be out of town for the next 2 weeks, so I might be slow in
responding.

Eric

Neil Girdhar

unread,
Mar 1, 2019, 12:05:55 PM3/1/19
to python-ideas
Looks like a good start.

I think you should replace all of the lines:
 if isinstance(other, dict):
with
 if isinstance(self, type(other)):


Since if other is an instance of a dict subclass, he should be the one to process the addition. On the other hand, if self is an instance of the derived type, then we are free to do the combination.


I think you should also change this wording:


"the result type will be the type of the left operand"


since the result type will be negotiated between the operands (even in your implemenation).


__sub__ can be implemented more simply as a dict comprehension.


Don't forget to return self in __isub__ and __iadd__ or they won't work.


I think __isub__ would be simpler like this:

def __isub__(self, it):
 if it is self:
  self.clear()
 else:
  for value in it:
 del self[value]
 return self

I don't see why you would bother looking for keys (iter will do that anyway).

Guido van Rossum

unread,
Mar 1, 2019, 2:09:42 PM3/1/19
to Eric V. Smith, Python-Ideas
Thanks -- FYI I renamed the file to .rst (per convention for PEPs in ReST format) and folded long text lines.
--
--Guido van Rossum (python.org/~guido)

Brett Cannon

unread,
Mar 1, 2019, 2:43:02 PM3/1/19
to Brandt Bucher, python-ideas
On Fri, Mar 1, 2019 at 8:50 AM Brandt Bucher <brandt...@gmail.com> wrote:
I’ve never been part of this process before, but I’m interested in learning and helping any way I can.

Thanks!
 

My addition implementation is attached to the bpo, and I’m working today on bringing it in line with the PEP in its current form (specifically, subtraction operations).


When your proposed patch is complete, Brandt, just ask Steven to update the PEP to mention that there's a proposed implementation attached to the issue tracking the idea.

-Brett

Brandt Bucher

unread,
Mar 1, 2019, 7:12:03 PM3/1/19
to Steven D'Aprano, python-ideas
While working through my implementation, I've come across a couple of inconsistencies with the current proposal:

> The merge operator will have the same relationship to the dict.update method as the list concatenation operator has to list.extend, with dict difference being defined analogously.

I like this premise. += for lists *behaves* like extend, and += for dicts *behaves* like update.

However, later in the PEP it says:

> Augmented assignment will just call the update method. This is analogous to the way list += calls the extend method, which accepts any iterable, not just lists.

In your Python implementation samples from the PEP, dict subclasses will behave differently from how list subclasses do. List subclasses, without overrides, return *list* objects for bare "+" operations (and "+=" won't call an overridden "extend" method). So a more analogous pseudo-implementation (if that's what we seek) would look like:

def __add__(self, other):
if isinstance(other, dict):
new = dict.copy(self)
dict.update(new, other)
return new
return NotImplemented

 def __radd__(self, other): if isinstance(other, dict): new = dict.copy(other) dict.update(other, self) return new return NotImplemented

def __iadd__(self, other): if isinstance(other, dict): dict.update(self, other) return self return NotImplemented

This is what my C looks like right now. We can choose to update these semantics to be "nicer" to subclasses, but I don't see any precedent for it (lists, sets, strings, etc.).

Brandt

Neil Girdhar

unread,
Mar 1, 2019, 8:59:39 PM3/1/19
to python-ideas, Steven D'Aprano, python-ideas
I think that sequence should be fixed. 

--

---
You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/jq5QVTt3CAI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to python-ideas...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

_______________________________________________
Python-ideas mailing list
Python...@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

--

---
You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/jq5QVTt3CAI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to python-ideas...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Steven D'Aprano

unread,
Mar 1, 2019, 10:58:42 PM3/1/19
to python...@python.org
Executive summary:

- I'm going to argue for subclass-preserving behaviour;

- I'm not wedded to the idea that dict += should actually call the
update method, so long as it has the same behaviour;

- __iadd__ has no need to return NotImplemented or type-check its
argument.

Details below.


On Fri, Mar 01, 2019 at 04:10:44PM -0800, Brandt Bucher wrote:

[...]
> In your Python implementation samples from the PEP, dict subclasses will
> behave differently from how list subclasses do. List subclasses, without
> overrides, return *list* objects for bare "+" operations

Right -- and I think they are wrong to do so, for reasons I explained
here:

https://mail.python.org/pipermail/python-ideas/2019-March/055547.html

I think the standard handling of subclasses in Python builtins is wrong,
and I don't wish to emulate that wrong behaviour without a really good
reason. Or at least a better reason than "other methods break
subclassing unless explicitly overloaded, so this should do so too".

Or at least not without a fight :-)



> (and "+=" won't call an overridden "extend" method).

I'm slightly less opinionated about that. Looking more closely into the
docs, I see that they don't actually say that += calls list.extend:

s.extend(t) extends s with the contents of t (for
or s += t the most part the same as s[len(s):len(s)] = t)

https://docs.python.org/3/library/stdtypes.html#mutable-sequence-types

only that they have the same effect. So the wording re lists calling
extend certainly needs to be changed. But that doesn't mean that we must
change the implementation. We have a choice:

- regardless of what lists do, we define += for dicts as literally
calling dict.update; the more I think about it, the less I like this.

- Or we say that += behaves similarly to update, without actually
calling the method. I think I prefer this.

(The second implies either that += either contains a duplicate of the
update logic, or that += and update both delegate to a private, C-level
function that does most of the work.)

I think that the second approach (define += as having the equivalent
semantics of update but without actually calling the update method) is
probably better. That decouples the two methods, allows subclasses to
change one without necessarily changing the other.


> So a more analogous
> pseudo-implementation (if that's what we seek) would look like:
>
> def __add__(self, other):
> if isinstance(other, dict):
> new = dict.copy(self)
> dict.update(new, other)
> return new
> return NotImplemented

We should not require the copy method.

The PEP should be more explicit that the approximate implementation does
not imply the copy() and update() methods are actually called.


> def __iadd__(self, other):
> if isinstance(other, dict):
> dict.update(self, other)
> return self
> return NotImplemented

I don't agree with that implementation.

According to PEP 203, which introduced augmented assignment, the
sequence of calls in ``d += e`` is:

1. Try to call ``d.__iadd__(e)``.

2. If __iadd__ is not present, try ``d.__add__(e)``.

3. If __add__ is missing too, try ``e.__radd__(d)``.

but my tests suggest this is inaccurate. I think the correct behaviour
is this:

1. Try to call ``d.__iadd__(e)``.

2. If __iadd__ is not present, or if it returns NotImplemented,
try ``d.__add__(e)``.

3. If __add__ is missing too, or if it returns NotImplemented,
fail with TypeError.

In other words, e.__radd__ is not used.

We don't want dict.__iadd__ to try calling __add__, since the later is
more restrictive and less efficient than the in-place merge. So there is
no need for __iadd__ to return NotImplemented. It should either succeed
on its own, or fail hard:

def __iadd__(self, other):
self.update(other)
return self

Except that the actual C implementation won't call the update method
itself, but will follow the same semantics.

See the docstring for dict.update for details of what is accepted by
update.


--
Steven

Brandt Bucher

unread,
Mar 2, 2019, 6:39:34 PM3/2/19
to Steven D'Aprano, Python-Ideas
Hi Steven.

Thanks for the clarifications. I've pushed a complete working patch (with tests) to GitHub. It's linked to the bpo issue.


Right now, it's pretty much a straight reimplementation of your Python examples. I plan to update it periodically to keep it in sync with any changes, and to make a few optimizations (for example, when operands are identical or empty).

Let me know if you have any questions/suggestions. Stoked to learn and help out with this process! :)

Brandt

James Lu

unread,
Mar 3, 2019, 9:29:35 PM3/3/19
to python...@python.org
I propose that the + sign merge two python dictionaries such that if there are conflicting keys, a KeyError is thrown.

This way, d1 + d2 isn’t just another obvious way to do {**d1, **d2}. The second syntax makes it clear that a new dictionary is being constructed and that d2 overrides keys from d1.

One can reasonably expect or imagine a situation where a section of code that expects to merge two dictionaries with non-conflicting keys commits a semantic error if it merges two dictionaries with conflicting keys.

To better explain, imagine a program where options is a global variable storing parsed values from the command line.

def verbose_options():
if options.quiet
return {'verbose': True}

def quiet_options():
if options.quiet:
return {'verbose': False}

If we were to define an options() function, return {**quiet_options(), **verbose_options()} implies that verbose overrules quiet; whereas return quiet_options() + verbose_options() implies that verbose and quiet cannot be used simultaneously. I am not aware of another easy way in Python to merge dictionaries while checking for non-conflicting keys.

Compare:

def settings():
return {**quiet_options(), **verbose_options()}

def settings():
try:
return quiet_options() + verbose_options()
except KeyError:
print('conflicting options used', sys.stderr')
sys.exit(1)

***

This is a simple scenario, but you can imagine more complex ones as well. Does —quiet-stage-1 loosen —verbose? Does —quiet-stage-1 conflict with —verbose-stage-1?Does —verbosity=5 override —verbosity=4 or cause an error?

Having {**, **} and + do different things provides a convenient and Pythonic way to model such relationships in code. Indeed, you can even combine the two syntaxes in the same expression to show a mix of overriding and exclusionary behavior.

Anyways, I think it’s a good idea to have this semantic difference in behavior so Python developers have a good way to communicate what is expected of the two dictionaries being merged inside the language. This is like an assertion without

Again, I propose that the + sign merge two python dictionaries such that if there are conflicting keys, a KeyError is thrown, because such “non-conflicting merge” behavior would be useful in Python. It gives clarifying power to the + sign. The + and the {**, **} should serve different roles.

In other words, explicit + is better than implicit {**, **#, unless explicitly suppressed. Here + is explicit whereas {**, **} is implicitly allowing inclusive keys, and the KeyError is expressed suppressed by virtue of not using the {**, **} syntax. People expect the + operator to be commutative, while the {**, **} syntax prompts further examination by virtue of its “weird” syntax.

INADA Naoki

unread,
Mar 3, 2019, 9:54:58 PM3/3/19
to Steven D'Aprano, python-ideas
I think "Current Alternatives" section must refer to long existing idiom,
in addition to {**d1, **d2}:

d3 = d1.copy()
d3.update(d2)

It is obvious nor easily discoverable, while it takes two lines.
"There are no obvious way" and "there is at least one obvious way" is
very different.
> _______________________________________________
> Python-ideas mailing list
> Python...@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/



--
INADA Naoki <songof...@gmail.com>

Stefan Behnel

unread,
Mar 4, 2019, 3:42:30 AM3/4/19
to python...@python.org
James Lu schrieb am 04.03.19 um 03:28:
> I propose that the + sign merge two python dictionaries such that if there are conflicting keys, a KeyError is thrown.

Please, no. That would be really annoying.

If you need that feature, it can become a new method on dicts.

Stefan

Jimmy Girardet

unread,
Mar 4, 2019, 4:23:15 AM3/4/19
to python...@python.org

Hi,

I'm not old on this list but every time there is a proposal, the answer is "what are you trying to solve ?".

Since

z = {**x, **y} and z.update(y) Exists, I can"t find the answer.

Stefan Behnel

unread,
Mar 4, 2019, 4:53:06 AM3/4/19
to python...@python.org
Jimmy Girardet schrieb am 04.03.19 um 10:12:
> I'm not old on this list but every time there is a proposal, the answer
> is "what are you trying to solve ?".
>
> Since
>
> |z ={**x,**y} and z.update(y) Exists, I can"t find the answer.

I think the main intentions is to close a gap in the language.

[1,2,3] + [4,5,6]

works for lists and tuples,

{1,2,3} | {4,5,6}

works for sets, but joining two dicts isn't simply

{1:2, 3:4} + {5:6}

but requires either some obscure syntax or a statement instead of a simple
expression.

The proposal is to enable the obvious syntax for something that should be
obvious.

Stefan

INADA Naoki

unread,
Mar 4, 2019, 5:16:45 AM3/4/19
to Stefan Behnel, python-ideas
On Mon, Mar 4, 2019 at 6:52 PM Stefan Behnel <stef...@behnel.de> wrote:
>
> I think the main intentions is to close a gap in the language.
>
> [1,2,3] + [4,5,6]
>
> works for lists and tuples,
>
> {1,2,3} | {4,5,6}
>
> works for sets, but joining two dicts isn't simply
>
> {1:2, 3:4} + {5:6}
>

Operators are syntax borrowed from math.

* Operators are used for concatenate and repeat (Kleene star) in
regular language.
https://en.wikipedia.org/wiki/Regular_language
seq + seq and seq * N are very similar to it, although Python used +
instead of
middle dot (not in ASCII) for concatenate.

* set is directly relating to set in math. | is well known operator for union.

* In case of merging dict, I don't know obvious background in math or
computer science.

So I feel it's very natural that dict don't have operator for merging.
Isn't "for consistency with other types" a wrong consistency?

> but requires either some obscure syntax or a statement instead of a simple
> expression.
>
> The proposal is to enable the obvious syntax for something that should be
> obvious.

dict.update is obvious already. Why statement is not enough?

Regards,

Jimmy Girardet

unread,
Mar 4, 2019, 5:28:16 AM3/4/19
to python...@python.org

> but requires either some obscure syntax or a statement instead of a simple
> expression.
>
> The proposal is to enable the obvious syntax for something that should be
> obvious.
>
> Stefan

The discussions on this list show that the behavior of `+` operator with
dict will never be obvious (first wins or second wins or add results or
raise Exception). So the user will always have to look at the doc or
test it to know the intended behavior.

That said, [1,2] + [3] equals  [1,2,3]  but not[1,2, [3]] and that was
not obvious to me, and I survived.

James Lu

unread,
Mar 4, 2019, 10:02:29 AM3/4/19
to python...@python.org


> On Mar 4, 2019, at 3:41 AM, Stefan Behnel <stef...@behnel.de> wrote:
>
> James Lu schrieb am 04.03.19 um 03:28:
>> I propose that the + sign merge two python dictionaries such that if there are conflicting keys, a KeyError is thrown.
>
> Please, no. That would be really annoying.
>
> If you need that feature, it can become a new method on dicts.
>
> Stefan
If you want to merge it without a KeyError, learn and use the more explicit {**d1, **d2} syntax.

Stefan Behnel

unread,
Mar 4, 2019, 10:03:47 AM3/4/19
to python...@python.org
INADA Naoki schrieb am 04.03.19 um 11:15:
> Why statement is not enough?

I'm not sure I understand why you're asking this, but a statement is "not
enough" because it's a statement and not an expression. It does not replace
the convenience of an expression.

Stefan

James Lu

unread,
Mar 4, 2019, 10:10:37 AM3/4/19
to python...@python.org
On Mar 4, 2019, at 4:51 AM, Stefan Behnel <stef...@behnel.de> wrote:

Jimmy Girardet schrieb am 04.03.19 um 10:12:
I'm not old on this list but every time there is a proposal, the answer
is "what are you trying to solve ?".

Since

|z ={**x,**y} and z.update(y) Exists, I can"t find the answer.

I think the main intentions is to close a gap in the language.

   [1,2,3] + [4,5,6]

works for lists and tuples,

   {1,2,3} | {4,5,6}

works for sets, but joining two dicts isn't simply

   {1:2, 3:4} + {5:6}

but requires either some obscure syntax or a statement instead of a simple
expression.

The proposal is to enable the obvious syntax for something that should be
obvious.

Rebutting my “throw KeyError on conflicting keys for +” proposal:
Indeed but + is never destructive in those contexts: duplicate list items are okay because they’re ordered, duplicated set items are okay because they mean the same thing (when two sets contain the same item and you merge the two the “containing” means the same thing), but duplicate dict keys mean different things. 

How many situations would you need to make a copy of a dictionary and then update that copy and override old keys from a new dictionary?

It’s better to have two different syntaxes for different situations. 

The KeyError of my proposal is a feature, a sign that something is wrong, a sign an invariant is being violated. Yes, {**, **} syntax looks abnormal and ugly. That’s part of the point– how many times have you needed to create a copy of a dictionary and update that dictionary with overriding keys from a new dictionary? It’s much more common to have non-conflicting keys. The ugliness of the syntax makes one pause and think and ask: “Why is it important that the keys from this dictionary override the ones from another dictionary?”

PROPOSAL EDIT: I think KeyError should only be thrown if the same keys from two dictionaries have values that are not __eq__.

James Lu

unread,
Mar 4, 2019, 10:13:09 AM3/4/19
to python...@python.org

> On Mar 4, 2019, at 10:02 AM, Stefan Behnel <stef...@behnel.de> wrote:
>
> INADA Naoki schrieb am 04.03.19 um 11:15:
>> Why statement is not enough?
>
> I'm not sure I understand why you're asking this, but a statement is "not
> enough" because it's a statement and not an expression. It does not replace
> the convenience of an expression.
>
> Stefan
There is already an expression for key-overriding merge. Why do we need a new one?

Steven D'Aprano

unread,
Mar 4, 2019, 10:27:10 AM3/4/19
to python...@python.org
On Mon, Mar 04, 2019 at 10:01:23AM -0500, James Lu wrote:

> If you want to merge it without a KeyError, learn and use the more explicit {**d1, **d2} syntax.

In your previous email, you said the {**d ...} syntax was implicit:

In other words, explicit + is better than implicit {**, **#, unless
explicitly suppressed. Here + is explicit whereas {**, **} is
implicitly allowing inclusive keys, and the KeyError is expressed
suppressed by virtue of not using the {**, **} syntax.

It is difficult to take your "explicit/implicit" argument seriously when
you cannot even decided which is which.



--
Steven

Steven D'Aprano

unread,
Mar 4, 2019, 11:26:57 AM3/4/19
to python...@python.org
On Mon, Mar 04, 2019 at 10:09:32AM -0500, James Lu wrote:

> How many situations would you need to make a copy of a dictionary and
> then update that copy and override old keys from a new dictionary?

Very frequently.

That's why we have a dict.update method, which if I remember correctly,
was introduced in Python 1.5 because people were frequently re-inventing
the same wheel:

def update(d1, d2):
for key in d2.keys():
d1[key] in d2[key]


You should have a look at how many times it is used in the standard
library:

[steve@ando cpython]$ cd Lib/
[steve@ando Lib]$ grep -U "\.update[(]" *.py */*.py | wc -l
373

Now some of those are false positives (docstrings, comments, non-dicts,
etc) but that still leaves a lot of examples of wanting to override old
keys. This is a very common need. Wanting an exception if the key
already exists is, as far as I can tell, very rare.

It is true that many of the examples in the std lib involve updating an
existing dict, not creating a new one. But that's only to be expected:
since Python didn't provide an obvious functional version of update,
only an in-place version, naturally people get used to writing
in-place code.

(Think about how long we made do without sorted(). I don't know about
other people, but I now find sorted indispensible, and probably use it
ten or twenty times more often than the in-place version.)


[...]


> The KeyError of my proposal is a feature, a sign that something is
> wrong, a sign an invariant is being violated.

Why is "keys are unique" an invariant?

The PEP gives a good example of when this "invariant" would be
unnecessarily restrictive:

For example, updating default configuration values with
user-supplied values would most often fail under the
requirement that keys are unique::

prefs = site_defaults + user_defaults + document_prefs


Another example would be when reading command line options, where the
most common convention is for "last option seen" to win:

[steve@ando Lib]$ grep --color=always --color=never "zero" f*.py
fileinput.py: numbers are zero; nextfile() has no effect.
fractions.py: # the same way for any finite a, so treat a as zero.
functools.py: # prevent their ref counts from going to zero during

and the output is printed without colour.

(I've slightly edited the above output so it will fit in the email
without wrapping.)

The very name "update" should tell us that the most useful behaviour is
the one the devs decided on back in 1.5: have the last seen value win.
How can you update values if the operation raises an error if the key
already exists? If this behaviour is ever useful, I would expect that it
will be very rare.

An update or merge is effectively just running through a loop setting
the value of a key. See the pre-Python 1.5 function above. Having update
raise an exception if the key already exists would be about as useful as
having ``d[key] = value`` raise an exception if the key already exists.

Unless someone can demonstrate that the design of dict.update() was a
mistake, and the "require unique keys" behaviour is more common, then
I maintain that for the very rare cases you want an exception, you can
subclass dict and overload the __add__ method:

# Intentionally simplified version.
def __add__(self, other):
if self.keys() & other.keys():
raise KeyError
return super().__add__(self, other)


> The ugliness of the syntax makes one pause
> and think and ask: “Why is it important that the keys from this
> dictionary override the ones from another dictionary?”

Because that is the most common and useful behaviour. That's what it
means to *update* a dict or database, and this proposal is for an update
operator.

The ugliness of the existing syntax is not a feature, it is a barrier.


--
Steven

Rhodri James

unread,
Mar 4, 2019, 11:46:30 AM3/4/19
to python...@python.org
On 04/03/2019 15:12, James Lu wrote:
>
>> On Mar 4, 2019, at 10:02 AM, Stefan Behnel <stef...@behnel.de> wrote:
>>
>> INADA Naoki schrieb am 04.03.19 um 11:15:
>>> Why statement is not enough?
>>
>> I'm not sure I understand why you're asking this, but a statement is "not
>> enough" because it's a statement and not an expression. It does not replace
>> the convenience of an expression.
>>
>> Stefan
> There is already an expression for key-overriding merge. Why do we need a new one?

Because the existing one is inobvious, hard to discover and ugly.

--
Rhodri James *-* Kynesim Ltd

Del Gan

unread,
Mar 4, 2019, 6:31:08 PM3/4/19
to Rhodri James, python...@python.org
Hi.

> the augmented assignment version allows anything the ``update`` method allows, such as iterables of key/value pairs

I am a little surprised by this choice.

First, this means that "a += b" would not be equivalent to "a = a +
b". Is there other built-in types which act differently if called with
the operator or augmented assignment version?

Secondly, that would imply I would no longer be able to infer the type
of "a" while reading "a += [('foo', 'bar')]". Is it a list? A dict?

Those two points make me uncomfortable with "+=" strictly behaving
like ".update()".

2019-03-04 17:44 UTC+01:00, Rhodri James <rho...@kynesim.co.uk>:

Guido van Rossum

unread,
Mar 4, 2019, 6:58:59 PM3/4/19
to Del Gan, Python-Ideas
On Mon, Mar 4, 2019 at 3:31 PM Del Gan <delg...@gmail.com> wrote:
> the augmented assignment version allows anything the ``update`` method allows, such as iterables of key/value pairs

I am a little surprised by this choice.

First, this means that "a += b" would not be equivalent to "a = a +
b". Is there other built-in types which act differently if called with
the operator or augmented assignment version?

Yes. The same happens for lists. [1] + 'a' is a TypeError, but a += 'a' works:

>>> a = [1]
>>> a + 'a'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can only concatenate list (not "str") to list
>>> a += 'a'
>>> a
[1, 'a']
>>>
 
Secondly, that would imply I would no longer be able to infer the type
of "a" while reading "a += [('foo', 'bar')]". Is it a list? A dict?

Real code more likely looks like "a += b" and there you already don't have much of a clue -- the author of the code should probably communicate this using naming conventions or type annotations.
 
Those two points make me uncomfortable with "+=" strictly behaving
like ".update()".

And yet that's how it works for lists. (Note that dict.update() still has capabilities beyond +=, since you can also invoke it with keyword args.)

James Lu

unread,
Mar 4, 2019, 7:46:25 PM3/4/19
to python...@python.org
On Mon, Mar 04, 2019 at 10:01:23AM -0500, James Lu wrote:

If you want to merge it without a KeyError, learn and use the more explicit {**d1, **d2} syntax.
On Mar 4, 2019, at 10:25 AM, Steven D'Aprano <st...@pearwood.info> wrote:

In your previous email, you said the {**d ...} syntax was implicit:

   In other words, explicit + is better than implicit {**, **#, unless
   explicitly suppressed.  Here + is explicit whereas {**, **} is
   implicitly allowing inclusive keys, and the KeyError is expressed
   suppressed by virtue of not using the {**, **} syntax.

It is difficult to take your "explicit/implicit" argument seriously when
you cannot even decided which is which.
I misspoke. 

In your previous email, you said the {**d ...} syntax was implicit:

   In other words, explicit + is better than implicit {**, **#, unless 
   explicitly suppressed.  Here + is explicit whereas {**, **} is 
   implicitly allowing inclusive keys, and the KeyError is expressed 
   suppressed by virtue of not using the {**, **} syntax.

It is difficult to take your "explicit/implicit" argument seriously when 
you cannot even decided which is which.
Yes, + is explicit. {**, **} is implicit. 

My argument:

We should set the standard that + is for non-conflicting merge and {**, **} is for overriding merge. That standard should be so that + explicitly asserts that the keys will not conflict whereas {**d1, **d2} is ambiguous on why d2 is overriding d1.^

^Presumably you’re making a copy of d1 so why should d3 have d2 take priority? The syntax deserves a comment, perhaps explaining that items from d2 are newer in time or that the items in d1 are always nonces. 

The + acts as an implicit assertion and an opportunity to catch an invariant violation or data input error.

Give me an example of a situation where you need a third dictionary from two existing dictionaries and having conflict where a key has a different value in both is desirable behavior. 

The situation where non-conflicting merge is what’s desired is more common and in that case throwing an exception in the case of a conflicting value is a good thing, a way to catch code smell.

James Lu

unread,
Mar 4, 2019, 7:54:23 PM3/4/19
to python...@python.org
On Mar 4, 2019, at 11:25 AM, Steven D'Aprano <st...@pearwood.info> wrote:

How many situations would you need to make a copy of a dictionary and 
then update that copy and override old keys from a new dictionary?


Very frequently.

That's why we have a dict.update method, which if I remember correctly,
was introduced in Python 1.5 because people were frequently re-inventing
the same wheel:

   def update(d1, d2):
       for key in d2.keys():
           d1[key] in d2[key]


You should have a look at how many times it is used in the standard
library:

[steve@ando cpython]$ cd Lib/
[steve@ando Lib]$ grep -U "\.update[(]" *.py */*.py | wc -l
373

Now some of those are false positives (docstrings, comments, non-dicts,
etc) but that still leaves a lot of examples of wanting to override old
keys. This is a very common need. Wanting an exception if the key
already exists is, as far as I can tell, very rare.
It is very rare when you want to modify an existing dictionary. It’s not rare at all when you’re creating a new one.

It is true that many of the examples in the std lib involve updating an
existing dict, not creating a new one. But that's only to be expected:
since Python didn't provide an obvious functional version of update,
only an in-place version, naturally people get used to writing
in-place code.
My question was “How many situations would you need to make a copy of a dictionary and then update that copy and override old keys from a new dictionary?” Try to really think about my question, instead of giving answering with half of it to dismiss my point.

James Lu

unread,
Mar 4, 2019, 8:02:41 PM3/4/19
to python...@python.org

> On Mar 4, 2019, at 11:25 AM, Steven D'Aprano <st...@pearwood.info> wrote:
>
> The PEP gives a good example of when this "invariant" would be
> unnecessarily restrictive:
>
> For example, updating default configuration values with
> user-supplied values would most often fail under the
> requirement that keys are unique::
>
> prefs = site_defaults + user_defaults + document_prefs
>
>
> Another example would be when reading command line options, where the
> most common convention is for "last option seen" to win:
>
> [steve@ando Lib]$ grep --color=always --color=never "zero" f*.py
> fileinput.py: numbers are zero; nextfile() has no effect.
> fractions.py: # the same way for any finite a, so treat a as zero.
> functools.py: # prevent their ref counts from going to zero during
>
Indeed, in this case you would want to use {**, **} syntax.


> and the output is printed without colour.
>
> (I've slightly edited the above output so it will fit in the email
> without wrapping.)
>
> The very name "update" should tell us that the most useful behaviour is
> the one the devs decided on back in 1.5: have the last seen value win.
> How can you update values if the operation raises an error if the key
> already exists? If this behaviour is ever useful, I would expect that it
> will be very rare.

> An update or merge is effectively just running through a loop setting
> the value of a key. See the pre-Python 1.5 function above. Having update
> raise an exception if the key already exists would be about as useful as
> having ``d[key] = value`` raise an exception if the key already exists.
>
> Unless someone can demonstrate that the design of dict.update() was a
> mistake

You’re making a logical mistake here. + isn’t supposed to have .update’s behavior and it never was supposed to.

> , and the "require unique keys" behaviour is more common,

I just have. 99% of the time you want to have keys from one dict override another, you’d be better off doing it in-place and so would be using .update() anyways.

> then
> I maintain that for the very rare cases you want an exception, you can
> subclass dict and overload the __add__ method:

Well, yes, the whole point is to define the best default behavior.

James Lu

unread,
Mar 4, 2019, 8:04:21 PM3/4/19
to python...@python.org
By the way, my “no same keys with different values” proposal would not apply to +=.

David Foster

unread,
Mar 5, 2019, 12:38:59 AM3/5/19
to python...@python.org
I have seen a ton of discussion about what dict addition should do, but
have seen almost no mention of dict difference.

This lack of discussion interest combined with me not recalling having
needed the proposed subtraction semantics personally makes me wonder if
we should hold off on locking in subtraction semantics just yet. Perhaps
we could just scope the proposal to dictionary addition only for now?

If I *were* to define dict difference, my intuition suggests supporting
a second operand that is any iterable of keys and not just dicts.
(Augmented dict subtraction is already proposed to accept such a broader
second argument.)

David Foster | Seattle, WA, USA

On 3/1/19 8:26 AM, Steven D'Aprano wrote:
> Attached is a draft PEP on adding + and - operators to dict for
> discussion.
>
> This should probably go here:
>
> https://github.com/python/peps
>
> but due to technical difficulties at my end, I'm very limited in what I
> can do on Github (at least for now). If there's anyone who would like to
> co-author and/or help with the process, that will be appreciated.
>
>
>
>

Brandt Bucher

unread,
Mar 5, 2019, 1:49:13 AM3/5/19
to David Foster, python...@python.org
I agree with David here. Subtraction wasn’t even part of the original discussion — it seems that it was only added as an afterthought because Guido felt they were natural to propose together and formed a nice symmetry.

It’s odd that RHS values are not used at all, period. Further, there’s no precedent for bulk sequence/mapping removals like this... except for sets, for which it is certainly justified.

I’ve had the opportunity to play around with my reference implementation over the last few days, and despite my initial doubts, I have *absolutely* fallen in love with dictionary addition — I even accidentally tried to += two dictionaries at work on Friday (a good, but frustrating, sign). For context, I was updating a module-level mapping with an imported one, a use case I hadn’t even previously considered.

I have tried to fall in love with dict subtraction the same way, but every code sketch/test I come up with feels contrived and hack-y. I’m indifferent towards it, at best.

TL;DR: I’ve lived with both for a week. Addition is now habit, subtraction is still weird.

> Nice branch name! :)

I couldn’t help myself.

Brandt

INADA Naoki

unread,
Mar 5, 2019, 2:04:41 AM3/5/19
to Stefan Behnel, python-ideas
On Tue, Mar 5, 2019 at 12:02 AM Stefan Behnel <stef...@behnel.de> wrote:
>
> INADA Naoki schrieb am 04.03.19 um 11:15:
> > Why statement is not enough?
>
> I'm not sure I understand why you're asking this, but a statement is "not
> enough" because it's a statement and not an expression. It does not replace
> the convenience of an expression.
>
> Stefan
>

It seems tautology and say nothing.
What is "convenience of an expression"?
Is it needed to make Python more readable language?

Anyway, If "there is expression" is the main reason for this proposal, symbolic
operator is not necessary.
`new = d1.updated(d2)` or `new = dict.merge(d1, d2)` are enough.

Python preferred name over symbol in general.
Symbols are readable and understandable only when it has good
math metaphor.

Sets has symbol operator because it is well known in set in math,
not because set is frequently used.

In case of dict, there is no simple metaphor in math.
It just cryptic and hard to Google.

--
INADA Naoki <songof...@gmail.com>

Jonathan Fine

unread,
Mar 5, 2019, 4:48:11 AM3/5/19
to python-ideas, Steven D'Aprano
This is mainly for Steve, as the author of PEP 584.

I'm grateful to Steve for preparing the current draft. Thank you.

It's strong on implementation, but I find it weak on motivation. I
hope that when time is available you (and the other contributors)
could transfer some motivating material into the PEP, from
python-ideas.

According to PEP 001, the PEP "should clearly explain why the existing
language specification is inadequate to address the problem that the
PEP solves". So it is important.

--
Jonathan

Steven D'Aprano

unread,
Mar 5, 2019, 5:27:38 AM3/5/19
to python...@python.org
On Sat, Mar 02, 2019 at 01:47:37AM +0900, INADA Naoki wrote:
> > If the keys are not strings, it currently works in CPython, but it may not work with other implementations, or future versions of CPython[2].
>
> I don't think so. https://bugs.python.org/issue35105 and
> https://mail.python.org/pipermail/python-dev/2018-October/155435.html
> are about kwargs. I think non string keys are allowed for {**d1,
> **d2} by language.

Is this documented somewhere?

Or is there a pronouncement somewhere that it is definitely expected to
work in any language calling itself Python?


Thanks,



--
Steven

Inada Naoki

unread,
Mar 5, 2019, 5:38:01 AM3/5/19
to Steven D'Aprano, python-ideas
On Tue, Mar 5, 2019 at 7:26 PM Steven D'Aprano <st...@pearwood.info> wrote:
>
> On Sat, Mar 02, 2019 at 01:47:37AM +0900, INADA Naoki wrote:
> > > If the keys are not strings, it currently works in CPython, but it may not work with other implementations, or future versions of CPython[2].
> >
> > I don't think so. https://bugs.python.org/issue35105 and
> > https://mail.python.org/pipermail/python-dev/2018-October/155435.html
> > are about kwargs. I think non string keys are allowed for {**d1,
> > **d2} by language.
>
> Is this documented somewhere?
>

It is not explicitly documented. But unlike keyword argument,
dict display supported non-string keys from very old.

I believe {3: 4} is supported by Python language, not CPython
implementation behavior.

https://docs.python.org/3/reference/expressions.html#grammar-token-dict-display

> Or is there a pronouncement somewhere that it is definitely expected to
> work in any language calling itself Python?
>
>
> Thanks,
>
>
>
> --
> Steven
> _______________________________________________
> Python-ideas mailing list
> Python...@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/



--
Inada Naoki <songof...@gmail.com>

David Shawley

unread,
Mar 5, 2019, 8:12:37 AM3/5/19
to Stefan Behnel, python...@python.org
On Mar 4, 2019, at 4:51 AM, Stefan Behnel <stef...@behnel.de> wrote:

I think the main intentions is to close a gap in the language.

   [1,2,3] + [4,5,6]

works for lists and tuples,

   {1,2,3} | {4,5,6}

works for sets, but joining two dicts isn't simply

   {1:2, 3:4} + {5:6}

but requires either some obscure syntax or a statement instead of a simple
expression.

The proposal is to enable the obvious syntax for something that should be
obvious.

I would challenge that this dictionary merging is something that is
obvious.  The existing sequences are simple collections of values
where a dictionary is a mapping of values.  The difference between
the two is akin to the difference between a mathematical array or
set and a unary mapping function.  There is a clear and obvious way to
combine arrays and sets -- concatenation for arrays and union for sets.
Combining mapping functions is less than obvious.

"Putting Metaclasses to Work" (ISBN-13 978-0201433050) presents a more
mathematical view of programming language types that includes two
distinct operations for combining dictionaries -- merge and recursive
merge.

For two input dictionaries D1 & D2 and the output dictionary O

D1 merge D2
    O is D1 with the of those keys of D2 that do not have keys in D1

D1 recursive-merge D2
    For all keys k, O[k] = D1[k] recursive merge D2[k] if both D1[k]
    and D2[k] are dictionaries, otherwise O[k] = (D1 merge D2)[k].

Note that neither of the cases is the same as:

>>> O = D1.copy()
>>> O.update(D2)

So that gives us three different ways to combine dictionaries that are
each sensible.  The following example uses dictionaries from "Putting
Metaclasses to Work":

>>> d1 = {
...     'title': 'Structured Programming',
...     'authors': 'Dahl, Dijkstra, and Hoare',
...     'locations': {
...         'Dahl': 'University of Oslo',
...         'Dijkstra': 'University of Texas',
...         'Hoare': 'Oxford University',
...     },
... }
>>>
>>> d2 = {
...     'publisher': 'Academic Press',
...     'locations': {
...         'North America': 'New York',
...         'Europe': 'London',
...     },
... }
>>>
>>> o = d1.copy()
>>> o.update(d2)
>>> o
{'publisher': 'Academic Press',
 'title': 'Structured Programming',
 'locations': {'North America': 'New York', 'Europe': 'London'},
 'authors': 'Dahl, Dijkstra, and Hoare'}
>>> 
>>> merge(d1, d2)
{'publisher': 'Academic Press',
 'title': 'Structured Programming',
 'locations': {'Dijkstra': 'University of Texas',
               'Hoare': 'Oxford University',
               'Dahl': 'University of Oslo'},
 'authors': 'Dahl, Dijkstra, and Hoare'}
>>>
>>> recursive_merge(d1, d2)
{'publisher': 'Academic Press',
 'title': 'Structured Programming',
 'locations': {'North America': 'New York',
               'Europe': 'London',
               'Dijkstra': 'University of Texas',
               'Hoare': 'Oxford University',
               'Dahl': 'University of Oslo'},
 'authors': 'Dahl, Dijkstra, and Hoare'}
>>>

IMO, having more than one obvious outcome means that we should refuse
the temptation to guess.  If we do, then the result is only obvious
to a subset of users and will be a surprise to the others.

It's also useful to note that I am having trouble coming up with
another programming language that supports a "+" operator for map types.

Does anyone have an example of another programming language that
allows for addition of dictionaries/mappings?

If so, what is the behavior there?

- dave
--
Any linter or project that treats PEP 8 as mandatory has *already*
failed, as PEP 8 itself states that the rules can be broken as needed. - Paul Moore.

Jimmy Girardet

unread,
Mar 5, 2019, 9:31:25 AM3/5/19
to python...@python.org



Does anyone have an example of another programming language that
allows for addition of dictionaries/mappings?


kotlin does that (`to` means `:`)   :

fun main() {
    var a = mutableMapOf<String,Int>("a" to 1, "b" to 2)
    var b = mutableMapOf<String,Int>("c" to 1, "b" to 3)
    println(a)
    println(b)
    println(a + b)
    println(b + a)
}
   
{a=1, b=2}
{c=1, b=3}
{a=1, b=3, c=1}
{c=1, b=2, a=1}

Steven D'Aprano

unread,
Mar 5, 2019, 11:12:11 AM3/5/19
to python...@python.org
On Tue, Mar 05, 2019 at 08:11:29AM -0500, David Shawley wrote:

> "Putting Metaclasses to Work" (ISBN-13 978-0201433050) presents a more
> mathematical view of programming language types that includes two
> distinct operations for combining dictionaries -- merge and recursive
> merge.
>
> For two input dictionaries D1 & D2 and the output dictionary O
>
> D1 merge D2
> O is D1 with the of those keys of D2 that do not have keys in D1
>
> D1 recursive-merge D2
> For all keys k, O[k] = D1[k] recursive merge D2[k] if both D1[k]
> and D2[k] are dictionaries, otherwise O[k] = (D1 merge D2)[k].

I'm afraid I cannot understand either of those algorithms as written. I
suspect that you've left at least one word out of the first.

Fortunately your example below is extremely clear, thank you.


[...]
Yes, that's the classic "update with last seen wins". That's what the
PEP proposes as that seems to be the most frequently requested
behaviour. It is also the only behaviour which has been deemed useful
enough in nearly 30 years of Python's history to be added to dict as a
method.



> >>> merge(d1, d2)
> {'publisher': 'Academic Press',
> 'title': 'Structured Programming',
> 'locations': {'Dijkstra': 'University of Texas',
> 'Hoare': 'Oxford University',
> 'Dahl': 'University of Oslo'},
> 'authors': 'Dahl, Dijkstra, and Hoare'}

That seems to be "update with first seen wins", which is easily done
using ChainMap or the proposed dict difference operator:

dict( ChainMap(d1, d2) )
# or
d1 + (d2 - d1)

or simply by swapping the order of the operands:

d2 + d1

(These are not *identical* in effect, there are small differences with
respect to key:value identity, and order of keys. But they ought to give
*equal* results.)

Personally, I don't think that behaviour is as useful as the first, but
it is certainly a legitimate kind of merge.

As far as I know, this has never been requested before. Perhaps it is
too niche?



> >>> recursive_merge(d1, d2)
> {'publisher': 'Academic Press',
> 'title': 'Structured Programming',
> 'locations': {'North America': 'New York',
> 'Europe': 'London',
> 'Dijkstra': 'University of Texas',
> 'Hoare': 'Oxford University',
> 'Dahl': 'University of Oslo'},
> 'authors': 'Dahl, Dijkstra, and Hoare'}

That's an interesting one. I'd write it something like this:


def merge(a, b):
new = a.copy()
for key, value in b:
if key not in a:
# Add new keys.
new[key] = value
else:
v = new[key]
if isinstance(value, dict) and isinstance(v, dict):
# If both values are dicts, merge them.
new[key] = merge(v, value)
else:
# What to do if only one is a dict?
# Or if neither is a dict?
return new

I've seen variants of this where duplicate keys are handled by building
a list of the values:

def merge(a, b):
new = a.copy()
for key, value in b:
if key in a:
v = new[key]
if isinstance(v, list):
v.append(value)
else:
new[key] = [v, value]
...

or by concatenating values, or adding them (as Counter does), etc. We
have subclasses and operator overloading, so you can implement whatever
behaviour you like.

The question is, is this behaviour useful enough and common enough to be
built into dict itself?


> IMO, having more than one obvious outcome means that we should refuse
> the temptation to guess.

We're not *guessing*. We're *chosing* which behaviour we want.

Nobody says:

When I print some strings, I can seperate them with spaces, or
dots, or newlines, and print a newline at the end, or suppress
the newline. Since all of these behaviours might be useful for
somebody, we should not "guess" what the user wants. Therefore
we should not have a print() function at all.

The behaviour of print() is not a guess as to what the user wants. We
offer a specific behaviour, and if the user is happy with that, then
they can use print(), and if not, they can write their own.

The same applies here: we're offering one specific behaviour that we
think is the most important, and anyone who wants another can write
their own.

If people don't like my choice of what I think is the most important
(copy-and-update, with last seen wins), they can argue for whichever
alternative they like. If they make a convincing enough case, the PEP
can change :-)

James Lu has already tried to argue that the "raise on non-unique keys"
is the best behaviour. I have disagreed with that, but if James makes a
strong enough case for his idea, and it gains sufficient support, I
could be persuaded to change my position.

Or he can write a competing PEP and the Steering Council can decide
between the two ideas.


> If we do, then the result is only obvious
> to a subset of users and will be a surprise to the others.

Its only a surprise to those users who don't read the docs and make
assumptions about behaviour based on their own wild guesses.

We should get away from the idea that the only behaviours we can provide
are those which are "obvious" (intuitive?) to people who guess what it
means without reading the docs. It's great when a function's meaning can
be guessed or inferred from a basic understanding of English:

len(string) # assuming len is an abbreviation for length

but that sets the bar impossibly high. We can't guess what these do, not
with any precision:

print(spam, eggs) # prints spaces between arguments or not?

spam is eggs # that's another way of spelling == right?

zip(spam, eggs) # what does it do if args aren't the same length?

and who can guess what these do without reading the docs?

property, classmethod, slice, enumerate, iter

I don't think that Python is a worse language for having specified a
meaning for these rather than leaving them out.

The Zen's prohibition against guessing in the face of ambiguity does not
mean that we must not add a feature to the language that requires the
user to learn what it does first.


> It's also useful to note that I am having trouble coming up with
> another programming language that supports a "+" operator for map types.
>
> Does anyone have an example of another programming language that
> allows for addition of dictionaries/mappings?
>
> If so, what is the behavior there?

An excellent example, but my browser just crashed and it's after 3am
here so I'm going to take this opportunity to go to bed :-)



--
Steven

Guido van Rossum

unread,
Mar 5, 2019, 11:46:51 AM3/5/19
to Inada Naoki, python-ideas
On Tue, Mar 5, 2019 at 2:38 AM Inada Naoki <songof...@gmail.com> wrote:
On Tue, Mar 5, 2019 at 7:26 PM Steven D'Aprano <st...@pearwood.info> wrote:
>
> On Sat, Mar 02, 2019 at 01:47:37AM +0900, INADA Naoki wrote:
> > > If the keys are not strings, it currently works in CPython, but it may not work with other implementations, or future versions of CPython[2].
> >
> > I don't think so.  https://bugs.python.org/issue35105 and
> > https://mail.python.org/pipermail/python-dev/2018-October/155435.html
> > are about kwargs.  I think non string keys are allowed for {**d1,
> > **d2} by language.
>
> Is this documented somewhere?

It is not explicitly documented.  But unlike keyword argument,
dict display supported non-string keys from very old.

I believe {3: 4} is supported by Python language, not CPython
implementation behavior.

https://docs.python.org/3/reference/expressions.html#grammar-token-dict-display

I'd like to remove all doubt: {**d1} needs to work regardless of the key type, as long as it's hashable  (d1 could be some mapping implemented without hashing, e.g. using a balanced tree, so that it could support unhashable keys).

If there's doubt about this anywhere, we could add an example to the docs and to the PEP.
 

Anders Hovmöller

unread,
Mar 5, 2019, 12:04:52 PM3/5/19
to gu...@python.org, python-ideas

> I'd like to remove all doubt: {**d1} needs to work regardless of the key type, as long as it's hashable (d1 could be some mapping implemented without hashing, e.g. using a balanced tree, so that it could support unhashable keys).
>
> If there's doubt about this anywhere, we could add an example to the docs and to the PEP.


On a related note: **kwargs, should they support arbitrary strings as keys? I depend on this behavior in production code and all python implementations handle it.

/ Anders

Guido van Rossum

unread,
Mar 5, 2019, 12:14:38 PM3/5/19
to Anders Hovmöller, python-ideas
On Tue, Mar 5, 2019 at 9:02 AM Anders Hovmöller <bo...@killingar.net> wrote:
On a related note: **kwargs, should they support arbitrary strings as keys? I depend on this behavior in production code and all python implementations handle it.

The ice is much thinner there, but my position is that as long as they are *strings* such keys should be allowed.

Inada Naoki

unread,
Mar 5, 2019, 12:22:02 PM3/5/19
to Anders Hovmöller, python-ideas
There was a thread for the topic.


2019年3月6日(水) 2:02 Anders Hovmöller <bo...@killingar.net>:

Del Gan

unread,
Mar 5, 2019, 1:31:49 PM3/5/19
to Inada Naoki, python-ideas
2019-03-05 0:34 UTC+01:00, Brandt Bucher <brandt...@gmail.com>:
>> Is there other built-in types which act differently if called with
>> the operator or augmented assignment version?
>
> list.__iadd__ and list.extend


2019-03-05 0:57 UTC+01:00, Guido van Rossum <gu...@python.org>:
> Yes. The same happens for lists. [1] + 'a' is a TypeError, but a += 'a'
> works:


Oh, I can't believe I'm learning that just today while I'm using
Python since years.

Thanks for the clarification. This makes perfect sense for += to
behaves like .update() then.

Greg Ewing

unread,
Mar 5, 2019, 5:41:45 PM3/5/19
to python...@python.org
Steven D'Aprano wrote:
> The question is, is [recursive merge] behaviour useful enough and
> common enough to be built into dict itself?

I think not. It seems like just one possible way of merging
values out of many. I think it would be better to provide
a merge function or method that lets you specify a function
for merging values.

--
Greg

Brandt Bucher

unread,
Mar 5, 2019, 5:48:28 PM3/5/19
to Steven D'Aprano, Python-Ideas
 These semantics are intended to match those of update as closely as possible. For the dict built-in itself, calling keys is redundant as iteration over a dict iterates over its keys; but for subclasses or other mappings, update prefers to use the keys method.
 
The above paragraph may be inaccurate. Although the dict docstring states that keys will be called if it exists, this does not seem to be the case for dict subclasses. Bug or feature?

>>> print(dict.update.__doc__)
D.update([E, ]**F) -> None.  Update D from dict/iterable E and F.
If E is present and has a .keys() method, then does:  for k in E: D[k] = E[k]
If E is present and lacks a .keys() method, then does:  for k, v in E: D[k] = v
In either case, this is followed by: for k in F:  D[k] = F[k]

It's actually pretty interesting... and misleading/wrongish. It never says that keys is *called*... in reality, it just checks for the "keys" method before deciding whether to proceed with PyDict_Merge or PyDict_MergeFromSeq2. It should really read more like:

D.update([E, ]**F) -> None.  Update D from dict/iterable E and F.
If E is present, has a .keys() method, and is a subclass of dict, then does:  for k in E: D[k] = E[k]
If E is present, has a .keys() method, and is not a subclass of dict, then does:  for k in E.keys(): D[k] = E[k]
If E is present and lacks a .keys() method, then does:  for k, v in E: D[k] = v
In either case, this is followed by: for k in F:  D[k] = F[k]

Should our __sub__ behavior be the same (i.e., iterate for dict subclasses and objects without "keys()", otherwise call "keys()" and iterate over that)? __iadd__ calls into this logic already. It seems to be the most "natural" solution here, if we desire behavior analogous to "update".

Brandt

On Fri, Mar 1, 2019 at 8:26 AM Steven D'Aprano <st...@pearwood.info> wrote:
Attached is a draft PEP on adding + and - operators to dict for
discussion.

This should probably go here:

https://github.com/python/peps

but due to technical difficulties at my end, I'm very limited in what I
can do on Github (at least for now). If there's anyone who would like to
co-author and/or help with the process, that will be appreciated.


--
Steven

Brandt Bucher

unread,
Mar 5, 2019, 5:58:49 PM3/5/19
to Steven D'Aprano, Python-Ideas
Should our __sub__ behavior be the same...

Sorry, our "__isub__" behavior. Long day...

Brandt Bucher

unread,
Mar 5, 2019, 6:15:55 PM3/5/19
to Steven D'Aprano, Python-Ideas
Actually, this was made even more condition-y in 3.8. Now we check __iter__ too:

D.update([E, ]**F) -> None.  Update D from dict/iterable E and F.
If E is present, has a .keys() method, is a subclass of dict, and hasn't overridden __iter__, then does:  for k in E: D[k] = E[k]
If E is present, has a .keys() method, and is not a subclass of dict or has overridden __iter__, then does:  for k in E.keys(): D[k] = E[k]
If E is present and lacks a .keys() method, then does:  for k, v in E: D[k] = v
In either case, this is followed by: for k in F:  D[k] = F[k]

Bleh.

Steven D'Aprano

unread,
Mar 5, 2019, 6:17:12 PM3/5/19
to python...@python.org
On Sun, Mar 03, 2019 at 09:28:30PM -0500, James Lu wrote:

> I propose that the + sign merge two python dictionaries such that if
> there are conflicting keys, a KeyError is thrown.

This proposal is for a simple, operator-based equivalent to
dict.update() which returns a new dict. dict.update has existed since
Python 1.5 (something like a quarter of a century!) and never grown a
"unique keys" version.

I don't recall even seeing a request for such a feature. If such a
unique keys version is useful, I don't expect it will be useful often.


> This way, d1 + d2 isn’t just another obvious way to do {**d1, **d2}.

One of the reasons for preferring + is that it is an obvious way to do
something very common, while {**d1, **d2} is as far from obvious as you
can get without becoming APL or Perl :-)

If I needed such a unique key version of update, I'd use a subclass:


class StrictDict(dict):
def __add__(self, other):
if isinstance(other, dict) and (self.keys() & other.keys()):
raise KeyError('non-unique keys')
return super().__add__(self, other)

# and similar for __radd__.


rather than burden the entire language, and every user of it, with
having to learn the subtle difference between the obvious + operator and
the error-prone and unobvious trick of {*d1, *d2}.

( Did you see what I did there? *wink* )


> The second syntax makes it clear that a new dictionary is being
> constructed and that d2 overrides keys from d1.

Only because you have learned the rule that {**d, **e) means to
construct a new dict by merging, with the rule that in the event of
duplicate keys, the last key seen wins. If you hadn't learned that rule,
there is nothing in the syntax which would tell you the behaviour. We
could have chosen any rule we liked:

- raise an exception, like you get a TypeError if you pass the
same keyword argument to a function twice: spam(foo=1, foo=2);

- first value seen wins;

- last value seen wins;

- random value wins;

- anything else we liked!


There is nothing "clear" about the syntax which makes it obvious which
behaviour is implemented. We have to learn it.

> One can reasonably expect or imagine a situation where a section of
> code that expects to merge two dictionaries with non-conflicting keys
> commits a semantic error if it merges two dictionaries with
> conflicting keys.

I can imagine it, but I don't think I've ever needed it, and I can't
imagine wanting it often enough to wish it was not just a built-in
function or method, but actual syntax.

Do you have some real examples of wanting an error when trying to update
a dict if keys match?


> To better explain, imagine a program where options is a global
> variable storing parsed values from the command line.
>
> def verbose_options():
> if options.quiet
> return {'verbose': True}
>
> def quiet_options():
> if options.quiet:
> return {'verbose': False}

That seems very artifical to me. Why not use a single function:

def verbose_options(): # There's more than one?
return {'verbose': not options.quiet}

The way you have written those functions seems weird to me. You already
have a nice options object, with named fields like "options.quiet", why
are you turning it into not one but *two* different dicts, both
reporting the same field?

And its buggy: if options.quiet is True, then the key 'quiet'
should be True, not the 'verbose' key.

Do you have *two* functions for every preference setting that takes a
true/false flag?

What do you do for preference settings that take multiple values? Create
a vast number of specialised functions, one for each possible value?

def A4_page_options():
if options.page_size == 'A4':
return {'page_size': 'A4'}

def US_Letter_page_options():
if options.page_size == 'US Letter':
return {'page_size': 'US Letter'}

page_size = (
A4_page_options() + A3_page_options() + A5_page_options()
+ Foolscape_page_options + Tabloid_page_options()
+ US_Letter_page_options() + US_Legal_page_options()
# and about a dozen more...
)

The point is, although I might be wrong, I don't think that this example
is a practical, realistic use-case for a unique keys version of update.

To me, your approach seems so complicated and artificial that it seems
like it was invented specifically to justify this "unique key" operator,
not something that we would want to write in real life.

But even if it real code, the question is not whether it is EVER useful
for a dict update to raise an exception on matching keys. The question
is whether this is so often useful that this is the behaviour we want to
make the default for dicts.


[...]
> Again, I propose that the + sign merge two python dictionaries such
> that if there are conflicting keys, a KeyError is thrown, because such
> “non-conflicting merge” behavior would be useful in Python.

I don't think it would be, at least not often.

If it were common enough to justify a built-in operator to do this, we
would have had many requests for a dict.unique_update or similar by now,
and I don't think we have.


> It gives
> clarifying power to the + sign. The + and the {**, **} should serve
> different roles.


>
> In other words, explicit + is better than implicit {**, **#, unless
> explicitly suppressed. Here + is explicit whereas {**, **} is
> implicitly allowing inclusive keys,

If I had a cent for every time people misused "explicit" to mean "the
proposal that I like", I'd be rich.

In what way is the "+" operator *explicit* about raising an exception on
duplicate keys? These are both explicit:

merge_but_raise_exception_if_any_duplicates(d1, d2)

merge(d1, d2, raise_if_duplicates=True)

and these are both equally implicit:

d1 + d2

{**d1, **d2}

since the behaviour on duplicates is not explicitly stated in clear and
obvious language, but implied by the rules of the language.


[...]
> People expect the + operator to be commutative

THey are wrong to expect that, because the + operator is already not
commutative for:

str
bytes
bytearray
list
tuple
array.array
collections.deque
collections.Counter

and possibly others.

Steven D'Aprano

unread,
Mar 5, 2019, 6:37:15 PM3/5/19
to python...@python.org
On Mon, Mar 04, 2019 at 08:01:38PM -0500, James Lu wrote:
>
> > On Mar 4, 2019, at 11:25 AM, Steven D'Aprano <st...@pearwood.info> wrote:

> > Another example would be when reading command line options, where the
> > most common convention is for "last option seen" to win:
> >
> > [steve@ando Lib]$ grep --color=always --color=never "zero" f*.py

[...]


> Indeed, in this case you would want to use {**, **} syntax.

No I would NOT want to use the {**, **} syntax, because it is ugly.
That's why people ask for + instead. (Or perhaps I should say "as well
as" since the double-star syntax is not going away.)


[...]


> > Unless someone can demonstrate that the design of dict.update() was a
> > mistake
>
> You’re making a logical mistake here. + isn’t supposed to have
> .update’s behavior and it never was supposed to.

James, I'm the author of the PEP, and for the purposes of the proposal,
the + operator is supposed to do what I say it is supposed to do.

You might be able to persuade me to change the PEP, if you have a
sufficiently good argument, or you can write your own counter PEP making
a different choice, but please don't tell me what I intended. I know
what I intended, and it is for + to have the same last-key-wins
behaviour as update. That's the behaviour which is most commonly
requested in the various times this comes up.


> > , and the "require unique keys" behaviour is more common,
>
> I just have.

No you haven't -- you have simply *declared* that it is more common,
without giving any evidence for it.


> 99% of the time you want to have keys from one dict override another,
> you’d be better off doing it in-place and so would be using .update()
> anyways.

I don't know if it is "99% of the time" or 50% of the time or 5%,
but this PEP is for the remaining times where we don't want in-place
updates but we want a new dict.

I use list.append or list.extend more often than list concatenation, but
when I want a new list, list concatenation is very useful. This proposal
is about those cases where we want a new dict.


--
Steven

Josh Rosenberg

unread,
Mar 5, 2019, 6:49:56 PM3/5/19
to python...@python.org
On Tue, Mar 5, 2019 at 11:16 PM Steven D'Aprano <st...@pearwood.info> wrote:
On Sun, Mar 03, 2019 at 09:28:30PM -0500, James Lu wrote:

> I propose that the + sign merge two python dictionaries such that if
> there are conflicting keys, a KeyError is thrown.

This proposal is for a simple, operator-based equivalent to
dict.update() which returns a new dict. dict.update has existed since
Python 1.5 (something like a quarter of a century!) and never grown a
"unique keys" version.

I don't recall even seeing a request for such a feature. If such a
unique keys version is useful, I don't expect it will be useful often.


I have one argument in favor of such a feature: It preserves concatenation semantics. + means one of two things in all code I've ever seen (Python or otherwise):

1. Numeric addition (including element-wise numeric addition as in Counter and numpy arrays)
2. Concatenation (where the result preserves all elements, in order, including, among other guarantees, that len(seq1) + len(seq2) == len(seq1 + seq2))

dict addition that didn't reject non-unique keys wouldn't fit *either* pattern; the main proposal (making it equivalent to left.copy(), followed by .update(right)) would have the left hand side would win on ordering, the right hand side on values, and wouldn't preserve the length invariant of concatenation. At least when repeated keys are rejected, most concatenation invariants are preserved; order is all of the left elements followed by all of the right, and no elements are lost.
 

> This way, d1 + d2 isn’t just another obvious way to do {**d1, **d2}.

One of the reasons for preferring + is that it is an obvious way to do
something very common, while {**d1, **d2} is as far from obvious as you
can get without becoming APL or Perl :-)


From the moment PEP 448 published, I've been using unpacking as a more composable/efficient form of concatenation, merging, etc. I'm sorry you don't find it obvious, but a couple e-mails back you said:

"The Zen's prohibition against guessing in the face of ambiguity does not
mean that we must not add a feature to the language that requires the
user to learn what it does first."

Learning to use the unpacking syntax in the case of function calls is necessary for tons of stuff (writing general function decorators, handling initialization in class hierarchies, etc.), and as PEP 448 is titled, this is just a generalization combining the features of unpacking arguments with collection literals.

> The second syntax makes it clear that a new dictionary is being
> constructed and that d2 overrides keys from d1.

Only because you have learned the rule that {**d, **e) means to
construct a new dict by merging, with the rule that in the event of
duplicate keys, the last key seen wins. If you hadn't learned that rule,
there is nothing in the syntax which would tell you the behaviour. We
could have chosen any rule we liked:


No, because we learned the general rule for dict literals that {'a': 1, 'a': 2} produces {'a': 2}; the unpacking generalizations were very good about adhering to the existing rules, so it was basically zero learning curve if you already knew dict literal rules and less general unpacking rules. The only part to "learn" is that when there is a conflict between dict literal rules and function call rules, dict literal rules win.

To be clear: I'm not supporting + as raising error on non-unique keys. Even if it makes dict + dict adhere to the rules of concatenation, I don't think it's a common or useful functionality. My order of preferences is roughly:

1. Do nothing (even if you don't like {**d1, **d2}, .copy() followed by .update() is obvious, and we don't need more than one way to do it)
2. Add a new method to dict, e.g. dict.merge (whether it's a class method or an instance method is irrelevant to me)
3. Use | (because dicts are *far* more like sets than they are like sequences, and the semi-lossy rules of unioning make more sense there); it would also make - make sense, since + is only matched by - in numeric contexts; on collections, | and - are paired. And I consider the - functionality the most useful part of this whole proposal (because I *have* wanted to drop a collection of known blacklisted keys from a dict and while it's obvious you can do it by looping, I always wanted to be able to do something like d1.keys() -= badkeys, and remain disappointed nothing like it is available)

-Josh Rosenberg

Guido van Rossum

unread,
Mar 5, 2019, 7:10:26 PM3/5/19
to Josh Rosenberg, Python-Ideas
On Tue, Mar 5, 2019 at 3:50 PM Josh Rosenberg <shadowranger...@gmail.com> wrote:

On Tue, Mar 5, 2019 at 11:16 PM Steven D'Aprano <st...@pearwood.info> wrote:
On Sun, Mar 03, 2019 at 09:28:30PM -0500, James Lu wrote:

> I propose that the + sign merge two python dictionaries such that if
> there are conflicting keys, a KeyError is thrown.

This proposal is for a simple, operator-based equivalent to
dict.update() which returns a new dict. dict.update has existed since
Python 1.5 (something like a quarter of a century!) and never grown a
"unique keys" version.

I don't recall even seeing a request for such a feature. If such a
unique keys version is useful, I don't expect it will be useful often.

I have one argument in favor of such a feature: It preserves concatenation semantics. + means one of two things in all code I've ever seen (Python or otherwise):

1. Numeric addition (including element-wise numeric addition as in Counter and numpy arrays)
2. Concatenation (where the result preserves all elements, in order, including, among other guarantees, that len(seq1) + len(seq2) == len(seq1 + seq2))

dict addition that didn't reject non-unique keys wouldn't fit *either* pattern; the main proposal (making it equivalent to left.copy(), followed by .update(right)) would have the left hand side would win on ordering, the right hand side on values, and wouldn't preserve the length invariant of concatenation. At least when repeated keys are rejected, most concatenation invariants are preserved; order is all of the left elements followed by all of the right, and no elements are lost.

I must by now have seen dozens of post complaining about this aspect of the proposal. I think this is just making up rules (e.g. "+ never loses information") to deal with an aspect of the design where a *choice* must be made. This may reflect the Zen of Python's "In the face of ambiguity, refuse the temptation to guess." But really, that's a pretty silly rule (truly, they aren't all winners). Good interface design constantly makes choices in ambiguous situations, because the alternative is constantly asking, and that's just annoying.

We have a plethora of examples (in fact, almost all alternatives considered) of situations related to dict merging where a choice is made between conflicting values for a key, and it's always the value further to the right that wins: from d[k] = v (which overrides the value when k is already in the dict) to d1.update(d2) (which lets the values in d2 win), including the much lauded {**d1, **d2} and even plain {'a': 1, 'a': 2} has a well-defined meaning where the latter value wins.

As to why raising is worse: First, none of the other situations I listed above raises for conflicts. Second, there's the experience of str+unicode in Python 2, which raises if the str argument contains any non-ASCII bytes. In fact, we disliked it so much that we changed the language incompatibly to deal with it.

Josh Rosenberg

unread,
Mar 5, 2019, 7:48:12 PM3/5/19
to Python-Ideas
On Wed, Mar 6, 2019 at 12:08 AM Guido van Rossum <gu...@python.org> wrote:
On Tue, Mar 5, 2019 at 3:50 PM Josh Rosenberg <shadowranger...@gmail.com> wrote:

On Tue, Mar 5, 2019 at 11:16 PM Steven D'Aprano <st...@pearwood.info> wrote:
On Sun, Mar 03, 2019 at 09:28:30PM -0500, James Lu wrote:

> I propose that the + sign merge two python dictionaries such that if
> there are conflicting keys, a KeyError is thrown.

This proposal is for a simple, operator-based equivalent to
dict.update() which returns a new dict. dict.update has existed since
Python 1.5 (something like a quarter of a century!) and never grown a
"unique keys" version.

I don't recall even seeing a request for such a feature. If such a
unique keys version is useful, I don't expect it will be useful often.

I have one argument in favor of such a feature: It preserves concatenation semantics. + means one of two things in all code I've ever seen (Python or otherwise):

1. Numeric addition (including element-wise numeric addition as in Counter and numpy arrays)
2. Concatenation (where the result preserves all elements, in order, including, among other guarantees, that len(seq1) + len(seq2) == len(seq1 + seq2))

dict addition that didn't reject non-unique keys wouldn't fit *either* pattern; the main proposal (making it equivalent to left.copy(), followed by .update(right)) would have the left hand side would win on ordering, the right hand side on values, and wouldn't preserve the length invariant of concatenation. At least when repeated keys are rejected, most concatenation invariants are preserved; order is all of the left elements followed by all of the right, and no elements are lost.

I must by now have seen dozens of post complaining about this aspect of the proposal. I think this is just making up rules (e.g. "+ never loses information") to deal with an aspect of the design where a *choice* must be made. This may reflect the Zen of Python's "In the face of ambiguity, refuse the temptation to guess." But really, that's a pretty silly rule (truly, they aren't all winners). Good interface design constantly makes choices in ambiguous situations, because the alternative is constantly asking, and that's just annoying.

We have a plethora of examples (in fact, almost all alternatives considered) of situations related to dict merging where a choice is made between conflicting values for a key, and it's always the value further to the right that wins: from d[k] = v (which overrides the value when k is already in the dict) to d1.update(d2) (which lets the values in d2 win), including the much lauded {**d1, **d2} and even plain {'a': 1, 'a': 2} has a well-defined meaning where the latter value wins.

Yeah. And I'm fine with the behavior for update because the name itself is descriptive; we're spelling out, in English, that we're update-ing the thing it's called on, so it makes sense to have the thing we're sourcing for updates take precedence.

Similarly, for dict literals (and by extension, unpacking), it's following an existing Python convention which doesn't contradict anything else.

Overloading + lacks the clear descriptive aspect of update that describes the goal of the operation, and contradicts conventions (in Python and elsewhere) about how + works (addition or concatenation, and a lot of people don't even like it doing the latter, though I'm not that pedantic).

A couple "rules" from C++ on overloading are "Whenever the meaning of an operator is not obviously clear and undisputed, it should not be overloaded. Instead, provide a function with a well-chosen name." and "Always stick to the operator’s well-known semantics". (Source: https://stackoverflow.com/a/4421708/364696 , though the principle is restated in many other places). Obviously the C++ community isn't perfect on this (see iostream and <</>> operators), but they're otherwise pretty consistent. + means addition, and in many languages including C++ strings, concatenation, but I don't know of any languages outside the "esoteric" category that use it for things that are neither addition nor concatenation. You've said you don't want the whole plethora of set-like behaviors on dicts, but dicts are syntactically and semantically much more like sets than sequences, and if you add + (with semantics differing from both sets and sequences), the language becomes less consistent.

I'm not against making it easier to merge dictionaries. But people seem to be arguing that {**d1, **d2} is bad because of magic punctuation that obscures meaning, when IMO:

     d3 = d1 + d2

is obscuring meaning by adding yet a third rule for what + means, inconsistent with both existing rules (from both Python and the majority of languages I've had cause to use). A named method (class or instance) or top-level function (a la sorted) is more explicit, easier to look up (after all, the major complaint about ** syntax is the difficulty of finding the documentation on it). It's also easier to make it do the right thing; d1 + d2 + d3 + ... dN is inefficient (makes many unnecessary temporaries), {**d1, **d2, **d3, ..., **dN} is efficient but obscure (and not subclass friendly), but a varargs method like dict.combine(d1, d2, d3, ..., dN) (or merge, or whatever; I'm not trying to bikeshed) is correct, efficient, and most importantly, easy to look up documentation for.

I occasionally find it frustrating that concatenation exists given the wealth of Schlemiel the Painter's algorithms it encourages, and the "correct" solution for combining sequences (itertools.chain for general cases, str.join/bytes.join for special cases) being less obvious means my students invariably use the "wrong" tool out of convenience (and it's not really wrong in 90% of code where the lengths are always short, but then they use it where lengths are often huge and suffer for it). If we're going to make dict merging more convenient, I'd prefer we make the obvious, convenient solution also the one that doesn't encourage non-scalable anti-patterns.

As to why raising is worse: First, none of the other situations I listed above raises for conflicts. Second, there's the experience of str+unicode in Python 2, which raises if the str argument contains any non-ASCII bytes. In fact, we disliked it so much that we changed the language incompatibly to deal with it.

Agreed, I don't like raising. It's consistent with + (the only argument in favor of it really), but it's a bad idea, for all the reasons you mention.

- Josh Rosenberg

Stefan Behnel

unread,
Mar 6, 2019, 3:35:03 AM3/6/19
to python...@python.org
INADA Naoki schrieb am 05.03.19 um 08:03:> On Tue, Mar 5, 2019 at 12:02 AM
Stefan Behnel wrote:
>> INADA Naoki schrieb am 04.03.19 um 11:15:
>>> Why statement is not enough?
>>
>> I'm not sure I understand why you're asking this, but a statement is
>> "not enough" because it's a statement and not an expression. It does
>> not replace the convenience of an expression.
>
> It seems tautology and say nothing.

That's close to what I thought when I read your question. :)


> What is "convenience of an expression"?

It's the convenience of being able to write an expression that generates
the thing you need, rather than having to split code into statements that
create it step by step before you can use it.

Think of comprehensions versus for-loops. Comprehensions are expressions
that don't add anything to the language that a for-loop cannot achieve.
Still, everyone uses them because they are extremely convenient.


> Is it needed to make Python more readable language?

No, just like comprehensions, it's not "needed". It's just convenient.


> Anyway, If "there is expression" is the main reason for this proposal,
> symbolic operator is not necessary.

As said, "needed" is not the right word. Being able to use a decorator
closes a gap in the language. Just like list comprehensions fit generator
expressions and vice versa. There is no "need" for being able to write

[x**2 for x in seq]
{x**2 for x in seq}

when you can equally well write

list(x**2 for x in seq)
set(x**2 for x in seq)

But I certainly wouldn't complain about that redundancy in the language.


> `new = d1.updated(d2)` or `new = dict.merge(d1, d2)` are enough. Python
> preferred name over symbol in general. Symbols are readable and
> understandable only when it has good math metaphor.
>
> Sets has symbol operator because it is well known in set in math, not
> because set is frequently used.
>
> In case of dict, there is no simple metaphor in math.

So then, if "list+list" and "tuple+tuple" wasn't available through an
operator, would you also reject the idea of adding it, argueing that we
could use this:

L = L1.extended(L2)

I honestly do not see the math relation in concatenation via "+".

But, given that "+" and "|" already have the meaning of "merging two
containers into one" in Python, I think it makes sense to allow that also
for dicts.


> It just cryptic and hard to Google.

I honestly doubt that it's something people would have to search for any
more than they have to search for the list "+" operation. My guess is that
it's pretty much what most people would try first when they have the need
to merge two dicts, and only failing that, they would start a web search.

In comparison, very few users would be able to come up with "{**d1, **d2}"
on their own, or even "d1.updated(d2)".

My point is, given the current language, "dict+dict" is a gap that is worth
closing.

Stefan

Brice Parent

unread,
Mar 6, 2019, 4:26:02 AM3/6/19
to python...@python.org

Le 05/03/2019 à 23:40, Greg Ewing a écrit :
> Steven D'Aprano wrote:
>> The question is, is [recursive merge] behaviour useful enough and
> > common enough to be built into dict itself?
>
> I think not. It seems like just one possible way of merging
> values out of many. I think it would be better to provide
> a merge function or method that lets you specify a function
> for merging values.
>
That's what this conversation led me to. I'm not against the addition
for the most general usage (and current PEP's describes the behaviour I
would expect before reading the doc), but for all other more specific
usages, where we intend any special or not-so-common behaviour, I'd go
with modifying Dict.update like this:

foo.update(bar, on_collision=updator)  # Although I'm not a fan of the
keyword I used

`updator` being a simple function like this one:

def updator(updated, updator, key) -> Any:
    if key == "related":
        return updated[key].update(updator[key])

    if key == "tags":
        return updated[key] + updator[key]

    if key in ["a", "b", "c"]:  # Those
        return updated[key]

    return updator[key]

There's nothing here that couldn't be made today by using a custom
update function, but leaving the burden of checking for values that are
in both and actually inserting the new values to Python's language, and
keeping on our side only the parts that are specific to our use case,
makes in my opinion the code more readable, with fewer possible bugs and
possibly better optimization.

Inada Naoki

unread,
Mar 6, 2019, 4:27:38 AM3/6/19
to Stefan Behnel, python-ideas
On Wed, Mar 6, 2019 at 5:34 PM Stefan Behnel <stef...@behnel.de> wrote:
>
> INADA Naoki schrieb am 05.03.19 um 08:03:> On Tue, Mar 5, 2019 at 12:02 AM
> Stefan Behnel wrote:
> >> INADA Naoki schrieb am 04.03.19 um 11:15:
> >>> Why statement is not enough?
> >>
> >> I'm not sure I understand why you're asking this, but a statement is
> >> "not enough" because it's a statement and not an expression. It does
> >> not replace the convenience of an expression.
> >
> > It seems tautology and say nothing.
>
> That's close to what I thought when I read your question. :)
>
>
> > What is "convenience of an expression"?
>
> It's the convenience of being able to write an expression that generates
> the thing you need, rather than having to split code into statements that
> create it step by step before you can use it.
>

I don't think it's reasonable rationale for adding operator.

First, Python sometimes force people to use statement intentionally.

Strictly speaking, dict.update() is an expression. But it not return `self`
so you must split statements. It's design decision.

So "add operator because I want expression" is bad reasoning to me.
If it is valid reasoning, every mutating method should have operator.
It's crazy idea.

Second, operator is not required for expression. And adding operator
must have high bar than adding method because it introduces more
complexity and it could seen cryptic especially when the operator doesn't
have good math metaphor.

So I proposed adding dict.merge() instead of adding dict + as a counter
proposal.

If "I want expression" is really main motivation, it must be enough.


> Think of comprehensions versus for-loops. Comprehensions are expressions
> that don't add anything to the language that a for-loop cannot achieve.
> Still, everyone uses them because they are extremely convenient.
>

I agree that comprehension is extremely convenient.
But I think the main reason is it is compact and readable.
If comprehension is not compact and readable as for-loop, it's not extremely
convenient.

>
> > Is it needed to make Python more readable language?
>
> No, just like comprehensions, it's not "needed". It's just convenient.
>

I think comprehension is needed to make Python more readable language,
not just for convenient.

>
> > Anyway, If "there is expression" is the main reason for this proposal,
> > symbolic operator is not necessary.
>
> As said, "needed" is not the right word.

Maybe, I misunderstood nuance of the word "needed". English and Japanese
are very different language. sorry.

> Being able to use a decorator
> closes a gap in the language. Just like list comprehensions fit generator
> expressions and vice versa. There is no "need" for being able to write
>
> [x**2 for x in seq]
> {x**2 for x in seq}
>
> when you can equally well write
>
> list(x**2 for x in seq)
> set(x**2 for x in seq)
>
> But I certainly wouldn't complain about that redundancy in the language.
>

OK, I must agree this point. [] and {} has good metaphor in math.
We use [1, 2, 3,... ] for series, and {1, 2, 3, ...} for sets.

>
> > `new = d1.updated(d2)` or `new = dict.merge(d1, d2)` are enough. Python
> > preferred name over symbol in general. Symbols are readable and
> > understandable only when it has good math metaphor.
> >
> > Sets has symbol operator because it is well known in set in math, not
> > because set is frequently used.
> >
> > In case of dict, there is no simple metaphor in math.
>
> So then, if "list+list" and "tuple+tuple" wasn't available through an
> operator, would you also reject the idea of adding it, argueing that we
> could use this:
>
> L = L1.extended(L2)
>
> I honestly do not see the math relation in concatenation via "+".
>

First of all, concatenating sequence (especially str) is extremely
frequent than merging dict. My point is dict + dict is major abuse
of + than seq + seq and it's usage is smaller than seq + seq.

Let's describe why I think dict+dict is "major" abuse.

As I said before, it's common to assign operator for concatenation in
regular language, while middle-dot is used common.

When the commonly-used operator is not in ASCII, other symbol can
be used as alternative. We used | instead of ∪.

In case of dict, it's not common to assign operator for merging in math,
as far as I know.

(Maybe, "direct sum" ⊕ is similar to it. But it doesn't allow intersection.
So ValueError must be raised for duplicated key if we use "direct sum"
for metaphor.
But direct sum is higher-level math than "union" of set. I don't think
it's good idea to use it as metaphor.)

That's one of reasons I think seq + seq is "little" abuse and dict +
dict is "major" abuse.

Another reason is "throw some values away" doesn't fit mental model of "sum",
as I said already in earlier mail.


> But, given that "+" and "|" already have the meaning of "merging two
> containers into one" in Python, I think it makes sense to allow that also
> for dicts.
>

+ is used for concatenate, it is more strict than just merge.

If + is allowed for dict, set should support it too for consistency.
Then, meaning of "+ for container" become "sum up two containers in
some way, defined by the container type." It's consistent.

Kotlin uses + for this meaning. Scala uses ++ for this meaning.

But this is a large design change of the language. Is this really required?
I feel adding a method is enough.

--
Inada Naoki <songof...@gmail.com>

Rémi Lapeyre

unread,
Mar 6, 2019, 4:51:43 AM3/6/19
to Brice Parent, python...@python.org
Le 6 mars 2019 à 10:26:15, Brice Parent
(con...@brice.xyz(mailto:con...@brice.xyz)) a écrit:

>
> Le 05/03/2019 à 23:40, Greg Ewing a écrit :
> > Steven D'Aprano wrote:
> >> The question is, is [recursive merge] behaviour useful enough and
> > > common enough to be built into dict itself?
> >
> > I think not. It seems like just one possible way of merging
> > values out of many. I think it would be better to provide
> > a merge function or method that lets you specify a function
> > for merging values.
> >
> That's what this conversation led me to. I'm not against the addition
> for the most general usage (and current PEP's describes the behaviour I
> would expect before reading the doc), but for all other more specific
> usages, where we intend any special or not-so-common behaviour, I'd go
> with modifying Dict.update like this:
>
> foo.update(bar, on_collision=updator) # Although I'm not a fan of the
> keyword I used

Le 6 mars 2019 à 10:26:15, Brice Parent
(con...@brice.xyz(mailto:con...@brice.xyz)) a écrit:

>
> Le 05/03/2019 à 23:40, Greg Ewing a écrit :
> > Steven D'Aprano wrote:
> >> The question is, is [recursive merge] behaviour useful enough and
> > > common enough to be built into dict itself?
> >
> > I think not. It seems like just one possible way of merging
> > values out of many. I think it would be better to provide
> > a merge function or method that lets you specify a function
> > for merging values.
> >
> That's what this conversation led me to. I'm not against the addition
> for the most general usage (and current PEP's describes the behaviour I
> would expect before reading the doc), but for all other more specific
> usages, where we intend any special or not-so-common behaviour, I'd go
> with modifying Dict.update like this:
>
> foo.update(bar, on_collision=updator) # Although I'm not a fan of the
> keyword I used

This won’t be possible update() already takes keyword arguments:

>>> foo = {}
>>> bar = {'a': 1}
>>> foo.update(bar, on_collision=lambda e: e)
>>> foo
{'a': 1, 'on_collision': <function <lambda> at 0x10b8df598>}

Ka-Ping Yee

unread,
Mar 6, 2019, 5:30:49 AM3/6/19
to Rémi Lapeyre, python...@python.org
len(dict1 + dict2) does not equal len(dict1) + len(dict2), so using the + operator is nonsense.

len(dict1 + dict2) cannot even be computed by any expression involving +.  Using len() to test the semantics of the operation is not arbitrary; the fact that the sizes do not add is a defining quality of a merge.  This is a merge, not an addition.  The proper analogy is to sets, not lists.

The operators should be |, &, and -, exactly as for sets, and the behaviour defined with just three rules:

1. The keys of dict1 [op] dict2 are the elements of dict1.keys() [op] dict2.keys().

2. The values of dict2 take priority over the values of dict1.

3. When either operand is a set, it is treated as a dict whose values are None.

This yields many useful operations and, most importantly, is simple to explain.  "sets and dicts can |, &, -" takes up less space in your brain than "sets can |, &, - but dicts can only + and -, where dict + is like set |".

merge and update some items:

    {'a': 1, 'b': 2} | {'b': 3, 'c': 4} => {'a': 1, 'b': 3, 'c': 4}

pick some items:

    {'a': 1, 'b': 2} & {'b': 3, 'c': 4} => {'b': 3}

remove some items:

    {'a': 1, 'b': 2} - {'b': 3, 'c': 4} => {'a': 1}

reset values of some keys:

    {'a': 1, 'b': 2} | {'b', 'c'} => {'a': 1, 'b': None, 'c': None}

ensure certain keys are present:

    {'b', 'c'} | {'a': 1, 'b': 2} => {'a': 1, 'b': 2, 'c': None}

pick some items:

    {'b', 'c'} | {'a': 1, 'b': 2} => {'b': 2}

remove some items:

    {'a': 1, 'b': 2} - {'b', 'c'} => {'a': 1}

Rhodri James

unread,
Mar 6, 2019, 6:53:44 AM3/6/19
to python...@python.org
On 06/03/2019 10:29, Ka-Ping Yee wrote:
> len(dict1 + dict2) does not equal len(dict1) + len(dict2), so using the +
> operator is nonsense.

I'm sorry, but you're going to have to justify why this identity is
important. Making assumptions about length where any dictionary
manipulations are concerned seems unwise to me, which makes a nonsense
of your claim that this is nonsense :-)

--
Rhodri James *-* Kynesim Ltd

Brice Parent

unread,
Mar 6, 2019, 7:19:02 AM3/6/19
to Rémi Lapeyre, python...@python.org

Le 06/03/2019 à 10:50, Rémi Lapeyre a écrit :
>> Le 05/03/2019 à 23:40, Greg Ewing a écrit :
>>> Steven D'Aprano wrote:
>>>> The question is, is [recursive merge] behaviour useful enough and
>>>> common enough to be built into dict itself?
>>> I think not. It seems like just one possible way of merging
>>> values out of many. I think it would be better to provide
>>> a merge function or method that lets you specify a function
>>> for merging values.
>>>
>> That's what this conversation led me to. I'm not against the addition
>> for the most general usage (and current PEP's describes the behaviour I
>> would expect before reading the doc), but for all other more specific
>> usages, where we intend any special or not-so-common behaviour, I'd go
>> with modifying Dict.update like this:
>>
>> foo.update(bar, on_collision=updator) # Although I'm not a fan of the
>> keyword I used
> This won’t be possible update() already takes keyword arguments:
>
>>>> foo = {}
>>>> bar = {'a': 1}
>>>> foo.update(bar, on_collision=lambda e: e)
>>>> foo
> {'a': 1, 'on_collision': <function <lambda> at 0x10b8df598>}

I don't see that as a problem at all.
Having a function's signature containing a **kwargs doesn't disable to
have explicit keyword arguments at the same time:
`def foo(bar="baz", **kwargs):` is perfectly valid, as well as `def
spam(ham: Dict, eggs="blah", **kwargs):`, so `update(other,
on_collision=None, **added) is too, no? The major implication to such a
modification of the Dict.update method, is that when you're using it
with keyword arguments (by opposition to passing another dict/iterable
as positional), you're making a small non-backward compatible change in
that if in some code, someone was already using the keyword that would
be chosing (here "on_collision"), their code would be broken by the new
feature.
I had never tried to pass a dict and kw arguments together, as it seemed
to me that it wasn't supported (I would even have expected an exception
to be raised), but it's probably my level of English that isn't high
enough to get it right, or this part of the doc that doesn't describe
well the full possible usage of the method (see here:
https://docs.python.org/3/library/stdtypes.html#dict.update). Anyway, if
the keyword is slected wisely, the collision case will almost never
happen, and be quite easy to correct if it ever happened.

Jonathan Fine

unread,
Mar 6, 2019, 7:21:57 AM3/6/19
to python-ideas
Ka-Ping Yee wrote:
>
> len(dict1 + dict2) does not equal len(dict1) + len(dict2), so using the + operator is nonsense.
>
> len(dict1 + dict2) cannot even be computed by any expression involving +. Using len() to test the semantics of the operation is not arbitrary; the fact that the sizes do not add is a defining quality of a merge. This is a merge, not an addition. The proper analogy is to sets, not lists.

For me, this comment is excellent. It neatly expresses the central
concern about this proposal. I think most us will agree that the
proposal is to use '+' to express a merge operation, namely update.
(There are other merge operations, when there are two values to
combine, such as taking the min or max of the two values.)

Certainly, many of the posts quite naturally use the word merge.
Indeed PEP 584 writes "This PEP suggests adding merge '+' and
difference '-' operators to the built-in dict class."

We would all agree that it would be obviously wrong to suggest adding
merge '-' and difference '+' operators. (Note: I've swapped '+' and
'-'.) And why? Because it is obviously wrong to use '-' to denote
merge, etc.

Some of us are also upset by the use of '+' to denote merge. By the
way, there is already a widespread symbol for merge. It appears on
many road signs. It looks like an upside down 'Y'. It even has merge
left and merge right versions.

Python already has operator symbols '+', '-', '*', '/' and so on. See
https://docs.python.org/3/reference/lexical_analysis.html#operators

Perhaps we should add a merge or update symbol to this list, so that
we don't overload to breaking point the humble '+' operator. Although
that would make Python a bit more like APL.

By the way, Pandas already has a merge operation, called merge, that
takes many parameters. I've only glanced at it.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html

--
Jonathan

Josh Rosenberg

unread,
Mar 6, 2019, 7:22:04 AM3/6/19
to Python-Ideas
On Wed, Mar 6, 2019 at 11:52 AM Rhodri James <rho...@kynesim.co.uk> wrote:
On 06/03/2019 10:29, Ka-Ping Yee wrote:
> len(dict1 + dict2) does not equal len(dict1) + len(dict2), so using the +
> operator is nonsense.

I'm sorry, but you're going to have to justify why this identity is
important.  Making assumptions about length where any dictionary
manipulations are concerned seems unwise to me, which makes a nonsense
of your claim that this is nonsense :-)

It's not "nonsense" per se. If we were inventing programming languages in a vacuum, you could say + can mean "arbitrary combination operator" and it would be fine. But we're not in a vacuum; every major language that uses + with general purpose containers uses it to mean element-wise addition or concatenation, not just "merge". Concatenation is what imposes that identity (and all the others people are defending, like no loss of input values); you're taking a sequence of things, and shoving another sequence of things on the end of it, preserving order and all values.

The argument here isn't that you *can't* make + do arbitrary merges that don't adhere to these semantics. It's that adding yet a third meaning to + (and it is a third meaning; it has no precedent in any existing type in Python, nor in any other major language; even in the minor languages that allow it, they use + for sets as well, so Python using + is making Python itself internally inconsistent with the operators used for set), for limited benefit.

- Josh Rosenberg

Jonathan Fine

unread,
Mar 6, 2019, 7:35:30 AM3/6/19
to python-ideas
Rhodri James wrote:

> Making assumptions about length where any dictionary
> manipulations are concerned seems unwise to me

I think you're a bit hasty here. Some assumptions are sensible. Suppose

a = len(d1)
b = len(d2)
c = len(d1 + d2) # Using the suggested syntax.

Then we know
max(a, b) <= c <= a + b

And this is, in broad terms, characteristic of merge operations.
--
Jonathan

David Mertz

unread,
Mar 6, 2019, 7:53:36 AM3/6/19
to Ka-Ping Yee, python-ideas
I strongly agree with Ka-Ping. '+' is intuitively concatenation not merging. The behavior is overwhelmingly more similar to the '|' operator in sets (whether or not a user happens to know the historical implementation overlap).

I think growing the full collection of set operations world be a pleasant addition to dicts. I think shoe-horning in plus would always be jarring to me.

Chris Angelico

unread,
Mar 6, 2019, 7:55:02 AM3/6/19
to python-ideas
On Wed, Mar 6, 2019 at 11:18 PM Brice Parent <con...@brice.xyz> wrote:
> The major implication to such a
> modification of the Dict.update method, is that when you're using it
> with keyword arguments (by opposition to passing another dict/iterable
> as positional), you're making a small non-backward compatible change in
> that if in some code, someone was already using the keyword that would
> be chosing (here "on_collision"), their code would be broken by the new
> feature.
> Anyway, if
> the keyword is slected wisely, the collision case will almost never
> happen, and be quite easy to correct if it ever happened.

You can make it unlikely, yes, but I'd dispute "easy to correct".
Let's suppose that someone had indeed used the chosen keyword (and
remember, the more descriptive the argument name, the more likely that
it'll be useful elsewhere and therefore have a collision). How would
they discover this? If they're really lucky, there MIGHT be an
exception (if on_collision accepts only a handful of keywords, and the
collision isn't one of them), but if your new feature is sufficiently
flexible, that might not happen. There'll just be incorrect behaviour.

As APIs go, using specific keyword args at the same time as **kw is a
bit odd. Consider:

button_options.update(button_info, on_click=frobnicate, style="KDE",
on_collision="replace")

It's definitely not obvious which of those will end up in the
dictionary and which won't. Big -1 from me on that change.

ChrisA

Brice Parent

unread,
Mar 6, 2019, 8:41:09 AM3/6/19
to python...@python.org

That's indeed a good point. Even if the correction is quite easy to make
in most cases. With keyword only changes:

button_options.update(dict(on_click=frobnicate, style="KDE", on_collision="replace")) # or button_options.update(dict(on_collision="replace"), on_click=frobnicate, style="KDE")

In the exact case you proposed, it could become a 2-liners:

button_options.update(button_info)
button_options.update(dict(on_click=frobnicate, style="KDE", on_collision="replace"))

In my code, I would probably make it into 2 lines, to make clear that we
have 2 levels of data merging, one that is general (the first), and one
that is specific to this use-case (as it's hard written in the code),
but not everyone doesn't care about the number of lines.

But for the other part of your message, I 100% agree with you. The main
problem with such a change is not (to me) that it can break some edge
cases, but that it would potentially break them silently. And that, I
agree, is worth a big -1 I guess.

Michael Lee

unread,
Mar 6, 2019, 9:00:04 AM3/6/19
to David Mertz, python-ideas
I strongly agree with Ka-Ping. '+' is intuitively concatenation not merging. The behavior is overwhelmingly more similar to the '|' operator in sets (whether or not a user happens to know the historical implementation overlap).

I think the behavior proposed in the PEP makes sense whether you think of "+" as meaning "concatenation" or "merging".

If your instinct is to assume "+" means "concatenation", then it would be natural to assume that {"a": 1, "b": 2} + {"c": 3, "b": 4} would be identical to {"a": 1, "b": 2, "c": 3, "b": 4} -- literally concat the key-value pairs into a new dict.

But of course, you can't have duplicate keys in Python. So, you would either recall or look up how duplicate keys are handled when constructing a dict and learn that the rule is that the right-most key wins. So the natural conclusion is that "+" would follow this existing rule -- and you end up with exactly the behavior described in the PEP.

This also makes explaining the behavior of "d1 + d2" slightly easier than explaining "d1 | d2". For the former, you can just say "d1 + d2 means we concat the two dicts together" and stop there. You almost don't need to explain the merging/right-most key wins behavior at all, since that behavior is the only one consistent with the existing language rules.

In contrast, you *would* need to explain this with "d1 | d2": I would mentally translate this expression to mean "take the union of these two dicts" and there's no real way to deduce which key-value pair ends up in the final dict given that framing. Why is it that key-value pairs in d2 win over pairs in d1 here? That choice seems pretty arbitrary when you think of this operation in terms of unions, rather than either concat or merge.

Using "|" would also violate an important existing property of unions: the invariant "d1 | d2 == d2 | d1" is no longer true. As far as I'm aware, the union operation is always taken to be commutative in math, and so I think it's important that we preserve that property in Python. At the very least, I think it's far more important to preserve commutativity of unions then it is to preserve some of the invariants I've seen proposed above, like "len(d1 + d2) == len(d1) + len(d2)".

Personally, I don't really have a strong opinion on this PEP, or the other one I've seen proposed where we add a "d1.merge(d2, d3, ...)". But I do know that I'm a strong -1 on adding set operations to dicts: it's not possible to preserve the existing semantics of union (and intersection) with dict and  think expressions like "d1 | d2" and "d1 & d2" would just be confusing and misleading to encounter in the wild.

-- Michael


Chris Angelico

unread,
Mar 6, 2019, 9:06:19 AM3/6/19
to python-ideas
On Thu, Mar 7, 2019 at 12:59 AM Michael Lee <michael....@gmail.com> wrote:
> If your instinct is to assume "+" means "concatenation", then it would be natural to assume that {"a": 1, "b": 2} + {"c": 3, "b": 4} would be identical to {"a": 1, "b": 2, "c": 3, "b": 4} -- literally concat the key-value pairs into a new dict.
>
> But of course, you can't have duplicate keys in Python. So, you would either recall or look up how duplicate keys are handled when constructing a dict and learn that the rule is that the right-most key wins. So the natural conclusion is that "+" would follow this existing rule -- and you end up with exactly the behavior described in the PEP.
>

Which, by the way, is also consistent with assignment:

d = {}; d["a"] = 1; d["b"] = 2; d["c"] = 3; d["b"] = 4

Rightmost one wins. It's the most logical behaviour.

ChrisA

Jonathan Fine

unread,
Mar 6, 2019, 9:53:58 AM3/6/19
to python-ideas
Michael Lee wrote:

> If your instinct is to assume "+" means "concatenation", then it would be natural to assume that {"a": 1, "b": 2} + {"c": 3, "b": 4} would be identical to {"a": 1, "b": 2, "c": 3, "b": 4} -- literally concat the key-value pairs into a new dict.

> But of course, you can't have duplicate keys in Python. So, you would either recall or look up how duplicate keys are handled when constructing a dict and learn that the rule is that the right-most key wins. So the natural conclusion is that "+" would follow this existing rule -- and you end up with exactly the behavior described in the PEP.

This is a nice argument. And well presented. And it gave me surprise,
that taught me something. Here goes:

>>> {'a': 0}
{'a': 0}
>>> {'a': 0, 'a': 0}
{'a': 0}
>>> {'a': 0, 'a': 1}
{'a': 1}
>>> {'a': 1, 'a': 0}
{'a': 0}

This surprised me quite a bit. I was expecting to get an exception. However

>>> dict(a=0)
{'a': 0}
>>> dict(a=0, a=0)
SyntaxError: keyword argument repeated

does give an exception.

I wonder, is this behaviour of {'a': 0, 'a': 1} documented (or tested)
anywhere? I didn't find it in these URLs:
https://docs.python.org/3/library/stdtypes.html#mapping-types-dict
https://docs.python.org/3/tutorial/datastructures.html#dictionaries

I think this behaviour might give rise to gotchas. For example, if we
define inverse_f by
>>> inverse_f = { f(a): a, f(b): b }
then is the next statement always true (assuming a <> b)?
>>> inverse_f[ f(a) ] == a

Well, it's not true with these values
>>> a, b = 1, 2
>>> def f(n): pass # There's a bug here, f(n) should be a bijection.

A quick check that len(inverse) == 2 would provide a sanity check. Or
perhaps better, len(inverse_f) == len(set(a, b)). (I don't have an
example of this bug appearing 'in the wild'.)

Once again, I thank Michael for his nice, instructive and
well-presented example.
--
Jonathan

Inada Naoki

unread,
Mar 6, 2019, 11:30:16 AM3/6/19
to Michael Lee, python-ideas
On Wed, Mar 6, 2019 at 10:59 PM Michael Lee <michael....@gmail.com> wrote:

>
> I think the behavior proposed in the PEP makes sense whether you think of "+" as meaning "concatenation" or "merging".
>
> If your instinct is to assume "+" means "concatenation", then it would be natural to assume that {"a": 1, "b": 2} + {"c": 3, "b": 4} would be identical to {"a": 1, "b": 2, "c": 3, "b": 4} -- literally concat the key-value pairs into a new dict.
>

Nice explanation. You reduced my opposite to `+` by "literally concat".
Better example, {"a": 1, "b": 2} + {"c": 4, "b": 3} == {"a": 1, "b":
2, "c": 4, "b": 3} == {"a": 1, "b": 3, "c": 4}

On the other hand, union of set is also "literally concat". If we use
this "literally concat" metaphor,
I still think set should have `+` as alias to `|` for consistency.

>
> Using "|" would also violate an important existing property of unions: the invariant "d1 | d2 == d2 | d1" is no longer true. As far as I'm aware, the union operation is always taken to be commutative in math, and so I think it's important that we preserve that property in Python. At the very least, I think it's far more important to preserve commutativity of unions then it is to preserve some of the invariants I've seen proposed above, like "len(d1 + d2) == len(d1) + len(d2)".
>

I think both rule are "rather a coincidence than a conscious decision".

I think "|" keeps commutativity only because it's minor than `+`. Easy operator
is abused easily more than minor operator.

And I think every "coincidence" rules are important. They makes
understanding Python easy.
Every people "discover" rules and consistency while learning language.

This is a matter of balance. There are no right answer. Someone
*feel* rule A is important than B.
Someone feel opposite.


> But I do know that I'm a strong -1 on adding set operations to dicts: it's not possible to preserve the existing semantics of union (and intersection) with dict and think expressions like "d1 | d2" and "d1 & d2" would just be confusing and misleading to encounter in the wild.

Hmm. The PEP proposed dict - dict, which is similar to set - set (difference).
To me, {"a": 1, "b": 2} - {"b": 3} = {"a": 1} is confusing than {"a":
1, "b": 2} - {"b"} = {"a": 1}.

So I think borrow some semantics from set is good idea.
Both of `dict - set` and `dict & set` makes sense to me.

* `dict - set` can be used to remove private keys by "blacklist".
* `dict & set` can be used to choose public keys by "whiltelist".

--
Inada Naoki <songof...@gmail.com>

Jonathan Fine

unread,
Mar 6, 2019, 12:10:39 PM3/6/19
to python-ideas
SUMMARY: The outcome of a search for: python dict literal duplicate
keys. No conclusions (so far).

BACKGROUND
In the thread "PEP: Dict addition and subtraction" I wrote

> >>> {'a': 0, 'a': 1}
> {'a': 1}

> I wonder, is this behaviour of {'a': 0, 'a': 1} documented (or tested)
> anywhere? I didn't find it in these URLs:
> https://docs.python.org/3/library/stdtypes.html#mapping-types-dict
> https://docs.python.org/3/tutorial/datastructures.html#dictionaries

LINKS
I've since found some relevant URLs.

[1] https://stackoverflow.com/questions/34539772/is-a-dict-literal-containing-repeated-keys-well-defined
[2] https://help.semmle.com/wiki/display/PYTHON/Duplicate+key+in+dict+literal
[3] https://bugs.python.org/issue26910
[4] https://bugs.python.org/issue16385
[5] https://realpython.com/python-dicts/

ANALYSIS
[1] gives a reference to [6], which correctly states the behaviour of
{'a':0, 'a':1}, although without giving an example. (Aside: Sometimes
one example is worth 50 or more words.)

[2] is from Semmle, who provide an automated code review tool, called
LGTM. The page [2] appears to be part of the documentation for LGTM.
This page provides a useful link to [7].

[3] is a re-opening of [4]. It was rapidly closed by David Murray, who
recommended reopening the discussion on python-ideas.
[4] was raised by Albert Ferras, based on his real-world experience.
In particular, a configuration file that contains a long dict literal.
This was closed by Benjamin Peterson, who said that raising an error
was "out of the question for compatibility isssues". Given few use
case and little support on python-ideas,Terry Ready supported the
closure. Raymond Hettinger supported the closure.

[5] is from RealPython, who provide online tutorials. This page
contains the statement "a given key can appear in a dictionary only
once. Duplicate keys are not allowed." Note that
{'a': 0, 'a': 1}
can reasonably be thought of as a dictionary with duplicate keys.

NOTE
As I recall SGML (this shows my age) allows multiple entity declarations, as in
<!ENTITY key "original">
<!ENTITY key "updated">

And as I recall, in SGML the first value "original" is the one that is
in effect. This is what happens with the LaTeX command
\providecommand.

FURTHER LINKS
[6] https://docs.python.org/3/reference/expressions.html#dictionary-displays
[7] https://cwe.mitre.org/data/definitions/561.html # CWE-561: Dead Code

Jonathan Fine

unread,
Mar 6, 2019, 12:14:02 PM3/6/19
to python-ideas
I wrote:

> I wonder, is this behaviour of {'a': 0, 'a': 1} documented (or tested)
> anywhere?

I've answered my own question here:
[Python-ideas] dict literal allows duplicate keys
https://mail.python.org/pipermail/python-ideas/2019-March/055717.html

Finally, Christopher Barker wrote:
> Yes, and had already been brought up in this thread ( I think by Guido). (Maybe not well documented, but certainly well understood and deliberate)

Thank you for this, Christopher.

Inada Naoki

unread,
Mar 6, 2019, 12:20:13 PM3/6/19
to Jonathan Fine, python-ideas
https://docs.python.org/3/reference/expressions.html#dictionary-displays

> If a comma-separated sequence of key/datum pairs is given, they are evaluated from left to right to define the entries of the dictionary: each key object is used as a key into the dictionary to store the corresponding datum. This means that you can specify the same key multiple times in the key/datum list, and the final dictionary’s value for that key will be the last one given.

--
Inada Naoki <songof...@gmail.com>

Jonathan Fine

unread,
Mar 6, 2019, 12:48:02 PM3/6/19
to python-ideas
SUMMARY: Off-thread-topic comment on examples and words in documentation.

Inada Naoki quoted (from doc.python ref [6] in my original post):

> > If a comma-separated sequence of key/datum pairs is given, they are evaluated from left to right to define the entries of the dictionary: each key object is used as a key into the dictionary to store the corresponding datum. This means that you can specify the same key multiple times in the key/datum list, and the final dictionary’s value for that key will be the last one given.

Indeed. Although off-topic, I think

>>> {'a': 0, 'a': 1} == {'a': 1}
True

is much better than "This means that you can specify the same key


multiple times in the key/datum list, and the final dictionary’s value
for that key will be the last one given."

By the way, today I think we'd say key/value pairs. And I've read

https://www.theguardian.com/guardian-observer-style-guide-d
data takes a singular verb (like agenda), though strictly a plural;
you come across datum, the singular of data, about as often as you
hear about an agendum

Oh, and "the final dictionary's value" should I think be "the
dictionary's final value" or perhaps just "the dictionary's value"

But now we're far from the thread topic. I'm happy to join in on a
thread on improving documentation (by using simpler language and good
examples).

Rhodri James

unread,
Mar 6, 2019, 1:13:21 PM3/6/19
to python...@python.org
On 06/03/2019 17:43, Jonathan Fine wrote:
> Indeed. Although off-topic, I think
>
>>>> {'a': 0, 'a': 1} == {'a': 1}
> True
>
> is much better than "This means that you can specify the same key
> multiple times in the key/datum list, and the final dictionary’s value
> for that key will be the last one given."

I disagree. An example is an excellent thing, but the words are
definitive and must be there.

--
Rhodri James *-* Kynesim Ltd

Rhodri James

unread,
Mar 6, 2019, 1:14:40 PM3/6/19
to python...@python.org
On 06/03/2019 18:12, Rhodri James wrote:
> On 06/03/2019 17:43, Jonathan Fine wrote:
>> Indeed. Although off-topic, I think
>>
>>>>> {'a': 0, 'a': 1} == {'a': 1}
>> True
>>
>> is much better than "This means that you can specify the same key
>> multiple times in the key/datum list, and the final dictionary’s value
>> for that key will be the last one given."
>
> I disagree.  An example is an excellent thing, but the words are
> definitive and must be there.

Sigh. I hit SEND before I finished changing the title. Sorry, folks.

Michael Lee

unread,
Mar 6, 2019, 1:24:38 PM3/6/19
to Inada Naoki, python-ideas
If we use this "literally concat" metaphor, I still think set should have `+` as alias to `|` for consistency.

I agree.

I think "|" keeps commutativity only because it's minor than `+`.

I suppose that's true, fair point.

I guess I would be ok with | no longer always implying commutativity if we were repurposing it for some radically different purpose. But dicts and sets are similar enough that I think having them both use similar but ultimately different definitions of "|" is going to have non-zero cost, especially when reading or modifying future code that makes heavy use of both data structures.

Maybe that cost is worth it. I'm personally not convinced, but I do think it should be taken into account..

Hmm.  The PEP proposed dict - dict, which is similar to set - set (difference).

Now that you point it out, I think I also dislike `d1 - d2` for the same reasons I listed earlier: it's not consistent with set semantics. One other objection I overlooked is that the PEP currently requires both operands to be dicts when doing "d1 - d2" . So doing {"a": 1, "b": 2, "c": 3} - ["a", "b"] is currently disallowed (though doing d1 -= ["a", "b"] is apparently ok).

I can sympathize: allowing "d1 - some_iter" feels a little too magical to me. But it's unfortunately restrictive -- I suspect removing keys stored within a list or something would be just as common of a use-case if not more so then removing keys stored in another dict.

I propose that we instead add methods like "d1.without_keys(...)" and "d1.remove_keys(...)" that can accept any iterable of keys. These two methods would replace "d1.__sub__(...)" and "d1.__isub__(...)" respectively. The exact method names and semantics could probably do with a little more bikeshedding, but I think this idea would remove a false symmetry between "d1 + d2" and "d1 - d2" that doesn't actually really exist while being more broadly useful.

Or I guess we could just remove that restriction: "it feels too magical" isn't a great objection on my part. Either way, that part of the PEP could use some more refinement, I think.

-- Michael






Guido van Rossum

unread,
Mar 6, 2019, 2:00:10 PM3/6/19
to Jonathan Fine, python-ideas
Would it shut down this particular subthread if (as the language's designer, if not its BDFL) I declared that this was an explicit design decision that I made nearly 30 years ago? I should perhaps blog about the background of this decision, but it was quite a conscious one. There really is no point in thinking that this is an accident of implementation or could be changed.
--
--Guido van Rossum (python.org/~guido)

Jonathan Fine

unread,
Mar 6, 2019, 3:52:33 PM3/6/19
to Guido van Rossum, python-ideas
Hi Guido

You wrote:

> Would it shut down this particular subthread if (as the language's designer, if not its BDFL) I declared that this was an explicit design decision that I made nearly 30 years ago? I should perhaps blog about the background of this decision, but it was quite a conscious one. There really is no point in thinking that this is an accident of implementation or could be changed.

Thank you for sharing this with us.

I'd be fascinated to hear about the background to this conscious
decision, and I think it would help me and others understand better
what makes Python what it is. And it might help persuade me that my
surprise at {'a': 0, 'a': 1} is misplaced, or at least exaggerated and
one-sided.

Do you want menial help writing the blog? Perhaps if you share your
recollections, others will find the traces in the source code. For
example, I've found the first dictobject.c, dating back to 1994.
https://github.com/python/cpython/blob/956640880da20c20d5320477a0dcaf2026bd9426/Objects/dictobject.c

I'm a great fan of your Python conversation (with Biancuzzi and Warden) in
http://shop.oreilly.com/product/9780596515171.do # Masterminds of Programming

I've read this article several times, and have wished that it was more
widely available. My personal view is that putting a copy of this
article in docs.python.org would provide more benefit to the community
than you blogging on why dict literals allow duplicate keys. However,
it need not be either/or. Perhaps someone could ask the PSF to talk
with O'Reilly about getting copyright clearance to do this.

Finally, some personal remarks. I've got a long training as a pure
mathematician. For me consistency and application of simple basic
principles is important to me. And also the discovery of basic
principles.

In your interview, you say (paraphrased and without context) that most
Python code is written simply to get a job done. And that pragmatism,
rather then being hung up about theoretical concept, is the
fundamental quality in being proficient in developing with Python.

Thank you for inventing Python, and designing the language. It's a
language popular both with pure mathematicians, and also pragmatic
people who want to get things done. That's quite an achievement, which
has drawn people like me into your community.

with best regards

Greg Ewing

unread,
Mar 6, 2019, 5:31:59 PM3/6/19
to python...@python.org
Ka-Ping Yee wrote:
> len(dict1 + dict2) does not equal len(dict1) + len(dict2), so using the
> + operator is nonsense.

You might as well say that using the + operator on vectors is
nonsense, because len(v1 + v2) is not in general equal to
len(v1) + len(v2).

Yet mathematicians are quite happy to talk about "addition"
of vectors.

--
Greg

Josh Rosenberg

unread,
Mar 6, 2019, 6:52:28 PM3/6/19
to Python-Ideas
On Wed, Mar 6, 2019 at 10:31 PM Greg Ewing <greg....@canterbury.ac.nz> wrote:

You might as well say that using the + operator on vectors is
nonsense, because len(v1 + v2) is not in general equal to
len(v1) + len(v2).

Yet mathematicians are quite happy to talk about "addition"
of vectors.

 
Vectors addition is *actual* addition, not concatenation. You're so busy loosening the definition of + as relates to , to make it make sense for dicts that you've forgotten that + is, first and foremost, about addition in the mathematical sense, where vector addition is just one type of addition. Concatenation is already a minor abuse of +, but one commonly accepted by programmers, thanks to it having some similarities to addition and a single, unambiguous set of semantics to avoid confusion.

You're defending + on dicts because vector addition isn't concatenation already, which only shows how muddled things get when you try to use + to mean multiple concepts that are at best loosely related.

The closest I can come to a thorough definition of what + does in Python (and most languages) right now is that:

1. Returns a new thing of the same type (or a shared coerced type for number weirdness)
2. That combines the information of the input operands
3. Is associative ((a + b) + c produces the same thing as a + (b + c)) (modulo floating point weirdness)
4. Is "reversible": Knowing the end result and *one* of the inputs is sufficient to determine the value of the other input; that is, for c = a + b, knowing any two of a, b and c allows you to determine a single unambiguous value for the remaining value (numeric coercion and floating point weirdness make this not 100%, but you can at least know a value equal to other value; e.g. for c = a + b, knowing c is 5.0 and a is 1.0 is sufficient to say that b is equal to 4, even if it's not necessarily an int or float). For numbers, reversal is done with -; for sequences, it's done by slicing c using the length of a or b to "subtract" the elements that came from a/b.
5. (Actual addition only) Is commutative (modulo floating point weirdness); a + b == b + a
6. (Concatenation only) Is order preserving (really a natural consequence of #4, but a property that people expect)

Note that these rules are consistent across most major languages that allow + to mean combine collections (the few that disagree, like Pascal, don't support | as a union operator).

Concatenation is missing element #5, but otherwise aligns with actual addition. dict merges (and set unions for that matter) violate #4 and #6; for c = a + b, knowing c and either a or b still leaves a literally infinite set of possible inputs for the other input (it's not infinite for sets, where the options would be a subset of the result, but for dicts, there would be no such limitation; keys from b could exist with any possible value in a). dicts order preserving aspect *almost* satisfies #6, but not quite (if 'x' comes after 'y' in b, there is no guarantee that it will do so in c, because a gets first say on ordering, and b gets the final word on value).

Allowing dicts to get involved in + means:

1. Fewer consistent rules apply to +;
2. The particular idiosyncrasies of Python dict ordering and "which value wins" rules are now tied to +. for concatenation, there is only one set of possible rules AFAICT so every language naturally agrees on behavior, but dict merging obviously has many possible rules that would be unlikely to match the exact rules of any other language except by coincidence). a winning on order and b winning on value is a historical artifact of how Python's dict developed; I doubt any other language would intentionally choose to split responsibility like that if they weren't handcuffed by history.

Again, there's nothing wrong with making dict merges easier. But it shouldn't be done by (further) abusing +.

-Josh Rosenberg

Chris Angelico

unread,
Mar 6, 2019, 7:01:26 PM3/6/19
to Python-Ideas
On Thu, Mar 7, 2019 at 10:52 AM Josh Rosenberg
<shadowranger...@gmail.com> wrote:
> The closest I can come to a thorough definition of what + does in Python (and most languages) right now is that:
>
> 1. Returns a new thing of the same type (or a shared coerced type for number weirdness)
> 2. That combines the information of the input operands
> 3. Is associative ((a + b) + c produces the same thing as a + (b + c)) (modulo floating point weirdness)
> 4. Is "reversible": Knowing the end result and *one* of the inputs is sufficient to determine the value of the other input; that is, for c = a + b, knowing any two of a, b and c allows you to determine a single unambiguous value for the remaining value (numeric coercion and floating point weirdness make this not 100%, but you can at least know a value equal to other value; e.g. for c = a + b, knowing c is 5.0 and a is 1.0 is sufficient to say that b is equal to 4, even if it's not necessarily an int or float). For numbers, reversal is done with -; for sequences, it's done by slicing c using the length of a or b to "subtract" the elements that came from a/b.
> 5. (Actual addition only) Is commutative (modulo floating point weirdness); a + b == b + a
> 6. (Concatenation only) Is order preserving (really a natural consequence of #4, but a property that people expect)
>
> Allowing dicts to get involved in + means:
>
> 1. Fewer consistent rules apply to +;
> 2. The particular idiosyncrasies of Python dict ordering and "which value wins" rules are now tied to +. for concatenation, there is only one set of possible rules AFAICT so every language naturally agrees on behavior, but dict merging obviously has many possible rules that would be unlikely to match the exact rules of any other language except by coincidence). a winning on order and b winning on value is a historical artifact of how Python's dict developed; I doubt any other language would intentionally choose to split responsibility like that if they weren't handcuffed by history.
>
> Again, there's nothing wrong with making dict merges easier. But it shouldn't be done by (further) abusing +.

Lots of words that basically say: Stuff wouldn't be perfectly pure.

But adding dictionaries is fundamentally *useful*. It is expressive.
It will, in pretty much all situations, do exactly what someone would
expect, based on knowledge of how Python works in other areas. The
semantics for edge cases have to be clearly defined, but they'll only
come into play on rare occasions; most of the time, for instance, we
don't have to worry about identity vs equality in dictionary keys. If
you tell people "adding two dictionaries combines them, with the right
operand winning collisions", it won't matter that this isn't how lists
or floats work; it'll be incredibly useful as it is.

Practicality. Let's have some.

ChrisA

Ka-Ping Yee

unread,
Mar 7, 2019, 12:38:06 PM3/7/19
to Chris Angelico, Python-Ideas
On Wed, Mar 6, 2019 at 4:01 PM Chris Angelico <ros...@gmail.com> wrote:
On Thu, Mar 7, 2019 at 10:52 AM Josh Rosenberg
<shadowranger...@gmail.com> wrote:
>
> Allowing dicts to get involved in + means:

Lots of words that basically say: Stuff wouldn't be perfectly pure.

But adding dictionaries is fundamentally *useful*. It is expressive.

It is useful.  It's just that + is the wrong name.

Filtering and subtracting from dictionaries are also useful!  Those are operations we do all the time.  It would be useful if & and - did these things too—and if we have & and -, it's going to be even more obvious that the merge operator should have been |.


Josh Rosenberg <shadowranger...@gmail.com> wrote:
If we were inventing programming languages in a vacuum, you could say + can mean "arbitrary combination operator" and it would be fine. But we're not in a vacuum; every major language that uses + with general purpose containers uses it to mean element-wise addition or concatenation, not just "merge".

If we were inventing Python from scratch, we could have decided that we always use "+" to combine collections.  Sets would combine with + and then it would make sense that dictionaries also combine with + .

But that is not Python.  Lists combine with + and sets combine with |.  Why?  Because lists add (put both collections together and keep everything), but sets merge (put both collections together and keep some).

So, Python already has a merge operator.  The merge operator is "|".

For lists, += is shorthand for list.extend().
For sets, |= is shorthand for set.update().

Is dictionary merge more like extend() or more like update()?  Python already took a position on that when it was decided to name the dictionary method update().  That ship sailed a long time ago.


—Ping

James Lu

unread,
Mar 7, 2019, 7:37:47 PM3/7/19
to python...@python.org
Now, this belongs as a separate PEP, and I probably will write one, but I propose:

d1 << d2 makes a copy of d1 and merges d2 into it, and when the keys conflict, d2 takes priority. (Works like copy/update.)

d1 + d2 makes a new dictionary, taking keys from d1 and d2. If d1 and d2 have a different value for same key, a KeyError is thrown.

Stephen J. Turnbull

unread,
Mar 8, 2019, 12:02:28 AM3/8/19
to Python-Ideas
Ka-Ping Yee writes:
> On Wed, Mar 6, 2019 at 4:01 PM Chris Angelico <ros...@gmail.com> wrote:

> > But adding dictionaries is fundamentally *useful*. It is expressive.
>
> It is useful. It's just that + is the wrong name.

First, let me say that I prefer ?!'s position here, so my bias is made
apparent. I'm also aware that I have biases so I'm sympathetic to
those who take a different position.

Rather than say it's "wrong", let me instead point out that I think
it's pragmatically troublesome to use "+". I can think of at least
four interpretations of "d1 + d2"

1. update
2. multiset (~= Collections.Counter addition)
3. addition of functions into the same vector space (actually, a
semigroup will do ;-), and this is the implementation of
Collections.Counter
4. "fiberwise" addition (ie, assembling functions into relations)

and I'm very jet-lagged so I may be missing some.

Since "|" (especially "|=") *is* suitable for "update", I think we
should reserve "+" for some alternative future commutative extension,
of which there are several possible (all of 2, 3, 4 are commutative).

Again in the spirit of full disclosure, of those above, 2 is already
implemented and widely used, so we don't need to use "+" for that.
I've never seen 4 except in the mathematical literature (union of
relations is not the same thing). 3, however, is very common both for
mappings with small domain and sparse representation of mappings with
a default value (possibly computed then cached), and "|" is not
suitable for expressing that sort of addition (I'm willing to say it's
"wrong" :-).

There's also the fact that the operations denoted by "|" and "||" are
often implemented as "short-circuiting", and therefore not
commutative, while "+" usually is (and that's reinforced for
mathematicians who are trained to think of "+" as the operator for
Abelian groups, while "*" is a (possibly) non-commutative operator. I
know commutativity of "+" has been mentioned before, but the
non-commutativity of "|" -- and so unsuitability for many kinds of
dict combination -- hasn't been emphasized before IIRC.

Steve

Stephen J. Turnbull

unread,
Mar 8, 2019, 12:12:17 AM3/8/19
to Ka-Ping Yee, Python-Ideas
Ka-Ping Yee writes:
> On Wed, Mar 6, 2019 at 4:01 PM Chris Angelico <ros...@gmail.com> wrote:

> > But adding dictionaries is fundamentally *useful*. It is expressive.
>
> It is useful. It's just that + is the wrong name.

First, let me say that I prefer ?!'s position here, so my bias is made
apparent. I'm also aware that I have biases so I'm sympathetic to
those who take a different position.

Rather than say it's "wrong", let me instead point out that I think
it's pragmatically troublesome to use "+". I can think of at least
four interpretations of "d1 + d2"

1. update
2. multiset (~= Collections.Counter addition)
3. addition of functions into the same vector space (actually, a
semigroup will do ;-), and this is the implementation of
Collections.Counter
4. "fiberwise" set addition (ie, of functions into relations)

and I'm very jet-lagged so I may be missing some.

There's also the fact that the operations denoted by "|" and "||" are
often implemented as "short-circuiting", and therefore not
commutative, while "+" usually is (and that's reinforced for
mathematicians who are trained to think of "+" as the operator for
Abelian groups, while "*" is a (possibly) non-commutative operator. I
know commutativity of "+" has been mentioned before, but the
non-commutativity of "|" -- and so unsuitability for many kinds of
dict combination -- hasn't been emphasized before IIRC.

Since "|" (especially "|=") *is* suitable for "update", I think we
should reserve "+" for some future commutative extension.

In the spirit of full disclosure:
Of these, 2 is already implemented and widely used, so we don't need
to use dict.__add__ for that. I've never seen 4 in the mathematical
literature (union of relations is not the same thing). 3, however, is
very common both for mappings with small domain and sparse
representation of mappings with a default value (possibly computed
then cached), and "|" is not suitable for expressing that sort of
addition (I'm willing to say it's "wrong" :-).

Steve

João Matos

unread,
Mar 8, 2019, 11:25:28 AM3/8/19
to Steven D'Aprano, python...@python.org
Hello,

I've just read your PEP 585 draft and have some questions.
When you say
"

Like the merge operator and list concatenation, the difference operator requires both operands to be dicts, while the augmented version allows any iterable of keys.

>>> d - {'spam', 'parrot'}
Traceback (most recent call last):
  ...
TypeError: cannot take the difference of dict and set
>>> d -= {'spam', 'parrot'}
>>> print(d)
{'eggs': 2, 'cheese': 'cheddar'}
>>> d -= [('spam', 999)]
>>> print(d)
{'spam': 999, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}
 
"

The option d -= {'spam', 'parrot'} where parrot does not exist in the d dict, will raise an exception (eg. KeyNotFound) or be silent?

The option d -= [('spam', 999)] should remove the pair from the dict, correct? But the print that follows still shows it there. It's a mistake or am I missing something?


Best regards,

João Matos

Guido van Rossum

unread,
Mar 8, 2019, 11:57:06 AM3/8/19
to Stephen J. Turnbull, Python-Ideas
On Thu, Mar 7, 2019 at 9:12 PM Stephen J. Turnbull <turnbull....@u.tsukuba.ac.jp> wrote:
Ka-Ping Yee writes:
 > On Wed, Mar 6, 2019 at 4:01 PM Chris Angelico <ros...@gmail.com> wrote:

 > > But adding dictionaries is fundamentally *useful*. It is expressive.
 >
 > It is useful.  It's just that + is the wrong name.

First, let me say that I prefer ?!'s position here, so my bias is made
apparent.  I'm also aware that I have biases so I'm sympathetic to
those who take a different position.

TBH, I am warming up to "|" as well.
 
Rather than say it's "wrong", let me instead point out that I think
it's pragmatically troublesome to use "+".  I can think of at least
four interpretations of "d1 + d2"

1.  update
2.  multiset (~= Collections.Counter addition)

I guess this explains the behavior of removing results <= 0; it makes sense as multiset subtraction, since in a multiset a negative count makes little sense. (Though the name Counter certainly doesn't seem to imply multiset.)
 
3.  addition of functions into the same vector space (actually, a
    semigroup will do ;-), and this is the implementation of
    Collections.Counter
4.  "fiberwise" set addition (ie, of functions into relations)

and I'm very jet-lagged so I may be missing some.

There's also the fact that the operations denoted by "|" and "||" are
often implemented as "short-circuiting", and therefore not
commutative, while "+" usually is (and that's reinforced for
mathematicians who are trained to think of "+" as the operator for
Abelian groups, while "*" is a (possibly) non-commutative operator.  I
know commutativity of "+" has been mentioned before, but the
non-commutativity of "|" -- and so unsuitability for many kinds of
dict combination -- hasn't been emphasized before IIRC.

I've never heard of single "|" being short-circuiting. ("||" of course is infamous for being that in C and most languages derived from it.)

And "+" is of course used for many non-commutative operations in Python (e.g. adding two lists/strings/tuples together). It is only *associative*, a weaker requirement that just says (A + B) + C == A + (B + C). (This is why we write A + B + C, since the grouping doesn't matter for the result.)

Anyway, while we're discussing mathematical properties, and since SETL was briefly mentioned, I found an interesting thing in math. For sets, union and intersection are distributive over each other. I can't type the operators we learned in high school, so I'll use Python's set operations. We find that A | (B & C) == (A | B) & (A | C). We also find that A & (B | C) == (A & B) | (A & C).

Note that this is *not* the case for + and * when used with (mathematical) numbers: * distributes over +: a * (b + c) == (a * b) + (a * c), but + does not distribute over *: a + (b * c) != (a + b) * (a + c). So in a sense, SETL (which uses + and * for union and intersection) got the operators wrong.

Note that in Python, + and * for sequences are not distributive this way, since (A + B) * n is not the same as (A * n) + (B * n). OTOH A * (n + m) == A * n + A * m. (Assuming A and B are sequences of the same type, and n and m are positive integers.)

If we were to use "|" and "&" for dict "union" and "intersection", the mutual distributive properties will hold.
 
Since "|" (especially "|=") *is* suitable for "update", I think we
should reserve "+" for some future commutative extension.

One argument is that sets have an update() method aliased to "|=", so this makes it more reasonable to do the same for dicts, which also have a. update() method, with similar behavior (not surprising, since sets were modeled after dicts).
 
In the spirit of full disclosure:
Of these, 2 is already implemented and widely used, so we don't need
to use dict.__add__ for that.  I've never seen 4 in the mathematical
literature (union of relations is not the same thing).  3, however, is
very common both for mappings with small domain and sparse
representation of mappings with a default value (possibly computed
then cached), and "|" is not suitable for expressing that sort of
addition (I'm willing to say it's "wrong" :-).

MRAB

unread,
Mar 8, 2019, 2:20:28 PM3/8/19
to python...@python.org
On 2019-03-08 16:55, Guido van Rossum wrote:
[snip]
> If we were to use "|" and "&" for dict "union" and "intersection", the
> mutual distributive properties will hold.
>
> Since "|" (especially "|=") *is* suitable for "update", I think we
> should reserve "+" for some future commutative extension.
>
>
> One argument is that sets have an update() method aliased to "|=", so
> this makes it more reasonable to do the same for dicts, which also have
> a. update() method, with similar behavior (not surprising, since sets
> were modeled after dicts).
>
[snip]
One way to think of it is that a dict is like a set, except that each of
its members has an additional associated value.

Greg Ewing

unread,
Mar 8, 2019, 6:33:21 PM3/8/19
to Python-Ideas
Guido van Rossum wrote:
> I guess this explains the behavior of removing results <= 0; it makes
> sense as multiset subtraction, since in a multiset a negative count
> makes little sense. (Though the name Counter certainly doesn't seem to
> imply multiset.)

It doesn't even behave consistently as a multiset, since c[k] -= n
is happy to let the value go negative.

> For sets,
> union and intersection are distributive over each other.

> Note that this is *not* the case for + and * when used with
> (mathematical) numbers... So in a sense, SETL (which uses + and *
> for union and intersection got the operators wrong.

But in another sense, it didn't. In Boolean algebra, "and" and "or"
(which also distribute over each other) are often written using the
same notations as multiplication and addition. There's no rule in
mathematics saying that these notations must be distributive in one
direction but not the other.

--
Greg

Guido van Rossum

unread,
Mar 8, 2019, 7:09:09 PM3/8/19
to Greg Ewing, Python-Ideas
On Fri, Mar 8, 2019 at 3:33 PM Greg Ewing <greg....@canterbury.ac.nz> wrote:
Guido van Rossum wrote:
> I guess this explains the behavior of removing results <= 0; it makes
> sense as multiset subtraction, since in a multiset a negative count
> makes little sense. (Though the name Counter certainly doesn't seem to
> imply multiset.)

It doesn't even behave consistently as a multiset, since c[k] -= n
is happy to let the value go negative.

> For sets,
> union and intersection are distributive over each other.

> Note that this is *not* the case for + and * when used with
> (mathematical) numbers... So in a sense, SETL (which uses + and *
 > for union and intersection got the operators wrong.

But in another sense, it didn't. In Boolean algebra, "and" and "or"
(which also distribute over each other) are often written using the
same notations as multiplication and addition. There's no rule in
mathematics saying that these notations must be distributive in one
direction but not the other.

I guess everybody's high school math(s) class was different. I don't ever recall seeing + and * for boolean OR/AND; we used ∧ and ∨.

I learned | and & for set operations only after I learned programming; I think it was in PL/1. But of course it stuck because of C bitwise operators (which are also boolean OR/AND and set operations).

This table suggests there's a lot of variety in how these operators are spelled:

Greg Ewing

unread,
Mar 8, 2019, 8:23:32 PM3/8/19
to Python-Ideas
Guido van Rossum wrote:
> I guess everybody's high school math(s) class was different. I don't
> ever recall seeing + and * for boolean OR/AND; we used ∧ and ∨.

Boolean algebra was only touched on briefly in my high school
years. I can't remember exactly what notation was used, but it
definitely wasn't ∧ and ∨ -- I didn't encounter those until
much later.

However, I've definitely seen texts on boolean alegbra in
relation to logic circuits that write 'A and B' as 'AB',
and 'A or B' as 'A + B'. (And also use an overbar for
negation instead of the mathematical ¬).

Maybe it depends on whether you're a mathematician or an
engineer? The multiplication-addition notation seems a lot
more readable when you have a complicated boolean expression,
so I can imagine it being favoured by pragmatic engineering
type people.

James Edwards

unread,
Mar 8, 2019, 11:50:46 PM3/8/19
to Python-Ideas
My understanding is that:

 - (Q1) Attempting to discard a key not in the target of the augmented assignment would not raise a KeyError (or any Exception for that matter).  This is analogous to how the - operator works on sets and is consistent with the pure python implementation towards the bottom of the PEP.
 - (Q2) This one got me as well while implementing the proposal in cpython, but there is a difference in what "part" of the RHS the operators "care about" if the RHS isn't a dict.  The += operator expects 2-tuples and will treat them as (key, value) pairs.  The -= operator doesn't attempt to unpack the RHS's elements as += does and expects keys.  So d -= [('spam', 999)] treated the tuple as a key and attempted to discard it.

IOW,

    d = {
        'spam': 999,
        ('spam', 999): True
    }
    d -= [('spam', 999)]

Would discard the key ('spam', 999) and corresponding value True.

Which highlights a possibly surprising incongruence between the operators:

    d = {}
    update = [(1,1), (2,2), (3,3)]
    d += update
    d -= update
    assert d == {}  # will raise, as d still has 3 items

Similarly,

    d = {}
    update = {1:1, 2:2, 3:3}
    d += update.items()
    d -= update.items()
    assert d == {}  # will raise, for the same reason
    d -= update.keys()
    assert d == {}  # would pass without issue

That being said I (personally) wouldn't consider it a deal-breaker and still would very much appreciate of the added functionality (regardless of the choice of operator).

- Jim



Steven D'Aprano

unread,
Mar 9, 2019, 11:56:19 AM3/9/19
to python...@python.org
Thanks to everyone who has contributed to the discussion, I have been
reading all the comments even if I haven't responded.

I'm currently working on an update to the PEP which will, I hope,
improve some of the failings of the current draft.


--
Steven

Stephan Hoyer

unread,
Mar 9, 2019, 2:40:53 PM3/9/19
to Steven D'Aprano, python...@python.org
Would __iadd__ and __isub__ be added to collections.abc.MutableMapping?

This would be consistent with other infix operations on mutable ABCs, but could potentially break backwards compatibility for anyone who has defined a MutableMapping subclass that implements __add__ but not __iadd__.

Steven D'Aprano

unread,
Mar 9, 2019, 6:25:37 PM3/9/19
to python...@python.org
On Sat, Mar 09, 2019 at 11:39:39AM -0800, Stephan Hoyer wrote:
> Would __iadd__ and __isub__ be added to collections.abc.MutableMapping?

No, that will not be part of the PEP. The proposal is only to change
dict itself. If people want to add this to MutableMapping, that could be
considered seperately.

Chris Barker via Python-ideas

unread,
Mar 12, 2019, 7:42:00 PM3/12/19
to Python-Ideas
Just in case I'm not the only one that had a hard time finding the latest version of this PEP, here it is in the PEPS Repo:


-CHB



--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris....@noaa.gov

Antoine Pitrou

unread,
Mar 15, 2019, 7:21:29 AM3/15/19
to python...@python.org
On Wed, 6 Mar 2019 00:46:57 +0000
Josh Rosenberg
<shadowranger...@gmail.com>
wrote:
>
> Overloading + lacks the clear descriptive aspect of update that describes
> the goal of the operation, and contradicts conventions (in Python and
> elsewhere) about how + works (addition or concatenation, and a lot of
> people don't even like it doing the latter, though I'm not that pedantic).
>
> A couple "rules" from C++ on overloading are "*Whenever the meaning of an
> operator is not obviously clear and undisputed, it should not be
> overloaded.* *Instead, provide a function with a well-chosen name.*"
> and "*Always
> stick to the operator’s well-known semantics".* (Source:
> https://stackoverflow.com/a/4421708/364696 , though the principle is
> restated in many other places).

Agreed with this. What is so useful exactly in this new dict operator
that it hasn't been implemented, say, 20 years ago? I rarely find
myself merging dicts and, when I do, calling dict.update() is entirely
acceptable (I think the "{**d}" notation was already a mistake, making
a perfectly readable operation more cryptic simply for the sake of
saving a few keystrokes).

Built-in operations should be added with regard to actual user needs
(such as: a first-class notation for matrix multiplication, making
formulas easier to read and understand), not a mere "hmm this might
sometimes be useful".


Besides, if I have two dicts with e.g. lists as values, I *really*
dislike the fact that the + operator will clobber the values rather than
concatenate them. It's a recipe for confusion.

Regards

Antoine.

Antoine Pitrou

unread,
Mar 15, 2019, 7:26:51 AM3/15/19
to python...@python.org
On Mon, 4 Mar 2019 16:02:06 +0100
Stefan Behnel <stef...@behnel.de> wrote:
> INADA Naoki schrieb am 04.03.19 um 11:15:
> > Why statement is not enough?
>
> I'm not sure I understand why you're asking this, but a statement is "not
> enough" because it's a statement and not an expression.

This is an argument for Perl 6, not for Python.
It is loading more messages.
0 new messages