<dict_addition_pep.txt>
I’ve never been part of this process before, but I’m interested in learning and helping any way I can.
My addition implementation is attached to the bpo, and I’m working today on bringing it in line with the PEP in its current form (specifically, subtraction operations).
def __add__(self, other):
if isinstance(other, dict):
new = dict.copy(self)
dict.update(new, other)
return new
return NotImplemented
def __radd__(self, other):
if isinstance(other, dict):
new = dict.copy(other)
dict.update(other, self)
return new
return NotImplemented
def __iadd__(self, other):
if isinstance(other, dict):
dict.update(self, other)
return self
return NotImplemented
This is what my C looks like right now. We can choose to update these semantics to be "nicer" to subclasses, but I don't see any precedent for it (lists, sets, strings, etc.).
Brandt
--
---
You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/jq5QVTt3CAI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to python-ideas...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
_______________________________________________
Python-ideas mailing list
Python...@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/
--
---
You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/jq5QVTt3CAI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to python-ideas...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
This way, d1 + d2 isn’t just another obvious way to do {**d1, **d2}. The second syntax makes it clear that a new dictionary is being constructed and that d2 overrides keys from d1.
One can reasonably expect or imagine a situation where a section of code that expects to merge two dictionaries with non-conflicting keys commits a semantic error if it merges two dictionaries with conflicting keys.
To better explain, imagine a program where options is a global variable storing parsed values from the command line.
def verbose_options():
if options.quiet
return {'verbose': True}
def quiet_options():
if options.quiet:
return {'verbose': False}
If we were to define an options() function, return {**quiet_options(), **verbose_options()} implies that verbose overrules quiet; whereas return quiet_options() + verbose_options() implies that verbose and quiet cannot be used simultaneously. I am not aware of another easy way in Python to merge dictionaries while checking for non-conflicting keys.
Compare:
def settings():
return {**quiet_options(), **verbose_options()}
def settings():
try:
return quiet_options() + verbose_options()
except KeyError:
print('conflicting options used', sys.stderr')
sys.exit(1)
***
This is a simple scenario, but you can imagine more complex ones as well. Does —quiet-stage-1 loosen —verbose? Does —quiet-stage-1 conflict with —verbose-stage-1?Does —verbosity=5 override —verbosity=4 or cause an error?
Having {**, **} and + do different things provides a convenient and Pythonic way to model such relationships in code. Indeed, you can even combine the two syntaxes in the same expression to show a mix of overriding and exclusionary behavior.
Anyways, I think it’s a good idea to have this semantic difference in behavior so Python developers have a good way to communicate what is expected of the two dictionaries being merged inside the language. This is like an assertion without
Again, I propose that the + sign merge two python dictionaries such that if there are conflicting keys, a KeyError is thrown, because such “non-conflicting merge” behavior would be useful in Python. It gives clarifying power to the + sign. The + and the {**, **} should serve different roles.
In other words, explicit + is better than implicit {**, **#, unless explicitly suppressed. Here + is explicit whereas {**, **} is implicitly allowing inclusive keys, and the KeyError is expressed suppressed by virtue of not using the {**, **} syntax. People expect the + operator to be commutative, while the {**, **} syntax prompts further examination by virtue of its “weird” syntax.
Hi,
I'm not old on this list but every time there is a proposal, the answer is "what are you trying to solve ?".
Since
z = {**x, **y} and z.update(y)
Exists, I can"t find the answer.
The discussions on this list show that the behavior of `+` operator with
dict will never be obvious (first wins or second wins or add results or
raise Exception). So the user will always have to look at the doc or
test it to know the intended behavior.
That said, [1,2] + [3] equals [1,2,3] but not[1,2, [3]] and that was
not obvious to me, and I survived.
On Mar 4, 2019, at 4:51 AM, Stefan Behnel <stef...@behnel.de> wrote:
Jimmy Girardet schrieb am 04.03.19 um 10:12:I'm not old on this list but every time there is a proposal, the answeris "what are you trying to solve ?".Since|z ={**x,**y} and z.update(y) Exists, I can"t find the answer.
I think the main intentions is to close a gap in the language.
[1,2,3] + [4,5,6]
works for lists and tuples,
{1,2,3} | {4,5,6}
works for sets, but joining two dicts isn't simply
{1:2, 3:4} + {5:6}
but requires either some obscure syntax or a statement instead of a simple
expression.
The proposal is to enable the obvious syntax for something that should be
obvious.
> How many situations would you need to make a copy of a dictionary and
> then update that copy and override old keys from a new dictionary?
Very frequently.
That's why we have a dict.update method, which if I remember correctly,
was introduced in Python 1.5 because people were frequently re-inventing
the same wheel:
def update(d1, d2):
for key in d2.keys():
d1[key] in d2[key]
You should have a look at how many times it is used in the standard
library:
[steve@ando cpython]$ cd Lib/
[steve@ando Lib]$ grep -U "\.update[(]" *.py */*.py | wc -l
373
Now some of those are false positives (docstrings, comments, non-dicts,
etc) but that still leaves a lot of examples of wanting to override old
keys. This is a very common need. Wanting an exception if the key
already exists is, as far as I can tell, very rare.
It is true that many of the examples in the std lib involve updating an
existing dict, not creating a new one. But that's only to be expected:
since Python didn't provide an obvious functional version of update,
only an in-place version, naturally people get used to writing
in-place code.
(Think about how long we made do without sorted(). I don't know about
other people, but I now find sorted indispensible, and probably use it
ten or twenty times more often than the in-place version.)
[...]
> The KeyError of my proposal is a feature, a sign that something is
> wrong, a sign an invariant is being violated.
Why is "keys are unique" an invariant?
The PEP gives a good example of when this "invariant" would be
unnecessarily restrictive:
For example, updating default configuration values with
user-supplied values would most often fail under the
requirement that keys are unique::
prefs = site_defaults + user_defaults + document_prefs
Another example would be when reading command line options, where the
most common convention is for "last option seen" to win:
[steve@ando Lib]$ grep --color=always --color=never "zero" f*.py
fileinput.py: numbers are zero; nextfile() has no effect.
fractions.py: # the same way for any finite a, so treat a as zero.
functools.py: # prevent their ref counts from going to zero during
and the output is printed without colour.
(I've slightly edited the above output so it will fit in the email
without wrapping.)
The very name "update" should tell us that the most useful behaviour is
the one the devs decided on back in 1.5: have the last seen value win.
How can you update values if the operation raises an error if the key
already exists? If this behaviour is ever useful, I would expect that it
will be very rare.
An update or merge is effectively just running through a loop setting
the value of a key. See the pre-Python 1.5 function above. Having update
raise an exception if the key already exists would be about as useful as
having ``d[key] = value`` raise an exception if the key already exists.
Unless someone can demonstrate that the design of dict.update() was a
mistake, and the "require unique keys" behaviour is more common, then
I maintain that for the very rare cases you want an exception, you can
subclass dict and overload the __add__ method:
# Intentionally simplified version.
def __add__(self, other):
if self.keys() & other.keys():
raise KeyError
return super().__add__(self, other)
> The ugliness of the syntax makes one pause
> and think and ask: “Why is it important that the keys from this
> dictionary override the ones from another dictionary?”
Because that is the most common and useful behaviour. That's what it
means to *update* a dict or database, and this proposal is for an update
operator.
The ugliness of the existing syntax is not a feature, it is a barrier.
--
Steven
> the augmented assignment version allows anything the ``update`` method allows, such as iterables of key/value pairs
I am a little surprised by this choice.
First, this means that "a += b" would not be equivalent to "a = a +
b". Is there other built-in types which act differently if called with
the operator or augmented assignment version?
Secondly, that would imply I would no longer be able to infer the type
of "a" while reading "a += [('foo', 'bar')]". Is it a list? A dict?
Those two points make me uncomfortable with "+=" strictly behaving
like ".update()".
On Mon, Mar 04, 2019 at 10:01:23AM -0500, James Lu wrote:
If you want to merge it without a KeyError, learn and use the more explicit {**d1, **d2} syntax.
In your previous email, you said the {**d ...} syntax was implicit:
In other words, explicit + is better than implicit {**, **#, unless
explicitly suppressed. Here + is explicit whereas {**, **} is
implicitly allowing inclusive keys, and the KeyError is expressed
suppressed by virtue of not using the {**, **} syntax.
It is difficult to take your "explicit/implicit" argument seriously when
you cannot even decided which is which.
In your previous email, you said the {**d ...} syntax was implicit:
In other words, explicit + is better than implicit {**, **#, unless
explicitly suppressed. Here + is explicit whereas {**, **} is
implicitly allowing inclusive keys, and the KeyError is expressed
suppressed by virtue of not using the {**, **} syntax.
It is difficult to take your "explicit/implicit" argument seriously when
you cannot even decided which is which.
How many situations would you need to make a copy of a dictionary andthen update that copy and override old keys from a new dictionary?
Very frequently.
That's why we have a dict.update method, which if I remember correctly,
was introduced in Python 1.5 because people were frequently re-inventing
the same wheel:
def update(d1, d2):
for key in d2.keys():
d1[key] in d2[key]
You should have a look at how many times it is used in the standard
library:
[steve@ando cpython]$ cd Lib/
[steve@ando Lib]$ grep -U "\.update[(]" *.py */*.py | wc -l
373
Now some of those are false positives (docstrings, comments, non-dicts,
etc) but that still leaves a lot of examples of wanting to override old
keys. This is a very common need. Wanting an exception if the key
already exists is, as far as I can tell, very rare.
It is true that many of the examples in the std lib involve updating an
existing dict, not creating a new one. But that's only to be expected:
since Python didn't provide an obvious functional version of update,
only an in-place version, naturally people get used to writing
in-place code.
> and the output is printed without colour.
>
> (I've slightly edited the above output so it will fit in the email
> without wrapping.)
>
> The very name "update" should tell us that the most useful behaviour is
> the one the devs decided on back in 1.5: have the last seen value win.
> How can you update values if the operation raises an error if the key
> already exists? If this behaviour is ever useful, I would expect that it
> will be very rare.
> An update or merge is effectively just running through a loop setting
> the value of a key. See the pre-Python 1.5 function above. Having update
> raise an exception if the key already exists would be about as useful as
> having ``d[key] = value`` raise an exception if the key already exists.
>
> Unless someone can demonstrate that the design of dict.update() was a
> mistake
You’re making a logical mistake here. + isn’t supposed to have .update’s behavior and it never was supposed to.
> , and the "require unique keys" behaviour is more common,
I just have. 99% of the time you want to have keys from one dict override another, you’d be better off doing it in-place and so would be using .update() anyways.
> then
> I maintain that for the very rare cases you want an exception, you can
> subclass dict and overload the __add__ method:
Well, yes, the whole point is to define the best default behavior.
It’s odd that RHS values are not used at all, period. Further, there’s no precedent for bulk sequence/mapping removals like this... except for sets, for which it is certainly justified.
I’ve had the opportunity to play around with my reference implementation over the last few days, and despite my initial doubts, I have *absolutely* fallen in love with dictionary addition — I even accidentally tried to += two dictionaries at work on Friday (a good, but frustrating, sign). For context, I was updating a module-level mapping with an imported one, a use case I hadn’t even previously considered.
I have tried to fall in love with dict subtraction the same way, but every code sketch/test I come up with feels contrived and hack-y. I’m indifferent towards it, at best.
TL;DR: I’ve lived with both for a week. Addition is now habit, subtraction is still weird.
> Nice branch name! :)
I couldn’t help myself.
Brandt
I think the main intentions is to close a gap in the language.
[1,2,3] + [4,5,6]
works for lists and tuples,
{1,2,3} | {4,5,6}
works for sets, but joining two dicts isn't simply
{1:2, 3:4} + {5:6}
but requires either some obscure syntax or a statement instead of a simple
expression.
The proposal is to enable the obvious syntax for something that should be
obvious.
Does anyone have an example of another programming language thatallows for addition of dictionaries/mappings?
On Tue, Mar 5, 2019 at 7:26 PM Steven D'Aprano <st...@pearwood.info> wrote:
>
> On Sat, Mar 02, 2019 at 01:47:37AM +0900, INADA Naoki wrote:
> > > If the keys are not strings, it currently works in CPython, but it may not work with other implementations, or future versions of CPython[2].
> >
> > I don't think so. https://bugs.python.org/issue35105 and
> > https://mail.python.org/pipermail/python-dev/2018-October/155435.html
> > are about kwargs. I think non string keys are allowed for {**d1,
> > **d2} by language.
>
> Is this documented somewhere?
It is not explicitly documented. But unlike keyword argument,
dict display supported non-string keys from very old.
I believe {3: 4} is supported by Python language, not CPython
implementation behavior.
https://docs.python.org/3/reference/expressions.html#grammar-token-dict-display
On a related note: **kwargs, should they support arbitrary strings as keys? I depend on this behavior in production code and all python implementations handle it.
These semantics are intended to match those of update as closely as possible. For the dict built-in itself, calling keys is redundant as iteration over a dict iterates over its keys; but for subclasses or other mappings, update prefers to use the keys method.The above paragraph may be inaccurate. Although the dict docstring states that keys will be called if it exists, this does not seem to be the case for dict subclasses. Bug or feature?
Attached is a draft PEP on adding + and - operators to dict for
discussion.
This should probably go here:
https://github.com/python/peps
but due to technical difficulties at my end, I'm very limited in what I
can do on Github (at least for now). If there's anyone who would like to
co-author and/or help with the process, that will be appreciated.
--
Steven
Should our __sub__ behavior be the same...
> I propose that the + sign merge two python dictionaries such that if
> there are conflicting keys, a KeyError is thrown.
This proposal is for a simple, operator-based equivalent to
dict.update() which returns a new dict. dict.update has existed since
Python 1.5 (something like a quarter of a century!) and never grown a
"unique keys" version.
I don't recall even seeing a request for such a feature. If such a
unique keys version is useful, I don't expect it will be useful often.
> This way, d1 + d2 isn’t just another obvious way to do {**d1, **d2}.
One of the reasons for preferring + is that it is an obvious way to do
something very common, while {**d1, **d2} is as far from obvious as you
can get without becoming APL or Perl :-)
If I needed such a unique key version of update, I'd use a subclass:
class StrictDict(dict):
def __add__(self, other):
if isinstance(other, dict) and (self.keys() & other.keys()):
raise KeyError('non-unique keys')
return super().__add__(self, other)
# and similar for __radd__.
rather than burden the entire language, and every user of it, with
having to learn the subtle difference between the obvious + operator and
the error-prone and unobvious trick of {*d1, *d2}.
( Did you see what I did there? *wink* )
> The second syntax makes it clear that a new dictionary is being
> constructed and that d2 overrides keys from d1.
Only because you have learned the rule that {**d, **e) means to
construct a new dict by merging, with the rule that in the event of
duplicate keys, the last key seen wins. If you hadn't learned that rule,
there is nothing in the syntax which would tell you the behaviour. We
could have chosen any rule we liked:
- raise an exception, like you get a TypeError if you pass the
same keyword argument to a function twice: spam(foo=1, foo=2);
- first value seen wins;
- last value seen wins;
- random value wins;
- anything else we liked!
There is nothing "clear" about the syntax which makes it obvious which
behaviour is implemented. We have to learn it.
> One can reasonably expect or imagine a situation where a section of
> code that expects to merge two dictionaries with non-conflicting keys
> commits a semantic error if it merges two dictionaries with
> conflicting keys.
I can imagine it, but I don't think I've ever needed it, and I can't
imagine wanting it often enough to wish it was not just a built-in
function or method, but actual syntax.
Do you have some real examples of wanting an error when trying to update
a dict if keys match?
> To better explain, imagine a program where options is a global
> variable storing parsed values from the command line.
>
> def verbose_options():
> if options.quiet
> return {'verbose': True}
>
> def quiet_options():
> if options.quiet:
> return {'verbose': False}
That seems very artifical to me. Why not use a single function:
def verbose_options(): # There's more than one?
return {'verbose': not options.quiet}
The way you have written those functions seems weird to me. You already
have a nice options object, with named fields like "options.quiet", why
are you turning it into not one but *two* different dicts, both
reporting the same field?
And its buggy: if options.quiet is True, then the key 'quiet'
should be True, not the 'verbose' key.
Do you have *two* functions for every preference setting that takes a
true/false flag?
What do you do for preference settings that take multiple values? Create
a vast number of specialised functions, one for each possible value?
def A4_page_options():
if options.page_size == 'A4':
return {'page_size': 'A4'}
def US_Letter_page_options():
if options.page_size == 'US Letter':
return {'page_size': 'US Letter'}
page_size = (
A4_page_options() + A3_page_options() + A5_page_options()
+ Foolscape_page_options + Tabloid_page_options()
+ US_Letter_page_options() + US_Legal_page_options()
# and about a dozen more...
)
The point is, although I might be wrong, I don't think that this example
is a practical, realistic use-case for a unique keys version of update.
To me, your approach seems so complicated and artificial that it seems
like it was invented specifically to justify this "unique key" operator,
not something that we would want to write in real life.
But even if it real code, the question is not whether it is EVER useful
for a dict update to raise an exception on matching keys. The question
is whether this is so often useful that this is the behaviour we want to
make the default for dicts.
[...]
> Again, I propose that the + sign merge two python dictionaries such
> that if there are conflicting keys, a KeyError is thrown, because such
> “non-conflicting merge” behavior would be useful in Python.
I don't think it would be, at least not often.
If it were common enough to justify a built-in operator to do this, we
would have had many requests for a dict.unique_update or similar by now,
and I don't think we have.
> It gives
> clarifying power to the + sign. The + and the {**, **} should serve
> different roles.
>
> In other words, explicit + is better than implicit {**, **#, unless
> explicitly suppressed. Here + is explicit whereas {**, **} is
> implicitly allowing inclusive keys,
If I had a cent for every time people misused "explicit" to mean "the
proposal that I like", I'd be rich.
In what way is the "+" operator *explicit* about raising an exception on
duplicate keys? These are both explicit:
merge_but_raise_exception_if_any_duplicates(d1, d2)
merge(d1, d2, raise_if_duplicates=True)
and these are both equally implicit:
d1 + d2
{**d1, **d2}
since the behaviour on duplicates is not explicitly stated in clear and
obvious language, but implied by the rules of the language.
[...]
> People expect the + operator to be commutative
THey are wrong to expect that, because the + operator is already not
commutative for:
str
bytes
bytearray
list
tuple
array.array
collections.deque
collections.Counter
and possibly others.
> > Another example would be when reading command line options, where the
> > most common convention is for "last option seen" to win:
> >
> > [steve@ando Lib]$ grep --color=always --color=never "zero" f*.py
[...]
> Indeed, in this case you would want to use {**, **} syntax.
No I would NOT want to use the {**, **} syntax, because it is ugly.
That's why people ask for + instead. (Or perhaps I should say "as well
as" since the double-star syntax is not going away.)
[...]
> > Unless someone can demonstrate that the design of dict.update() was a
> > mistake
>
> You’re making a logical mistake here. + isn’t supposed to have
> .update’s behavior and it never was supposed to.
James, I'm the author of the PEP, and for the purposes of the proposal,
the + operator is supposed to do what I say it is supposed to do.
You might be able to persuade me to change the PEP, if you have a
sufficiently good argument, or you can write your own counter PEP making
a different choice, but please don't tell me what I intended. I know
what I intended, and it is for + to have the same last-key-wins
behaviour as update. That's the behaviour which is most commonly
requested in the various times this comes up.
> > , and the "require unique keys" behaviour is more common,
>
> I just have.
No you haven't -- you have simply *declared* that it is more common,
without giving any evidence for it.
> 99% of the time you want to have keys from one dict override another,
> you’d be better off doing it in-place and so would be using .update()
> anyways.
I don't know if it is "99% of the time" or 50% of the time or 5%,
but this PEP is for the remaining times where we don't want in-place
updates but we want a new dict.
I use list.append or list.extend more often than list concatenation, but
when I want a new list, list concatenation is very useful. This proposal
is about those cases where we want a new dict.
--
Steven
On Sun, Mar 03, 2019 at 09:28:30PM -0500, James Lu wrote:
> I propose that the + sign merge two python dictionaries such that if
> there are conflicting keys, a KeyError is thrown.
This proposal is for a simple, operator-based equivalent to
dict.update() which returns a new dict. dict.update has existed since
Python 1.5 (something like a quarter of a century!) and never grown a
"unique keys" version.
I don't recall even seeing a request for such a feature. If such a
unique keys version is useful, I don't expect it will be useful often.
> This way, d1 + d2 isn’t just another obvious way to do {**d1, **d2}.
One of the reasons for preferring + is that it is an obvious way to do
something very common, while {**d1, **d2} is as far from obvious as you
can get without becoming APL or Perl :-)
> The second syntax makes it clear that a new dictionary is being
> constructed and that d2 overrides keys from d1.
Only because you have learned the rule that {**d, **e) means to
construct a new dict by merging, with the rule that in the event of
duplicate keys, the last key seen wins. If you hadn't learned that rule,
there is nothing in the syntax which would tell you the behaviour. We
could have chosen any rule we liked:
On Tue, Mar 5, 2019 at 11:16 PM Steven D'Aprano <st...@pearwood.info> wrote:On Sun, Mar 03, 2019 at 09:28:30PM -0500, James Lu wrote:
> I propose that the + sign merge two python dictionaries such that if
> there are conflicting keys, a KeyError is thrown.
This proposal is for a simple, operator-based equivalent to
dict.update() which returns a new dict. dict.update has existed since
Python 1.5 (something like a quarter of a century!) and never grown a
"unique keys" version.
I don't recall even seeing a request for such a feature. If such a
unique keys version is useful, I don't expect it will be useful often.I have one argument in favor of such a feature: It preserves concatenation semantics. + means one of two things in all code I've ever seen (Python or otherwise):1. Numeric addition (including element-wise numeric addition as in Counter and numpy arrays)2. Concatenation (where the result preserves all elements, in order, including, among other guarantees, that len(seq1) + len(seq2) == len(seq1 + seq2))dict addition that didn't reject non-unique keys wouldn't fit *either* pattern; the main proposal (making it equivalent to left.copy(), followed by .update(right)) would have the left hand side would win on ordering, the right hand side on values, and wouldn't preserve the length invariant of concatenation. At least when repeated keys are rejected, most concatenation invariants are preserved; order is all of the left elements followed by all of the right, and no elements are lost.
On Tue, Mar 5, 2019 at 3:50 PM Josh Rosenberg <shadowranger...@gmail.com> wrote:On Tue, Mar 5, 2019 at 11:16 PM Steven D'Aprano <st...@pearwood.info> wrote:On Sun, Mar 03, 2019 at 09:28:30PM -0500, James Lu wrote:
> I propose that the + sign merge two python dictionaries such that if
> there are conflicting keys, a KeyError is thrown.
This proposal is for a simple, operator-based equivalent to
dict.update() which returns a new dict. dict.update has existed since
Python 1.5 (something like a quarter of a century!) and never grown a
"unique keys" version.
I don't recall even seeing a request for such a feature. If such a
unique keys version is useful, I don't expect it will be useful often.I have one argument in favor of such a feature: It preserves concatenation semantics. + means one of two things in all code I've ever seen (Python or otherwise):1. Numeric addition (including element-wise numeric addition as in Counter and numpy arrays)2. Concatenation (where the result preserves all elements, in order, including, among other guarantees, that len(seq1) + len(seq2) == len(seq1 + seq2))dict addition that didn't reject non-unique keys wouldn't fit *either* pattern; the main proposal (making it equivalent to left.copy(), followed by .update(right)) would have the left hand side would win on ordering, the right hand side on values, and wouldn't preserve the length invariant of concatenation. At least when repeated keys are rejected, most concatenation invariants are preserved; order is all of the left elements followed by all of the right, and no elements are lost.I must by now have seen dozens of post complaining about this aspect of the proposal. I think this is just making up rules (e.g. "+ never loses information") to deal with an aspect of the design where a *choice* must be made. This may reflect the Zen of Python's "In the face of ambiguity, refuse the temptation to guess." But really, that's a pretty silly rule (truly, they aren't all winners). Good interface design constantly makes choices in ambiguous situations, because the alternative is constantly asking, and that's just annoying.We have a plethora of examples (in fact, almost all alternatives considered) of situations related to dict merging where a choice is made between conflicting values for a key, and it's always the value further to the right that wins: from d[k] = v (which overrides the value when k is already in the dict) to d1.update(d2) (which lets the values in d2 win), including the much lauded {**d1, **d2} and even plain {'a': 1, 'a': 2} has a well-defined meaning where the latter value wins.
As to why raising is worse: First, none of the other situations I listed above raises for conflicts. Second, there's the experience of str+unicode in Python 2, which raises if the str argument contains any non-ASCII bytes. In fact, we disliked it so much that we changed the language incompatibly to deal with it.
foo.update(bar, on_collision=updator) # Although I'm not a fan of the
keyword I used
`updator` being a simple function like this one:
def updator(updated, updator, key) -> Any:
if key == "related":
return updated[key].update(updator[key])
if key == "tags":
return updated[key] + updator[key]
if key in ["a", "b", "c"]: # Those
return updated[key]
return updator[key]
There's nothing here that couldn't be made today by using a custom
update function, but leaving the burden of checking for values that are
in both and actually inserting the new values to Python's language, and
keeping on our side only the parts that are specific to our use case,
makes in my opinion the code more readable, with fewer possible bugs and
possibly better optimization.
I don't think it's reasonable rationale for adding operator.
First, Python sometimes force people to use statement intentionally.
Strictly speaking, dict.update() is an expression. But it not return `self`
so you must split statements. It's design decision.
So "add operator because I want expression" is bad reasoning to me.
If it is valid reasoning, every mutating method should have operator.
It's crazy idea.
Second, operator is not required for expression. And adding operator
must have high bar than adding method because it introduces more
complexity and it could seen cryptic especially when the operator doesn't
have good math metaphor.
So I proposed adding dict.merge() instead of adding dict + as a counter
proposal.
If "I want expression" is really main motivation, it must be enough.
> Think of comprehensions versus for-loops. Comprehensions are expressions
> that don't add anything to the language that a for-loop cannot achieve.
> Still, everyone uses them because they are extremely convenient.
>
I agree that comprehension is extremely convenient.
But I think the main reason is it is compact and readable.
If comprehension is not compact and readable as for-loop, it's not extremely
convenient.
>
> > Is it needed to make Python more readable language?
>
> No, just like comprehensions, it's not "needed". It's just convenient.
>
I think comprehension is needed to make Python more readable language,
not just for convenient.
>
> > Anyway, If "there is expression" is the main reason for this proposal,
> > symbolic operator is not necessary.
>
> As said, "needed" is not the right word.
Maybe, I misunderstood nuance of the word "needed". English and Japanese
are very different language. sorry.
> Being able to use a decorator
> closes a gap in the language. Just like list comprehensions fit generator
> expressions and vice versa. There is no "need" for being able to write
>
> [x**2 for x in seq]
> {x**2 for x in seq}
>
> when you can equally well write
>
> list(x**2 for x in seq)
> set(x**2 for x in seq)
>
> But I certainly wouldn't complain about that redundancy in the language.
>
OK, I must agree this point. [] and {} has good metaphor in math.
We use [1, 2, 3,... ] for series, and {1, 2, 3, ...} for sets.
>
> > `new = d1.updated(d2)` or `new = dict.merge(d1, d2)` are enough. Python
> > preferred name over symbol in general. Symbols are readable and
> > understandable only when it has good math metaphor.
> >
> > Sets has symbol operator because it is well known in set in math, not
> > because set is frequently used.
> >
> > In case of dict, there is no simple metaphor in math.
>
> So then, if "list+list" and "tuple+tuple" wasn't available through an
> operator, would you also reject the idea of adding it, argueing that we
> could use this:
>
> L = L1.extended(L2)
>
> I honestly do not see the math relation in concatenation via "+".
>
First of all, concatenating sequence (especially str) is extremely
frequent than merging dict. My point is dict + dict is major abuse
of + than seq + seq and it's usage is smaller than seq + seq.
Let's describe why I think dict+dict is "major" abuse.
As I said before, it's common to assign operator for concatenation in
regular language, while middle-dot is used common.
When the commonly-used operator is not in ASCII, other symbol can
be used as alternative. We used | instead of ∪.
In case of dict, it's not common to assign operator for merging in math,
as far as I know.
(Maybe, "direct sum" ⊕ is similar to it. But it doesn't allow intersection.
So ValueError must be raised for duplicated key if we use "direct sum"
for metaphor.
But direct sum is higher-level math than "union" of set. I don't think
it's good idea to use it as metaphor.)
That's one of reasons I think seq + seq is "little" abuse and dict +
dict is "major" abuse.
Another reason is "throw some values away" doesn't fit mental model of "sum",
as I said already in earlier mail.
> But, given that "+" and "|" already have the meaning of "merging two
> containers into one" in Python, I think it makes sense to allow that also
> for dicts.
>
+ is used for concatenate, it is more strict than just merge.
If + is allowed for dict, set should support it too for consistency.
Then, meaning of "+ for container" become "sum up two containers in
some way, defined by the container type." It's consistent.
Kotlin uses + for this meaning. Scala uses ++ for this meaning.
But this is a large design change of the language. Is this really required?
I feel adding a method is enough.
--
Inada Naoki <songof...@gmail.com>
>
> Le 05/03/2019 à 23:40, Greg Ewing a écrit :
> > Steven D'Aprano wrote:
> >> The question is, is [recursive merge] behaviour useful enough and
> > > common enough to be built into dict itself?
> >
> > I think not. It seems like just one possible way of merging
> > values out of many. I think it would be better to provide
> > a merge function or method that lets you specify a function
> > for merging values.
> >
> That's what this conversation led me to. I'm not against the addition
> for the most general usage (and current PEP's describes the behaviour I
> would expect before reading the doc), but for all other more specific
> usages, where we intend any special or not-so-common behaviour, I'd go
> with modifying Dict.update like this:
>
> foo.update(bar, on_collision=updator) # Although I'm not a fan of the
> keyword I used
Le 6 mars 2019 à 10:26:15, Brice Parent
(con...@brice.xyz(mailto:con...@brice.xyz)) a écrit:
>
> Le 05/03/2019 à 23:40, Greg Ewing a écrit :
> > Steven D'Aprano wrote:
> >> The question is, is [recursive merge] behaviour useful enough and
> > > common enough to be built into dict itself?
> >
> > I think not. It seems like just one possible way of merging
> > values out of many. I think it would be better to provide
> > a merge function or method that lets you specify a function
> > for merging values.
> >
> That's what this conversation led me to. I'm not against the addition
> for the most general usage (and current PEP's describes the behaviour I
> would expect before reading the doc), but for all other more specific
> usages, where we intend any special or not-so-common behaviour, I'd go
> with modifying Dict.update like this:
>
> foo.update(bar, on_collision=updator) # Although I'm not a fan of the
> keyword I used
This won’t be possible update() already takes keyword arguments:
>>> foo = {}
>>> bar = {'a': 1}
>>> foo.update(bar, on_collision=lambda e: e)
>>> foo
{'a': 1, 'on_collision': <function <lambda> at 0x10b8df598>}
Le 06/03/2019 à 10:50, Rémi Lapeyre a écrit :
>> Le 05/03/2019 à 23:40, Greg Ewing a écrit :
>>> Steven D'Aprano wrote:
>>>> The question is, is [recursive merge] behaviour useful enough and
>>>> common enough to be built into dict itself?
>>> I think not. It seems like just one possible way of merging
>>> values out of many. I think it would be better to provide
>>> a merge function or method that lets you specify a function
>>> for merging values.
>>>
>> That's what this conversation led me to. I'm not against the addition
>> for the most general usage (and current PEP's describes the behaviour I
>> would expect before reading the doc), but for all other more specific
>> usages, where we intend any special or not-so-common behaviour, I'd go
>> with modifying Dict.update like this:
>>
>> foo.update(bar, on_collision=updator) # Although I'm not a fan of the
>> keyword I used
> This won’t be possible update() already takes keyword arguments:
>
>>>> foo = {}
>>>> bar = {'a': 1}
>>>> foo.update(bar, on_collision=lambda e: e)
>>>> foo
> {'a': 1, 'on_collision': <function <lambda> at 0x10b8df598>}
I don't see that as a problem at all.
Having a function's signature containing a **kwargs doesn't disable to
have explicit keyword arguments at the same time:
`def foo(bar="baz", **kwargs):` is perfectly valid, as well as `def
spam(ham: Dict, eggs="blah", **kwargs):`, so `update(other,
on_collision=None, **added) is too, no? The major implication to such a
modification of the Dict.update method, is that when you're using it
with keyword arguments (by opposition to passing another dict/iterable
as positional), you're making a small non-backward compatible change in
that if in some code, someone was already using the keyword that would
be chosing (here "on_collision"), their code would be broken by the new
feature.
I had never tried to pass a dict and kw arguments together, as it seemed
to me that it wasn't supported (I would even have expected an exception
to be raised), but it's probably my level of English that isn't high
enough to get it right, or this part of the doc that doesn't describe
well the full possible usage of the method (see here:
https://docs.python.org/3/library/stdtypes.html#dict.update). Anyway, if
the keyword is slected wisely, the collision case will almost never
happen, and be quite easy to correct if it ever happened.
On 06/03/2019 10:29, Ka-Ping Yee wrote:
> len(dict1 + dict2) does not equal len(dict1) + len(dict2), so using the +
> operator is nonsense.
I'm sorry, but you're going to have to justify why this identity is
important. Making assumptions about length where any dictionary
manipulations are concerned seems unwise to me, which makes a nonsense
of your claim that this is nonsense :-)
That's indeed a good point. Even if the correction is quite easy to make
in most cases. With keyword only changes:
button_options.update(dict(on_click=frobnicate, style="KDE", on_collision="replace")) # or button_options.update(dict(on_collision="replace"), on_click=frobnicate, style="KDE")
In the exact case you proposed, it could become a 2-liners:
button_options.update(button_info)
button_options.update(dict(on_click=frobnicate, style="KDE", on_collision="replace"))
In my code, I would probably make it into 2 lines, to make clear that we
have 2 levels of data merging, one that is general (the first), and one
that is specific to this use-case (as it's hard written in the code),
but not everyone doesn't care about the number of lines.
But for the other part of your message, I 100% agree with you. The main
problem with such a change is not (to me) that it can break some edge
cases, but that it would potentially break them silently. And that, I
agree, is worth a big -1 I guess.
I strongly agree with Ka-Ping. '+' is intuitively concatenation not merging. The behavior is overwhelmingly more similar to the '|' operator in sets (whether or not a user happens to know the historical implementation overlap).
> If a comma-separated sequence of key/datum pairs is given, they are evaluated from left to right to define the entries of the dictionary: each key object is used as a key into the dictionary to store the corresponding datum. This means that you can specify the same key multiple times in the key/datum list, and the final dictionary’s value for that key will be the last one given.
--
Inada Naoki <songof...@gmail.com>
Inada Naoki quoted (from doc.python ref [6] in my original post):
> > If a comma-separated sequence of key/datum pairs is given, they are evaluated from left to right to define the entries of the dictionary: each key object is used as a key into the dictionary to store the corresponding datum. This means that you can specify the same key multiple times in the key/datum list, and the final dictionary’s value for that key will be the last one given.
Indeed. Although off-topic, I think
>>> {'a': 0, 'a': 1} == {'a': 1}
True
is much better than "This means that you can specify the same key
multiple times in the key/datum list, and the final dictionary’s value
for that key will be the last one given."
By the way, today I think we'd say key/value pairs. And I've read
https://www.theguardian.com/guardian-observer-style-guide-d
data takes a singular verb (like agenda), though strictly a plural;
you come across datum, the singular of data, about as often as you
hear about an agendum
Oh, and "the final dictionary's value" should I think be "the
dictionary's final value" or perhaps just "the dictionary's value"
But now we're far from the thread topic. I'm happy to join in on a
thread on improving documentation (by using simpler language and good
examples).
I disagree. An example is an excellent thing, but the words are
definitive and must be there.
--
Rhodri James *-* Kynesim Ltd
Sigh. I hit SEND before I finished changing the title. Sorry, folks.
If we use this "literally concat" metaphor, I still think set should have `+` as alias to `|` for consistency.
I think "|" keeps commutativity only because it's minor than `+`.
Hmm. The PEP proposed dict - dict, which is similar to set - set (difference).
You might as well say that using the + operator on vectors is
nonsense, because len(v1 + v2) is not in general equal to
len(v1) + len(v2).
Yet mathematicians are quite happy to talk about "addition"
of vectors.
On Thu, Mar 7, 2019 at 10:52 AM Josh Rosenberg
<shadowranger...@gmail.com> wrote:
>
> Allowing dicts to get involved in + means:
Lots of words that basically say: Stuff wouldn't be perfectly pure.
But adding dictionaries is fundamentally *useful*. It is expressive.
If we were inventing programming languages in a vacuum, you could say + can mean "arbitrary combination operator" and it would be fine. But we're not in a vacuum; every major language that uses + with general purpose containers uses it to mean element-wise addition or concatenation, not just "merge".
Like the merge operator and list concatenation, the difference operator requires both operands to be dicts, while the augmented version allows any iterable of keys.
">>> d - {'spam', 'parrot'} Traceback (most recent call last): ... TypeError: cannot take the difference of dict and set>>> d -= {'spam', 'parrot'} >>> print(d) {'eggs': 2, 'cheese': 'cheddar'}>>> d -= [('spam', 999)] >>> print(d) {'spam': 999, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}
João Matos
Ka-Ping Yee writes:
> On Wed, Mar 6, 2019 at 4:01 PM Chris Angelico <ros...@gmail.com> wrote:
> > But adding dictionaries is fundamentally *useful*. It is expressive.
>
> It is useful. It's just that + is the wrong name.
First, let me say that I prefer ?!'s position here, so my bias is made
apparent. I'm also aware that I have biases so I'm sympathetic to
those who take a different position.
Rather than say it's "wrong", let me instead point out that I think
it's pragmatically troublesome to use "+". I can think of at least
four interpretations of "d1 + d2"
1. update
2. multiset (~= Collections.Counter addition)
3. addition of functions into the same vector space (actually, a
semigroup will do ;-), and this is the implementation of
Collections.Counter
4. "fiberwise" set addition (ie, of functions into relations)
and I'm very jet-lagged so I may be missing some.
There's also the fact that the operations denoted by "|" and "||" are
often implemented as "short-circuiting", and therefore not
commutative, while "+" usually is (and that's reinforced for
mathematicians who are trained to think of "+" as the operator for
Abelian groups, while "*" is a (possibly) non-commutative operator. I
know commutativity of "+" has been mentioned before, but the
non-commutativity of "|" -- and so unsuitability for many kinds of
dict combination -- hasn't been emphasized before IIRC.
Since "|" (especially "|=") *is* suitable for "update", I think we
should reserve "+" for some future commutative extension.
In the spirit of full disclosure:
Of these, 2 is already implemented and widely used, so we don't need
to use dict.__add__ for that. I've never seen 4 in the mathematical
literature (union of relations is not the same thing). 3, however, is
very common both for mappings with small domain and sparse
representation of mappings with a default value (possibly computed
then cached), and "|" is not suitable for expressing that sort of
addition (I'm willing to say it's "wrong" :-).
Guido van Rossum wrote:
> I guess this explains the behavior of removing results <= 0; it makes
> sense as multiset subtraction, since in a multiset a negative count
> makes little sense. (Though the name Counter certainly doesn't seem to
> imply multiset.)
It doesn't even behave consistently as a multiset, since c[k] -= n
is happy to let the value go negative.
> For sets,
> union and intersection are distributive over each other.
> Note that this is *not* the case for + and * when used with
> (mathematical) numbers... So in a sense, SETL (which uses + and *
> for union and intersection got the operators wrong.
But in another sense, it didn't. In Boolean algebra, "and" and "or"
(which also distribute over each other) are often written using the
same notations as multiplication and addition. There's no rule in
mathematics saying that these notations must be distributive in one
direction but not the other.
Boolean algebra was only touched on briefly in my high school
years. I can't remember exactly what notation was used, but it
definitely wasn't ∧ and ∨ -- I didn't encounter those until
much later.
However, I've definitely seen texts on boolean alegbra in
relation to logic circuits that write 'A and B' as 'AB',
and 'A or B' as 'A + B'. (And also use an overbar for
negation instead of the mathematical ¬).
Maybe it depends on whether you're a mathematician or an
engineer? The multiplication-addition notation seems a lot
more readable when you have a complicated boolean expression,
so I can imagine it being favoured by pragmatic engineering
type people.
Agreed with this. What is so useful exactly in this new dict operator
that it hasn't been implemented, say, 20 years ago? I rarely find
myself merging dicts and, when I do, calling dict.update() is entirely
acceptable (I think the "{**d}" notation was already a mistake, making
a perfectly readable operation more cryptic simply for the sake of
saving a few keystrokes).
Built-in operations should be added with regard to actual user needs
(such as: a first-class notation for matrix multiplication, making
formulas easier to read and understand), not a mere "hmm this might
sometimes be useful".
Besides, if I have two dicts with e.g. lists as values, I *really*
dislike the fact that the + operator will clobber the values rather than
concatenate them. It's a recipe for confusion.
Regards
Antoine.