[Python-Dev] Guarantee ordered dict literals in v3.7?

154 views
Skip to first unread message

Stefan Krah

unread,
Nov 4, 2017, 2:29:32 PM11/4/17
to pytho...@python.org

Hello,

would it be possible to guarantee that dict literals are ordered in v3.7?


The issue is well-known and the workarounds are tedious, example:

https://mail.python.org/pipermail/python-ideas/2015-December/037423.html


If the feature is guaranteed now, people can rely on it around v3.9.



Stefan Krah



_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Guido van Rossum

unread,
Nov 4, 2017, 2:39:36 PM11/4/17
to Stefan Krah, Python-Dev
This sounds reasonable -- I think when we introduced this in 3.6 we were worried that other implementations (e.g. Jython) would have a problem with this, but AFAIK they've reported back that they can do this just fine. So let's just document this as a language guarantee.

Jim Baker

unread,
Nov 4, 2017, 5:09:39 PM11/4/17
to Guido van Rossum, Python-Dev
+1, as Guido correctly recalls, this language guarantee will work well with Jython when we get to the point of implementing 3.7+.

Nick Coghlan

unread,
Nov 4, 2017, 10:06:44 PM11/4/17
to Guido van Rossum, Python-Dev
On 5 November 2017 at 04:35, Guido van Rossum <gu...@python.org> wrote:
> This sounds reasonable -- I think when we introduced this in 3.6 we were
> worried that other implementations (e.g. Jython) would have a problem with
> this, but AFAIK they've reported back that they can do this just fine. So
> let's just document this as a language guarantee.

When I asked Damien George about this for MicroPython, he indicated
that they'd have to choose between guaranteed order and O(1) lookups
given their current dict implementation. That surprised me a bit
(since PyPy and CPython both *saved* memory by switching to their
guaranteed order implementations, hence the name "compact dict
representation"), but my (admittedly vague) understand is that the
presence of a space/speed trade-off in their case has something to do
with MicroPython deliberately running with a much higher chance of
hash collisions in general (since the data sets it deals with are
naturally smaller).

So if we make the change, MicroPython will likely go along with it,
but it may mean that dict lookups there become O(N), and folks will be
relying on "N" being consistently small due to memory constraints (but
some typically O(N) algorithms will still become O(N^2) when run on
MicroPython).

I don't think that situation should change the decision, but I do
think it would be helpful if folks that understand CPython's dict
implementation could take a look at MicroPython's dict implementation
and see if it might be possible for them to avoid having to make that
trade-off and instead be able to use a naturally insertion ordered
hashmap implementation.

Cheers,
Nick.

P.S. If anyone does want to explore MicroPython's dict implementation,
and see if there might be an alternate implementation strategy that
offers both O(1) lookup and guaranteed ordering without using
additional memory, the relevant files seem to be:

* https://github.com/micropython/micropython/blob/77a48e8cd493c0b0e0ca2d2ad58a110a23c6a232/py/obj.h#L339
(C level hashmap/ordered array structs)
* https://github.com/micropython/micropython/blob/master/py/map.c (C
level hashmap/ordered array implementation)
* https://github.com/micropython/micropython/blob/master/py/objdict.c
(Python dict wrapper around the mapping impl)

The current behaviour is that the builtin dict uses the hashmap
algorithms, while collections.OrderedDict uses the ordered array
algorithms.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Barry Warsaw

unread,
Nov 5, 2017, 12:39:34 PM11/5/17
to Python-Dev
On Nov 4, 2017, at 11:35, Guido van Rossum <gu...@python.org> wrote:
>
> This sounds reasonable -- I think when we introduced this in 3.6 we were worried that other implementations (e.g. Jython) would have a problem with this, but AFAIK they've reported back that they can do this just fine. So let's just document this as a language guarantee.

The other concern was backward compatibility issues. For example, if 3.7 makes this guarantee official, and folks write code with this assumption that has to work with older versions of Python, then that code could be buggy in previous versions and work in 3.7. This will probably be most evident in test suites, and such failures are often mysterious to debug (as we’ve seen in the past).

That doesn’t mean we shouldn’t do it, but it does mean we have to be careful and explicit to educate users about how to write safe multi-Python-version code.

Cheers,
-Barry

signature.asc

Guido van Rossum

unread,
Nov 5, 2017, 12:53:25 PM11/5/17
to Barry Warsaw, Python-Dev
Yup. At least such code will not break in 3.5.

In general if you write code using a newer version you should expect arbitrary breakage when trying to run it under older versions. There is no compatibility guarantee in that direction for anything anyways.

I don't see this as a reason to not put this into the language spec at 3.7.

Paul G

unread,
Nov 5, 2017, 1:19:55 PM11/5/17
to pytho...@python.org
I'm not entirely sure I understand the full set of reasoning for this - I couldn't really tell what the problem with OrderedDict is from the link Stefan provided. It seems to me like a kind of huge change for the language to move from arbitrary-ordered to guaranteed-ordered dict. The problem I see is that this introduces a huge backwards compatibility burden on all implementations of Python.

It's possible that no implementations of Python (either future CPython versions or current/future alternative interpreters) will find any reason to use anything but an insertion-order sorted dictionary, but given that we've done just fine with arbitrary-order semantics for the entire lifetime of the language /and/ there is a container (OrderedDict) which has guaranteed order semantics, it doesn't seem worth it to me.

I think I would mostly be concerned with (in terms of likeliness to occur):

1. An edge case we haven't thought of where arbitrary order dictionaries would allow some crticial optimization for a given platform (perhaps in someone writing a transpiler to another language where the convenient equivalent container has arbitrary order, e.g. if Brython wants to implement dicts in terms of objects - https://stackoverflow.com/questions/5525795/does-javascript-guarantee-object-property-order )

2. Someone invents a new arbitrary-ordered container that would improve on the memory and/or CPU performance of the current dict implementation

3. Some sort of bug or vulnerability is discovered that makes insertion-ordered dictionaries an unwise choice (similar to the hash collision vulnerability that necessitated hash randomization - https://stackoverflow.com/questions/14956313#14959001 ).

Perhaps these concerns are overblown, but if indeed guaranteed-order Mapping literals are critical in some or many cases, maybe it would be preferable to introduce new syntax for OrderedDict literals.

Best,
Paul
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/paul%40ganssle.io
>

signature.asc

Stefan Krah

unread,
Nov 5, 2017, 1:41:42 PM11/5/17
to pytho...@python.org
On Sun, Nov 05, 2017 at 01:14:54PM -0500, Paul G wrote:
> I'm not entirely sure I understand the full set of reasoning for this - I couldn't really tell what the problem with OrderedDict is from the link Stefan provided. It seems to me like a kind of huge change for the language to move from arbitrary-ordered to guaranteed-ordered dict. The problem I see is that this introduces a huge backwards compatibility burden on all implementations of Python.

Scientific applications want something like

{'a': 10, 'b': "foo", 'c': {'this': b'123'}}

as an ordered initializer for unboxed or typed (or both) data. In general,
if dicts are ordered, they can be used for example as initializers for
(nested) C structs.



> 2. Someone invents a new arbitrary-ordered container that would improve on the memory and/or CPU performance of the current dict implementation

I would think this is very unlikely, given that the previous dict implementation
has always been very fast. The new one is very fast, too.



Stefan Krah



_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Paul G

unread,
Nov 5, 2017, 1:59:27 PM11/5/17
to pytho...@python.org
> Scientific applications want something like
>
> {'a': 10, 'b': "foo", 'c': {'this': b'123'}}
>
> as an ordered initializer for unboxed or typed (or both) data. In general,
> if dicts are ordered, they can be used for example as initializers for
> (nested) C structs.

I can understand why you'd want an ordered container, I just don't see why it must be a dict. Why can't it be:

OrderedDict(a=10, b='foo', c=OrderedDict(this=b'123'))

Is it just that you don't want to type OrderedDict that many times? If it's so important to provide ordered dictionary literals, I would think it's a no-brainer to give them their own literal syntax (rather than re-defining dicts to have guaranteed order). e.g.:

o{'a': 10, 'b': 'foo', 'c': o{'this': b'123'}

Then there is no backwards incompatibility problem and users can express whether order does or does not matter to them when initializing a container.

On 11/05/2017 01:39 PM, Stefan Krah wrote:
> On Sun, Nov 05, 2017 at 01:14:54PM -0500, Paul G wrote:
>> I'm not entirely sure I understand the full set of reasoning for this - I couldn't really tell what the problem with OrderedDict is from the link Stefan provided. It seems to me like a kind of huge change for the language to move from arbitrary-ordered to guaranteed-ordered dict. The problem I see is that this introduces a huge backwards compatibility burden on all implementations of Python.
>

>
>
>
>> 2. Someone invents a new arbitrary-ordered container that would improve on the memory and/or CPU performance of the current dict implementation
>
> I would think this is very unlikely, given that the previous dict implementation
> has always been very fast. The new one is very fast, too.
>
>
>
> Stefan Krah
>
>
>
> _______________________________________________
> Python-Dev mailing list
> Pytho...@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/paul%40ganssle.io
>

signature.asc

Serhiy Storchaka

unread,
Nov 5, 2017, 2:03:42 PM11/5/17
to pytho...@python.org
04.11.17 19:30, Stefan Krah пише:

> would it be possible to guarantee that dict literals are ordered in v3.7?
>
>
> The issue is well-known and the workarounds are tedious, example:
>
> https://mail.python.org/pipermail/python-ideas/2015-December/037423.html
>
>
> If the feature is guaranteed now, people can rely on it around v3.9.

Do you suggest to make dictionary displays producing OrderedDict instead
of dict?

Serhiy Storchaka

unread,
Nov 5, 2017, 2:11:40 PM11/5/17
to pytho...@python.org
05.11.17 20:39, Stefan Krah пише:

> On Sun, Nov 05, 2017 at 01:14:54PM -0500, Paul G wrote:
>> 2. Someone invents a new arbitrary-ordered container that would improve on the memory and/or CPU performance of the current dict implementation
>
> I would think this is very unlikely, given that the previous dict implementation
> has always been very fast. The new one is very fast, too.

The modification of the current implementation that don't preserve the
initial order after deletion would be more compact and faster.

Stefan Krah

unread,
Nov 5, 2017, 2:25:12 PM11/5/17
to pytho...@python.org
On Sun, Nov 05, 2017 at 09:01:40PM +0200, Serhiy Storchaka wrote:
> Do you suggest to make dictionary displays producing OrderedDict
> instead of dict?

No, this is essentially a language spec doc issue that would guarantee
the ordering properties of the current dict implementation.


Stefan Krah

Stefan Krah

unread,
Nov 5, 2017, 2:32:31 PM11/5/17
to pytho...@python.org
On Sun, Nov 05, 2017 at 09:09:37PM +0200, Serhiy Storchaka wrote:
> 05.11.17 20:39, Stefan Krah пише:
> >On Sun, Nov 05, 2017 at 01:14:54PM -0500, Paul G wrote:
> >>2. Someone invents a new arbitrary-ordered container that would improve on the memory and/or CPU performance of the current dict implementation
> >
> >I would think this is very unlikely, given that the previous dict implementation
> >has always been very fast. The new one is very fast, too.
>
> The modification of the current implementation that don't preserve
> the initial order after deletion would be more compact and faster.

How much faster?


Stefan Krah

Serhiy Storchaka

unread,
Nov 5, 2017, 2:37:42 PM11/5/17
to pytho...@python.org
05.11.17 21:20, Stefan Krah пише:

> On Sun, Nov 05, 2017 at 09:01:40PM +0200, Serhiy Storchaka wrote:
>> Do you suggest to make dictionary displays producing OrderedDict
>> instead of dict?
>
> No, this is essentially a language spec doc issue that would guarantee
> the ordering properties of the current dict implementation.

Wouldn't be enough to guarantee just the ordering of dicts before first
deletion? Or before first resizing (the maximal size of dictionary
displays is known at compile time, so they can be presized)?

Serhiy Storchaka

unread,
Nov 5, 2017, 2:57:15 PM11/5/17
to pytho...@python.org
05.11.17 21:30, Stefan Krah пише:

> On Sun, Nov 05, 2017 at 09:09:37PM +0200, Serhiy Storchaka wrote:
>> 05.11.17 20:39, Stefan Krah пише:
>>> On Sun, Nov 05, 2017 at 01:14:54PM -0500, Paul G wrote:
>>>> 2. Someone invents a new arbitrary-ordered container that would improve on the memory and/or CPU performance of the current dict implementation
>>>
>>> I would think this is very unlikely, given that the previous dict implementation
>>> has always been very fast. The new one is very fast, too.
>>
>> The modification of the current implementation that don't preserve
>> the initial order after deletion would be more compact and faster.
>
> How much faster?

I didn't try to implement this. But the current implementation requires
periodical reallocating if add and remove items. The following loop
reallocates the dict every len(d) iterations, while the size of the dict
is not changed, and the half of its storage is empty.

while True:
v = d.pop(k)
...
d[k] = v

Stefan Krah

unread,
Nov 5, 2017, 3:08:12 PM11/5/17
to pytho...@python.org
On Sun, Nov 05, 2017 at 09:35:38PM +0200, Serhiy Storchaka wrote:
> 05.11.17 21:20, Stefan Krah пише:
> >On Sun, Nov 05, 2017 at 09:01:40PM +0200, Serhiy Storchaka wrote:
> >>Do you suggest to make dictionary displays producing OrderedDict
> >>instead of dict?
> >
> >No, this is essentially a language spec doc issue that would guarantee
> >the ordering properties of the current dict implementation.
>
> Wouldn't be enough to guarantee just the ordering of dicts before
> first deletion? Or before first resizing (the maximal size of
> dictionary displays is known at compile time, so they can be
> presized)?

Yes, for my use case that would be sufficient and that's what
I had in mind initially.


A luxury syntax addition like {a = 10, b = {c = "foo"}} that is read
as an OrderedDict (where the keys a, b, c are implicitly strings) would
of course also be sufficient for my use case.


But I suspect many users who have other use cases find it tantalizing
not to be able to use the properties of the current regular dict.

Stefan Krah

Antoine Pitrou

unread,
Nov 5, 2017, 3:20:23 PM11/5/17
to pytho...@python.org
On Sun, 5 Nov 2017 21:06:12 +0100
Stefan Krah <ste...@bytereef.org> wrote:

> On Sun, Nov 05, 2017 at 09:35:38PM +0200, Serhiy Storchaka wrote:
> > 05.11.17 21:20, Stefan Krah пише:
> > >On Sun, Nov 05, 2017 at 09:01:40PM +0200, Serhiy Storchaka wrote:
> > >>Do you suggest to make dictionary displays producing OrderedDict
> > >>instead of dict?
> > >
> > >No, this is essentially a language spec doc issue that would guarantee
> > >the ordering properties of the current dict implementation.
> >
> > Wouldn't be enough to guarantee just the ordering of dicts before
> > first deletion? Or before first resizing (the maximal size of
> > dictionary displays is known at compile time, so they can be
> > presized)?
>
> Yes, for my use case that would be sufficient and that's what
> I had in mind initially.
>
>
> A luxury syntax addition like {a = 10, b = {c = "foo"}} that is read
> as an OrderedDict (where the keys a, b, c are implicitly strings) would
> of course also be sufficient for my use case.

Or you are ok with:

OD = OrderedDict
# ...
OD(a=10, b=OD(c="foo"))

Regards

Antoine.

Paul Ganssle

unread,
Nov 5, 2017, 3:41:40 PM11/5/17
to pytho...@python.org
I think the question of whether any specific implementation of dict could be made faster for a given architecture or even that the trade-offs made by CPython are generally the right ones is kinda beside the point. It's certainly feasible that an implementation that does not preserve ordering could be better for some implementation of Python, and the question is really how much is gained by changing the language semantics in such a way as to cut off that possibility.
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/paul%40ganssle.io

signature.asc

Sven R. Kunze

unread,
Nov 5, 2017, 3:46:03 PM11/5/17
to pytho...@python.org

Peter Ludemann via Python-Dev

unread,
Nov 5, 2017, 3:52:38 PM11/5/17
to Sven R. Kunze, Python-Dev
Isn't ordered dict also useful for **kwargs? 

If it turns out that there's a dict implementation that's faster by not preserving order, collections.UnorderedDict could be added.
There could also be specialized implementations that pre-size the dict (cf: C++ unordered_map::reserve), etc., etc.
But these are all future things, which might not be necessary.

Paul Ganssle

unread,
Nov 5, 2017, 5:04:13 PM11/5/17
to pytho...@python.org
> If it turns out that there's a dict implementation that's faster by not
> preserving order, collections.UnorderedDict could be added.
> There could also be specialized implementations that pre-size the dict (cf:
> C++ unordered_map::reserve), etc., etc.
> But these are all future things, which might not be necessary.

I think that the problem with this is that for the most part, people will use `dict`, and for most uses of `dict`, order doesn't matter (and it never has before). Given that "arbitrary order" includes *any* fixed ordering (insertion ordered, reverse insertion ordered, etc), the common case should keep the existing "no order guarantee" specification. This gives interpreter authors maximum freedom with a fundamental, widely used data type.

> Isn't ordered dict also useful for **kwargs?

If this is useful (and it seems like it would be), I think again a syntax modification that allows users to indicate that they want a particular implementation of **kwargs would be better than modifying dict semantics. It could possibly be handled with a type-hinting like syntax:

def f(*args, **kwargs : OrderedKwargs):

Or a riff on the existing syntax:

def f(*args, ***kwargs):
def f(*args, ^^kwargs):
def f(*args, .**kwargs):

In this case, the only guarantee you'd need (which relatively minor compared to a change in the dict semantics) would be that keyword argument order passed to a function would be preserved as the order that it is passed into the `kwargs` constructor. The old **kwargs syntax would give you a `dict` as normal, and the new ^^kwargs would give you an OrderedDict or some other dict subclass with guaranteed order.
>>> --Guido van Rossum (python.org/~guido <http://python.org/%7Eguido>)
>>>
>>> _______________________________________________
>>> Python-Dev mailing list
>>> Pytho...@python.org
>>> https://mail.python.org/mailman/listinfo/python-dev
>>> Unsubscribe: https://mail.python.org/mailman/options/python-dev/jbaker%
>>> 40zyasoft.com
>>>
>>>
>>
>>
>> _______________________________________________
>> Python-Dev mailing listPython-Dev@python.orghttps://mail.python.org/mailman/listinfo/python-dev
>>
>> Unsubscribe: https://mail.python.org/mailman/options/python-dev/srkunze%40mail.de
>>
>>
>>
>> _______________________________________________
>> Python-Dev mailing list
>> Pytho...@python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
>> pludemann%40google.com
>>
>>
>
>
>
> _______________________________________________
> Python-Dev mailing list
> Pytho...@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/paul%40ganssle.io
>

signature.asc

Tim Delaney

unread,
Nov 5, 2017, 5:28:14 PM11/5/17
to Peter Ludemann, Python-Dev
On 6 November 2017 at 07:50, Peter Ludemann via Python-Dev <pytho...@python.org> wrote:
Isn't ordered dict also useful for **kwargs? 

**kwargs is already specified as insertion ordered as of Python  3.6.


Tim Delaney

Tim Delaney

unread,
Nov 5, 2017, 5:34:23 PM11/5/17
to Serhiy Storchaka, Python-Dev
On 6 November 2017 at 06:09, Serhiy Storchaka <stor...@gmail.com> wrote:
05.11.17 20:39, Stefan Krah пише:
On Sun, Nov 05, 2017 at 01:14:54PM -0500, Paul G wrote:
2. Someone invents a new arbitrary-ordered container that would improve on the memory and/or CPU performance of the current dict implementation

I would think this is very unlikely, given that the previous dict implementation
has always been very fast. The new one is very fast, too.

The modification of the current implementation that don't preserve the initial order after deletion would be more compact and faster.

I would personally be happy with this as the guarantee (it covers dict literals and handles PEP 468), but it might be more confusing. "dicts are in arbitrary order" and "dicts maintain insertion order" are fairly simple to explain, "dicts maintain insertion order up to the point that a key is deleted" is less so.

Tim Delaney 

Steve Dower

unread,
Nov 5, 2017, 6:35:12 PM11/5/17
to pytho...@python.org

Since there is voting going on, -1 on changing the guarantees of `dict`. If ordering is important, OrderedDict is more explicit and more obvious to the over reading the code, even if the underlying implementation is identical.

 

Good luck teaching people about why Python went from OrderedDict to UnorderedDict within a version. I remember learning about this same thing in C++ and just thinking “wut?”. And they just added a new type and said to stop using the old one – not changing something that people already understand.

 

Better to underspecify the default dict and offer explicitly named extensions (DefaultDict, OrderedDict, etc.) for those who want more guarantees.

 

Cheers,

Steve

 

Top-posted from my Windows phone

Raymond Hettinger

unread,
Nov 5, 2017, 6:45:40 PM11/5/17
to Nick Coghlan, Python-Dev@Python. Org

> On Nov 4, 2017, at 7:04 PM, Nick Coghlan <ncog...@gmail.com> wrote:
>
> When I asked Damien George about this for MicroPython, he indicated
> that they'd have to choose between guaranteed order and O(1) lookups
> given their current dict implementation. That surprised me a bit
> (since PyPy and CPython both *saved* memory by switching to their
> guaranteed order implementations, hence the name "compact dict
> representation"), but my (admittedly vague) understand is that the
> presence of a space/speed trade-off in their case has something to do
> with MicroPython deliberately running with a much higher chance of
> hash collisions in general (since the data sets it deals with are
> naturally smaller).
>
> So if we make the change, MicroPython will likely go along with it,
> but it may mean that dict lookups there become O(N), and folks will be
> relying on "N" being consistently small due to memory constraints (but
> some typically O(N) algorithms will still become O(N^2) when run on
> MicroPython).
>
> I don't think that situation should change the decision, but I do
> think it would be helpful if folks that understand CPython's dict
> implementation could take a look at MicroPython's dict implementation
> and see if it might be possible for them to avoid having to make that
> trade-off and instead be able to use a naturally insertion ordered
> hashmap implementation.

I've just looked at the MicroPython dictionary implementation and think they won't have a problem implementing O(1) compact dicts with ordering.

The likely reason for the confusion is that they are already have an option for an "ordered array" dict variant that does a brute-force linear search. However, their normal hashed lookup is very similar to ours and is easily amenable to being compact and ordered.

See: https://github.com/micropython/micropython/blob/77a48e8cd493c0b0e0ca2d2ad58a110a23c6a232/py/map.c#L139

Pretty much any implementation hashed lookup of keys and values is amenable to being compact and ordered. Whatever existing logic that looks up an entry becomes a lookup into a table of indices which in turn references a sequential array of keys and values. This logic is independent of hashing scheme or density, and it has no effect on the number of probes or collision rate.

The cost is an extra level of indirection and an extra array of indices (typically very small). The benefit is faster iteration over the smaller dense key/value array, overall memory savings resulting in improved cache utilization, and the side-effect of remembering insertion order.

Summary: I think MicroPython will be just fine and if needed I will help create the patch that implements compact-and-ordered behavior.


Raymond

Nathaniel Smith

unread,
Nov 5, 2017, 7:33:09 PM11/5/17
to Paul Ganssle, Python Dev
On Nov 5, 2017 2:41 PM, "Paul Ganssle" <pgan...@gmail.com> wrote:
I think the question of whether any specific implementation of dict could be made faster for a given architecture or even that the trade-offs made by CPython are generally the right ones is kinda beside the point. It's certainly feasible that an implementation that does not preserve ordering could be better for some implementation of Python, and the question is really how much is gained by changing the language semantics in such a way as to cut off that possibility.

The language definition is not nothing, but I think it's easy to overestimate its importance. CPython does in practice provide ordering guarantees for dicts, and this solves a whole bunch of pain points: it makes json roundtripping work better, it gives ordered kwargs, it makes it possible for metaclasses to see the order class items were defined, etc. And we got all these goodies for better-than-free: the new dict is faster and uses less memory. So it seems very unlikely that CPython is going to revert this change in the foreseeable future, and that means people will write code that depends on this, and that means in practice reverting it will become impossible due to backcompat and it will be important for other interpreters to implement, regardless of what the language definition says.

That said, there are real benefits to putting this in the spec. Given that we're not going to get rid of it, we might as well reward the minority of programmers who are conscientious about following the spec by letting them use it too. And there were multiple PEPs that went away when this was merged; no one wants to resurrect them just for hypothetical future implementations that may never exist. And putting it in the spec will mean that we can stop having this argument over and over with the same points rehashed for those who missed the last one. (This isn't aimed at you or anything; it's not your fault you don't know all these arguments off the top of your head, because how would you? But it is a reality of mailing list dynamics that rehashing this kind of thing sucks up energy without producing much.)

MicroPython deviates from the language spec in lots of ways. Hopefully this won't need to be another one, but it won't be the end of the world if it is.

-n

Raymond Hettinger

unread,
Nov 5, 2017, 9:46:20 PM11/5/17
to Nathaniel Smith, Paul Ganssle, Python-Dev@Python. Org

> On Nov 5, 2017, at 4:31 PM, Nathaniel Smith <n...@pobox.com> wrote:
>
> CPython does in practice provide ordering guarantees for dicts, and this solves a whole bunch of pain points: it makes json roundtripping work better, it gives ordered kwargs, it makes it possible for metaclasses to see the order class items were defined, etc. And we got all these goodies for better-than-free: the new dict is faster and uses less memory. So it seems very unlikely that CPython is going to revert this change in the foreseeable future, and that means people will write code that depends on this, and that means in practice reverting it will become impossible due to backcompat and it will be important for other interpreters to implement, regardless of what the language definition says.
>
> That said, there are real benefits to putting this in the spec. Given that we're not going to get rid of it, we might as well reward the minority of programmers who are conscientious about following the spec by letting them use it too.

Thanks. Your note resonated with me -- the crux of your argument seems to be that the proposal results in a net reduction in complexity for both users and implementers.

That makes sense. Even having read all the PEPs, read all the patches, and having participated in the discussions, I tend to forget where ordering is guaranteed and where it isn't.

This discussion reminds me of when Timsort was introduced many years ago. Sort stability wasn't guaranteed at first, but it was so darned convenient (and a pain to work around when not present) that it became guaranteed in the following release. The current proposal is different in many ways, but does share the virtue of being a nice-to-have for users.

> MicroPython deviates from the language spec in lots of ways. Hopefully this won't need to be another one, but it won't be the end of the world if it is.

I've looked at the MicroPython source and think this won't be a problem. It will be even easier for them than it was for us (the code is simpler because it doesn't have special cases for key-sharing, unicode optimizations, and whatnot).


Raymond
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

INADA Naoki

unread,
Nov 5, 2017, 10:04:10 PM11/5/17
to Serhiy Storchaka, Python-Dev
On Mon, Nov 6, 2017 at 4:54 AM, Serhiy Storchaka <stor...@gmail.com> wrote:
...
>
> I didn't try to implement this. But the current implementation requires
> periodical reallocating if add and remove items. The following loop
> reallocates the dict every len(d) iterations, while the size of the dict is
> not changed, and the half of its storage is empty.
>
> while True:
> v = d.pop(k)
> ...
> d[k] = v
>

FYI, Raymond's original compact dict (moving last item to slot used
for deleted item) will break OrderedDict. So it's not easy to implement
than it looks.

OrderedDict uses linked list to keep which slot is used for the key.
Moving last item will break it.
It means odict.__delitem__ can't use PyDict_DelItem anymore and
OrderedDict should touch internal structure of dict.

I think current OrderedDict implementation is fragile loose coupling.
While two object has different file (dictobject.c and odictobject.c),
OrderedDict depends on dict's internal behavior heavily.
It prevents optimizing dict. See comment here.

https://github.com/python/cpython/blob/a5293b4ff2c1b5446947b4986f98ecf5d52432d4/Objects/dictobject.c#L1082

I don't have strong opinion about what should we do about dict and OrderedDict.
But I feel PyPy's approach (using same implementation and just override
__eq__ and add move_to_end() method) is most simple.

Regards,

INADA Naoki <songof...@gmail.com>

Nick Coghlan

unread,
Nov 5, 2017, 10:09:49 PM11/5/17
to Peter Ludemann, Python-Dev
On 6 November 2017 at 06:50, Peter Ludemann via Python-Dev
<pytho...@python.org> wrote:
> Isn't ordered dict also useful for **kwargs?

3.6 already made this semantic change for **kwargs and class execution
namespaces - it just left the door open to having implementations meet
those requirements by way of substituting in collections.OrderedDict
for those use cases, rather than using a regular builtin dictionary.

So we're also already imposing the requirement on conformant Python
implementations to have a reasonably performant insertion-ordered
mapping implementation available.

The open questions around insertion ordering thus relate to what user
expectations should be for:

- dictionary displays
- explicit invocations of the builtin dict constructor
- mutating methods on builtin dicts
- serialisation and deserialisation operations that use regular dicts
(e.g. the JSON module, csv.DictReader/Writer)

It's that last category which is particularly problematic now, as in
*C*Python 3.6, the dicts used for these operations actually *are*
insertion ordered, so round trips from disk, through a CPython builtin
dict, and then back to disk *will* be order preserving, whereas in
previous versions there was no guarantee that that would be the case
(and hash randomisation meant the key reordering wasn't even
deterministic).

While such operations were already order preserving in PyPy (since
their switch to an insertion-ordered builtin dict implementation
served as prior art for the subsequent switch in CPython), we know
from experience that it's incredibly easy for even things that are
nominally implementation details of CPython to become expected
behaviour for Python users (e.g. there's still plenty of code out
there that relies on the use of an eager refcounting memory management
strategy to handle external resources properly).

That means our choices for 3.7 boil down to:

* make this a language level guarantee that Python devs can reasonably rely on
* deliberately perturb dict iteration in CPython the same way the
default Go runtime does [1]

When we did the "insertion ordered hash map" availability review, the
main implementations we were checking on behalf of were Jython & VOC
(JVM implementations), Batavia (JavaScript implementation), and
MicroPython (C implementation). Adding IronPython (C# implementation)
to the mix gives:

* for the JVM, the insertion ordered LinkedHashMap [2] has been
available since J2SE 1.4 was released in 2002
* for JavaScript, ECMA 6 defines the Map type [3] as an insertion
ordered key/value store
* for the .NET CLR, System.Collections.Specialized.OrderedDictionary
[4] seems to be the best builtin option, but the Java implementations
also map reasonably well to C# semantics
* for MicroPython, it's not yet clear whether or not the hash map
design used in CPython could also be adapted to their design
constraints (i.e. giving both insertion ordering and O(1) lookup
without requiring excessive amounts of memory), but it should at least
be feasible to make the semantics configurable based on a compile time
option (since CPython has a working C implementation of the desired
semantics already)

Since the round-trip behaviour that comes from guaranteed order
preservation is genuinely useful, and we're comfortable with folks
switching to more specialised containers when they need different
performance characteristics from what the builtins provide, elevating
insertion order preservation to a language level requirements makes
sense.

Cheers,
Nick.

[1] https://blog.golang.org/go-maps-in-action (search for "Iteration Order")
[2] https://docs.oracle.com/javase/8/docs/api/java/util/LinkedHashMap.html
[3] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Map
[4] https://docs.microsoft.com/en-us/dotnet/api/system.collections.specialized.ordereddictionary?view=netframework-4.7.1

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Nick Coghlan

unread,
Nov 5, 2017, 10:15:20 PM11/5/17
to Raymond Hettinger, Python-Dev@Python. Org
On 6 November 2017 at 09:43, Raymond Hettinger
Nice! That's what I thought based on reading some of the design
discussions about CPython's dict implementation, but I didn't know the
details of either dict implementation well enough to be confident that
the CPython changes would map cleanly to MicroPython's variant.

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Serhiy Storchaka

unread,
Nov 6, 2017, 2:46:20 AM11/6/17
to pytho...@python.org
06.11.17 05:01, INADA Naoki пише:

> FYI, Raymond's original compact dict (moving last item to slot used
> for deleted item) will break OrderedDict. So it's not easy to implement
> than it looks.
>
> OrderedDict uses linked list to keep which slot is used for the key.
> Moving last item will break it.
> It means odict.__delitem__ can't use PyDict_DelItem anymore and
> OrderedDict should touch internal structure of dict.

All this is implementation details. We did have little time for
coordinated changing dict and OrderedDict implementations before
releasing 3.6, and left existing coupling. This also prevents from
in-place compactization of dict items.

We could add a private flag to dict that denotes whether this dict is
ordered or no. The ordinal dict could be more compact, while OrderedDict
left ordered. And I like your issue31265.

The current dict implementation still is young. It takes several
optimizations in 3.7 (thank to you Inada) and AFAIK there are still not
merged patches. I would wait until it become more stable before making
change in language specification.

> I think current OrderedDict implementation is fragile loose coupling.
> While two object has different file (dictobject.c and odictobject.c),
> OrderedDict depends on dict's internal behavior heavily.
> It prevents optimizing dict. See comment here.
>
> https://github.com/python/cpython/blob/a5293b4ff2c1b5446947b4986f98ecf5d52432d4/Objects/dictobject.c#L1082
>
> I don't have strong opinion about what should we do about dict and OrderedDict.
> But I feel PyPy's approach (using same implementation and just override
> __eq__ and add move_to_end() method) is most simple.

I think that the coupling with dict and OrderedDict should be more
robust, but left private. Dict implementation should be aware of
OrderedDict and provide some private methods for using in OrderedDict,
but not restrict the optimization of unordered dicts.

Victor Stinner

unread,
Nov 6, 2017, 3:48:35 AM11/6/17
to Guido van Rossum, Barry Warsaw, Python-Dev
2017-11-05 18:50 GMT+01:00 Guido van Rossum <gu...@python.org>:
> I don't see this as a reason to not put this into the language spec at 3.7.

It can prevent some kinds of optimizations. Dictionaries are used
"everywhere" in Python, so they are very important for performance.

I would prefer to only keep the ordering warranty for function keyword
arguments and class members, and use explicitly an ordered dictionary
when needed.

Sorry, I don't have any example right now of a concrete optimization
which would not be possible with ordered dictionary. But Serhiy
mentioned the performance impact of ordering in Python 3.6 dictionary
on deletion.

Victor

Nick Coghlan

unread,
Nov 6, 2017, 4:06:32 AM11/6/17
to Victor Stinner, Barry Warsaw, Python-Dev
On 6 November 2017 at 18:46, Victor Stinner <victor....@gmail.com> wrote:
> 2017-11-05 18:50 GMT+01:00 Guido van Rossum <gu...@python.org>:
>> I don't see this as a reason to not put this into the language spec at 3.7.
>
> It can prevent some kinds of optimizations. Dictionaries are used
> "everywhere" in Python, so they are very important for performance.
>
> I would prefer to only keep the ordering warranty for function keyword
> arguments and class members, and use explicitly an ordered dictionary
> when needed.
>
> Sorry, I don't have any example right now of a concrete optimization
> which would not be possible with ordered dictionary. But Serhiy
> mentioned the performance impact of ordering in Python 3.6 dictionary
> on deletion.

Note that I *don't* think we should mandate that regular dictionaries
be synonymous with collections.OrderedDict - I think it's fine to say
that regular dicts are insertion ordered *until* a key is deleted, and
after that their ordering is arbitrary.
Insertion-ordered-until-the-first-delete is sufficient to guarantee
serialisation round trips, dict display ordering, and keyword order
preservation in the dict constructor (it's not sufficient to guarantee
ordering for class namespaces, as those may contain "del" statements,
but class bodies are also permitted to use a non-default mapping
type).

However, if we decide *not* to require that dictionaries be insertion
ordered, then I think we should do the same thing Go did, and have
dict iterators start at a random offset in the item sequence (that's
not actually my idea - Greg Smith suggested it at the core dev
sprints).

Otherwise we'll end up standardising the 3.6 behaviour by default, as
folks come to rely on it, and decline to support Python
implementations that don't provide insertion ordering semantics.

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Steven D'Aprano

unread,
Nov 6, 2017, 4:16:22 AM11/6/17
to pytho...@python.org
On Mon, Nov 06, 2017 at 01:07:51PM +1000, Nick Coghlan wrote:

> That means our choices for 3.7 boil down to:
>
> * make this a language level guarantee that Python devs can reasonably rely on
> * deliberately perturb dict iteration in CPython the same way the
> default Go runtime does [1]

I agree with this choice.

My preference is for the first: having dicts be unordered has never been
a positive virtue in itself, but always the cost we paid for fast O(1)
access. Now what we have fast O(1) access *without* dicts being
unordered, we should make it a language guarantee.

Provided of course that we can be reasonable certain that other
implementations can do the same. And it looks like we can.

But if we wanted to still keep our options open, how about weakening the
requirement that globals() and object __dicts__ be specifically the same
type as builtin dict? That way if we discover a super-fast and compact
dict implementation (maybe one that allows only string keys?) that is
unordered, we can use it for object namespaces without affecting the
builtin dict.


> When we did the "insertion ordered hash map" availability review, the
> main implementations we were checking on behalf of were Jython & VOC
> (JVM implementations), Batavia (JavaScript implementation), and
> MicroPython (C implementation). Adding IronPython (C# implementation)
> to the mix gives:

Shouldn't we check with Nuitka (C++) and Cython as well?

I'd be surprised if this is a problem for either of them, but we should
ask.


> Since the round-trip behaviour that comes from guaranteed order
> preservation is genuinely useful, and we're comfortable with folks
> switching to more specialised containers when they need different
> performance characteristics from what the builtins provide, elevating
> insertion order preservation to a language level requirements makes
> sense.

+1


OrderedDict could then become a thin wrapper around regular dicts.

--
Steve

Paul Sokolovsky

unread,
Nov 6, 2017, 5:20:23 AM11/6/17
to Nick Coghlan, Python-Dev
Hello,

On Sun, 5 Nov 2017 12:04:41 +1000
Nick Coghlan <ncog...@gmail.com> wrote:

> On 5 November 2017 at 04:35, Guido van Rossum <gu...@python.org>
> wrote:
> > This sounds reasonable -- I think when we introduced this in 3.6 we
> > were worried that other implementations (e.g. Jython) would have a
> > problem with this, but AFAIK they've reported back that they can do
> > this just fine. So let's just document this as a language
> > guarantee.
>
> When I asked Damien George about this for MicroPython, he indicated
> that they'd have to choose between guaranteed order and O(1) lookups
> given their current dict implementation. That surprised me a bit

MicroPython hashmap implementation is effectively O(n) (average and
worst case) due to the algorithm parameters chosen (like the load factor
of 1). Of course, parameters could be tweaked, but the ones chosen are
so because the memory usage is far more important for MicroPython
systems than performance characteristics (all due to small amounts of
memory). Like, MicroPython was twice as fast than Python 3.3 on
average, and 1000 times more efficient in the memory usage.

> (since PyPy and CPython both *saved* memory by switching to their
> guaranteed order implementations, hence the name "compact dict

There's nothing to save in MicroPython's dict implementation, simply
because it doesn't waste anything in the first place - uPy's dict entry
is just (key, val) (2 machine words), and load factor of the dict can
reach 1.0 as mentioned.

[]

> I don't think that situation should change the decision,

Indeed, it shouldn't. What may change it is the simple and obvious fact
that there's no need to change anything, as proven by the 25-year
history of the language.

What happens now borders on technologic surrealism - the CPython, after
many years of persuasion, switched its dict algorithm, rather
inefficient in terms of memory, to something else, less inefficient
(still quite inefficient, taking "no overhead" as the baseline). That
algorithm randomly had another property. Now there's a seemingly
serious talk of letting that property leak into the *language spec*,
despite the fact that there can be unlimited number of dictionary
algorithms, most of them not having that property.

What it will lead to is further fragmentation of the community. Python2
vs Python3 split is far from being over, and now there're splits
between:

* people who use "yield from" vs "await"
* people who use f-strings vs who don't
* people who rely on sorted nature of dict's vs who don't

etc.

[]

> P.S. If anyone does want to explore MicroPython's dict implementation,
> and see if there might be an alternate implementation strategy that
> offers both O(1) lookup and guaranteed ordering without using
> additional memory

That would be the first programmer in the history to have a cake and
eat it too. Memory efficiency, runtime efficiency, sorted order: choose
2 of 3.


--
Best regards,
Paul mailto:pmi...@gmail.com

Nick Coghlan

unread,
Nov 6, 2017, 5:23:50 AM11/6/17
to Steven D'Aprano, pytho...@python.org
On 6 November 2017 at 19:14, Steven D'Aprano <st...@pearwood.info> wrote:
> On Mon, Nov 06, 2017 at 01:07:51PM +1000, Nick Coghlan wrote:
>> When we did the "insertion ordered hash map" availability review, the
>> main implementations we were checking on behalf of were Jython & VOC
>> (JVM implementations), Batavia (JavaScript implementation), and
>> MicroPython (C implementation). Adding IronPython (C# implementation)
>> to the mix gives:
>
> Shouldn't we check with Nuitka (C++) and Cython as well?

If you're using dicts as dicts, Cython just uses the CPython ones via
the C API, rather than defining its own.

I didn't do any specific research for C++ (since it's essentially the
king of fast low level data structure design, hence its popularity in
graphics programming), but did see a few references to C++ ordered
hash map implementations while looking into the available options for
other runtimes.

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Stefan Krah

unread,
Nov 6, 2017, 5:39:09 AM11/6/17
to pytho...@python.org
On Mon, Nov 06, 2017 at 12:18:17PM +0200, Paul Sokolovsky wrote:
> MicroPython hashmap implementation is effectively O(n) (average and
> worst case) due to the algorithm parameters chosen (like the load factor
> of 1). Of course, parameters could be tweaked, but the ones chosen are
> so because the memory usage is far more important for MicroPython
> systems than performance characteristics (all due to small amounts of
> memory). Like, MicroPython was twice as fast than Python 3.3 on
> average, and 1000 times more efficient in the memory usage.

$ cat xxx.py

def pi_float():
"""native float"""
lasts, t, s, n, na, d, da = 0, 3.0, 3, 1, 0, 0, 24
while s != lasts:
lasts = s
n, na = n+na, na+8
d, da = d+da, da+32
t = (t * n) / d
s += t
return s

for i in range(100000):
x = pi_float()

$ time ./micropython xxx.py

real 0m4.424s
user 0m4.406s
sys 0m0.016s
$
$ time ../../cpython/python xxx.py

real 0m1.066s
user 0m1.056s
sys 0m0.010s



Congratulations ...



Stefan Krah

Chris Angelico

unread,
Nov 6, 2017, 5:50:40 AM11/6/17
to python-dev
Maybe I'm misreading your demo, but I fail to see what this has to do
with dict performance?

ChrisA

Paul Sokolovsky

unread,
Nov 6, 2017, 6:05:20 AM11/6/17
to Stefan Krah, pytho...@python.org
Hello,

On Mon, 6 Nov 2017 11:36:59 +0100
Stefan Krah <ste...@bytereef.org> wrote:

> On Mon, Nov 06, 2017 at 12:18:17PM +0200, Paul Sokolovsky wrote:
> > MicroPython hashmap implementation is effectively O(n) (average and
> > worst case) due to the algorithm parameters chosen (like the load
> > factor of 1). Of course, parameters could be tweaked, but the ones
> > chosen are so because the memory usage is far more important for
> > MicroPython systems than performance characteristics (all due to
> > small amounts of memory). Like, MicroPython was twice as fast than
> > Python 3.3 on average, and 1000 times more efficient in the memory
> > usage.
>

[]

>
> $ time ./micropython xxx.py
> $ time ../../cpython/python xxx.py

>
> Congratulations ...

That's why I wrote "Python 3.3", it's hard to argue CPython doesn't do
anything about the "Python is slow" proverb. It's still shouldn't be
hard to construct a testcase where MicroPython is faster (by not
implementing features not needed by 90% of Python programs of course,
not "for free").

Anyway, where're memory measurements?

(This is offtopic, so I shouldn't have replied.)

> Stefan Krah

--
Best regards,
Paul mailto:pmi...@gmail.com

Steve Holden

unread,
Nov 6, 2017, 6:20:59 AM11/6/17
to Paul Sokolovsky, Nick Coghlan, Python-Dev
On Mon, Nov 6, 2017 at 10:18 AM, Paul Sokolovsky <pmi...@gmail.com> wrote:
Hello,


What happens now borders on technologic surrealism - the CPython, after
many years of persuasion, switched its dict algorithm, rather
inefficient in terms of memory, to something else, less inefficient
(still quite inefficient, taking "no overhead" as the baseline). That
algorithm randomly had another property. Now there's a seemingly
serious talk of letting that property leak into the *language spec*,
despite the fact that there can be unlimited number of dictionary
algorithms, most of them not having that property.

I have to agree: I find the elevation of a CPython implementation detail to a language feature ​somewhat hard to comprehend. Maybe it's more to do with the way it's been presented, but this is hardly an enhancement the language has been screaming for for years.

​Presumably there is little concern that algorithms that rely on this behaviour will be perfectly syntactically conformant with earlier versions​ but will fail subtly and without explanation? It's a small concern, but a real one - particularly for learners.

regards
 Steve



Antoine Pitrou

unread,
Nov 6, 2017, 6:29:54 AM11/6/17
to pytho...@python.org

I think that Paul has a point. Interestingly, at the same time we're
talking about guaranteeing the order of dicts, we're talking about
using another, unordered, data structure (hash array mapped tries) to
improve the performance of something that resembles a namespace. It
seems the "unordered" part will be visible through
ExecutionContext.vars().

https://www.python.org/dev/peps/pep-0550/#enumerating-context-vars

The ordered-ness of dicts could instead become one of those stable
CPython implementation details, such as the fact that resources are
cleaned up timely by reference counting, that people nevertheless
should not rely on if they're writing portable code.

Regards

Antoine.


On Mon, 6 Nov 2017 12:18:17 +0200
Paul Sokolovsky <pmi...@gmail.com> wrote:
> []
>
> > I don't think that situation should change the decision,
>
> Indeed, it shouldn't. What may change it is the simple and obvious fact
> that there's no need to change anything, as proven by the 25-year
> history of the language.
>
> What happens now borders on technologic surrealism - the CPython, after
> many years of persuasion, switched its dict algorithm, rather
> inefficient in terms of memory, to something else, less inefficient
> (still quite inefficient, taking "no overhead" as the baseline). That
> algorithm randomly had another property. Now there's a seemingly
> serious talk of letting that property leak into the *language spec*,
> despite the fact that there can be unlimited number of dictionary
> algorithms, most of them not having that property.


Nick Coghlan

unread,
Nov 6, 2017, 7:11:18 AM11/6/17
to Steve Holden, Python-Dev
On 6 November 2017 at 21:18, Steve Holden <st...@holdenweb.com> wrote:
> I have to agree: I find the elevation of a CPython implementation detail to
> a language feature somewhat hard to comprehend. Maybe it's more to do with
> the way it's been presented, but this is hardly an enhancement the language
> has been screaming for for years.
>
> Presumably there is little concern that algorithms that rely on this
> behaviour will be perfectly syntactically conformant with earlier versions
> but will fail subtly and without explanation? It's a small concern, but a
> real one - particularly for learners.

A similar concern existed when we elevated sort stability to being a
language requirement - if you relied on that guarantee, your code was
technically buggy on versions prior to 2.3, but eventually 2.2 and
earlier aged out of general use, allowing such code to become correct
in general.

So the current discussion is mainly about deciding where we want the
compatibility burden to fall in relation to dict insertion ordering:

1. Do we deliberately revert CPython back to being harder to use
correctly for the sake of making Python easier to implement?
2. Do we make Python harder to implement for the sake of making it
easier to use?
3. Do we choose not to choose, thus implicitly choosing "2" by default
due to the fact that Python is defined by a language spec and a
reference implementation, rather than *just* a language spec?

Here's a more-complicated-than-a-doctest-for-a-dict-repo, but still
fairly straightforward, example regarding the "insertion ordering
dictionaries are easier to use correctly" argument:

import json
data = {"a":1, "b":2, "c":3}
rendered = json.dumps(data)
data2 = json.loads(rendered)
rendered2 = json.dumps(data2)
# JSON round trip
assert data == data2, "JSON round trip failed"
# Dict round trip
assert rendered == rendered2, "dict round trip failed"

Both of those assertions will always pass in CPython 3.6, as well as
in PyPy, because their dict implementations are insertion ordered,
which means the iteration order on the dictionaries is always "a",
"b", "c".

If you try it on 3.5 though, you should fairly consistently see that
last assertion fail, since there's nothing in 3.5 that ensures that
data and data2 will iterate over their keys in the same order.

You can make that code implementation independent (and sufficiently
version dependent to pass both assertions) by using OrderedDict:

from collections import OrderedDict
import json
data = OrderedDict(a=1, b=2, c=3)
rendered = json.dumps(data)
data2 = json.loads(rendered, object_pairs_hook=OrderedDict)
rendered2 = json.dumps(data2)
# JSON round trip
assert data == data2, "JSON round trip failed"
# Dict round trip
assert rendered == rendered2, "dict round trip failed"

However, despite the way this code looks, the serialised key order
*might not* be "a, b, c" on 3.5 and earlier (it will be on 3.6+, since
that already requires that kwarg order be preserved).

So the formally correct version independent code that reliably ensures
that the key order in the JSON file is always "a, b, c" looks like
this:

from collections import OrderedDict
import json
data = OrderedDict((("a",1), ("b",2), ("c",3)))
rendered = json.dumps(data)
data2 = json.loads(rendered, object_pairs_hook=OrderedDict)
rendered2 = json.dumps(data2)
# JSON round trip
assert data == data2, "JSON round trip failed"
# Dict round trip
assert rendered == rendered2, "dict round trip failed"
# Key order
assert "".join(data) == "".join(data2) == "abc", "key order failed"

Getting from the "Works on CPython 3.6+ but is technically
non-portable" state to a fully portable correct implementation that
ensures a particular key order in the JSON file thus currently
requires the following changes:

- don't use a dict display, use collections.OrderedDict
- make sure to set object_pairs_hook when using json.loads
- don't use kwargs to OrderedDict, use a sequence of 2-tuples

For 3.6, we've already said that we want the last constraint to age
out, such that the middle version of the code also ensures a
particular key order.

The proposal is that in 3.7 we retroactively declare that the first,
most obvious, version of this code should in fact reliably pass all
three assertions.

Failing that, the proposal is that we instead change the dict
iteration implementation such that the dict round trip will start
failing reasonably consistently again (the same as it did in 3.5), so
that folks realise almost immediately that they still need
collections.OrderedDict instead of the builtin dict.

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Steven D'Aprano

unread,
Nov 6, 2017, 8:21:45 AM11/6/17
to pytho...@python.org
On Mon, Nov 06, 2017 at 12:27:54PM +0100, Antoine Pitrou wrote:

> The ordered-ness of dicts could instead become one of those stable
> CPython implementation details, such as the fact that resources are
> cleaned up timely by reference counting, that people nevertheless
> should not rely on if they're writing portable code.

Given that (according to others) none of IronPython, Jython, Batavia,
Nuitka, or even MicroPython, should have trouble implementing an
insertion-order preserving dict, and that PyPy already has, why should
we say it is a CPython implementation detail?


--
Steve

Stefan Krah

unread,
Nov 6, 2017, 8:31:36 AM11/6/17
to pytho...@python.org
On Mon, Nov 06, 2017 at 01:03:05PM +0200, Paul Sokolovsky wrote:
> > $ time ./micropython xxx.py
> > $ time ../../cpython/python xxx.py
>
> >
> > Congratulations ...
>
> That's why I wrote "Python 3.3", it's hard to argue CPython doesn't do
> anything about the "Python is slow" proverb. It's still shouldn't be
> hard to construct a testcase where MicroPython is faster (by not
> implementing features not needed by 90% of Python programs of course,
> not "for free").

Sorry, that was a slightly mischievous benchmark indeed. -- Whether the proposal
is surreal or not depends on the likelihood that a) a substantially faster dict
algorithm will emerge and b) CPython, PyPy and Jython will switch to it.


My proposal was based on the fact that for almost two release cycles the
ordering implementation detail hasn't changed.




Stefan Krah

Antoine Pitrou

unread,
Nov 6, 2017, 8:33:12 AM11/6/17
to pytho...@python.org
On Tue, 7 Nov 2017 00:14:35 +1100
Steven D'Aprano <st...@pearwood.info> wrote:
> On Mon, Nov 06, 2017 at 12:27:54PM +0100, Antoine Pitrou wrote:
>
> > The ordered-ness of dicts could instead become one of those stable
> > CPython implementation details, such as the fact that resources are
> > cleaned up timely by reference counting, that people nevertheless
> > should not rely on if they're writing portable code.
>
> Given that (according to others) none of IronPython, Jython, Batavia,
> Nuitka, or even MicroPython, should have trouble implementing an
> insertion-order preserving dict, and that PyPy already has, why should
> we say it is a CPython implementation detail?

That's not what I'm taking away from Paul Sokolovsky's message I was
responding to. If you think otherwise then please expand and/or contact
Paul so that things are made clearer one way or the other.

Wolfgang

unread,
Nov 6, 2017, 9:05:24 AM11/6/17
to pytho...@python.org
Hi,

Could be missed by me but the
guarantee that dict literals are ordered implies not
that dict must be ordered in all cases.

The dict literal:

d = {"a": 1, "b": 2}

will keep the order of "a" and "b" because it is specified
as a dict literal.

But

d["c"] = 3

can change this order and it is allowed by the specification
of guaranteed ordered dict literals.

Please correct me if I am wrong.

In Python 3.6 it does not because dict
is implemented ordered and the insertion order is preserved.

Also I think we should give the whole thing more time and wait
with this guarantee. There are valid concerns against this.

Personally I like the ordering but if I need an ordered dict
it is ok for me to write this explicitly.

The **kwargs are already guaranteed to be ordered and this was
useful because the OrderedDict constructor benefits and for
other places it is useful.
But making all dict's per default ordered is another level.


Regards,

Wolfgang

Paul Sokolovsky

unread,
Nov 6, 2017, 11:37:53 AM11/6/17
to Antoine Pitrou, pytho...@python.org
Hello,

On Mon, 6 Nov 2017 14:30:45 +0100
Antoine Pitrou <soli...@pitrou.net> wrote:

> On Tue, 7 Nov 2017 00:14:35 +1100
> Steven D'Aprano <st...@pearwood.info> wrote:
> > On Mon, Nov 06, 2017 at 12:27:54PM +0100, Antoine Pitrou wrote:
> >
> > > The ordered-ness of dicts could instead become one of those stable
> > > CPython implementation details, such as the fact that resources
> > > are cleaned up timely by reference counting, that people
> > > nevertheless should not rely on if they're writing portable
> > > code.
> >
> > Given that (according to others) none of IronPython, Jython,
> > Batavia, Nuitka, or even MicroPython, should have trouble
> > implementing an insertion-order preserving dict, and that PyPy
> > already has, why should we say it is a CPython implementation
> > detail?
>
> That's not what I'm taking away from Paul Sokolovsky's message I was
> responding to. If you think otherwise then please expand and/or
> contact Paul so that things are made clearer one way or the other.

For MicroPython, it would lead to quite an overhead to make
dictionary items be in insertion order. As I mentioned, MicroPython
optimizes for very low bookkeeping memory overhead, so lookups are
effectively O(n), but orderedness will increase constant factor
significantly, perhaps 5x.

Also, arguably any algorithm which would *maintain* insertion order
over mutating operations would be more complex and/or require more
memory that one which doesn't. (This is based on the idea that
this problem is equivalent to maintaining a sorted data structure, where
sorting key is "insertion order").

CPython 3.6 gives **kwargs in the key specification order? That's fine
if that's all that it says (note that it doesn't say what happens if
someone takes **kwargs and starts to "del []" or "[] =" it). That
allows different ways to implement it, per particular implementation's
choice. That's even implementable in MicroPython. Like, lookups will be
less efficient, but nobody realistically passes hundreds of kwargs.

It's pretty different to go from that to dictionaries in general.

Saying that a general dict always maintains insertion order is saying
that "in Python, you have to use complex, memory hungry algorithms for
a simple mapping type".

Saying something like "dict maintains insertion order until first
modification" is going down the rabbit hole, making it all confusing,
hard to remember, crazy-sounding to novices.


Why all that, if, since the beginning of times, Python offered a
structure with guaranteed ordering - list, and structure for efficient
mapping one values into other - dict. Then even marriage between the
two - OrderedDict.

Why suddenly once in 25 years there's a need to do something to dict's,
violating computer science background behind them (one of the reason
enough people loved Python comparing to other "practical hack"
languages)?

Alternatives were already presented on the thread - if people want more
and easier ordered dictionaries, it calls to add a special literal
initializer for them ("o{}" was proposed).


--
Best regards,
Paul mailto:pmi...@gmail.com

Chris Jerdonek

unread,
Nov 6, 2017, 12:30:08 PM11/6/17
to Nick Coghlan, Steve Holden, Python-Dev
Nick, it seems like this is more complicated than it needs to be. You can just pass sort_keys=True to json.dump() / json.dumps(). I use it for tests and human-readability all the time.

—Chris



- don't use a dict display, use collections.OrderedDict
- make sure to set object_pairs_hook when using json.loads
- don't use kwargs to OrderedDict, use a sequence of 2-tuples

For 3.6, we've already said that we want the last constraint to age
out, such that the middle version of the code also ensures a
particular key order.

The proposal is that in 3.7 we retroactively declare that the first,
most obvious, version of this code should in fact reliably pass all
three assertions.

Failing that, the proposal is that we instead change the dict
iteration implementation such that the dict round trip will start
failing reasonably consistently again (the same as it did in 3.5), so
that folks realise almost immediately that they still need
collections.OrderedDict instead of the builtin dict.

Cheers,
Nick.

--
Nick Coghlan   |   ncog...@gmail.com   |   Brisbane, Australia
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev

Brett Cannon

unread,
Nov 6, 2017, 1:01:10 PM11/6/17
to Paul Sokolovsky, Antoine Pitrou, pytho...@python.org
But that's an implementation detail that most folks won't care about.
 

Saying something like "dict maintains insertion order until first
modification" is going down the rabbit hole, making it all confusing,
hard to remember, crazy-sounding to novices.


Why all that, if, since the beginning of times, Python offered a
structure with guaranteed ordering - list, and structure for efficient
mapping one values into other - dict. Then even marriage between the
two - OrderedDict.

Why suddenly once in 25 years there's a need to do something to dict's,
violating computer science background behind them (one of the reason
enough people loved Python comparing to other "practical hack"
languages)?

I don't understand what "computer science background" is being violated?
 

Alternatives were already presented on the thread - if people want more
and easier ordered dictionaries, it calls to add a special literal
initializer for them ("o{}" was proposed).

I don't see that happening. If OrderedDict didn't lead to syntax then I don't see why that's necessary now.

I think worrying about future optimizations that order preservation might prevent is a premature optimization/red herring. None of us can predict the future so worrying about some algorithmic breakthrough that will suddenly materialize on how to implement dictionaries seems unnecessary. That line of thought suggests we should guarantee as little as possible and just make most stuff undefined behaviour.

I also think Paul S. has made it clear that MicroPython won't be implementing order-preserving dicts due to their memory constraints. That's fine, but I think we also need to realize that MicroPython is already a slight hybrid by being based on Python 3.4 but with a backport of async/await and a subset of the stdlib.

From what I have read, this argument breaks down to this:

Standardizing order preserving and ...
  • MicroPython won't implement it, but all other major implementations will
  • Those that have order preservation will implement PEPs 468 and 520 for free
  • Some minor practical benefits in terms of avoiding accidental reliance on ordering

If we don't promise order preservation ...

  • CPython should probably add some perturbing to the iteration order
  • All implementations can handle this without issue or major implementation costs


So to me this sounds like a decision between the pragmatism of choosing semantics that all Python implementations can handle or the developer benefits of an order-preserving dict everywhere (all the other arguments I think are minor compared to these two).

Barry Warsaw

unread,
Nov 6, 2017, 1:22:22 PM11/6/17
to Python-Dev
On Nov 6, 2017, at 02:18, Paul Sokolovsky <pmi...@gmail.com> wrote:

> What it will lead to is further fragmentation of the community. Python2
> vs Python3 split is far from being over, and now there're splits
> between:
>
> * people who use "yield from" vs "await"
> * people who use f-strings vs who don't
> * people who rely on sorted nature of dict's vs who don't

This is the classic argument of, do you proceed conservatively and use the lowest-common denominator that makes your code work with the widest range of versions, or do you ride the wave and adopt the latest and greatest features as soon as they’re available?

Neither answer is wrong or right… for everyone. It’s also a debate as old as the software industry. Every package, project, company, developer, community will have to decide for themselves. Once you realize you can’t win, you’ve won! :)

-Barry

signature.asc

Paul Sokolovsky

unread,
Nov 6, 2017, 2:09:59 PM11/6/17
to Brett Cannon, Antoine Pitrou, pytho...@python.org
Hello,

On Mon, 06 Nov 2017 17:58:47 +0000
Brett Cannon <br...@python.org> wrote:

[]

> > Why suddenly once in 25 years there's a need to do something to
> > dict's, violating computer science background behind them (one of
> > the reason enough people loved Python comparing to other "practical
> > hack" languages)?
>
> I don't understand what "computer science background" is being
> violated?

I tried to explain that in the previous mail, can try a different
angle. So, please open you favorite CS book (better few) and look up
"abstract data types", then "mapping/associative array" and "list". We
can use Wikipedia too: https://en.wikipedia.org/wiki/Associative_array.
So, please look up: "Operations associated with this data type allow".
And you'll see, that there're no "ordering" related operations are
defined. Vice versa, looking at "sequence" operations, there will be
"prev/next", maybe "get n'th" element operations, implying ordering.

Python used to be a perfect application of these principles. Its dict
was a perfect CS implementation of an abstract associative array, and
list - of "sequence" abstract type (with additional guarantee of O(1)
random element access).

People knew and rejoiced that Python is built on solid science
principles, or could *learn* them from it. That no longer will be true,
with a sound concept being replaced with on-the-spot practical hack,
choosing properties of a random associative array algorithm
implementation over properties of a superset of such algorithms (many
of which are again don't offer any orderness guarantees).



I know though what will be replied (based on the replies below): "all
these are implementation details" - no, orderness vs non-orderness of a
mapping algorithm is an implementation detail; "users shouldn't know all
that" - they should, that's the real knowledge, and up until now, they
could learn that from *Python docs*, "we can't predict future" - we
don't need, we just need to know the past (25 years in our case), and
understand why it was done like that, I don't think Guido couldn't code
it ordered in 1991, it's just not natural for a mapping type to be so,
and in 2017, it's not more natural than it was in 1991.


MicroPython in particular appeared because Python offered all the
CS-sound properties and freedom and alternative choices for
implementation (more so than any other scripting language). It's losing
it, and not just for MicroPython's surprise.


[]

Paul G

unread,
Nov 6, 2017, 2:25:41 PM11/6/17
to pytho...@python.org
Is there a major objection to just adding in explicit syntax for order-preserving dictionaries? To some extent that seems like a reasonable compromise position in an "explicit is better than implicit" sense. A whole lot of code is out there that doesn't require or expect order-preserving dictionaries - it would be nice to be able to differentiate out the parts where order actually *does* matter.

(Of course, given that CPython's implementation is order-preserving, a bunch of code is probably now being written that implicitly requires on this detail, but at least having syntax that makes that clear would give people the *option* to make the assumption explicit).
> _______________________________________________
> Python-Dev mailing list
> Pytho...@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/paul%40ganssle.io
>

signature.asc

Barry Warsaw

unread,
Nov 6, 2017, 2:35:13 PM11/6/17
to Python Dev
On Nov 6, 2017, at 11:23, Paul G <pa...@ganssle.io> wrote:
>
> Is there a major objection to just adding in explicit syntax for order-preserving dictionaries?

I don’t think new syntax is necessary. We already have OrderedDict, which to me is a perfectly sensible way to spell “I need a mapping that preserves insertion order”, and the extra import doesn’t bother me.

I’m not saying whether or not to make the language guarantee that built-in dict preserves order. I’m just saying that if we don’t make that language change, we already have everything we need to support both use cases.

If we did make the change, it’s possible we would need a way to explicit say that order is not preserved. That seems a little weird to me, but I suppose it could be useful. I like the idea previously brought up that iteration order be deliberately randomized in that case, but we’d still need a good way to spell that.

Cheers,
-Barry

signature.asc

Paul Sokolovsky

unread,
Nov 6, 2017, 2:58:48 PM11/6/17
to Barry Warsaw, Python Dev
Hello,

On Mon, 6 Nov 2017 11:33:10 -0800
Barry Warsaw <ba...@python.org> wrote:

[]

> If we did make the change, it’s possible we would need a way to
> explicit say that order is not preserved. That seems a little weird

I recently was working on a more or less complex dataflow propagation
problem. It should converge to a fixed point, and it did, but on
different runs, to different ones. So, I know that I'm a bad
programmer, need to do more of my homework and grow. I know, that if I
rewrite it in C++ or C, it'll work unstable the same way, because it's
buggy. (Heck, over these years, I learned that I don't need to rewrite
things in C/C++, because Python is the *real* language, which works the
way computers do, without sugaring that up).

I need to remember that, because with Python 3.7, I may become a
good-programmer-in-a-ponyland-of-ordered-dicts.


Btw, in all this discussion, I don't remember anyone mentioning sets. I
don't recall the way they're implemented in CPython, but they have
strong conceptual and semantic resemblance to dict's. So, what about
them, do they become ordered too?


>
> Cheers,
> -Barry
>

--
Best regards,
Paul mailto:pmi...@gmail.com

_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev

Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Serhiy Storchaka

unread,
Nov 6, 2017, 3:30:54 PM11/6/17
to pytho...@python.org
06.11.17 21:56, Paul Sokolovsky пише:

> Btw, in all this discussion, I don't remember anyone mentioning sets. I
> don't recall the way they're implemented in CPython, but they have
> strong conceptual and semantic resemblance to dict's. So, what about
> them, do they become ordered too?

No, sets are not ordered, and seems they will not be ordered at least in
several next years.

Chris Barker

unread,
Nov 6, 2017, 6:25:52 PM11/6/17
to Paul G, Python Dev
On Mon, Nov 6, 2017 at 11:23 AM, Paul G <pa...@ganssle.io> wrote:
(Of course, given that CPython's implementation is order-preserving, a bunch of code is probably now being written that implicitly requires on this detail, but at least having syntax that makes that clear would give people the *option* to make the assumption explicit).

This is a really key point -- a heck of a lot more people use cPython than read the language spec. And a great deal of code is developed with a certain level of ignorance -- try something, if it works, and your test pass (if there are any), then you are done. 

So right now, there is more an more code out there that relies on a regular old dcit being ordered.

I've been struggling with teaching this right now -- my written-a-couple-years ago materials talk about dicts being arbitrary order, and then have a little demo of that fact. Now I'm teaching with Python 3.6, and I had to add in something like:

   cPython does, in fact, preserve order with dicts, but it should be considered an implementation detail, and not counted on ... (and by the say, so does PyPy, and ....)"

I don't know, but I'm going to guess about 0% of my class is going to remember that...

And if we added o{,,,} syntax it would be even worse, 'cause folks would forget to use it, as their code wouldn't behave differently (kind of like the 'b' flag on unix text files, or the u"string" where everything is ascii in that string...)

in short -- we don't have a choice (unless we add an explicit randomization as some suggested -- but that just seems perverse...)

-CHB

--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris....@noaa.gov

Joao S. O. Bueno

unread,
Nov 6, 2017, 7:19:26 PM11/6/17
to Paul G, Python-Dev
On 6 November 2017 at 17:23, Paul G <pa...@ganssle.io> wrote:
> Is there a major objection to just adding in explicit syntax for order-preserving dictionaries? To some extent that seems like a reasonable compromise position in an "explicit is better than implicit" sense. A whole lot of code is out there that doesn't require or expect order-preserving dictionaries - it would be nice to be able to differentiate out the parts where order actually *does* matter.
>
> (Of course, given that CPython's implementation is order-preserving, a bunch of code is probably now being written that implicitly requires on this detail, but at least having syntax that makes that clear would give people the *option* to make the assumption explicit).

I think the additional syntax have the added benefit of preventing
code that relies on the ordering of dict literals to be run on older
versions, therefore triggering subtle bugs that had already being
mentioned.

And also, forgot along the discussion, is the big disadvantage that
other Python implementations would have a quite
significant overhead on mandatory ordered dicts. One that was
mentioned along the way is transpilers, with
Brython as an example - but there might be others. MircoPython is far
from being the only implementation affected.

js
-><-

>
> On 11/06/2017 01:19 PM, Barry Warsaw wrote:
>> On Nov 6, 2017, at 02:18, Paul Sokolovsky <pmi...@gmail.com> wrote:
>>
>>> What it will lead to is further fragmentation of the community. Python2
>>> vs Python3 split is far from being over, and now there're splits
>>> between:
>>>
>>> * people who use "yield from" vs "await"
>>> * people who use f-strings vs who don't
>>> * people who rely on sorted nature of dict's vs who don't
>>
>> This is the classic argument of, do you proceed conservatively and use the lowest-common denominator that makes your code work with the widest range of versions, or do you ride the wave and adopt the latest and greatest features as soon as they’re available?
>>
>> Neither answer is wrong or right… for everyone. It’s also a debate as old as the software industry. Every package, project, company, developer, community will have to decide for themselves. Once you realize you can’t win, you’ve won! :)
>>
>> -Barry
>>
>>
>>
>> _______________________________________________
>> Python-Dev mailing list
>> Pytho...@python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe: https://mail.python.org/mailman/options/python-dev/paul%40ganssle.io
>>
>
>
> _______________________________________________
> Python-Dev mailing list
> Pytho...@python.org
> https://mail.python.org/mailman/listinfo/python-dev

> Unsubscribe: https://mail.python.org/mailman/options/python-dev/jsbueno%40python.org.br


>
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev

Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Brett Cannon

unread,
Nov 6, 2017, 10:40:24 PM11/6/17
to Paul Sokolovsky, Antoine Pitrou, pytho...@python.org
On Mon, 6 Nov 2017 at 11:08 Paul Sokolovsky <pmi...@gmail.com> wrote:
Hello,

On Mon, 06 Nov 2017 17:58:47 +0000
Brett Cannon <br...@python.org> wrote:

[]

> > Why suddenly once in 25 years there's a need to do something to
> > dict's, violating computer science background behind them (one of
> > the reason enough people loved Python comparing to other "practical
> > hack" languages)?
>
> I don't understand what "computer science background" is being
> violated?

I tried to explain that in the previous mail, can try a different
angle. So, please open you favorite CS book (better few) and look up
"abstract data types", then "mapping/associative array" and "list". We
can use Wikipedia too: https://en.wikipedia.org/wiki/Associative_array.
So, please look up: "Operations associated with this data type allow".
And you'll see, that there're no "ordering" related operations are
defined. Vice versa, looking at "sequence" operations, there will be
"prev/next", maybe "get n'th" element operations, implying ordering.

I don't think you meant for this to come off as insulting, but telling me how to look up the definition of an associative array or map feels like you're putting me down. I also have a Ph.D. in computer science so I'm aware of the academic definitions of these data structures.
 

Python used to be a perfect application of these principles. Its dict
was a perfect CS implementation of an abstract associative array, and
list - of "sequence" abstract type (with additional guarantee of O(1)
random element access). 

People knew and rejoiced that Python is built on solid science
principles, or could *learn* them from it.
That no longer will be true,
with a sound concept being replaced with on-the-spot practical hack,
choosing properties of a random associative array algorithm
implementation over properties of a superset of such algorithms (many
of which are again don't offer any orderness guarantees).


I don't think it's fair to call the current dict implementation a hack. It's a sound design that has a certain property that we are discussing the masking of. As I said previously, I think this discussion comes down to whether we think there are pragmatic benefits to exposing the ordered aspects to the general developer versus not.

-Brett

Steven D'Aprano

unread,
Nov 6, 2017, 10:54:00 PM11/6/17
to pytho...@python.org
On Mon, Nov 06, 2017 at 12:18:17PM +0200, Paul Sokolovsky wrote:

> > I don't think that situation should change the decision,
>
> Indeed, it shouldn't. What may change it is the simple and obvious fact
> that there's no need to change anything, as proven by the 25-year
> history of the language.

I disagree -- the history of Python shows that having dicts be unordered
is a PITA for many Python programmers. Python eventually gained an
ordered dict because it provides useful functionality that developers
demand.

Every new generation of Python programmers comes along and gets confused
by why dicts mysteriously change their order from how they were entered,
why doctests involving dicts break, why keyword arguments lose their
order, why they have to import a module to get ordered dicts instead of
having it be built-in, etc. Historically, we had things like
ConfigParser reordering ini files when you write them.

Having dicts be unordered is not a positive virtue, it is a limitation.
Up until now, it was the price we've paid for having fast, O(1) dicts.
Now we have a dict implementation which is fast, O(1) and ordered. Why
pretend that we don't? This is a long-requested feature, and the cost
appears to be small: by specifying this, all we do is rule out some, but
not all, hypothetical future optimizations.

Unordered dicts served CPython well for 20+ years, but I doubt many
people will miss them.


> What happens now borders on technologic surrealism - the CPython, after
> many years of persuasion, switched its dict algorithm, rather
> inefficient in terms of memory, to something else, less inefficient
> (still quite inefficient, taking "no overhead" as the baseline).

Trading off space for time is a very common practice. You said that
lookups on MicroPython's dicts are O(N). How efficient is µPy when doing
a lookup of a dict with ten million keys?

µPy has chosen to optimize for space, rather than time. That's great.
But I don't think you should sneer at CPython's choice to optimize for
time instead.

And given that µPy's dicts already fail to meet the expected O(1) dict
behviour, and the already large number of functional differences (not
just performance differences) between µPy and Python:

http://docs.micropython.org/en/latest/pyboard/genrst/index.html

I don't think that this will make much practical difference. MicroPython
users already cannot expect to run arbitrary Python code that works in
other implementations: the Python community is fragmented between µPy
code written for tiny machines, and Python code for machines with lots
of memory.


> That
> algorithm randomly had another property. Now there's a seemingly
> serious talk of letting that property leak into the *language spec*,

It will no more be a "leak" than any other deliberate design choice.


> despite the fact that there can be unlimited number of dictionary
> algorithms, most of them not having that property.

Sure. So what? There's an unlimited number of algorithms that don't
provide the functionality that we want. There are an unlimited number of
sort algorithms, but Python guarantees that we're only going to use
those that are stable. Similar applies for method resolution (which µPy
already violates), strings, etc.


> What it will lead to is further fragmentation of the community.

Aren't you concerned about fragmenting the community because of the
functional differences between MicroPython and the specs?

Sometimes a small amount of fragmentation is unavoidable, and not
necessarily a bad thing.


> > P.S. If anyone does want to explore MicroPython's dict implementation,
> > and see if there might be an alternate implementation strategy that
> > offers both O(1) lookup and guaranteed ordering without using
> > additional memory
>
> That would be the first programmer in the history to have a cake and
> eat it too. Memory efficiency, runtime efficiency, sorted order: choose
> 2 of 3.

Given that you state that µPy dicts are O(N) and unordered, does that
mean you picked only 1 out of 3?


--
Steve

David Mertz

unread,
Nov 6, 2017, 11:07:17 PM11/6/17
to Brett Cannon, Antoine Pitrou, Python-Dev
I strongly opposed adding an ordered guarantee to regular dicts. If the implementation happens to keep that, great. Maybe OrderedDict can be rewritten to use the dict implementation. But the evidence that all implementations will always be fine with this restraint feels poor, and we have a perfectly good explicit OrderedDict for those who want that.

_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev

Steven D'Aprano

unread,
Nov 6, 2017, 11:16:28 PM11/6/17
to pytho...@python.org
On Mon, Nov 06, 2017 at 11:33:10AM -0800, Barry Warsaw wrote:

> If we did make the change, it’s possible we would need a way to
> explicit say that order is not preserved. That seems a little weird
> to me, but I suppose it could be useful.

Useful for what?

Given that we will hypothetically have order-preserving dicts that
perform no worse than unordered dicts, I'm struggling to think of a
reason (apart from performance) why somebody would intentionally use a
non-ordered dict. If performance was an issue, sure, it makes sense to
have a non-ordered dict for when you don't want to pay the cost of
keeping insertion order. But performance seems to be a non-issue.

I can see people wanting a SortedDict which automatically sorts the keys
into some specified order. If I really work at it, I can imagine that
there might even be a use-case for randomizing the key order (like
calling random.shuffle on the keys). But if you are willing to use a
dict with arbitrary order, that means that *you don't care* what order
the keys are in. If you don't care, then insertion order should be no
better or worse than any other implementation-defined arbitrary order.


> I like the idea previously
> brought up that iteration order be deliberately randomized in that
> case, but we’d still need a good way to spell that.

That would only be in the scenario that we decide *not* to guarantee
insertion-order preserving semantics for dicts, in order to prevent
users from relying on an implementation feature that isn't a language
guarantee.


--
Steve


_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev

Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Raymond Hettinger

unread,
Nov 7, 2017, 12:13:10 AM11/7/17
to David Mertz, Antoine Pitrou, Python-Dev@Python. Org

> On Nov 6, 2017, at 8:05 PM, David Mertz <me...@gnosis.cx> wrote:
>
> I strongly opposed adding an ordered guarantee to regular dicts. If the implementation happens to keep that, great. Maybe OrderedDict can be rewritten to use the dict implementation. But the evidence that all implementations will always be fine with this restraint feels poor, and we have a perfectly good explicit OrderedDict for those who want that.

I think this post is dismissive of the value that users would get from having reliable ordering by default.

Having worked with Python 3.6 for a while, it is repeatedly delightful to encounter the effects of ordering. When debugging, it is a pleasure to be able to easily see what has changed in a dictionary. When creating XML, it is joy to see the attribs show in the same order you added them. When reading a configuration, modifying it, and writing it back out, it is a godsend to have it written out in about the same order you originally typed it in. The same applies to reading and writing JSON. When adding a VIA header in a HTTP proxy, it is nice to not permute the order of the other headers. When generating url query strings for REST APIs, it is nice have the parameter order match documented examples.

We've lived without order for so long that it seems that some of us now think data scrambling is a virtue. But it isn't. Scrambled data is the opposite of human friendly.


Raymond


P.S. Especially during debugging, it is often inconvenient, difficult, or impossible to bring in an OrderedDict after the fact or to inject one into third-party code that is returning regular dicts. Just because we have OrderedDict in collections doesn't mean that we always get to take advantage of it. Plain dicts get served to us whether we want them or not.
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

INADA Naoki

unread,
Nov 7, 2017, 12:42:24 AM11/7/17
to Raymond Hettinger, Antoine Pitrou, Python-Dev@Python. Org
I agree with Raymond. dict ordered by default makes better developer
experience.

So, my concern is how "language spec" is important for minor (sorry about my
bad vocabulary) implementation?
What's difference between "MicroPython is 100% compatible with
language spec" and
"MicroPython is almost compatible with Python language spec, but has
some restriction"?

If it's very important, how about "strong recommendation for
implementations" instead of
"language spec"?
Users who don't care implementations other than CPython and PyPy can rely on
it's usability.

Regards,
INADA Naoki <songof...@gmail.com>
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/songofacandy%40gmail.com

Nick Coghlan

unread,
Nov 7, 2017, 1:25:20 AM11/7/17
to Chris Jerdonek, Python-Dev
On 7 November 2017 at 03:27, Chris Jerdonek <chris.j...@gmail.com> wrote:
> On Mon, Nov 6, 2017 at 4:11 AM Nick Coghlan <ncog...@gmail.com> wrote:
>> Getting from the "Works on CPython 3.6+ but is technically
>> non-portable" state to a fully portable correct implementation that
>> ensures a particular key order in the JSON file thus currently
>> requires the following changes:
>
> Nick, it seems like this is more complicated than it needs to be. You can
> just pass sort_keys=True to json.dump() / json.dumps(). I use it for tests
> and human-readability all the time.

sort_keys is only equivalent to order preservation if the key order
you want *is* alphabetical order. While that's typically a good enough
assumption for JSON, it's not the case for things like CSV column
order, TOML or ini-file setting order, etc.

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Steven D'Aprano

unread,
Nov 7, 2017, 1:28:21 AM11/7/17
to pytho...@python.org
On Mon, Nov 06, 2017 at 08:05:07PM -0800, David Mertz wrote:
> I strongly opposed adding an ordered guarantee to regular dicts. If the
> implementation happens to keep that, great.

That's the worst of both worlds. The status quo is that unless we
deliberately perturb the dictionary order, developers will come to rely
on implementation order (because that's what the CPython reference
implementation actually offers, regardless of what the docs say).
Consequently:

- people will be writing non-portable code, whether they know it or not;

- CPython won't be able to change the implementation, because it will
break too much code;

- other implementations will be pressured to match CPython's
implementation.

The only difference is that on the one hand we are honest and up-front
about requiring order-preserving dicts, and on the other we still
require it, but pretend that we don't.

And frankly, it seems rather perverse to intentionally perturb
dictionary order just to keep our options open that someday there might
be some algorithm which offers sufficiently better performance but
doesn't preserve order. Preserving order is useful, desirable, often
requested functionality, and now that we have it, it would have to be
one hell of an optimization to justify dropping it again.

(It is like Timsort and stability. How much faster sorting would it
have taken to justify giving up sort stability? 50% faster? 100%? We
wouldn't have done it for a 1% speedup.)

It would be better to relax the requirement that builtin dict is used
for those things that would benefit from improved performance. Is there
any need for globals() to be the same mapping type as dict? Probably
not. If somebody comes up with a much more efficient, non-order-
preserving map ideal for globals, it would be better to change globals
than dict. In my opinion.


> Maybe OrderedDict can be
> rewritten to use the dict implementation. But the evidence that all
> implementations will always be fine with this restraint feels poor,

I think you have a different definition of "poor" to me :-)

Nick has already done a survey of PyPy (which already has insertion-
order preserving dicts), Jython, VOC, and Batavia, and they don't have
any problem with this. IronPython is built on C#, which has order-
preserving mappings. Nuitka is built on C++, and if C++ can't implement
an order-preserving mapping, there is something terribly wrong with the
world. Cython (I believe) uses CPython's implementation, as does
Stackless.

The only well-known implementation that may have trouble with this is
MicroPython, but it already changes the functionality of a lot of
builtins and core language features, e.g. it uses a different method
resolution order (so multiple inheritence won't work right), some
builtins don't support slicing with three arguments, etc.

I think the evidence is excellent that other implementations shouldn't
have a problem with this, unless (like MicroPython) they are targetting
machines with tiny memory resources. µPy runs on the PyBoard, which I
believe has under 200K of memory. I think we can all forgive µPy if it
only *approximately* matches Python semantics.


--
Steve
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Steven D'Aprano

unread,
Nov 7, 2017, 1:35:03 AM11/7/17
to pytho...@python.org
On Mon, Nov 06, 2017 at 06:35:48PM +0200, Paul Sokolovsky wrote:

> For MicroPython, it would lead to quite an overhead to make
> dictionary items be in insertion order. As I mentioned, MicroPython
> optimizes for very low bookkeeping memory overhead, so lookups are
> effectively O(n), but orderedness will increase constant factor
> significantly, perhaps 5x.

Paul, it would be good if you could respond to Raymond's earlier
comments where he wrote:

I've just looked at the MicroPython dictionary implementation and
think they won't have a problem implementing O(1) compact dicts with
ordering.

The likely reason for the confusion is that they are already have an
option for an "ordered array" dict variant that does a brute-force
linear search. However, their normal hashed lookup is very similar
to ours and is easily amenable to being compact and ordered.

See: https://github.com/micropython/micropython/blob/77a48e8cd493c0b0e0ca2d2ad58a110a23c6a232/py/map.c#L139

Raymond has also volunteered to assist with this.


> Also, arguably any algorithm which would *maintain* insertion order
> over mutating operations would be more complex and/or require more
> memory that one which doesn't.

I think it would be reasonable to say that builtin dicts only maintain
insertion order for insertions, lookups, and changing the value. Any
mutation which deletes keys may arbitrarily re-order the dict.

If the user wants a stronger guarantee, then they should use
OrderedDict.




--
Steve

Steven D'Aprano

unread,
Nov 7, 2017, 1:41:14 AM11/7/17
to pytho...@python.org
On Mon, Nov 06, 2017 at 10:17:23PM -0200, Joao S. O. Bueno wrote:

> And also, forgot along the discussion, is the big disadvantage that
> other Python implementations would have a quite
> significant overhead on mandatory ordered dicts.

I don't think that is correct. Nick already did a survey, and found that
C# (IronPython), Java (Jython and VOC) and Javascript (Batavia) all have
acceptable insertion-order preserving mappings. C++ (Nuitka) surely
won't have any problem with this (if C++ cannot implement an efficient
order-preserving map, there is something terribly wrong with the world).

As for other languages that somebody might choose to build Python on
(the Parrot VM, Haskell, D, Rust, etc) surely we shouldn't be limiting
what Python does for the sake of hypothetical implementations in
"underpowered" languages?

I don't mean to imply that any of those examples are necessarily
underpowered, but if language Foo is incapable of supporting an
efficient ordered map, then language Foo is simply not good enough for a
serious Python implementation. We shouldn't allow Python's evolution to
be hamstrung by the requirement to support arbitrarily weak
implementation languages.


> One that was mentioned along the way is transpilers, with
> Brython as an example - but there might be others.

Since Brython transpiles to Javascript, couldn't it use the standard
Map object, which preserves insertion order?

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Map

Quote:

Description
A Map object iterates its elements in insertion order

The EMCAScript 6 standard specifies that Map.prototype.forEach operates
over the key/value pairs in insertion order:

https://tc39.github.io/ecma262/#sec-map-objects



--
Steve

Nick Coghlan

unread,
Nov 7, 2017, 1:41:55 AM11/7/17
to Chris Barker, Python Dev
On 7 November 2017 at 09:23, Chris Barker <chris....@noaa.gov> wrote:
> in short -- we don't have a choice (unless we add an explicit randomization
> as some suggested -- but that just seems perverse...)

And this is the key point for me: "choosing not to choose" is
effectively the same as standardising the feature, as enough Python
code will come to rely on CPython's behaviour that most alternative
implementations will feel obliged to start behaving the same way
CPython does (with MicroPython being the potential exception due to
memory usage constraints always winning over algorithmic efficiency
concerns in that context).

We added ResourceWarning a while back to help discourage reliance on
CPython promptly calling __del__ methods when dropping the last
reference to an object.

An equivalent for this case would be for dict objects to randomize
iteration (ala Go), once again requiring folks to opt-in via
collections.OrderedDict to get guaranteed ordering (perhaps with a
"o{}" dict display as new syntactic sugar).

But unless someone actually writes a PEP and implementation for that
in the next 12 weeks (and Guido accepts it), then we'll have 2
releases and 3 years of CPython working a particular way increasing
the inertia against making such a change in 3.8 (and beyond that, I'd
say we'd be well and truly into de facto standardisation territory,
and the chances of ever introducing deliberate perturbation of dict
iteration order would drop to nil).

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Nick Coghlan

unread,
Nov 7, 2017, 2:30:33 AM11/7/17
to Steven D'Aprano, pytho...@python.org
On 7 November 2017 at 16:21, Steven D'Aprano <st...@pearwood.info> wrote:
> On Mon, Nov 06, 2017 at 08:05:07PM -0800, David Mertz wrote:
>> Maybe OrderedDict can be
>> rewritten to use the dict implementation. But the evidence that all
>> implementations will always be fine with this restraint feels poor,
>
> I think you have a different definition of "poor" to me :-)

While I think "poor" is understating the case, I think "excellent"
(which you use later on) is overstating it. My own characterisation
would be "at least arguably good enough".

> Nick has already done a survey of PyPy (which already has insertion-
> order preserving dicts), Jython, VOC, and Batavia, and they don't have
> any problem with this.

For these, my research only showed that their respective platforms
have an order-preserving hashmap implementation available.

What's entirely unclear at this point is how switching wholesale to
that may impact the *performance* of these implementations (both in
terms of speed and memory usage), and how much code churn would be
involved in actually making the change.

Making builtin dict() order-preserving may also still impose an
ongoing complexity cost on these implementations if they end up having
to split "the mapping we use for code execution namespaces" away from
"the mapping we provide as the builtin dict".

(That said, I think at least Jython already makes that distinction - I
believe their string-only namespace dict is a separate type, whereas
CPython plays dynamic optimisation games inside the regular dict
type).

So Barry's suggestion of providing an explicit
"collections.UnorderedDict" as a consistent spelling for "an
associative mapping without any ordering guarantees" is a reasonable
one, even if it's primary usage in CPython ends up being to ensure
algorithms are compatible with collections that don't provide an
inherently stable iteration order, and any associated performance
benefits are mostly seen on other implementations. (As Paul S notes,
such a data type would also serve a pedagogical purpose when teaching
computer science principles)

> IronPython is built on C#, which has order-
> preserving mappings.

I couldn't actually find a clearly suitable existing collection type
in the .NET CLR - the one I linked was just the one commonly
referenced as "good enough for most purposes". It had some constraints
that meant it may not be suitable as a builtin dict type in a Python
implementation (e.g. it looked to have a 32-bit length limit).

> Nuitka is built on C++, and if C++ can't implement
> an order-preserving mapping, there is something terribly wrong with the
> world. Cython (I believe) uses CPython's implementation, as does
> Stackless.

Right, the other C/C++ implementations that also target environments
with at least 128 MiB+ RAM (and typically more) can reasonably be
expected to tolerate similar space/speed trade-offs to those that
CPython itself makes (and that's assuming they aren't just using
CPython's data type implementations in the first place).

> The only well-known implementation that may have trouble with this is
> MicroPython, but it already changes the functionality of a lot of
> builtins and core language features, e.g. it uses a different method
> resolution order (so multiple inheritence won't work right), some
> builtins don't support slicing with three arguments, etc.
>
> I think the evidence is excellent that other implementations shouldn't
> have a problem with this, unless (like MicroPython) they are targetting
> machines with tiny memory resources. µPy runs on the PyBoard, which I
> believe has under 200K of memory. I think we can all forgive µPy if it
> only *approximately* matches Python semantics.

It runs on the ESP8266 as well, and that only has 96 kiB data memory.
This means we're already talking 3-4 orders of magnitude difference in
memory capacity and typical data set sizes between CPython and
MicroPython use cases, and that's only accounting for the *low* end of
CPython's use cases - once you expand out to multi-terabyte data sets
(the very upper end of what a single x86-64 server can handle if you
can afford the RAM for it), then we're talking 9-10 orders of
magnitude between CPython's high end and MicroPython's low end.

So for CPython's target use cases algorithmic efficiency dominates
performance, and we afford to invest extra memory usage and startup
overhead in service to more efficient data access algorithms.
MicroPython's the opposite - you're going to run out of memory for
data storage long before algorithmic inefficiency becomes your biggest
problem, but wasted bookkeeping memory and startup overhead can cause
problems even with small data sets.

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Steven D'Aprano

unread,
Nov 7, 2017, 2:41:29 AM11/7/17
to pytho...@python.org
On Tue, Nov 07, 2017 at 05:28:24PM +1000, Nick Coghlan wrote:
> On 7 November 2017 at 16:21, Steven D'Aprano <st...@pearwood.info> wrote:
> > On Mon, Nov 06, 2017 at 08:05:07PM -0800, David Mertz wrote:
> >> Maybe OrderedDict can be
> >> rewritten to use the dict implementation. But the evidence that all
> >> implementations will always be fine with this restraint feels poor,
> >
> > I think you have a different definition of "poor" to me :-)
>
> While I think "poor" is understating the case, I think "excellent"
> (which you use later on) is overstating it. My own characterisation
> would be "at least arguably good enough".

Fair enough, and thanks for elaborating.



--
Steve

Wolfgang

unread,
Nov 7, 2017, 3:02:03 AM11/7/17
to pytho...@python.org
Hi,

I have to admit sometimes I don't understand why small things produce so much
mail traffic. :-)

If I use a mapping like dict most of the time I don't care if it is ordered.
If I need an ordering I use OrderedDict. In a library if I need a dict to be
ordered most of the time there is a parameter allowing me to do this or to
specify the used class as dict.
If this is not the case it can be fixed. In rare cases a workaround is needed.

As of now I have dict and OrderedDict, it is clear and explicit.
No need to change.

Yes it is useful for debugging to have things ordered.
Yes in other places the implicit ordering is also fine.
For pypy and CPython good and take it, be happy.

Also it is fine to teach people that dict (Mapping) is not ordered
but CPython has an implementation detail and it is ordered.
But if you want the guarantee use OrderedDict.

To be really practical even if this is guaranteed in 3.9
I cannot rely on it because of Python 2.7, 3.5, 3.6, ... compatibility.
If this versions are out of order in 10 years, even then if I want to
produce a small library running on another implementation I have to
care, because of the list of differences to the CPython implementation
or because the project is not yet up to date with the implementation (Jython).
To be save then I will still use OrderedDict guaranteeing me this
what I want.

Finally even when dict literals will be guaranteed to be ordered
it is good to teach and use OrderedDict because it is explicit.

For implementations (algorithm) I cannot foresee the
future so I cannot tell if it will be a burden or not.

Finally someone have to decide it.
As long as OrderedDict is available for me to specify it explicit
it will be fine. ;-)

Regards,

Wolfgang

On 04.11.2017 18:30, Stefan Krah wrote:
>
> Hello,
>
> would it be possible to guarantee that dict literals are ordered in v3.7?
>
>
> The issue is well-known and the workarounds are tedious, example:
>
> https://mail.python.org/pipermail/python-ideas/2015-December/037423.html
>
>
> If the feature is guaranteed now, people can rely on it around v3.9.
>
>
>
> Stefan Krah
>
>
>
> _______________________________________________
> Python-Dev mailing list
> Pytho...@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/tds333%40mailbox.org

Petr Viktorin

unread,
Nov 7, 2017, 4:53:44 AM11/7/17
to pytho...@python.org
On 11/07/2017 09:00 AM, Wolfgang wrote:
[...]
> Also it is fine to teach people that dict (Mapping) is not ordered
> but CPython has an implementation detail and it is ordered.
> But if you want the guarantee use OrderedDict.

I don't think that is fine.
When I explained this in 3.5, dicts rearranging themselves seemed quite
weird to the newcomers.
This year, I'm not looking forward to saying that dicts behave
"intuitively", but you shouldn't rely on that, because they're
theoretically allowed to rearrange themselves.
The concept of "implementation detail" and language spec vs. multiple
interpreter implementations isn't easy to explain to someone in a "basic
coding literacy" course.

Today I can still show an example on Python 3.5. But most people I teach
today won't run their code on 3.5, or on MicroPython or Brython, and
quite soon they'll forget that there's no dict ordering guarantee.

Also: I happen to read python-dev and the language docs. I suspect not
all teachers do, and when they see that dict order randomization was
"fixed", they might just remove the explanation from the lesson and
teach something practical instead.

Paul Moore

unread,
Nov 7, 2017, 5:01:07 AM11/7/17
to Nick Coghlan, Chris Barker, Python Dev
On 7 November 2017 at 06:39, Nick Coghlan <ncog...@gmail.com> wrote:
> On 7 November 2017 at 09:23, Chris Barker <chris....@noaa.gov> wrote:
>> in short -- we don't have a choice (unless we add an explicit randomization
>> as some suggested -- but that just seems perverse...)
>
> And this is the key point for me: "choosing not to choose" is
> effectively the same as standardising the feature, as enough Python
> code will come to rely on CPython's behaviour that most alternative
> implementations will feel obliged to start behaving the same way
> CPython does (with MicroPython being the potential exception due to
> memory usage constraints always winning over algorithmic efficiency
> concerns in that context).

Personally, I think that having an ordered implementation and then
deliberately breaking that ordering is pretty silly. Not least because
we chose not to do that for 3.6. So we're left with the simple
question of whether we make the behaviour required in the
documentation (which is basically where this thread started). I see 3
options. First, we maintain the status quo, treat it as a CPython/PyPy
implementation detail, and accept that this means that people will
expect it - resulting in pressure on alternative implementations to
conform, but without a language spec to support them. Second, we
document the requirement in the language spec, requiring alternative
implementations to either implement it or document it as a way in
which they don't confirm to the language spec (which is unattractive
for them, as "implements the official Python language spec" is a
selling point). Or third, we could document it as an optional
behaviour, that language implementations are expected to implement if
possible, and if they can't they should document the variance. That is
to some extent the best of both worlds, in that it allows
implementations to claim conformance with the spec while still not
implementing the feature if it causes them problems to do so - they
just have to document that they don't implement this optional but
recommended behaviour. The downside of this option is that there's no
precedent for it in the Python spec (it's basically C's
"implementation defined behaviour", which is something Python hasn't
needed this far).

Paul

Jan Claeys

unread,
Nov 7, 2017, 7:43:39 AM11/7/17
to pytho...@python.org
On Tue, 2017-11-07 at 16:39 +1000, Nick Coghlan wrote:
> And this is the key point for me: "choosing not to choose" is
> effectively the same as standardising the feature, as enough Python
> code will come to rely on CPython's behaviour that most alternative
> implementations will feel obliged to start behaving the same way
> CPython does (with MicroPython being the potential exception due to
> memory usage constraints always winning over algorithmic efficiency
> concerns in that context).

Maybe an UnorderedDict could be added which Python implementations
_can_ implement as an optimized (less memory use, faster, ...) version
without ordering guarantees if they have a need for it. In other
implementations it could just be a synonym for a regular dict.

That way it would be explicit that the programmer doesn't care about
the ordered behaviour. It would also avoid current mistakes some
(especially those new to the language and occasional users) make,
because of assumptions from default dict behaviour.

(Maybe a commandline switch or other mechanisms to explicitly use that
UnorderedDict as the default could also be useful. It would be a no-op
in implementations which don't have differing implementations.)



--
Jan Claeys

Evpok Padding

unread,
Nov 7, 2017, 10:22:12 AM11/7/17
to pytho...@python.org
Hello,

I agree with Wolfgang here. From what I gathered of the discussion, the argument started from « It would be nice if dict litterals returned ordered dicts instead of an unordered ones », which is mostly a good thing, as it allows e.g. `OrderedDict({'spam': 'ham', 'sausages': 'eggs'})` instead of having to rely on lists of couples to create an OrderedDict. It is not of utmost utility, but it would be nice to have and not dissimilar to what we already have with kwargs being ordered dicts, also a matter of slightly better usability. Possibly, in order to avoid implying that all dicts guarantee ordering at all time, a syntax such as `o{}` might be used, mostly to help newcomers. So far, so good.

Where it started to go all haywire is when it became conflated that with the fact that CPython and Pypy dicts are actually ordered (up to a point at least) and it suddenly became « Let's guarantee the ordering of all dicts » which apparently is an issue for at least one implementation of Python, and still have to be implemented in several others (said implementation would be trivial, or so it is said, but it still has to be written, along with appropriate tests, regression checks…). So far, the arguments I have seen for that are

  1. It is useful in context where one has to guarantee the ordering of some mapping (such as in json)
  2. It is useful in contexts where ordering is facultative, but nice to have (debugging has been mentionned)
  3. It is already this way in CPython, so people are going to use that anyway

I take issue with all of those arguments.

  1. If the ordered should be guaranteed, why would it be so hard to use OrderedDict ?
    - Just write `d = OrderedDict(((key, val) for key, value in …))` instead of `{key: value for key, value in …}`. It is not that hard and at least it is explicit that the order is important. And if it is really so hard, we could have dict comprehensions be ordered too in addition to litterals, it still doesn't mean that dicts should all be ordered
    - It is even easier if you fill your dict value-per-value, just initialise it as `d = OrderedDict` instead of `d = {}` and voilà !
  2. I would like to see some examples of cases where this is really much more convenient than any other soution, but even then I suspect that these cases are not sufficently relevant to wed all Python backends to ordered dicts forever.
  3. This is just a pure fallacy. The language has a documented API that says that if order of insertion is important, you should explicitely use an OrderedDict. If people stray away from it and use implementation details such as the ordering of dict in CPython, they are on their own and shouldn't expect it to be portable to any other version. Again, it's not as if OrderedDict did not exist or was much more inconvenient to use than dict.

Also, since the proposed implementation does not keep ordering on deletion, those relying implicitely on the ordering of dicts without reading the docs might get bitten by it later in much more subtle ways.
Note that I don't sugest mandatory shuffling of dicts to advertise their non-guaranteed ordering, either. Just that reading the docs (or having your instructor tell you that dict does not guarantee the order) is the reponsibility of the end user.

To sum it up

  - Ordered dict litterals are a nice idea, but probably not that important. If it happens, it would be nice if it could be extended to dict comprehensions, though.
  - Guaranteeing the ordering of all `dicts` does not make a lot of sense
  - Changing the API to guarantee the order of dicts **is** an API change, which still means work

Am I missing something ?

Cheers,

E

Thomas Nyberg

unread,
Nov 7, 2017, 10:24:00 AM11/7/17
to pytho...@python.org
On 11/07/2017 09:00 AM, Wolfgang wrote:
> Hi,
>
> I have to admit sometimes I don't understand why small things produce so much
> mail traffic. :-)
>
> If I use a mapping like dict most of the time I don't care if it is ordered.
> If I need an ordering I use OrderedDict. In a library if I need a dict to be
> ordered most of the time there is a parameter allowing me to do this or to
> specify the used class as dict.
> If this is not the case it can be fixed. In rare cases a workaround is needed.
>
> As of now I have dict and OrderedDict, it is clear and explicit.
> No need to change.
>
> [...]
>
> Regards,
>
> Wolfgang

I feel a bit out of place on this list since I'm a lurker and not a core
developer, but I just wanted to add my agreement with this as well. So
maybe take my opinion with a grain of salt...

I agree with Wolfgang. I just don't understand why this change is
needed. We have dict and we have OrderedDict. Why does dict need to
provide the extra ordering constraints? I've read many posts in this
discussion and find none of them convincing. Python already guarantees
things like ordering of keyword arguments. I've seen some people point
out confusion of newcomers (e.g. they are surprised when order is not
surprised), but that just seems to me natural confusion that comes about
when learning. I would argue that a better solution to that problem is
exactly the go solution: i.e. purposely perturbing the ordering in a way
that shows up immediately so that users realize the problems in their
thinking earlier. The dict provides a mapping from key to value. I
personally think that that is mentally much simpler object than a
mapping from key to value with certain ordering guarantees. If I want to
extra guarantees I import OrderedDict and read what the guarantees are.
This seems totally fine to me. I don't really see any advantages to this
change but a lack of implementation flexibility and a more complicated
core object in Python.

Cheers,
Thomas

David Mertz

unread,
Nov 7, 2017, 10:28:54 AM11/7/17
to Raymond Hettinger, Antoine Pitrou, Python-Dev
On Nov 6, 2017 9:11 PM, "Raymond Hettinger" <raymond....@gmail.com> wrote:
> On Nov 6, 2017, at 8:05 PM, David Mertz <me...@gnosis.cx> wrote:
> I strongly opposed adding an ordered guarantee to regular dicts. If the implementation happens to keep that, great. Maybe OrderedDict can be rewritten to use the dict implementation. But the evidence that all implementations will always be fine with this restraint feels poor, and we have a perfectly good explicit OrderedDict for those who want that.

I think this post is dismissive of the value that users would get from having reliable ordering by default.

Dismissive seems like an overly strong word. I recognize I disagree with Raymond on best official semantics. Someone else points out that if someday an "even more efficient unordered dict" is discovered, user-facing "dict" doesn't strictly have to be the same data structure as "internal dict". The fact they are is admittedly an implementation detail also.

I've had all those same uses about round-tripping serialization that Raymond mentions. I know the standard work arounds (which are not difficult, but DO require a little extra code if we don't have order).

But like Raymond, I make most of my living TEACHING Python. I feel like the extra order guarantee would make teaching slightly harder. I'm sure he feels contrarily. It is true that with 3.6 I can no longer show an example where the dict display is oddly changed when printed. But then, unordered sets also wind up sorting small integers on printing, even though that's not a guarantee.

Ordering by insertion order (possibly "only until first deletion") is simply not obvious to beginners. If we had, hypothetically, a dict that "always alphabetized keys" that would be more intuitive to them, for example. Insertion order feels obvious to us experts, but it really is an extra cognitive burden to learners beyond understanding "key/Val association".

Barry Warsaw

unread,
Nov 7, 2017, 11:43:31 AM11/7/17
to Python Dev
On Nov 6, 2017, at 22:33, Steven D'Aprano <st...@pearwood.info> wrote:

> I think it would be reasonable to say that builtin dicts only maintain
> insertion order for insertions, lookups, and changing the value. Any
> mutation which deletes keys may arbitrarily re-order the dict.
>
> If the user wants a stronger guarantee, then they should use
> OrderedDict.

In fact, that *is* leaking CPython’s implementation into the language specification. If by chance CPython’s implementation preserved order even after key deletion, either now or in the future, would that be defined as the ordering guarantees for built-in dict in the Python Language Reference?

Cheers,
-Barry

signature.asc

Barry Warsaw

unread,
Nov 7, 2017, 11:52:38 AM11/7/17
to Python Dev
On Nov 7, 2017, at 01:51, Petr Viktorin <enc...@gmail.com> wrote:
>
> When I explained this in 3.5, dicts rearranging themselves seemed quite weird to the newcomers.
> This year, I'm not looking forward to saying that dicts behave "intuitively", but you shouldn't rely on that, because they're theoretically allowed to rearrange themselves.
> The concept of "implementation detail" and language spec vs. multiple interpreter implementations isn't easy to explain to someone in a "basic coding literacy" course.

Perhaps, but IME, it’s not hard to teach someone that in a code review. Today, if I see someone submit a change that includes an implicit assumption about ordering, I’ll call that out. I can say “you can’t rely on dicts being ordered, so if that’s what you want, use an OrderedDict or sort your test data”. That’s usually a localized observation, meaning, I can look at the diff and see that they are assuming dict iteration ordering.

What happens when built-in dict’s implementation behavior becomes a language guarantee? Now the review is much more difficult because I probably won’t be able to tell just from a diff whether the ordering guarantees are preserved. Do they delete a key somewhere? Who knows? I’m not even sure that would be statically determinable since I’d have to trace the use of that dictionary at run time to see if some “del d[key]” is deleting the key in the dict under review or not. I can probably only tell that at run time.

So how to I accurately review that code? Is the order presumption valid or invalid?

Cheers,
-Barry

signature.asc

Paul Sokolovsky

unread,
Nov 7, 2017, 12:12:00 PM11/7/17
to Steven D'Aprano, Raymond Hettinger, pytho...@python.org
Hello,

On Tue, 7 Nov 2017 17:33:03 +1100
Steven D'Aprano <st...@pearwood.info> wrote:

> On Mon, Nov 06, 2017 at 06:35:48PM +0200, Paul Sokolovsky wrote:
>
> > For MicroPython, it would lead to quite an overhead to make
> > dictionary items be in insertion order. As I mentioned, MicroPython
> > optimizes for very low bookkeeping memory overhead, so lookups are
> > effectively O(n), but orderedness will increase constant factor
> > significantly, perhaps 5x.
>
> Paul, it would be good if you could respond to Raymond's earlier
> comments where he wrote:
>
> I've just looked at the MicroPython dictionary implementation and
> think they won't have a problem implementing O(1) compact dicts
> with ordering.
>
> The likely reason for the confusion is that they are already have
> an option for an "ordered array" dict variant that does a brute-force
> linear search. However, their normal hashed lookup is very
> similar to ours and is easily amenable to being compact and ordered.
>
> See:
> https://github.com/micropython/micropython/blob/77a48e8cd493c0b0e0ca2d2ad58a110a23c6a232/py/map.c#L139
>
> Raymond has also volunteered to assist with this.

I tried to do that, let me summarize previous point and give IMHO re:
contributing an alternative:

MicroPython's dict implementation is optimized for the least
bookkeeping overhead, not performance on overlarge datasets. For the
heap sizes we target (64KB on average), that's a good choice (put it
differently, MicroPython's motto is "system's memory (all bunch
kilobytes of it) belongs to user, not to MicroPython").

Adding insertion order would either:

1. Lead to significant (several times) slowdown, or

2. To noticeable memory overhead. Note that MicroPython uses the
absolute minimum for a dictionary entry - 2 words (key and value).
Adding even one extra word (e.g. some indirection pointer) means
increasing overhead by 50%, cutting useful user memory size by third.


With all that in mind, MicroPython is a very configurable project (215
config options as of this moment). We can have a config option for dict
implementation too.


But, the points above still hold - MicroPython targets low-memory
systems and doesn't really target plenty-of-memory systems (there was
never an aim to compete with CPython, the aim was to bring Python
(the language) where CPython could never go). Put it another way,
the alternative dict implementation is not expected to be used by
default.

If, with all the above in mind, someone, especially a CPython developer,
wants to contribute an alternative dict implementation, it would be
gladly accepted. (Note that if CPython followed the same policy, i.e.
allowed compile-time selection of old vs new dict algorithm, we
wouldn't have this thread.)

(Disclaimer: all the above is just my IMHO as a long-time contributor,
I'm not a MicroPython BDFL).

And I really appreciate all the attention to MicroPython - that's the
biggest case on my memory on python-dev.


[]

--
Best regards,
Paul mailto:pmi...@gmail.com

Paul Sokolovsky

unread,
Nov 7, 2017, 12:41:47 PM11/7/17
to INADA Naoki, Antoine Pitrou, Raymond Hettinger, Python-Dev@Python. Org
Hello,

On Tue, 7 Nov 2017 14:40:07 +0900
INADA Naoki <songof...@gmail.com> wrote:

> I agree with Raymond. dict ordered by default makes better developer
> experience.
>
> So, my concern is how "language spec" is important for minor (sorry
> about my bad vocabulary) implementation?
> What's difference between "MicroPython is 100% compatible with
> language spec" and
> "MicroPython is almost compatible with Python language spec, but has
> some restriction"?

So, the problem is that there's no "Python language spec". And over
time, that becomes a problem for alternative implementations, especially
not mainstream ("we have infinite amount of memory to burn") ones. What
we have is just CPython documentation, which mixes Python language spec
and CPython implementation details. And is being changed (including
"language spec" part) on the fiat of CPython developers, apparently
without any guarantees of platform stability and backward
compatibility.

Over time, this really becomes a visible drawback, comparing to the
close competitors. For example, year goes by year, but in JavaScript,

[] + []

is still:

''

That's stability!

[]

--
Best regards,
Paul mailto:pmi...@gmail.com

Mark Lawrence

unread,
Nov 7, 2017, 12:47:05 PM11/7/17
to pytho...@python.org
On 07/11/17 04:05, David Mertz wrote:
> I strongly opposed adding an ordered guarantee to regular dicts. If the
> implementation happens to keep that, great. Maybe OrderedDict can be
> rewritten to use the dict implementation. But the evidence that all
> implementations will always be fine with this restraint feels poor, and
> we have a perfectly good explicit OrderedDict for those who want that.
>
>
If there is an ordered guarantee for regular dicts but not for dict
literals, which is the subject of this thread, then haven't we got a
recipe for the kind of confusion that will lead to the number of
questions from newbies going off of the Richter scale?

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Barry Warsaw

unread,
Nov 7, 2017, 1:02:22 PM11/7/17
to Python-Dev@Python. Org
On Nov 7, 2017, at 09:39, Paul Sokolovsky <pmi...@gmail.com> wrote:

> So, the problem is that there's no "Python language spec”.

There is a language specification: https://docs.python.org/3/reference/index.html

But there are still corners that are undocumented, or topics that are deliberately left as implementation details.

Cheers,
-Barry

signature.asc

Nathaniel Smith

unread,
Nov 7, 2017, 3:11:00 PM11/7/17
to Barry Warsaw, Python Dev
Also, specs don't mean that much unless there are multiple implementations in widespread use. In JS the spec matters because it describes the common subset of the language you can expect to see across browsers, and lets the browser vendors coordinate on future changes. Since users actually target and test against multiple implementations, this is useful. In python, CPython's dominance means that most libraries are written against CPython's behavior instead of the spec, and alternative implementations generally don't care about the spec, they care about whether they can run the code their users want to run. So PyPy has found that for their purposes, the python spec includes all kinds of obscure internal implementation details like CPython's static type/heap type distinction, the exact tricks CPython uses to optimize local variable access, the CPython C API, etc. The Pyston devs found that for their purposes, refcounting actually was a mandatory part of the python language. Jython, MicroPython, etc make a different set of compatibility tradeoffs again.

I'm not saying the spec is useless, but it's not magic either. It only matters to the extent that it solves some problem for people.

-n

Paul G

unread,
Nov 7, 2017, 3:40:22 PM11/7/17
to pytho...@python.org
If dictionary order is *not* guaranteed in the spec and the dictionary order isn't randomized (which I think everyone agrees is a bit messed up), it would probably be useful if you could enable "random order mode" in CPython, so you can stress-test that your code isn't making any assumptions about dictionary ordering without having to use an implementation where order isn't deterministic.

I could either be something like an environment variable SCRAMBLE_DICT_ORDER or a flag like --scramble-dict-order. That would probably help somewhat with the very real problem of "everyone's going to start counting on this ordered property".
> _______________________________________________
> Python-Dev mailing list
> Pytho...@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/paul%40ganssle.io
>

signature.asc

Chris Barker

unread,
Nov 7, 2017, 4:16:39 PM11/7/17
to David Mertz, Antoine Pitrou, Raymond Hettinger, Python-Dev
On Tue, Nov 7, 2017 at 7:21 AM, David Mertz <me...@gnosis.cx> wrote:
But like Raymond, I make most of my living TEACHING Python.

and I make least of my living TEACHING Python ( :-) ),and:
 
I feel like the extra order guarantee would make teaching slightly harder.

I can't understand how this possibly makes python (or dicts) harder to teach -- you can simply say: "dicts insertion order is preserved" or not mention it at all -- I think most people kind of expect it to be preserved, which is why I (used to )always bring up the lack-of-ordering of dicts early on -- but I suspect I simply won't bother mentioning it if it's decided as a language feature.

I'm sure he feels contrarily. It is true that with 3.6 I can no longer show an example where the dict display is oddly changed when printed.

Exactly! I have a really hard time deciding how to handle this -- explaining that ordering is not guaranteed, but not being able to demonstrate it! And frankly, my students are all going to forget what I "explained" soon enough, and replace it with their experience -- which will be that dicts retain their order.

But then, unordered sets also wind up sorting small integers on printing, even though that's not a guarantee.

but it's not hard to make an easy example with order not preserved -- jsut start with a non order example:

In [6]: s = {3,7,4}

In [7]: s

Out[7]: {3, 4, 7}

or use other types...

And  "set" is a mathematical concept that has no oder, whereas the "dictionary" metaphor DOES have order...

Ordering by insertion order (possibly "only until first deletion") is simply not obvious to beginners.

the "only until first deletion" part is really hard -- I hope we don't go that route. But I don't think insertion-order is non-obvious -- particularly with literals.
 
If we had, hypothetically, a dict that "always alphabetized keys" that would be more intuitive to them, for example. Insertion order feels obvious to us experts, but it really is an extra cognitive burden to learners beyond understanding "key/Val association".

again, I don't think so -- I kind of agree if dicts did not preserve order in practice -- demonstrating that right out of the gate does help make the "key/Val association" clear -- but if you can't demonstrate it, I think we're looking at more confusion...

Maybe I'll ask my students this evening -- this is the first class I'm teaching with py3.6 ....

We've lived without order for so long that it seems that some of us now think data scrambling is a virtue.  But it isn't.  Scrambled data is the opposite of human friendly.

exactly!

-CHB

--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris....@noaa.gov

Paul Moore

unread,
Nov 7, 2017, 4:17:47 PM11/7/17
to Paul G, Python Dev
On 7 November 2017 at 20:35, Paul G <pa...@ganssle.io> wrote:
> If dictionary order is *not* guaranteed in the spec and the dictionary order isn't randomized (which I think everyone agrees is a bit messed up), it would probably be useful if you could enable "random order mode" in CPython, so you can stress-test that your code isn't making any assumptions about dictionary ordering without having to use an implementation where order isn't deterministic.
>
> I could either be something like an environment variable SCRAMBLE_DICT_ORDER or a flag like --scramble-dict-order. That would probably help somewhat with the very real problem of "everyone's going to start counting on this ordered property".

This seems like overkill to me. By the same logic, we should add a
"delay garbage collection" mode, that allows people to test that their
code doesn't make unwarranted assumptions that a reference-counting
garbage collector is in use.

Most public projects (which are the only ones that really need to
worry about this sort of detail) will probably be supporting Python
3.5 and likely even Python 2.7 for some time yet. So they test under
non-order-preserving dictionary implementations anyway. And if code
that's only targeted for Python 3.7 assumes order preserving
dictionaries, it's likely not got a huge user base anyway, so what's
the problem?

Paul
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Barry Warsaw

unread,
Nov 7, 2017, 4:21:33 PM11/7/17
to Python-Dev@Python. Org
On Nov 7, 2017, at 12:35, Paul G <pa...@ganssle.io> wrote:
>
> If dictionary order is *not* guaranteed in the spec and the dictionary order isn't randomized (which I think everyone agrees is a bit messed up), it would probably be useful if you could enable "random order mode" in CPython, so you can stress-test that your code isn't making any assumptions about dictionary ordering without having to use an implementation where order isn't deterministic.

As has been suggested elsewhere, if we decide not to make that guarantee, then we should probably deliberately randomize iteration order.

-Barry

signature.asc

Paul Moore

unread,
Nov 7, 2017, 4:26:30 PM11/7/17
to Chris Barker, Antoine Pitrou, Raymond Hettinger, Python-Dev
On 7 November 2017 at 21:13, Chris Barker <chris....@noaa.gov> wrote:
> the "only until first deletion" part is really hard -- I hope we don't go
> that route. But I don't think insertion-order is non-obvious -- particularly
> with literals.

But I thought we *had* gone that route. Actually, there's no "route"
to go here. We're only talking about documenting the existing
semantics that cPython has, and I thought that included no longer
guaranteeing insertion order after a delete. Although I can't prove
that by experiment at the moment.

I don't know whether my confusion above is an argument for or against
documenting the behaviour :-)

Paul G

unread,
Nov 7, 2017, 4:35:51 PM11/7/17
to Python Dev
@ Paul Moore
> This seems like overkill to me. By the same logic, we should add a
> "delay garbage collection" mode, that allows people to test that their
> code doesn't make unwarranted assumptions that a reference-counting
> garbage collector is in use.

But you can get pretty fine-grained control of garbage collection with judicious use of gc.disable(). If there were a similar mechanism for changing the ordering properties of dictionaries in code, I wouldn't suggest it as an interpreter flag / option.

And you're right - it's not pressing - the people likely to test with dictionaries scrambled are exactly the people likely to be supporting 2.7 and 3.5, but that won't be true forever, and it would be nice to have *some* mechanism to test that you're not relying on this property.

@Barry Warsaw
> As has been suggested elsewhere, if we decide not to make that guarantee, then we should probably deliberately randomize iteration order.

This was my suggestion of a middle way, since deliberate randomization seems like a step too far just to avoid people relying on implementation details. I've seen plenty of code that assumes that `assert` statements will always throw `AssertionError`, but that's not guaranteed, and some people run their test suites with -O just to check that they aren't making that assumption.

On 11/07/2017 04:15 PM, Paul Moore wrote:
> On 7 November 2017 at 20:35, Paul G <pa...@ganssle.io> wrote:
>> If dictionary order is *not* guaranteed in the spec and the dictionary order isn't randomized (which I think everyone agrees is a bit messed up), it would probably be useful if you could enable "random order mode" in CPython, so you can stress-test that your code isn't making any assumptions about dictionary ordering without having to use an implementation where order isn't deterministic.
>>
>> I could either be something like an environment variable SCRAMBLE_DICT_ORDER or a flag like --scramble-dict-order. That would probably help somewhat with the very real problem of "everyone's going to start counting on this ordered property".
>

>
signature.asc

Chris Barker - NOAA Federal

unread,
Nov 7, 2017, 7:18:01 PM11/7/17
to Paul G, Python Dev

This seems like overkill to me. By the same logic, we should add a
"delay garbage collection" mode, that allows people to test that their
code doesn't make unwarranted assumptions that a reference-counting
garbage collector is in use.

Actually, there is a LOT of code out there that expects reference counting. I know a lot of my code does. So if cPython does abandon it some day, there will be issues.

Another thought:

Let’s say Python declares that dict literals are order preserving. 

Then, one day, someone invents a massively more efficient non-order preserving implementation, and we want to use it for Python 4.

So the Python 4 language spec says that dicts are not order preserving. And this is flagged as an INCOMPATIBLE CHANGE.

Now everyone knows to go and check their code, and the 3to4 tool adds an “o” to all dict literals.

People will complain, but it won’t be unexpected breakage.

Compare to leaving it as an implementation detail — now we will have a lot of code in the wild that expects order-preservation (because people don’t read the language spec) that will break with such a change without any real warning.

I think we really do need to accept that cPython is a reference implementation.

Because it is.

By the way, I only just realized I can delete a key to demonstrate non-order-preservation on py 3.6. So at least I know what to tell students now.

-CHB







But you can get pretty fine-grained control of garbage collection with judicious use of gc.disable(). If there were a similar mechanism for changing the ordering properties of dictionaries in code, I wouldn't suggest it as an interpreter flag / option.

And you're right - it's not pressing - the people likely to test with dictionaries scrambled are exactly the people likely to be supporting 2.7 and 3.5, but that won't be true forever, and it would be nice to have *some* mechanism to test that you're not relying on this property.

@Barry Warsaw
As has been suggested elsewhere, if we decide not to make that guarantee, then we should probably deliberately randomize iteration order.

This was my suggestion of a middle way, since deliberate randomization seems like a step too far just to avoid people relying on implementation details. I've seen plenty of code that assumes that `assert` statements will always throw `AssertionError`, but that's not guaranteed, and some people run their test suites with -O just to check that they aren't making that assumption.

On 11/07/2017 04:15 PM, Paul Moore wrote:
On 7 November 2017 at 20:35, Paul G <pa...@ganssle.io> wrote:
If dictionary order is *not* guaranteed in the spec and the dictionary order isn't randomized (which I think everyone agrees is a bit messed up), it would probably be useful if you could enable "random order mode" in CPython, so you can stress-test that your code isn't making any assumptions about dictionary ordering without having to use an implementation where order isn't deterministic.

I could either be something like an environment variable SCRAMBLE_DICT_ORDER or a flag like --scramble-dict-order. That would probably help somewhat with the very real problem of "everyone's going to start counting on this ordered property".



Most public projects (which are the only ones that really need to
worry about this sort of detail) will probably be supporting Python
3.5 and likely even Python 2.7 for some time yet. So they test under
non-order-preserving dictionary implementations anyway. And if code
that's only targeted for Python 3.7 assumes order preserving
dictionaries, it's likely not got a huge user base anyway, so what's
the problem?

Paul


_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev

INADA Naoki

unread,
Nov 7, 2017, 8:04:22 PM11/7/17
to Chris Barker - NOAA Federal, Python Dev
> By the way, I only just realized I can delete a key to demonstrate
> non-order-preservation on py 3.6. So at least I know what to tell students
> now.
>

You can't. dict in Python 3.6 preserves insertion order even after
key deletion.
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Nick Coghlan

unread,
Nov 7, 2017, 8:46:09 PM11/7/17
to Paul Moore, Python Dev
On 8 November 2017 at 07:15, Paul Moore <p.f....@gmail.com> wrote:
> On 7 November 2017 at 20:35, Paul G <pa...@ganssle.io> wrote:
>> If dictionary order is *not* guaranteed in the spec and the dictionary order isn't randomized (which I think everyone agrees is a bit messed up), it would probably be useful if you could enable "random order mode" in CPython, so you can stress-test that your code isn't making any assumptions about dictionary ordering without having to use an implementation where order isn't deterministic.
>>
>> I could either be something like an environment variable SCRAMBLE_DICT_ORDER or a flag like --scramble-dict-order. That would probably help somewhat with the very real problem of "everyone's going to start counting on this ordered property".
>
> This seems like overkill to me. By the same logic, we should add a
> "delay garbage collection" mode, that allows people to test that their
> code doesn't make unwarranted assumptions that a reference-counting
> garbage collector is in use.

Quite a few projects these days include PyPy in their CI test matrix,
and one of the things that does is test whether or not you're relying
on refcounting semantics.

We also offer ResourceWarning natively in CPython, which means if you
run under "python3 -Wd", you'll get a warning when external resources
like files are cleaned up implicitly:

$ python3 -Wd
Python 3.6.2 (default, Oct 2 2017, 16:51:32)
[GCC 7.2.1 20170915 (Red Hat 7.2.1-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> f = open(".bashrc")
>>> del f
__main__:1: ResourceWarning: unclosed file <_io.TextIOWrapper
name='.bashrc' mode='r' encoding='UTF-8'>
>>>

> Most public projects (which are the only ones that really need to
> worry about this sort of detail) will probably be supporting Python
> 3.5 and likely even Python 2.7 for some time yet. So they test under
> non-order-preserving dictionary implementations anyway. And if code
> that's only targeted for Python 3.7 assumes order preserving
> dictionaries, it's likely not got a huge user base anyway, so what's
> the problem?

The concern is that if we don't explicitly perturb dict iteration
order (or offer a way to opt-in to that), then insertion ordering will
end up becoming a *de facto* compatibility requirement for Python
implementations as CPython 2.7 and other releases prior to 3.6 start
dropping out of typical test matrices. With both 2.7 and 3.5 going
end-of-life in 2020, that means 3.7 (mid 2018) and 3.8 (late 2019 or
early 2020) are our available opportunities to make that decision -
beyond that, it starts getting a lot harder to change course away from
implicit standardisation, as there'll be a lot more 3.6+-only code in
the world by then.

The way golang handled this problem is in their dict iterator: they
added an extra field to hold a randomly generated hash, and used that
hash as the starting point in their iteration sequence.

We wouldn't be able to implement per-iterator order randomisation in
CPython due to the way the PyDict_Next API works: that uses a
caller-provided Py_ssize_t entry to track the current position in the
iteration sequence.

This means the simplest change we can make is to adjust the code in
_PyDict_Next that reads the "current iteration index" from the user
supplied pointer to instead interpret that as having a constant offset
(e.g. starting with the last item in the "natural" iteration order,
and then looping back around to the first one).

"Simplest" isn't the same as "simple" though, as:

1. We can't change this globally for all dicts, as we actually *do*
need keyword argument dicts and class body execution namespaces to be
insertion ordered. That makes it either a per-instance setting, or
else a subtly different dict type.
2. So far, I haven't actually come up with a perturbed iteration
implementation that doesn't segfault the interpreter. The dict
internals are nicely laid out to be iteration friendly, but they
really do assume that you're going to start at index zero, and then
iterate through to the end of the array. The bounds checking and
pointer validity testing becomes relatively fiddly if you try to push
against that and instead start iteration from a point partway through
the storage array.

That second point also becomes a concern from a performance
perspective because this is code that runs on *each* iteration of a
loop, rather than purely as part of the loop setup.

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Barry Warsaw

unread,
Nov 7, 2017, 8:52:56 PM11/7/17
to Python Dev
On Nov 7, 2017, at 16:15, Chris Barker - NOAA Federal <chris....@noaa.gov> wrote:

> Actually, there is a LOT of code out there that expects reference counting. I know a lot of my code does. So if cPython does abandon it some day, there will be issues.

I see this all the time in code reviews:

content = open(some file).read()

I never let that go uncommented.

So while you’re right that CPython is the reference implementation, and few people read the language spec, it’s still encombunt on us to point out broken code, code with implicit assumptions, and code that is not portable between implementations. Having the reference manual to point to chapter and verse is critical to avoid Python just devolving into an ad-hoc language ruled by its most popular implementation. This is something I believe Guido realized early on when JPython was first invented, and it was an important distinction that Python held, e.g. versus Perl. I still believe it’s an important principle to maintain.

Cheers,
-Barry


signature.asc

Nick Coghlan

unread,
Nov 7, 2017, 9:35:40 PM11/7/17
to Paul Moore, Python Dev
On 8 November 2017 at 11:44, Nick Coghlan <ncog...@gmail.com> wrote:
> 2. So far, I haven't actually come up with a perturbed iteration
> implementation that doesn't segfault the interpreter. The dict
> internals are nicely laid out to be iteration friendly, but they
> really do assume that you're going to start at index zero, and then
> iterate through to the end of the array. The bounds checking and
> pointer validity testing becomes relatively fiddly if you try to push
> against that and instead start iteration from a point partway through
> the storage array.

In case anyone else wants to experiment with a proof of concept:
https://github.com/ncoghlan/cpython/commit/6a8a6fa32f0a9cd71d9078fbb2b5ea44d5c5c14d

I think we've probably exhausted the utility of discussing this as a
purely hypothetical change, and so the only way to move the discussion
forward will be for someone to draft a patch that:

1. Perturbs iteration for regular dicts (it's OK for our purposes if
it's still deterministic - it just shouldn't match insertion order the
way odict does)
2. Switches keyword args and class body execution namespaces over to
odict so the test suite passes again
3. Measures the impact such a change would have on the benchmark suite

My experiment is a starting point, but it will still be a fair bit of
work to get it from there to a viable proof of concept that can be
assessed against the status quo.

INADA Naoki

unread,
Nov 7, 2017, 9:42:59 PM11/7/17
to Nick Coghlan, Python Dev
> 2. Switches keyword args and class body execution namespaces over to
> odict so the test suite passes again
> 3. Measures the impact such a change would have on the benchmark suite

For now, odict use twice memory and 2x slower on iteration.
https://bugs.python.org/issue31265#msg301942

INADA Naoki <songof...@gmail.com>
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/songofacandy%40gmail.com

INADA Naoki

unread,
Nov 7, 2017, 9:53:04 PM11/7/17
to Paul G, Python-Dev
On Wed, Nov 8, 2017 at 5:35 AM, Paul G <pa...@ganssle.io> wrote:
> If dictionary order is *not* guaranteed in the spec and the dictionary order isn't randomized (which I think everyone agrees is a bit messed up), it would probably be useful if you could enable "random order mode" in CPython, so you can stress-test that your code isn't making any assumptions about dictionary ordering without having to use an implementation where order isn't deterministic.
>
> I could either be something like an environment variable SCRAMBLE_DICT_ORDER or a flag like --scramble-dict-order. That would probably help somewhat with the very real problem of "everyone's going to start counting on this ordered property".

Namespace is ordered by language spec.
What does SCRAMBLE_DICT_ORDER in this code?

class A:
def __init__(self):
self.a, self.b, self.c = 1, 2, 3

a = A()
print(a.__dict__)
a.__dict__.pop('a')
print(a.__dict__)


Anyway, I'm -1 on adding such option to dict. dict in CPython is complicated
already for performance and compatibility reason.
I don't want to add more complexity to dict for such reason.

Regards,

INADA Naoki <songof...@gmail.com>
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Guido van Rossum

unread,
Nov 8, 2017, 12:47:41 AM11/8/17
to Python-Dev
It seems there must be at least two threads for each topic worth discussing at all. Therefore I feel compelled to point to https://mail.python.org/pipermail/python-dev/2017-November/150381.html, where I state my own conclusion about dict order.

I know Paul Sokolovsky does not claim to speak for MicroPython, but I think he had better shut up or he's nevertheless going to damage its reputation beyond repair. I note that micropython.org claims "MicroPython aims to be as compatible with normal Python as possible to allow you to transfer code with ease from the desktop to a microcontroller or embedded system." To me this implies that it is entirely up to the MicroPython project to decide what they'll do about the order of dict elements -- they can keep doing what they are doing, or choose a new dict implementation that satisfies their space and performance needs while also preserving order, or give developer a compile-time choice, or give users a choice at startup time, or something else I haven't thought of yet. Any of those options is better than continuing the flamewar that Paul is waging.

Finally: the dict type should not be endowed with other parts of the OrderedDict API, not should other API changes to dict be considered.

--
--Guido van Rossum (python.org/~guido)

Serhiy Storchaka

unread,
Nov 8, 2017, 3:35:26 AM11/8/17
to pytho...@python.org
08.11.17 04:33, Nick Coghlan пише:

> On 8 November 2017 at 11:44, Nick Coghlan <ncog...@gmail.com> wrote:
>> 2. So far, I haven't actually come up with a perturbed iteration
>> implementation that doesn't segfault the interpreter. The dict
>> internals are nicely laid out to be iteration friendly, but they
>> really do assume that you're going to start at index zero, and then
>> iterate through to the end of the array. The bounds checking and
>> pointer validity testing becomes relatively fiddly if you try to push
>> against that and instead start iteration from a point partway through
>> the storage array.
>
> In case anyone else wants to experiment with a proof of concept:
> https://github.com/ncoghlan/cpython/commit/6a8a6fa32f0a9cd71d9078fbb2b5ea44d5c5c14d
>
> I think we've probably exhausted the utility of discussing this as a
> purely hypothetical change, and so the only way to move the discussion
> forward will be for someone to draft a patch that:
>
> 1. Perturbs iteration for regular dicts (it's OK for our purposes if
> it's still deterministic - it just shouldn't match insertion order the
> way odict does)
> 2. Switches keyword args and class body execution namespaces over to
> odict so the test suite passes again
> 3. Measures the impact such a change would have on the benchmark suite
>
> My experiment is a starting point, but it will still be a fair bit of
> work to get it from there to a viable proof of concept that can be
> assessed against the status quo.

It may be easy and more efficient to break the order at insertion.

1. Insert in the reversed order.

2. Add at the end or at the begin, changing the order on every insertion.

2. Swap with an arbitrary item.

INADA Naoki

unread,
Dec 14, 2017, 9:06:01 PM12/14/17
to Guido van Rossum, Stefan Krah, Python-Dev
Hi, folks.

TLDR, was the final decision made already?

If "dict keeps insertion order" is not language spec and we
continue to recommend people to use OrderedDict to keep
order, I want to optimize OrderedDict for creation/iteration
and memory usage. (See https://bugs.python.org/issue31265#msg301942 )

If dict ordering is language spec, I'll stop the effort and
use remaining time to another optimizations.

My thought is, +1 to make it language spec.

* PHP (PHP 7.2 interpreter is faster than Python) keeps insertion order.
So even we make it language spec, I think we have enough room
to optimize.

* It can make stop discussion like "Does X keeps insertion order?
It's language spec?", "What about Y? Z?". Everything on top of dict
keeps insertion order. It's simple to learn and explain.

Regards,
INADA Naoki <songof...@gmail.com>


On Sun, Nov 5, 2017 at 3:35 AM, Guido van Rossum <gu...@python.org> wrote:
> This sounds reasonable -- I think when we introduced this in 3.6 we were
> worried that other implementations (e.g. Jython) would have a problem with
> this, but AFAIK they've reported back that they can do this just fine. So
> let's just document this as a language guarantee.
>
> On Sat, Nov 4, 2017 at 10:30 AM, Stefan Krah <ste...@bytereef.org> wrote:
>>
>>
>> Hello,
>>
>> would it be possible to guarantee that dict literals are ordered in v3.7?
>>
>>
>> The issue is well-known and the workarounds are tedious, example:
>>
>>
>> https://mail.python.org/pipermail/python-ideas/2015-December/037423.html
>>
>>
>> If the feature is guaranteed now, people can rely on it around v3.9.
>>
>>
>>
>> Stefan Krah
>>
>>
>>
>> _______________________________________________
>> Python-Dev mailing list
>> Pytho...@python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe:
>> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>
>
>
>
> --
> --Guido van Rossum (python.org/~guido)
>
> _______________________________________________
> Python-Dev mailing list
> Pytho...@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/songofacandy%40gmail.com
It is loading more messages.
0 new messages