_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
Do you suggest to make dictionary displays producing OrderedDict instead
of dict?
The modification of the current implementation that don't preserve the
initial order after deletion would be more compact and faster.
How much faster?
Stefan Krah
Wouldn't be enough to guarantee just the ordering of dicts before first
deletion? Or before first resizing (the maximal size of dictionary
displays is known at compile time, so they can be presized)?
I didn't try to implement this. But the current implementation requires
periodical reallocating if add and remove items. The following loop
reallocates the dict every len(d) iterations, while the size of the dict
is not changed, and the half of its storage is empty.
while True:
v = d.pop(k)
...
d[k] = v
Yes, for my use case that would be sufficient and that's what
I had in mind initially.
A luxury syntax addition like {a = 10, b = {c = "foo"}} that is read
as an OrderedDict (where the keys a, b, c are implicitly strings) would
of course also be sufficient for my use case.
But I suspect many users who have other use cases find it tantalizing
not to be able to use the properties of the current regular dict.
Stefan Krah
> On Sun, Nov 05, 2017 at 09:35:38PM +0200, Serhiy Storchaka wrote:
> > 05.11.17 21:20, Stefan Krah пише:
> > >On Sun, Nov 05, 2017 at 09:01:40PM +0200, Serhiy Storchaka wrote:
> > >>Do you suggest to make dictionary displays producing OrderedDict
> > >>instead of dict?
> > >
> > >No, this is essentially a language spec doc issue that would guarantee
> > >the ordering properties of the current dict implementation.
> >
> > Wouldn't be enough to guarantee just the ordering of dicts before
> > first deletion? Or before first resizing (the maximal size of
> > dictionary displays is known at compile time, so they can be
> > presized)?
>
> Yes, for my use case that would be sufficient and that's what
> I had in mind initially.
>
>
> A luxury syntax addition like {a = 10, b = {c = "foo"}} that is read
> as an OrderedDict (where the keys a, b, c are implicitly strings) would
> of course also be sufficient for my use case.
Or you are ok with:
OD = OrderedDict
# ...
OD(a=10, b=OD(c="foo"))
Regards
Antoine.
Isn't ordered dict also useful for **kwargs?
05.11.17 20:39, Stefan Krah пише:
On Sun, Nov 05, 2017 at 01:14:54PM -0500, Paul G wrote:
2. Someone invents a new arbitrary-ordered container that would improve on the memory and/or CPU performance of the current dict implementation
I would think this is very unlikely, given that the previous dict implementation
has always been very fast. The new one is very fast, too.
The modification of the current implementation that don't preserve the initial order after deletion would be more compact and faster.
Since there is voting going on, -1 on changing the guarantees of `dict`. If ordering is important, OrderedDict is more explicit and more obvious to the over reading the code, even if the underlying implementation is identical.
Good luck teaching people about why Python went from OrderedDict to UnorderedDict within a version. I remember learning about this same thing in C++ and just thinking “wut?”. And they just added a new type and said to stop using the old one – not changing something that people already understand.
Better to underspecify the default dict and offer explicitly named extensions (DefaultDict, OrderedDict, etc.) for those who want more guarantees.
Cheers,
Steve
Top-posted from my Windows phone
I think the question of whether any specific implementation of dict could be made faster for a given architecture or even that the trade-offs made by CPython are generally the right ones is kinda beside the point. It's certainly feasible that an implementation that does not preserve ordering could be better for some implementation of Python, and the question is really how much is gained by changing the language semantics in such a way as to cut off that possibility.
All this is implementation details. We did have little time for
coordinated changing dict and OrderedDict implementations before
releasing 3.6, and left existing coupling. This also prevents from
in-place compactization of dict items.
We could add a private flag to dict that denotes whether this dict is
ordered or no. The ordinal dict could be more compact, while OrderedDict
left ordered. And I like your issue31265.
The current dict implementation still is young. It takes several
optimizations in 3.7 (thank to you Inada) and AFAIK there are still not
merged patches. I would wait until it become more stable before making
change in language specification.
> I think current OrderedDict implementation is fragile loose coupling.
> While two object has different file (dictobject.c and odictobject.c),
> OrderedDict depends on dict's internal behavior heavily.
> It prevents optimizing dict. See comment here.
>
> https://github.com/python/cpython/blob/a5293b4ff2c1b5446947b4986f98ecf5d52432d4/Objects/dictobject.c#L1082
>
> I don't have strong opinion about what should we do about dict and OrderedDict.
> But I feel PyPy's approach (using same implementation and just override
> __eq__ and add move_to_end() method) is most simple.
I think that the coupling with dict and OrderedDict should be more
robust, but left private. Dict implementation should be aware of
OrderedDict and provide some private methods for using in OrderedDict,
but not restrict the optimization of unordered dicts.
Hello,
What happens now borders on technologic surrealism - the CPython, after
many years of persuasion, switched its dict algorithm, rather
inefficient in terms of memory, to something else, less inefficient
(still quite inefficient, taking "no overhead" as the baseline). That
algorithm randomly had another property. Now there's a seemingly
serious talk of letting that property leak into the *language spec*,
despite the fact that there can be unlimited number of dictionary
algorithms, most of them not having that property.
- don't use a dict display, use collections.OrderedDict
- make sure to set object_pairs_hook when using json.loads
- don't use kwargs to OrderedDict, use a sequence of 2-tuples
For 3.6, we've already said that we want the last constraint to age
out, such that the middle version of the code also ensures a
particular key order.
The proposal is that in 3.7 we retroactively declare that the first,
most obvious, version of this code should in fact reliably pass all
three assertions.
Failing that, the proposal is that we instead change the dict
iteration implementation such that the dict round trip will start
failing reasonably consistently again (the same as it did in 3.5), so
that folks realise almost immediately that they still need
collections.OrderedDict instead of the builtin dict.
Cheers,
Nick.
--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Saying something like "dict maintains insertion order until first
modification" is going down the rabbit hole, making it all confusing,
hard to remember, crazy-sounding to novices.
Why all that, if, since the beginning of times, Python offered a
structure with guaranteed ordering - list, and structure for efficient
mapping one values into other - dict. Then even marriage between the
two - OrderedDict.
Why suddenly once in 25 years there's a need to do something to dict's,
violating computer science background behind them (one of the reason
enough people loved Python comparing to other "practical hack"
languages)?
Alternatives were already presented on the thread - if people want more
and easier ordered dictionaries, it calls to add a special literal
initializer for them ("o{}" was proposed).
If we don't promise order preservation ...
So to me this sounds like a decision between the pragmatism of choosing semantics that all Python implementations can handle or the developer benefits of an order-preserving dict everywhere (all the other arguments I think are minor compared to these two).
On Mon, 6 Nov 2017 11:33:10 -0800
Barry Warsaw <ba...@python.org> wrote:
[]
> If we did make the change, it’s possible we would need a way to
> explicit say that order is not preserved. That seems a little weird
I recently was working on a more or less complex dataflow propagation
problem. It should converge to a fixed point, and it did, but on
different runs, to different ones. So, I know that I'm a bad
programmer, need to do more of my homework and grow. I know, that if I
rewrite it in C++ or C, it'll work unstable the same way, because it's
buggy. (Heck, over these years, I learned that I don't need to rewrite
things in C/C++, because Python is the *real* language, which works the
way computers do, without sugaring that up).
I need to remember that, because with Python 3.7, I may become a
good-programmer-in-a-ponyland-of-ordered-dicts.
Btw, in all this discussion, I don't remember anyone mentioning sets. I
don't recall the way they're implemented in CPython, but they have
strong conceptual and semantic resemblance to dict's. So, what about
them, do they become ordered too?
>
> Cheers,
> -Barry
>
--
Best regards,
Paul mailto:pmi...@gmail.com
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com
No, sets are not ordered, and seems they will not be ordered at least in
several next years.
(Of course, given that CPython's implementation is order-preserving, a bunch of code is probably now being written that implicitly requires on this detail, but at least having syntax that makes that clear would give people the *option* to make the assumption explicit).
I think the additional syntax have the added benefit of preventing
code that relies on the ordering of dict literals to be run on older
versions, therefore triggering subtle bugs that had already being
mentioned.
And also, forgot along the discussion, is the big disadvantage that
other Python implementations would have a quite
significant overhead on mandatory ordered dicts. One that was
mentioned along the way is transpilers, with
Brython as an example - but there might be others. MircoPython is far
from being the only implementation affected.
js
-><-
>
> On 11/06/2017 01:19 PM, Barry Warsaw wrote:
>> On Nov 6, 2017, at 02:18, Paul Sokolovsky <pmi...@gmail.com> wrote:
>>
>>> What it will lead to is further fragmentation of the community. Python2
>>> vs Python3 split is far from being over, and now there're splits
>>> between:
>>>
>>> * people who use "yield from" vs "await"
>>> * people who use f-strings vs who don't
>>> * people who rely on sorted nature of dict's vs who don't
>>
>> This is the classic argument of, do you proceed conservatively and use the lowest-common denominator that makes your code work with the widest range of versions, or do you ride the wave and adopt the latest and greatest features as soon as they’re available?
>>
>> Neither answer is wrong or right… for everyone. It’s also a debate as old as the software industry. Every package, project, company, developer, community will have to decide for themselves. Once you realize you can’t win, you’ve won! :)
>>
>> -Barry
>>
>>
>>
>> _______________________________________________
>> Python-Dev mailing list
>> Pytho...@python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe: https://mail.python.org/mailman/options/python-dev/paul%40ganssle.io
>>
>
>
> _______________________________________________
> Python-Dev mailing list
> Pytho...@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/jsbueno%40python.org.br
>
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com
Hello,
On Mon, 06 Nov 2017 17:58:47 +0000
Brett Cannon <br...@python.org> wrote:
[]
> > Why suddenly once in 25 years there's a need to do something to
> > dict's, violating computer science background behind them (one of
> > the reason enough people loved Python comparing to other "practical
> > hack" languages)?
>
> I don't understand what "computer science background" is being
> violated?
I tried to explain that in the previous mail, can try a different
angle. So, please open you favorite CS book (better few) and look up
"abstract data types", then "mapping/associative array" and "list". We
can use Wikipedia too: https://en.wikipedia.org/wiki/Associative_array.
So, please look up: "Operations associated with this data type allow".
And you'll see, that there're no "ordering" related operations are
defined. Vice versa, looking at "sequence" operations, there will be
"prev/next", maybe "get n'th" element operations, implying ordering.
Python used to be a perfect application of these principles. Its dict
was a perfect CS implementation of an abstract associative array, and
list - of "sequence" abstract type (with additional guarantee of O(1)
random element access).
People knew and rejoiced that Python is built on solid science
principles, or could *learn* them from it.
That no longer will be true,
with a sound concept being replaced with on-the-spot practical hack,
choosing properties of a random associative array algorithm
implementation over properties of a superset of such algorithms (many
of which are again don't offer any orderness guarantees).
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
> If we did make the change, it’s possible we would need a way to
> explicit say that order is not preserved. That seems a little weird
> to me, but I suppose it could be useful.
Useful for what?
Given that we will hypothetically have order-preserving dicts that
perform no worse than unordered dicts, I'm struggling to think of a
reason (apart from performance) why somebody would intentionally use a
non-ordered dict. If performance was an issue, sure, it makes sense to
have a non-ordered dict for when you don't want to pay the cost of
keeping insertion order. But performance seems to be a non-issue.
I can see people wanting a SortedDict which automatically sorts the keys
into some specified order. If I really work at it, I can imagine that
there might even be a use-case for randomizing the key order (like
calling random.shuffle on the keys). But if you are willing to use a
dict with arbitrary order, that means that *you don't care* what order
the keys are in. If you don't care, then insertion order should be no
better or worse than any other implementation-defined arbitrary order.
> I like the idea previously
> brought up that iteration order be deliberately randomized in that
> case, but we’d still need a good way to spell that.
That would only be in the scenario that we decide *not* to guarantee
insertion-order preserving semantics for dicts, in order to prevent
users from relying on an implementation feature that isn't a language
guarantee.
--
Steve
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com
While I think "poor" is understating the case, I think "excellent"
(which you use later on) is overstating it. My own characterisation
would be "at least arguably good enough".
> Nick has already done a survey of PyPy (which already has insertion-
> order preserving dicts), Jython, VOC, and Batavia, and they don't have
> any problem with this.
For these, my research only showed that their respective platforms
have an order-preserving hashmap implementation available.
What's entirely unclear at this point is how switching wholesale to
that may impact the *performance* of these implementations (both in
terms of speed and memory usage), and how much code churn would be
involved in actually making the change.
Making builtin dict() order-preserving may also still impose an
ongoing complexity cost on these implementations if they end up having
to split "the mapping we use for code execution namespaces" away from
"the mapping we provide as the builtin dict".
(That said, I think at least Jython already makes that distinction - I
believe their string-only namespace dict is a separate type, whereas
CPython plays dynamic optimisation games inside the regular dict
type).
So Barry's suggestion of providing an explicit
"collections.UnorderedDict" as a consistent spelling for "an
associative mapping without any ordering guarantees" is a reasonable
one, even if it's primary usage in CPython ends up being to ensure
algorithms are compatible with collections that don't provide an
inherently stable iteration order, and any associated performance
benefits are mostly seen on other implementations. (As Paul S notes,
such a data type would also serve a pedagogical purpose when teaching
computer science principles)
> IronPython is built on C#, which has order-
> preserving mappings.
I couldn't actually find a clearly suitable existing collection type
in the .NET CLR - the one I linked was just the one commonly
referenced as "good enough for most purposes". It had some constraints
that meant it may not be suitable as a builtin dict type in a Python
implementation (e.g. it looked to have a 32-bit length limit).
> Nuitka is built on C++, and if C++ can't implement
> an order-preserving mapping, there is something terribly wrong with the
> world. Cython (I believe) uses CPython's implementation, as does
> Stackless.
Right, the other C/C++ implementations that also target environments
with at least 128 MiB+ RAM (and typically more) can reasonably be
expected to tolerate similar space/speed trade-offs to those that
CPython itself makes (and that's assuming they aren't just using
CPython's data type implementations in the first place).
> The only well-known implementation that may have trouble with this is
> MicroPython, but it already changes the functionality of a lot of
> builtins and core language features, e.g. it uses a different method
> resolution order (so multiple inheritence won't work right), some
> builtins don't support slicing with three arguments, etc.
>
> I think the evidence is excellent that other implementations shouldn't
> have a problem with this, unless (like MicroPython) they are targetting
> machines with tiny memory resources. µPy runs on the PyBoard, which I
> believe has under 200K of memory. I think we can all forgive µPy if it
> only *approximately* matches Python semantics.
It runs on the ESP8266 as well, and that only has 96 kiB data memory.
This means we're already talking 3-4 orders of magnitude difference in
memory capacity and typical data set sizes between CPython and
MicroPython use cases, and that's only accounting for the *low* end of
CPython's use cases - once you expand out to multi-terabyte data sets
(the very upper end of what a single x86-64 server can handle if you
can afford the RAM for it), then we're talking 9-10 orders of
magnitude between CPython's high end and MicroPython's low end.
So for CPython's target use cases algorithmic efficiency dominates
performance, and we afford to invest extra memory usage and startup
overhead in service to more efficient data access algorithms.
MicroPython's the opposite - you're going to run out of memory for
data storage long before algorithmic inefficiency becomes your biggest
problem, but wasted bookkeeping memory and startup overhead can cause
problems even with small data sets.
Cheers,
Nick.
--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia
> On Nov 6, 2017, at 8:05 PM, David Mertz <me...@gnosis.cx> wrote:I think this post is dismissive of the value that users would get from having reliable ordering by default.
> I strongly opposed adding an ordered guarantee to regular dicts. If the implementation happens to keep that, great. Maybe OrderedDict can be rewritten to use the dict implementation. But the evidence that all implementations will always be fine with this restraint feels poor, and we have a perfectly good explicit OrderedDict for those who want that.
But like Raymond, I make most of my living TEACHING Python.
I feel like the extra order guarantee would make teaching slightly harder.
I'm sure he feels contrarily. It is true that with 3.6 I can no longer show an example where the dict display is oddly changed when printed.
But then, unordered sets also wind up sorting small integers on printing, even though that's not a guarantee.
In [6]: s = {3,7,4}
In [7]: s
Out[7]: {3, 4, 7}
or use other types...
And "set" is a mathematical concept that has no oder, whereas the "dictionary" metaphor DOES have order...
Ordering by insertion order (possibly "only until first deletion") is simply not obvious to beginners.
If we had, hypothetically, a dict that "always alphabetized keys" that would be more intuitive to them, for example. Insertion order feels obvious to us experts, but it really is an extra cognitive burden to learners beyond understanding "key/Val association".
We've lived without order for so long that it seems that some of us now think data scrambling is a virtue. But it isn't. Scrambled data is the opposite of human friendly.
This seems like overkill to me. By the same logic, we should add a"delay garbage collection" mode, that allows people to test that theircode doesn't make unwarranted assumptions that a reference-countinggarbage collector is in use.
But you can get pretty fine-grained control of garbage collection with judicious use of gc.disable(). If there were a similar mechanism for changing the ordering properties of dictionaries in code, I wouldn't suggest it as an interpreter flag / option.
And you're right - it's not pressing - the people likely to test with dictionaries scrambled are exactly the people likely to be supporting 2.7 and 3.5, but that won't be true forever, and it would be nice to have *some* mechanism to test that you're not relying on this property.
@Barry WarsawAs has been suggested elsewhere, if we decide not to make that guarantee, then we should probably deliberately randomize iteration order.
This was my suggestion of a middle way, since deliberate randomization seems like a step too far just to avoid people relying on implementation details. I've seen plenty of code that assumes that `assert` statements will always throw `AssertionError`, but that's not guaranteed, and some people run their test suites with -O just to check that they aren't making that assumption.
On 11/07/2017 04:15 PM, Paul Moore wrote:On 7 November 2017 at 20:35, Paul G <pa...@ganssle.io> wrote:If dictionary order is *not* guaranteed in the spec and the dictionary order isn't randomized (which I think everyone agrees is a bit messed up), it would probably be useful if you could enable "random order mode" in CPython, so you can stress-test that your code isn't making any assumptions about dictionary ordering without having to use an implementation where order isn't deterministic.I could either be something like an environment variable SCRAMBLE_DICT_ORDER or a flag like --scramble-dict-order. That would probably help somewhat with the very real problem of "everyone's going to start counting on this ordered property".Most public projects (which are the only ones that really need toworry about this sort of detail) will probably be supporting Python3.5 and likely even Python 2.7 for some time yet. So they test undernon-order-preserving dictionary implementations anyway. And if codethat's only targeted for Python 3.7 assumes order preservingdictionaries, it's likely not got a huge user base anyway, so what'sthe problem?Paul
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
It may be easy and more efficient to break the order at insertion.
1. Insert in the reversed order.
2. Add at the end or at the begin, changing the order on every insertion.
2. Swap with an arbitrary item.