On 08.01.16 23:27, Victor Stinner wrote:
Add a new read-only ``__version__`` property to ``dict`` and
``collections.UserDict`` types, incremented at each change.
This may be not the best name for a property. Many modules already have the __version__ attribute, this may make a confusion.
The C code uses ``version++``. The behaviour on integer overflow of the
version is undefined. The minimum guarantee is that the version always
changes when the dictionary is modified.
For clarification, this code has defined behavior in C (we should avoid introducing new undefined behaviors). May be you mean that the bahavior is not specified from Python side (since it is platform and implementation defined).
Usage of dict.__version__
=========================
This also can be used for better detecting dict mutating during iterating: https://bugs.python.org/issue19332.
> If what you want is optimization, it would be much better to devote time to a solution
> that can potentially yield orders of magnitude worth of speedup like PyPy
> rather than increasing language complexity for a minor payoff.
I disagree that my proposed changes increase the "language
complexity". According to early benchmarks, my changes has a
negligible impact on performances. I don't see how adding a read-only
__version__ property to dict makes the Python *language* more complex?
My whole design is based on the idea that my optimizer will be
optimal. You will be free to not use it ;-)
And sorry, I'm not interested to contribute to PyPy.
Victor
On Sat, Jan 9, 2016 at 8:42 AM, Victor Stinner <victor....@gmail.com> wrote:I wrote a whole website to explain the status of the Python optimizers
and why I want to write my own optimizer:
https://faster-cpython.readthedocs.org/index.htmlI think this is admirable. I also dream of faster Python. However, we have a fundamental disagreement about how to get there. You can spend your whole life adding one or two optimizations a year and Python may only end up twice as fast as it is now, which would still be dog slow. A meaningful speedup requires a JIT. So, I question the value of this kind of change.
What is the point of making __version__ an exposed property?
That's fine, but I think you are probably wasting your time then :) The "hole between CPython and PyPy" disappears as soon as PyPy catches up to CPython 3.5 with numpy, and then all of this work goes with it.
Le samedi 9 janvier 2016, Neil Girdhar <miste...@gmail.com> a écrit :On Sat, Jan 9, 2016 at 8:42 AM, Victor Stinner <victor....@gmail.com> wrote:I wrote a whole website to explain the status of the Python optimizers
and why I want to write my own optimizer:
https://faster-cpython.readthedocs.org/index.htmlI think this is admirable. I also dream of faster Python. However, we have a fundamental disagreement about how to get there. You can spend your whole life adding one or two optimizations a year and Python may only end up twice as fast as it is now, which would still be dog slow. A meaningful speedup requires a JIT. So, I question the value of this kind of change.There are multiple JIT compilers for Python actively developped: PyPy, Pyston, Pyjion, Numba (numerical computation), etc.I don't think that my work will slow down these projects. I hope that it will create more competition and that we will cooperate. For example, I am in contact with a Pythran developer who told me that my PEPs will help his project. As I wrote in the dict.__version__ PEP, the dictionary version will also be useful for Pyjion according to Brett Canon.But Antoine Pitrou told me that dictionary version will not help Numba. Numba doesn't use dictionaries and already has its own efficient implemenation for guards.What is the point of making __version__ an exposed property?Hum, technically I don't need it at the Python level. Guards are implemented in C and access directly the field from the strcuture.Having the property in Python helps to write unit tests, to write prototypes (experiment new things), etc.
> If what you want is optimization, it would be much better to devote time to a solution
> that can potentially yield orders of magnitude worth of speedup like PyPy
> rather than increasing language complexity for a minor payoff.
I disagree that my proposed changes increase the "language
complexity". According to early benchmarks, my changes has a
negligible impact on performances. I don't see how adding a read-only
__version__ property to dict makes the Python *language* more complex?
It makes it more complex because you're adding a user-facing property. Every little property adds up in the cognitive load of a language. It also means that all of the other Python implementation need to follow suit even if their optimizations work differently.What is the point of making __version__ an exposed property? Why can't it be a hidden variable in CPython's underlying implementation of dict? If some code needs to query __version__ to see if it's changed then CPython should be the one trying to discover this pattern and automatically generate the right code. Ultimately, this is just a piece of a JIT, which is the way this is going to end up.My whole design is based on the idea that my optimizer will be
optimal. You will be free to not use it ;-)
And sorry, I'm not interested to contribute to PyPy.That's fine, but I think you are probably wasting your time then :) The "hole between CPython and PyPy" disappears as soon as PyPy catches up to CPython 3.5 with numpy, and then all of this work goes with it.
How is this not just a poorer version of PyPy's optimizations? If what you want is optimization, it would be much better to devote time to a solution that can potentially yield orders of magnitude worth of speedup like PyPy rather than increasing language complexity for a minor payoff.
Le 10 janv. 2016 4:39 PM, "Nicholas Chammas" <nicholas...@gmail.com> a écrit :
> To extend this analogy a bit, I think Neil's objection was more along the lines of "Why work an extra 5 hours a week for only a 5% raise?"
Your analogy is wrong. I am working and you get the salary.
Victor
> I agree with Neil Girdhar that this looks to me like a CPython-specific
> implementation detail that should not be imposed on other
> implementations. For testing, perhaps we could add a dict_version
> function in test.support that uses ctypes to access the internals.
>
> Another reason to hide __version__ from the Python level is that its use
> seems to me rather tricky and bug-prone.
What makes you say that? Isn't it a simple matter of:
v = mydict.__version__
maybe_modify(mydict)
if v != mydict.__version__:
print("dict has changed")
Obviously a JIT can help, but even they can benefit from this. For instance, Pyjion could rely on this instead of creating our own guards for built-in and global namespaces if we wanted to inline calls to certain built-ins.
C compilers often have optimization levels that can potentially alter the program's operation
On Sun, Jan 10, 2016 at 11:48:35AM -0500, Neil Girdhar wrote:
[...]
> > v = mydict.__version__
> > maybe_modify(mydict)
> > if v != mydict.__version__:
> > print("dict has changed")
>
>
> This is exactly what I want to avoid. If you want to do something like
> this, I think you should do it in regular Python by subclassing dict and
> overriding the mutating methods.
That doesn't help Victor, because exec need an actual dict, not
subclasses. Victor's PEP says this is a blocker.
I can already subclass dict to do that now. But if Victor's suggestion
is accepted, then I don't need to. The functionality will already exist.
Why shouldn't I use it?
> What happens if someone uses a custom Mapping?
If they inherit from dict or UserDict, they get this functionality for
free. If they don't, they're responsible for implementing it if they
want it.
> Do all custom Mappings need to implement __version__?
I believe the answer to that is No, but the PEP probably should clarify
that.
--
Steve
_______________________________________________
Python-ideas mailing list
Python...@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/
--
---
You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/HP5qdo3rJxE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to python-ideas...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
On Sun, Jan 10, 2016 at 12:57 PM, Steven D'Aprano <st...@pearwood.info> wrote:On Sun, Jan 10, 2016 at 11:48:35AM -0500, Neil Girdhar wrote:
[...]
> > v = mydict.__version__
> > maybe_modify(mydict)
> > if v != mydict.__version__:
> > print("dict has changed")
>
>
> This is exactly what I want to avoid. If you want to do something like
> this, I think you should do it in regular Python by subclassing dict and
> overriding the mutating methods.
That doesn't help Victor, because exec need an actual dict, not
subclasses. Victor's PEP says this is a blocker.No, he can still do what he wants transparently in the interpreter. What I want to avoid is Python users using __version__ in their own code.
On Jan 10, 2016, at 10:35, Neil Girdhar <miste...@gmail.com> wrote:On Sun, Jan 10, 2016 at 12:57 PM, Steven D'Aprano <st...@pearwood.info> wrote:On Sun, Jan 10, 2016 at 11:48:35AM -0500, Neil Girdhar wrote:
[...]
> > v = mydict.__version__
> > maybe_modify(mydict)
> > if v != mydict.__version__:
> > print("dict has changed")
>
>
> This is exactly what I want to avoid. If you want to do something like
> this, I think you should do it in regular Python by subclassing dict and
> overriding the mutating methods.
That doesn't help Victor, because exec need an actual dict, not
subclasses. Victor's PEP says this is a blocker.No, he can still do what he wants transparently in the interpreter. What I want to avoid is Python users using __version__ in their own code.Well, he could change exec so it can use arbitrary mappings (or at least dict subclasses), but I assume that's much harder and more disruptive than his proposed change.Anyway, if I understand your point, it's this: __version__ should either be a private implementation-specific property of dicts, or it should be a property of all mappings; anything in between gets all the disadvantages of both.
If so, I agree with you. Encouraging people to use __version__ for other purposes besides namespace guards, but not doing anything to guarantee it actually exists anywhere besides namespaces, seems like a bad idea.But there is still something in between public and totally internal to FAT Python. Making it a documented property of PyDict objects at the C API level is a different story--there are already plenty of ways that C code can use those objects that won't work with arbitrary mappings, so adding another doesn't seem like a problem.
And even making it public but implementation-specific at the Python level may be useful for other CPython-specific optimizers (even if partially written in Python); if so, the best way to deal with the danger that someone could abuse it for code that should work with arbitrary mappings or with another Python implementation should be solved by clearly documenting it's non portability and discouraging its abuse in the docs, not by hiding it.
On 9 January 2016 at 16:03, Serhiy Storchaka <stor...@gmail.com> wrote:
> On 08.01.16 23:27, Victor Stinner wrote:
>>
>> Add a new read-only ``__version__`` property to ``dict`` and
>> ``collections.UserDict`` types, incremented at each change.
>
>
> This may be not the best name for a property. Many modules already have the
> __version__ attribute, this may make a confusion.
The equivalent API for the global ABC object graph is
abc.get_cache_token:
https://docs.python.org/3/library/abc.html#abc.get_cache_token
One of the reasons we chose that name is that even though it's a
number, the only operation with semantic significance is equality
testing, with the intended use case being cache invalidation when the
token changes value.
If we followed the same reasoning for Victor's proposal, then a
suitable attribute name would be "__cache_token__".
On 09.01.2016 10:58, Victor Stinner wrote:
> 2016-01-09 9:57 GMT+01:00 Serhiy Storchaka <stor...@gmail.com>:
>>>> This also can be used for better detecting dict mutating during
>>>> iterating:
>>>> https://bugs.python.org/issue19332.
>> (...)
>>
>> This makes Raymond's objections even more strong.
>
> Raymond has two major objections: memory footprint and performance. I
> opened an issue with a patch implementing dict__version__ and I ran
> pybench:
> https://bugs.python.org/issue26058#msg257810
>
> pybench doesn't seem reliable: microbenchmarks on dict seems faster
> with the patch, it doesn't make sense. I expect worse or same
> performance.
>
> With my own timeit microbenchmarks, I don't see any slowdown with the
> patch. For an unknown reason (it's really strange), dict operations
> seem even faster with the patch.
This can well be caused by a better memory alignment, which
depends on the CPU you're using.
> For the memory footprint, it's clearly stated in the PEP that it adds
> 8 bytes per dict (4 bytes on 32-bit platforms). See the "dict subtype"
> section which explains why I proposed to modify directly the dict
> type.
Some questions:
* How would the implementation deal with wrap around of the
version number for fast changing dicts (esp. on 32-bit platforms) ?
* Given that this is an optimization and not meant to be exact
science, why would we need 64 bits worth of version information ?
AFAIK, you only need the version information to be able to
answer the question "did anything change compared to last time
I looked ?".
For an optimization it's good enough to get an answer "yes"
for slow changing dicts and "no" for all other cases. False
negatives don't really hurt. False positives are not allowed.
What you'd need to answer the question is a way for the
code in need of the information to remember the dict
state and then later compare it's remembered state
with the now current state of the dict.
dicts could do this with a 16-bit index into an array
of state object slots which are set by the code tracking
the dict.
When it's time to check, the code would simply ask for the
current index value and compare the state object in the
array with the one it had set.
* Wouldn't it be possible to use the hash array itself to
store the state index ?
We could store the state object as regular key in the
dict and filter this out when accessing the dict.
Alternatively, we could try to use the free slots for
storing these state objects by e.g. declaring a free
slot as being NULL or a pointer to a state object.
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Experts (#1, Jan 09 2016)
>>> Python Projects, Coaching and Consulting ... http://www.egenix.com/
>>> Python Database Interfaces ... http://products.egenix.com/
>>> Plone/Zope Database Interfaces ... http://zope.egenix.com/
________________________________________________________________________
::: We implement business ideas - efficiently in both time and costs :::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
http://www.malemburg.com/
On Mon, Jan 11, 2016 at 05:18:59AM -0500, Neil Girdhar wrote:
> Here is where I have to disagree. I hate it when experts say "we'll just
> document it and then it's the user's fault for misusing it". Yeah, you're
> right, but as a user, it is very frustrating to have to read other people's
> documentation. You know that some elite Python programmer is going to
> optimize his code using this and someone years later is going to scratch
> his head wondering where __version__ is coming from. Is it the provided by
> the caller? Was it added to the object at some earlier point?
Neil, don't you think you're being overly dramatic here? "Programmer
needs to look up API feature, news at 11!" The same could be said about
class.__name__, instance.__class__, obj.__doc__, module.__dict__ and
indeed every single Python feature. Sufficiently inexperienced or naive
programmers could be scratching their head over literally *anything*.
All those words for such a simple, and minor, point: every new API
feature is one more thing for programmers to learn. We get that.