* lacks tests
* should raise exceptions when provided with illegal arguments (`func`
must be a function, `cache` a dict, and `num_args` a positive integer)
* should raise an exception if any of the arguments of `func`, which are
used as dictionary keys, is unhashable
--
Ticket URL: <https://code.djangoproject.com/ticket/21351>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
Comment (by timo):
What's the rationale for raising exceptions with illegal arguments? Maybe
it's the right thing to do, but I'm skeptical. Here's
[http://stackoverflow.com/questions/1950386/is-it-pythonic-to-check-
function-argument-types my research on the topic].
--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:1>
Comment (by EvilDMP):
I think you're right, it looks over-zealous. Do you think the third
point's unnecessary as well?
--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:2>
Comment (by EvilDMP):
There's actually a worse problem with `memoize`, that Shai Berger has
pointed out: it chokes on keyword arguments...
His example:
{{{
>>> from django.utils import functional as F
>>> def f(a,b): return a+b
...
>>> cache = {}
>>> f = F.memoize(f, cache, 2)
>>> f(3,4)
7
>>> f(3,b=4)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: wrapper() got an unexpected keyword argument 'b'
>>>
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:3>
* stage: Unreviewed => Accepted
Comment:
Accepting this on the basis of missing tests alone (I changed `memoize` to
be a no-op and the full test suite runs without problems).
As for the other issues described, maybe we could build `memoize` on top
of `functools.lru_cache` [1].
It's only available on python 3, but there is a backport available [2].
[1]
http://docs.python.org/3.2/library/functools.html?highlight=lru_cache#functools.lru_cache
[2] http://code.activestate.com/recipes/578078-py26-and-py30-backport-of-
python-33s-lru-cache/
--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:4>
* cc: bmispelon@… (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:5>
Comment (by shai):
I don't know all the places that memoize is used, but I suspect making it
more robust will also make it slower, and for a function whose entire
reason for existence is performance, that would be a problem.
Currently, `memoize()` is undocumented, and I think we should consider
just making it explicitly private. Users don't really need it in this form
(and when they do, they should probably base it on the cache framework).
--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:6>
* cc: bouke (added)
Comment:
Replacing `memoize` by `lru_cache` will probably take some work to get
right, as `memoize` allows one to provide a dict for caching. `lru_cache`
hides the cache from the user, only allowing the instrumented
`clear_cache()` for clearing the cache altogether.
--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:7>
Comment (by bouke):
As for the usage of `memoize`:
{{{
$ git grep -c "memoize"
django/contrib/staticfiles/finders.py:2
django/core/management/commands/loaddata.py:2
django/core/urlresolvers.py:4
django/utils/functional.py:2
django/utils/translation/trans_real.py:2
tests/decorators/tests.py:2
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:8>
* status: new => assigned
* owner: nobody => bouke
Comment:
I've submitted a PR: https://github.com/django/django/pull/1842
However, I have some questions about the deprecation and backport:
* Can the backport be included, as it is MIT licensed?
* `memoize` is an internal API, should it follow the deprecation policy?
* Is this the correct way to include a backport?
--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:9>
Comment (by bouke):
As discussed on IRC, a few benchmarks comparing `memoize` with `lru_cache`
and `lru_cache`'s backport. Also, another column with the statistics
bookkeeping in the backport disabled for extra speed. These results are
achieved with `maxsize=None`.
{{{
+-------+---------+-----------------+----------+----------+
| | memoize | lru_cache (py3) | backport | no-stats |
+-------+---------+-----------------+----------+----------+
| write | 0.970 | 2.373 | 2.714 | 2.413 |
| read | 0.342 | 0.830 | 1.053 | 0.950 |
+-------+---------+-----------------+----------+----------+
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:10>
Comment (by aaugustin):
The pull request looks good, I left a comment, you can mark the ticket as
RFC once you've addressed it.
--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:11>
* has_patch: 0 => 1
* stage: Accepted => Ready for checkin
Comment:
For ticket completeness bmispelon also left a comment about `memoize(...,
num_args=1)` whilst `get_callable` takes 2 arguments:
> bmispelon 16 hours ago:
> The num_args argument was introduced in
b6812f2d9d1bea3bdc9c5f9a747c5c71c764fa0f specifically for get_callable
(note how it can take two arguments but the num_args is 1).
> I suspect there might be something weird going on but I haven't had time
to look into it and unfortunately, we cannot ask the original committer of
that change anymore :(
Say a function has two parameters and one result, then `A,B -> X` and `A,C
-> Y`. However as the function's cache strategy only accounts for the
first argument, it will always do the same, namely `A,? -> X` UNLESS in
the case of an exception. Now when the exception happened for `A,D -> ?`,
then no cache is written and `A,C -> Y`. However if `A,C -> Y` is executed
FIRST and cached, then `A,? -> Y` and thus `A,D -> Y` -- no exception
being raised.
I have added a test to the PR demonstrate this problem.
--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:12>
* stage: Ready for checkin => Accepted
Comment:
There's still some issues with the pull request:
* shai raised concerns about performance. Your benchmark has no units but
I assume it's a "smaller is better" type which means that the `lru_cache`
is less performant than the old `memoize`. Can you address that (and maybe
also show the code of your benchmark)?
* The current pull request is not python3 compatible (because of the use
of `xrange` in one of the tests)
* I don't think the test you added to show the issue of `num_args` belongs
in this pull request. While I agree that there's probably a bug there, it
should either be opened as a separate ticket or just documented since
we're deprecating `memoize` altogether.
Thanks
--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:13>
Comment (by bouke):
I have attached a rewritten speedtest, which should show the performance
impact on 1M reads and writes. `lru_cache` is around 3 times slower on
both, which should come as no surprise. `memoize` has a very simplistic
cache key generator as it copies positional arguments into a new list,
while `lru_cache` creates a hash of both positional and keyword arguments.
This overhead comes at a reduced performance. Nevertheless looking at the
performance difference of a single read (0.0000006s) and write
(.0000014s), the impact seems rather limited.
{{{
Python: 2.7.5 (default, Sep 5 2013, 23:32:22) [GCC 4.2.1 Compatible Apple
LLVM 4.2 (clang-425.0.28)]
+-----------+----------+-----------+
| | memoize | lru_cache |
+-----------+----------+-----------+
| 1M reads | 0.34065s | 1.07396s |
| 1M writes | 1.01192s | 2.71383s |
+-----------+----------+-----------+
Python: 3.3.2 (default, Nov 6 2013, 08:59:42) [GCC 4.2.1 Compatible Apple
LLVM 5.0 (clang-500.2.79)]
+-----------+----------+-----------+
| | memoize | lru_cache |
+-----------+----------+-----------+
| 1M reads | 0.30876s | 0.87845s |
| 1M writes | 0.99767s | 2.39997s |
+-----------+----------+-----------+
}}}
The PR has been updated with the redundant tests removed and PY3
compatibility.
--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:14>
Comment (by akaariai):
This results in around 6-8% slowdown in url_reverse djangobench benchmark.
While not critical at all, avoiding that would be nice. URL reversing is
already bottleneck when producing a large json list where each element
contains a reversed url.
Similar slowdown is in url_resolve benchmark, too.
--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:15>
Comment (by akaariai):
I am OK for making this change, using standard library is usually a good
idea. The lru_cache definitely handles correctly some cases Django
doesn't. If the slowdown is too big for any specific use case, maybe it
would be better to use a custom tailored cache instead there.
--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:16>
* status: assigned => closed
* resolution: => fixed
Comment:
In [changeset:"9b7455e918a437c3db91e88dcbf6d9c93fef96f8"]:
{{{
#!CommitTicketReference repository=""
revision="9b7455e918a437c3db91e88dcbf6d9c93fef96f8"
Fixed #21351 -- Replaced memoize with Python's lru_cache.
Replaced the custom, untested memoize with a similar decorator from
Python's
3.2 stdlib. Although some minor performance degradation (see ticket), it
is
expected that in the long run lru_cache will outperform memoize once it is
implemented in C.
Thanks to EvilDMP for the report and Baptiste Mispelon for the idea of
replacing memoize with lru_cache.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:17>
Comment (by Baptiste Mispelon <bmispelon@…>):
In [changeset:"8ed96464e99c73150ab6eff5df70b6274a871a4a"]:
{{{
#!CommitTicketReference repository=""
revision="8ed96464e99c73150ab6eff5df70b6274a871a4a"
Fixed typo in lru_cache.py; refs #21351.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:18>
Comment (by Tim Graham <timograham@…>):
In [changeset:"61ad1ea92b1f4df992b0ef1dcc7c781da2413815"]:
{{{
#!CommitTicketReference repository=""
revision="61ad1ea92b1f4df992b0ef1dcc7c781da2413815"
Removed django.utils.functional.memoize per deprecation timeline.
refs #21351.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:19>