[Django] #21351: memoize function needs tests and improvements

61 views
Skip to first unread message

Django

unread,
Oct 29, 2013, 5:35:49 PM10/29/13
to django-...@googlegroups.com
#21351: memoize function needs tests and improvements
------------------------------------------------+------------------------
Reporter: EvilDMP | Owner: nobody
Type: Cleanup/optimization | Status: new
Component: Utilities | Version: master
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
------------------------------------------------+------------------------
`django.utils.functional.memoize(func, cache, num_args)`:

* lacks tests
* should raise exceptions when provided with illegal arguments (`func`
must be a function, `cache` a dict, and `num_args` a positive integer)
* should raise an exception if any of the arguments of `func`, which are
used as dictionary keys, is unhashable

--
Ticket URL: <https://code.djangoproject.com/ticket/21351>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Oct 29, 2013, 6:06:29 PM10/29/13
to django-...@googlegroups.com
#21351: memoize function needs tests and improvements
-------------------------------------+-------------------------------------
Reporter: EvilDMP | Owner: nobody
Type: | Status: new
Cleanup/optimization | Version: master
Component: Utilities | Resolution:
Severity: Normal | Triage Stage:
Keywords: | Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by timo):

What's the rationale for raising exceptions with illegal arguments? Maybe
it's the right thing to do, but I'm skeptical. Here's
[http://stackoverflow.com/questions/1950386/is-it-pythonic-to-check-
function-argument-types my research on the topic].

--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:1>

Django

unread,
Oct 29, 2013, 6:36:33 PM10/29/13
to django-...@googlegroups.com
#21351: memoize function needs tests and improvements
-------------------------------------+-------------------------------------
Reporter: EvilDMP | Owner: nobody

Type: | Status: new
Cleanup/optimization | Version: master
Component: Utilities | Resolution:
Severity: Normal | Triage Stage:
Keywords: | Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by EvilDMP):

I think you're right, it looks over-zealous. Do you think the third
point's unnecessary as well?

--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:2>

Django

unread,
Oct 29, 2013, 6:41:17 PM10/29/13
to django-...@googlegroups.com
#21351: memoize function needs tests and improvements
-------------------------------------+-------------------------------------
Reporter: EvilDMP | Owner: nobody

Type: | Status: new
Cleanup/optimization | Version: master
Component: Utilities | Resolution:
Severity: Normal | Triage Stage:
Keywords: | Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by EvilDMP):

There's actually a worse problem with `memoize`, that Shai Berger has
pointed out: it chokes on keyword arguments...

His example:

{{{
>>> from django.utils import functional as F
>>> def f(a,b): return a+b
...
>>> cache = {}
>>> f = F.memoize(f, cache, 2)
>>> f(3,4)
7
>>> f(3,b=4)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: wrapper() got an unexpected keyword argument 'b'
>>>
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:3>

Django

unread,
Oct 30, 2013, 3:10:57 AM10/30/13
to django-...@googlegroups.com
#21351: memoize function needs tests and improvements
--------------------------------------+------------------------------------

Reporter: EvilDMP | Owner: nobody
Type: Cleanup/optimization | Status: new
Component: Utilities | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------
Changes (by bmispelon):

* stage: Unreviewed => Accepted


Comment:

Accepting this on the basis of missing tests alone (I changed `memoize` to
be a no-op and the full test suite runs without problems).

As for the other issues described, maybe we could build `memoize` on top
of `functools.lru_cache` [1].
It's only available on python 3, but there is a backport available [2].


[1]
http://docs.python.org/3.2/library/functools.html?highlight=lru_cache#functools.lru_cache
[2] http://code.activestate.com/recipes/578078-py26-and-py30-backport-of-
python-33s-lru-cache/

--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:4>

Django

unread,
Oct 30, 2013, 3:31:40 AM10/30/13
to django-...@googlegroups.com
#21351: memoize function needs tests and improvements
--------------------------------------+------------------------------------

Reporter: EvilDMP | Owner: nobody
Type: Cleanup/optimization | Status: new
Component: Utilities | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------
Changes (by bmispelon):

* cc: bmispelon@… (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:5>

Django

unread,
Oct 30, 2013, 2:41:37 PM10/30/13
to django-...@googlegroups.com
#21351: memoize function needs tests and improvements
--------------------------------------+------------------------------------

Reporter: EvilDMP | Owner: nobody
Type: Cleanup/optimization | Status: new
Component: Utilities | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------

Comment (by shai):

I don't know all the places that memoize is used, but I suspect making it
more robust will also make it slower, and for a function whose entire
reason for existence is performance, that would be a problem.

Currently, `memoize()` is undocumented, and I think we should consider
just making it explicitly private. Users don't really need it in this form
(and when they do, they should probably base it on the cache framework).

--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:6>

Django

unread,
Nov 1, 2013, 3:15:47 PM11/1/13
to django-...@googlegroups.com
#21351: memoize function needs tests and improvements
--------------------------------------+------------------------------------

Reporter: EvilDMP | Owner: nobody
Type: Cleanup/optimization | Status: new
Component: Utilities | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------
Changes (by bouke):

* cc: bouke (added)


Comment:

Replacing `memoize` by `lru_cache` will probably take some work to get
right, as `memoize` allows one to provide a dict for caching. `lru_cache`
hides the cache from the user, only allowing the instrumented
`clear_cache()` for clearing the cache altogether.

--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:7>

Django

unread,
Nov 1, 2013, 3:18:20 PM11/1/13
to django-...@googlegroups.com
#21351: memoize function needs tests and improvements
--------------------------------------+------------------------------------

Reporter: EvilDMP | Owner: nobody
Type: Cleanup/optimization | Status: new
Component: Utilities | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------

Comment (by bouke):

As for the usage of `memoize`:
{{{
$ git grep -c "memoize"
django/contrib/staticfiles/finders.py:2
django/core/management/commands/loaddata.py:2
django/core/urlresolvers.py:4
django/utils/functional.py:2
django/utils/translation/trans_real.py:2
tests/decorators/tests.py:2
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:8>

Django

unread,
Nov 1, 2013, 4:21:53 PM11/1/13
to django-...@googlegroups.com
#21351: memoize function needs tests and improvements
--------------------------------------+------------------------------------
Reporter: EvilDMP | Owner: bouke
Type: Cleanup/optimization | Status: assigned
Component: Utilities | Version: master

Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------
Changes (by bouke):

* status: new => assigned
* owner: nobody => bouke


Comment:

I've submitted a PR: https://github.com/django/django/pull/1842

However, I have some questions about the deprecation and backport:

* Can the backport be included, as it is MIT licensed?
* `memoize` is an internal API, should it follow the deprecation policy?
* Is this the correct way to include a backport?

--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:9>

Django

unread,
Nov 4, 2013, 1:46:57 PM11/4/13
to django-...@googlegroups.com
#21351: memoize function needs tests and improvements
--------------------------------------+------------------------------------
Reporter: EvilDMP | Owner: bouke
Type: Cleanup/optimization | Status: assigned
Component: Utilities | Version: master

Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------

Comment (by bouke):

As discussed on IRC, a few benchmarks comparing `memoize` with `lru_cache`
and `lru_cache`'s backport. Also, another column with the statistics
bookkeeping in the backport disabled for extra speed. These results are
achieved with `maxsize=None`.

{{{
+-------+---------+-----------------+----------+----------+
| | memoize | lru_cache (py3) | backport | no-stats |
+-------+---------+-----------------+----------+----------+
| write | 0.970 | 2.373 | 2.714 | 2.413 |
| read | 0.342 | 0.830 | 1.053 | 0.950 |
+-------+---------+-----------------+----------+----------+
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:10>

Django

unread,
Nov 4, 2013, 4:52:51 PM11/4/13
to django-...@googlegroups.com
#21351: memoize function needs tests and improvements
--------------------------------------+------------------------------------
Reporter: EvilDMP | Owner: bouke
Type: Cleanup/optimization | Status: assigned
Component: Utilities | Version: master

Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------

Comment (by aaugustin):

The pull request looks good, I left a comment, you can mark the ticket as
RFC once you've addressed it.

--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:11>

Django

unread,
Nov 5, 2013, 9:41:25 AM11/5/13
to django-...@googlegroups.com
#21351: memoize function needs tests and improvements
-------------------------------------+-------------------------------------
Reporter: EvilDMP | Owner: bouke
Type: | Status: assigned

Cleanup/optimization | Version: master
Component: Utilities | Resolution:
Severity: Normal | Triage Stage: Ready for
Keywords: | checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by bouke):

* has_patch: 0 => 1
* stage: Accepted => Ready for checkin


Comment:

For ticket completeness bmispelon also left a comment about `memoize(...,
num_args=1)` whilst `get_callable` takes 2 arguments:

> bmispelon 16 hours ago:

> The num_args argument was introduced in
b6812f2d9d1bea3bdc9c5f9a747c5c71c764fa0f specifically for get_callable
(note how it can take two arguments but the num_args is 1).

> I suspect there might be something weird going on but I haven't had time
to look into it and unfortunately, we cannot ask the original committer of
that change anymore :(


Say a function has two parameters and one result, then `A,B -> X` and `A,C
-> Y`. However as the function's cache strategy only accounts for the
first argument, it will always do the same, namely `A,? -> X` UNLESS in
the case of an exception. Now when the exception happened for `A,D -> ?`,
then no cache is written and `A,C -> Y`. However if `A,C -> Y` is executed
FIRST and cached, then `A,? -> Y` and thus `A,D -> Y` -- no exception
being raised.

I have added a test to the PR demonstrate this problem.

--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:12>

Django

unread,
Nov 5, 2013, 12:58:16 PM11/5/13
to django-...@googlegroups.com
#21351: memoize function needs tests and improvements
--------------------------------------+------------------------------------
Reporter: EvilDMP | Owner: bouke
Type: Cleanup/optimization | Status: assigned
Component: Utilities | Version: master

Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------
Changes (by bmispelon):

* stage: Ready for checkin => Accepted


Comment:

There's still some issues with the pull request:

* shai raised concerns about performance. Your benchmark has no units but
I assume it's a "smaller is better" type which means that the `lru_cache`
is less performant than the old `memoize`. Can you address that (and maybe
also show the code of your benchmark)?
* The current pull request is not python3 compatible (because of the use
of `xrange` in one of the tests)
* I don't think the test you added to show the issue of `num_args` belongs
in this pull request. While I agree that there's probably a bug there, it
should either be opened as a separate ticket or just documented since
we're deprecating `memoize` altogether.

Thanks

--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:13>

Django

unread,
Nov 6, 2013, 4:12:11 AM11/6/13
to django-...@googlegroups.com
#21351: memoize function needs tests and improvements
--------------------------------------+------------------------------------
Reporter: EvilDMP | Owner: bouke
Type: Cleanup/optimization | Status: assigned
Component: Utilities | Version: master

Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------

Comment (by bouke):

I have attached a rewritten speedtest, which should show the performance
impact on 1M reads and writes. `lru_cache` is around 3 times slower on
both, which should come as no surprise. `memoize` has a very simplistic
cache key generator as it copies positional arguments into a new list,
while `lru_cache` creates a hash of both positional and keyword arguments.
This overhead comes at a reduced performance. Nevertheless looking at the
performance difference of a single read (0.0000006s) and write
(.0000014s), the impact seems rather limited.

{{{
Python: 2.7.5 (default, Sep 5 2013, 23:32:22) [GCC 4.2.1 Compatible Apple
LLVM 4.2 (clang-425.0.28)]
+-----------+----------+-----------+
| | memoize | lru_cache |
+-----------+----------+-----------+
| 1M reads | 0.34065s | 1.07396s |
| 1M writes | 1.01192s | 2.71383s |
+-----------+----------+-----------+

Python: 3.3.2 (default, Nov 6 2013, 08:59:42) [GCC 4.2.1 Compatible Apple
LLVM 5.0 (clang-500.2.79)]
+-----------+----------+-----------+
| | memoize | lru_cache |
+-----------+----------+-----------+
| 1M reads | 0.30876s | 0.87845s |
| 1M writes | 0.99767s | 2.39997s |
+-----------+----------+-----------+
}}}

The PR has been updated with the redundant tests removed and PY3
compatibility.

--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:14>

Django

unread,
Nov 6, 2013, 5:13:25 AM11/6/13
to django-...@googlegroups.com
#21351: memoize function needs tests and improvements
--------------------------------------+------------------------------------
Reporter: EvilDMP | Owner: bouke
Type: Cleanup/optimization | Status: assigned
Component: Utilities | Version: master

Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------

Comment (by akaariai):

This results in around 6-8% slowdown in url_reverse djangobench benchmark.
While not critical at all, avoiding that would be nice. URL reversing is
already bottleneck when producing a large json list where each element
contains a reversed url.

Similar slowdown is in url_resolve benchmark, too.

--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:15>

Django

unread,
Nov 7, 2013, 7:45:23 AM11/7/13
to django-...@googlegroups.com
#21351: memoize function needs tests and improvements
--------------------------------------+------------------------------------
Reporter: EvilDMP | Owner: bouke
Type: Cleanup/optimization | Status: assigned
Component: Utilities | Version: master

Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------

Comment (by akaariai):

I am OK for making this change, using standard library is usually a good
idea. The lru_cache definitely handles correctly some cases Django
doesn't. If the slowdown is too big for any specific use case, maybe it
would be better to use a custom tailored cache instead there.

--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:16>

Django

unread,
Nov 11, 2013, 2:57:06 AM11/11/13
to django-...@googlegroups.com
#21351: memoize function needs tests and improvements
--------------------------------------+------------------------------------
Reporter: EvilDMP | Owner: bouke
Type: Cleanup/optimization | Status: closed
Component: Utilities | Version: master
Severity: Normal | Resolution: fixed
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------
Changes (by Baptiste Mispelon <bmispelon@…>):

* status: assigned => closed
* resolution: => fixed


Comment:

In [changeset:"9b7455e918a437c3db91e88dcbf6d9c93fef96f8"]:
{{{
#!CommitTicketReference repository=""
revision="9b7455e918a437c3db91e88dcbf6d9c93fef96f8"
Fixed #21351 -- Replaced memoize with Python's lru_cache.

Replaced the custom, untested memoize with a similar decorator from
Python's
3.2 stdlib. Although some minor performance degradation (see ticket), it
is
expected that in the long run lru_cache will outperform memoize once it is
implemented in C.

Thanks to EvilDMP for the report and Baptiste Mispelon for the idea of
replacing memoize with lru_cache.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:17>

Django

unread,
Nov 12, 2013, 12:46:12 PM11/12/13
to django-...@googlegroups.com
#21351: memoize function needs tests and improvements
--------------------------------------+------------------------------------
Reporter: EvilDMP | Owner: bouke
Type: Cleanup/optimization | Status: closed
Component: Utilities | Version: master

Severity: Normal | Resolution: fixed
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------

Comment (by Baptiste Mispelon <bmispelon@…>):

In [changeset:"8ed96464e99c73150ab6eff5df70b6274a871a4a"]:
{{{
#!CommitTicketReference repository=""
revision="8ed96464e99c73150ab6eff5df70b6274a871a4a"
Fixed typo in lru_cache.py; refs #21351.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:18>

Django

unread,
Jan 17, 2015, 8:03:48 AM1/17/15
to django-...@googlegroups.com
#21351: memoize function needs tests and improvements
--------------------------------------+------------------------------------
Reporter: EvilDMP | Owner: bouke
Type: Cleanup/optimization | Status: closed
Component: Utilities | Version: master

Severity: Normal | Resolution: fixed
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------

Comment (by Tim Graham <timograham@…>):

In [changeset:"61ad1ea92b1f4df992b0ef1dcc7c781da2413815"]:
{{{
#!CommitTicketReference repository=""
revision="61ad1ea92b1f4df992b0ef1dcc7c781da2413815"
Removed django.utils.functional.memoize per deprecation timeline.

refs #21351.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/21351#comment:19>

Reply all
Reply to author
Forward
0 new messages