@lru_cache on functions with no arguments

t...@tomforb.es

unread,

Jul 31, 2017, 7:31:52 PM7/31/17

to

As part of the Python 3 cleanup in Django there are a fair few uses of @functools.lru_cache on functions that take no arguments. A lru_cache isn't strictly needed here, but it's convenient to just cache the result. Some examples are here: https://github.com/django/django/pull/8825/files

I did some profiling and I found that using `@lru_cache(maxsize=None)` on such functions is twice as fast as a standard `@lru_cache()`, apparently because with a `maxsize` the lru_cache code requires a lock acquisition and a fair bit more state to track.

Am I right in thinking that using `maxsize=None` is best for functions that accept no arguments? Should we even be using a `lru_cache` in such situations, or write our own simple cache decorator instead?

codew...@gmail.com

unread,

Jul 31, 2017, 8:00:51 PM7/31/17

to

If the performance savings are real, another choice would be to improve the implementation of lru_cache to special-case no-argument functions to avoid locks, etc.

Terry Reedy

unread,

Jul 31, 2017, 9:32:51 PM7/31/17

to

On 7/31/2017 7:31 PM, t...@tomforb.es wrote:
> As part of the Python 3 cleanup in Django there are a fair few uses of @functools.lru_cache on functions that take no arguments.

This makes no sense to me. If the function is being called for
side-effects, then it should not be cached. If the function is being
called for a result, different for each call, calculated from a changing
environment, then it should not be cached. (Input from disk is an
example.) If the function returns a random number, or a non-constant
value from an oracle (such as a person), it should not be cached. If
the function returns a constant (possible calculated once), then the
constant should just be bound to a name (which is a form of caching)
rather than using the overkill of an lru cache. What possibility am I
missing?

--
Terry Jan Reedy

Matt Wheeler

unread,

Aug 1, 2017, 7:06:49 AM8/1/17

to

On Tue, 1 Aug 2017 at 02:32 Terry Reedy <tjr...@udel.edu> wrote:

> On 7/31/2017 7:31 PM, t...@tomforb.es wrote:

> > As part of the Python 3 cleanup in Django there are a fair few uses of
> @functools.lru_cache on functions that take no arguments.
>

> This makes no sense to me. If the function is being called for
> side-effects, then it should not be cached. If the function is being
> called for a result, different for each call, calculated from a changing
> environment, then it should not be cached. (Input from disk is an
> example.) If the function returns a random number, or a non-constant
> value from an oracle (such as a person), it should not be cached. If
> the function returns a constant (possible calculated once), then the
> constant should just be bound to a name (which is a form of caching)
> rather than using the overkill of an lru cache. What possibility am I
> missing?
>

A function which is moderately expensive to run, that will always return
the same result if run again in the same process, and which will not be
needed in every session.

Also particularly in the context of a large framework it may be neater &
easier to expose a constant-like thing as a function to be called rather
than directly bound to a name; if there are several implementations of a
particular function, to be chosen based on configuration, and *some* of
them are pure functions while others may change depending on the
environment during a single run, maintaining the interface would trump
simplicity for the simple case).

I've not investigated the Django codebase, so I don't know if my guesses
line up with it exactly, but it should be quite easy to construct a grep or
sed script to scan the source to find out :)
--

--
Matt Wheeler
http://funkyh.at

Thomas Nyberg

unread,

Aug 1, 2017, 8:04:32 AM8/1/17

to

On 08/01/2017 01:06 PM, Matt Wheeler wrote:
> A function which is moderately expensive to run, that will always return
> the same result if run again in the same process, and which will not be
> needed in every session.
>

What about just using a lazy getter property? E.g.:

https://pypi.python.org/pypi/lazy-property

Cheers,
Thomas

t...@tomforb.es

unread,

Aug 1, 2017, 8:51:26 AM8/1/17

to

Hello all,
Thank you for the replies! So, here is the context:

1. The functions Django wants to cache require Django to be initialized and the settings loaded. This means the return values are not available at definition time. (Matt Wheeler hits it on the head).

2. Django has a long-standing no-dependencies rule, which may change in the near future but for now it is stdlib only. We can't add a dependency on `lazy-property`.

This is all a bit off-topic anyway. I was merely asking about the tradeoffs for using maxsize=None vs maxsize=default in `lru_cache`.

On my Ubuntu machine I am getting some interesting benchmark results. If you disable the LRU cache C extension, so it uses the following implementations:

1. maxsize is not none: https://github.com/python/cpython/blob/master/Lib/functools.py#L526-L581

2. maxsize is none: https://github.com/python/cpython/blob/master/Lib/functools.py#L509-L522

And you have a simple function:

def test():
return object()

I get the following numbers without much variance:

1. lru_cache(maxsize=None) - 870ns

2. lru_cache() - 1300ns

3. no cache - 100ns

So, in the best case, without the C extension lru_cache is 8x as slow as the function itself. I get that it's not a great comparison as few functions are as simple as that, but a 700ns overhead is quite a bit, and if you're putting it around simple functions and expecting them to be faster, that may not be true.

With the C extension I get 50ns for both. So obviously a lot faster. Maybe it doesn't matter...

Chris Angelico

unread,

Aug 1, 2017, 9:01:43 AM8/1/17

to

On Tue, Aug 1, 2017 at 10:50 PM, <t...@tomforb.es> wrote:
> And you have a simple function:
>
> def test():
> return object()
>
> I get the following numbers without much variance:
>
> 1. lru_cache(maxsize=None) - 870ns
>
> 2. lru_cache() - 1300ns
>
> 3. no cache - 100ns
>
> So, in the best case, without the C extension lru_cache is 8x as slow as the function itself. I get that it's not a great comparison as few functions are as simple as that, but a 700ns overhead is quite a bit, and if you're putting it around simple functions and expecting them to be faster, that may not be true.
>
> With the C extension I get 50ns for both. So obviously a lot faster. Maybe it doesn't matter...

There's a drastic semantic difference here, though. Without a cache,
you're constructing a new, unique object every time you call it; with
a cache, you return the same one every time. A better comparison would
be:

_sentinel = object()
def test():
return _sentinel

Not that it changes your point about timings, but if you're slapping a
cache onto functions to try to make them faster, you want to be sure
you're not breaking their semantics.

ChrisA

Thomas Nyberg

unread,

Aug 1, 2017, 9:07:53 AM8/1/17

to

On 08/01/2017 02:50 PM, t...@tomforb.es wrote:
> 2. Django has a long-standing no-dependencies rule, which may change in the near future but for now it is stdlib only. We can't add a dependency on `lazy-property`.

Apologies for continuing going off-topic, but the actual code in that
package I linked appears to be about 40 lines. It's MIT-licensed, so you
could just embed it in your code. In fact, I only linked that because it
was easily accessible by pip. I personally have used similar versions in
the past that were much shorter (presumably missing some features in the
package I linked though). For example, I know at some point I used some
variation of this (~11 lines):

https://github.com/columbia-applied-data-science/rosetta/blob/master/rosetta/common.py#L24-L35

But I do respect the distaste for adding a dependency on something so
small. :)

Cheers,
Thomas

Matt Wheeler

unread,

Aug 1, 2017, 9:15:08 AM8/1/17

to

That doesn't necessarily solve any problems (Tom has already answered why
this won't work for Django, but also...)

- the methods then can't be passed around before being called, which will
make your lazy evaluation greedy again if that's why you're using the
lru_cache.
- this requires the functions to be methods of an instance of some class
(no, you can't use properties on a class, see below). That may be much more
overhead than the current solution (the object+class lookups plus @property
method call might add up to more overhead to a single object lookup &
lru_cache call as a rule anyway. I haven't investigated):

Python 3.6.0 (default, Mar 13 2017, 00:53:03)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> class A:
... @property
... @classmethod
... def test(cls):
... return 'a'
...
>>> A.test
<property object at 0x1094afa98>

Paul Moore

unread,

Aug 1, 2017, 9:49:06 AM8/1/17

to

On Tuesday, 1 August 2017 00:31:52 UTC+1, t...@tomforb.es wrote:
> Am I right in thinking that using `maxsize=None` is best for functions that accept no arguments? Should we even be using a `lru_cache` in such situations, or write our own simple cache decorator instead?

It seems like overkill to use lru_cache for this case - it's designed to map arguments to results, so it makes little sense when there are no arguments. It's also got LRU arglist management and thread safety overheads, so there's a lot of unneeded overhead for this usage.

_sentinel = object()
_val = _sentinel
def val():
if _val is _sentinel:
# Calculate _val
return _val

seems entirely sufficient for this case. Write a custom decorator if you use the idiom often enough to make it worth the effort.

Paul

Christian Heimes

unread,

Aug 1, 2017, 9:58:32 AM8/1/17

to

There is a more elegant and faster approach if you use a non-data
descriptor instead of a property or a lru_cache. If you use a non-data
descriptor property, you can even get rid of subsequent function calls.
I dumped some example code in a gist,
https://gist.github.com/tiran/da7d4c43d493c5aa7d7abc4d8a0678a6

Christian

t...@tomforb.es

unread,

Aug 1, 2017, 10:54:42 AM8/1/17

to

> _sentinel = object()
> _val = _sentinel
> def val():
> if _val is _sentinel:
> # Calculate _val
> return _val
>
> seems entirely sufficient for this case. Write a custom decorator if you use the idiom often enough to make it worth the effort.

I did some timings with this as part of my timings above and found it to be significantly slower than lru_cache with the C extension. I had to add `nonlocal` to get `_val` to resolve, which I think kills performance a bit.

I agree with the premise though, it might be worth exploring.

> There is a more elegant and faster approach if you use a non-data
> descriptor instead of a property or a lru_cache. If you use a non-data
> descriptor property, you can even get rid of subsequent function calls.
> I dumped some example code in a gist,
> https://gist.github.com/tiran/da7d4c43d493c5aa7d7abc4d8a0678a6

Thank you! That's an interesting and great idea. It's faster, but not faster than lru-cache with the C extension. I'll investigate further.

Terry Reedy

unread,

Aug 1, 2017, 11:06:33 AM8/1/17

to

On 8/1/2017 7:06 AM, Matt Wheeler wrote:
> On Tue, 1 Aug 2017 at 02:32 Terry Reedy <tjr...@udel.edu> wrote:
>
>> On 7/31/2017 7:31 PM, t...@tomforb.es wrote:

>>> As part of the Python 3 cleanup in Django there are a fair few uses of
>> @functools.lru_cache on functions that take no arguments.
>>

>> This makes no sense to me. If the function is being called for
>> side-effects, then it should not be cached. If the function is being
>> called for a result, different for each call, calculated from a changing
>> environment, then it should not be cached. (Input from disk is an
>> example.) If the function returns a random number, or a non-constant
>> value from an oracle (such as a person), it should not be cached. If
>> the function returns a constant (possible calculated once), then the
>> constant should just be bound to a name (which is a form of caching)
>> rather than using the overkill of an lru cache. What possibility am I
>> missing?
>>
>

> A function which is moderately expensive to run, that will always return
> the same result if run again in the same process, and which will not be
> needed in every session.

In initialization section:
answer = None

Somewhere else:
answer = expensive_calculaton() if answer is None else answer

The conditional has to be cheaper than accessing lru cache. There can
be more than one of these.

--
Terry Jan Reedy

Steven D'Aprano

unread,

Aug 2, 2017, 5:01:22 AM8/2/17

to

One conditional has to be cheaper than accessing lru cache.

Debugging the errors from forgetting to wrap every single reference to
"answer" in a "if answer is None" test is not so cheap.

Maintaining twice as much state (one function plus one global variable,
instead of just one function) is not so cheap.

Sometimes paying a small cost to avoid having to occasionally pay a large
cost is worth it.

This is effectively the "Only Once" decorator pattern, which guarantees a
function only executes once no matter how often you call it.

--
“You are deluded if you think software engineers who can't write
operating systems or applications without security holes, can write
virtualization layers without security holes.” —Theo de Raadt

Paul Moore

unread,

Aug 3, 2017, 10:35:56 AM8/3/17

to

On Tuesday, 1 August 2017 15:54:42 UTC+1, t...@tomforb.es wrote:
> > _sentinel = object()
> > _val = _sentinel
> > def val():
> > if _val is _sentinel:
> > # Calculate _val
> > return _val
> >
> > seems entirely sufficient for this case. Write a custom decorator if you use the idiom often enough to make it worth the effort.
>
> I did some timings with this as part of my timings above and found it to be significantly slower than lru_cache with the C extension. I had to add `nonlocal` to get `_val` to resolve, which I think kills performance a bit.
>
> I agree with the premise though, it might be worth exploring.

It's worth pointing out that there's nothing *wrong* with using lru_cache with maxsize=None. You're going to find it hard to get a pure-Python equivalent that's faster (after all, even maintaining a single variable is still a dict lookup, which is all the cache does when LRU functionality is disabled).

Certainly if you only ever call with no arguments there's little point in an LRU cache, so maxsize=None ("the LRU feature is disabled") is logical.

Whether that approach clearly expresses your intent is maybe a little less obvious, but you could always write a custom decorator to make your intent clear:

>>> from functools import lru_cache
>>> def only_call_once(fn):
... return lru_cache(maxsize=None)(fn)
...
>>> @only_call_once
... def f():
... print("Called!")
... return 1
...
>>> f()
Called!
1
>>> f()
1

Paul

Ian Kelly

unread,

Aug 3, 2017, 11:37:22 AM8/3/17

to

On Thu, Aug 3, 2017 at 8:35 AM, Paul Moore <p.f....@gmail.com> wrote:
> On Tuesday, 1 August 2017 15:54:42 UTC+1, t...@tomforb.es wrote:
>> > _sentinel = object()
>> > _val = _sentinel
>> > def val():
>> > if _val is _sentinel:
>> > # Calculate _val
>> > return _val
>> >
>> > seems entirely sufficient for this case. Write a custom decorator if you use the idiom often enough to make it worth the effort.
>>
>> I did some timings with this as part of my timings above and found it to be significantly slower than lru_cache with the C extension. I had to add `nonlocal` to get `_val` to resolve, which I think kills performance a bit.
>>
>> I agree with the premise though, it might be worth exploring.
>
> It's worth pointing out that there's nothing *wrong* with using lru_cache with maxsize=None. You're going to find it hard to get a pure-Python equivalent that's faster (after all, even maintaining a single variable is still a dict lookup, which is all the cache does when LRU functionality is disabled).

The single variable is only a dict lookup if it's a global. Locals and
closures are faster.

def simple_cache(function):
sentinel = object()
cached = sentinel

@functools.wraps(function)
def wrapper(*args, **kwargs):
nonlocal cached
if args or kwargs:
return function(*args, **kwargs) # No caching with args
if cached is sentinel:
cached = function()
return cached
return wrapper

*Zero* dict lookups at call-time. If that's not (marginally) faster
than lru_cache with maxsize=None I'll eat my socks.

Ian Kelly

unread,

Aug 3, 2017, 11:43:02 AM8/3/17

to

On Thu, Aug 3, 2017 at 9:36 AM, Ian Kelly <ian.g...@gmail.com> wrote:
> On Thu, Aug 3, 2017 at 8:35 AM, Paul Moore <p.f....@gmail.com> wrote:

>> On Tuesday, 1 August 2017 15:54:42 UTC+1, t...@tomforb.es wrote:
>>> > _sentinel = object()
>>> > _val = _sentinel
>>> > def val():
>>> > if _val is _sentinel:
>>> > # Calculate _val
>>> > return _val
>>> >
>>> > seems entirely sufficient for this case. Write a custom decorator if you use the idiom often enough to make it worth the effort.
>>>
>>> I did some timings with this as part of my timings above and found it to be significantly slower than lru_cache with the C extension. I had to add `nonlocal` to get `_val` to resolve, which I think kills performance a bit.
>>>
>>> I agree with the premise though, it might be worth exploring.
>>
>> It's worth pointing out that there's nothing *wrong* with using lru_cache with maxsize=None. You're going to find it hard to get a pure-Python equivalent that's faster (after all, even maintaining a single variable is still a dict lookup, which is all the cache does when LRU functionality is disabled).
>

> The single variable is only a dict lookup if it's a global. Locals and
> closures are faster.
>
> def simple_cache(function):
> sentinel = object()
> cached = sentinel
>
> @functools.wraps(function)
> def wrapper(*args, **kwargs):
> nonlocal cached
> if args or kwargs:
> return function(*args, **kwargs) # No caching with args
> if cached is sentinel:
> cached = function()
> return cached
> return wrapper
>
> *Zero* dict lookups at call-time. If that's not (marginally) faster
> than lru_cache with maxsize=None I'll eat my socks.

Reading above, I now realize you were referring to the C extension
version of lru_cache. Yes, I'm sure that's faster. I still maintain
that this has to be faster than the pure-Python version however.

t...@tomforb.es

unread,

Aug 3, 2017, 11:44:45 AM8/3/17

to

On Thursday, 3 August 2017 16:37:22 UTC+1, Ian wrote:

I hope you washed them!

def test_func():
return 1

%timeit test_simple_cache()
The slowest run took 11.06 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 183 ns per loop

%timeit test_lru_cache()
The slowest run took 42.92 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 46.9 ns per loop

%timeit test_func()
10000000 loops, best of 3: 55.3 ns per loop

Serhiy Storchaka

unread,

Aug 3, 2017, 11:56:08 AM8/3/17

to

03.08.17 18:36, Ian Kelly пише:

> The single variable is only a dict lookup if it's a global. Locals and
> closures are faster.
>
> def simple_cache(function):
> sentinel = object()
> cached = sentinel
>
> @functools.wraps(function)
> def wrapper(*args, **kwargs):
> nonlocal cached
> if args or kwargs:
> return function(*args, **kwargs) # No caching with args
> if cached is sentinel:
> cached = function()
> return cached
> return wrapper
>
> *Zero* dict lookups at call-time. If that's not (marginally) faster
> than lru_cache with maxsize=None I'll eat my socks.

With salt?

$ ./python -m timeit -s 'from simple_cache import simple_cache; f =
simple_cache(int)' -- 'f()'
500000 loops, best of 5: 885 nsec per loop
$ ./python -m timeit -s 'from functools import lru_cache; f =
lru_cache(maxsize=None)(int)' -- 'f()'
1000000 loops, best of 5: 220 nsec per loop

Ian Kelly

unread,

Aug 3, 2017, 11:59:03 AM8/3/17

to

On Thu, Aug 3, 2017 at 9:44 AM, <t...@tomforb.es> wrote:
> I hope you washed them!

Yes, well as noted in my followup I was comparing pure-Python
implementations, not the C implementation.

Ian Kelly

unread,

Aug 3, 2017, 12:03:51 PM8/3/17

to

Fixed:

$ python3 -m timeit -s 'from simple_cache import simple_cache; f =
simple_cache(int)' -- 'f()'
1000000 loops, best of 3: 0.167 usec per loop
$ python3 -m timeit -s 'import sys; sys.modules["_functools"] = None;

from functools import lru_cache; f = lru_cache(maxsize=None)(int)' --
'f()'

1000000 loops, best of 3: 0.783 usec per loop

Ian Kelly

unread,

Aug 3, 2017, 12:09:46 PM8/3/17

to

On Thu, Aug 3, 2017 at 10:02 AM, Ian Kelly <ian.g...@gmail.com> wrote:
> Fixed:
>
> $ python3 -m timeit -s 'from simple_cache import simple_cache; f =
> simple_cache(int)' -- 'f()'
> 1000000 loops, best of 3: 0.167 usec per loop
> $ python3 -m timeit -s 'import sys; sys.modules["_functools"] = None;
> from functools import lru_cache; f = lru_cache(maxsize=None)(int)' --
> 'f()'
> 1000000 loops, best of 3: 0.783 usec per loop

This turns out to be because I was running 3.4 which doesn't have the
C implementation to begin with. In 3.6 this trick doesn't seem to work
as expected for disabling it:

$ python3.6 -m timeit -s 'from simple_cache import simple_cache; f =
simple_cache(int)' -- 'f()'
10000000 loops, best of 3: 0.152 usec per loop
$ python3.6 -m timeit -s 'from functools import lru_cache; f =
lru_cache(maxsize=None)(int)' -- 'f()'
10000000 loops, best of 3: 0.0422 usec per loop
$ python3.6 -m timeit -s 'import sys; sys.modules["_functools"] =

None; from functools import lru_cache; f =
lru_cache(maxsize=None)(int)' -- 'f()'

10000000 loops, best of 3: 0.0419 usec per loop

Serhiy Storchaka

unread,

Aug 3, 2017, 1:19:41 PM8/3/17

to

03.08.17 19:08, Ian Kelly пише:

> This turns out to be because I was running 3.4 which doesn't have the
> C implementation to begin with. In 3.6 this trick doesn't seem to work
> as expected for disabling it:

It didn't work because the functools module already was imported at startup.

$ ./python -m timeit -s 'from simple_cache import simple_cache; f =
simple_cache(int)' -- 'f()'
500000 loops, best of 5: 501 nsec per loop

$ ./python -m timeit -s 'from functools import lru_cache; f =
lru_cache(maxsize=None)(int)' -- 'f()'
2000000 loops, best of 5: 139 nsec per loop

$ ./python -m timeit -s 'import sys; sys.modules["_functools"] = None;
del sys.modules["functools"]; from functools import lru_cache; f =
lru_cache(maxsize=None)(int)' -- 'f()'
100000 loops, best of 5: 3.39 usec per loop

Ian Kelly

unread,

Aug 3, 2017, 2:19:03 PM8/3/17

to

On Thu, Aug 3, 2017 at 11:18 AM, Serhiy Storchaka <stor...@gmail.com> wrote:
> $ ./python -m timeit -s 'import sys; sys.modules["_functools"] = None; del
> sys.modules["functools"]; from functools import lru_cache; f =
> lru_cache(maxsize=None)(int)' -- 'f()'
> 100000 loops, best of 5: 3.39 usec per loop

Interesting, I had tried that and it didn't work for me:

$ python3.6 -m timeit -s 'import sys; sys.modules["_functools"] =

None; del sys.modules["functools"]; from functools import lru_cache; f
= lru_cache(maxsize=None)(int)' -- 'f()'

Could not import runpy module
Traceback (most recent call last):
File "/usr/lib/python3.4/runpy.py", line 14, in <module>
import importlib.machinery # importlib first so we can test #15386 via -m
File "/usr/lib/python3.4/importlib/__init__.py", line 34, in <module>
_w_long = _bootstrap._w_long
AttributeError: module 'importlib._bootstrap' has no attribute '_w_long'

Ian Kelly

unread,

Aug 3, 2017, 2:21:56 PM8/3/17

to

Never mind, it's fine now. In the future I should remember not to run
python3.6 from the /usr/lib/python3.4 directory. =)