Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Memoization and encapsulation

4 views
Skip to first unread message

Steven D'Aprano

unread,
Dec 30, 2005, 11:23:26 PM12/30/05
to
I was playing around with simple memoization and came up with something
like this:

_cache = {}
def func(x):
global _cache
if _cache.has_key(x):
return _cache[x]
else:
result = x+1 # or a time consuming calculation...
_cache[x] = result
return result

when it hit me if I could somehow bind the cache to the function, I could
get rid of that pesky global variable.

I tried this:

>>> def func(x):
... try:
... func.cache
... except AttributeError:
... func.cache = {}
... if func.cache.has_key(x):
... return func.cache[x]
... else:
... result = x + 1
... func.cache[x] = result
... return result

and it works as expected, but it lacks elegance.

Instead of using a function, I can also use a new-style class as if it
were a function:

>>> class Func(object):
... cache = {}
... def __new__(self, x):
... if self.cache.has_key(x):
... return self.cache[x]
... else:
... result = x+1
... self.cache[x] = result
... return result

and again it works, but I can't help feeling it is an abuse of the class
mechanism to do this.

What do folks think? Is there a better way?

--
Steven.

bon...@gmail.com

unread,
Dec 31, 2005, 12:08:09 AM12/31/05
to

Steven D'Aprano wrote:
> I was playing around with simple memoization and came up with something
> like this:
>
> _cache = {}
> def func(x):
> global _cache
> if _cache.has_key(x):
> return _cache[x]
> else:
> result = x+1 # or a time consuming calculation...
> _cache[x] = result
> return result
>
> when it hit me if I could somehow bind the cache to the function, I could
> get rid of that pesky global variable.
>
what would be the problem of the following, other than exposing the
cache which the caller may override ?

def func(x,_cache={}):

Raymond Hettinger

unread,
Dec 31, 2005, 12:08:29 AM12/31/05
to
Steven D'Aprano wrote:
> I was playing around with simple memoization and came up with something
> like this:
>
> _cache = {}
> def func(x):
> global _cache
> if _cache.has_key(x):
> return _cache[x]
> else:
> result = x+1 # or a time consuming calculation...
> _cache[x] = result
> return result
>
> when it hit me if I could somehow bind the cache to the function, I could
> get rid of that pesky global variable.

Try something like this:

def func(x, _cache={}):
if x in cache:
return cache[x]
result = x + 1 # or a time consuming function
return result

* If cache hits are the norm, then a try/except version would be a bit
faster.

* In your original code, the global declaration wasn't needed because
the assigment to _cache wasn't changed, rather its contents were.

Raymond

Raymond Hettinger

unread,
Dec 31, 2005, 12:09:38 AM12/31/05
to
Steven D'Aprano wrote:
> I was playing around with simple memoization and came up with something
> like this:
>
> _cache = {}
> def func(x):
> global _cache
> if _cache.has_key(x):
> return _cache[x]
> else:
> result = x+1 # or a time consuming calculation...
> _cache[x] = result
> return result
>
> when it hit me if I could somehow bind the cache to the function, I could
> get rid of that pesky global variable.

Try something like this:

def func(x, _cache={}):
if x in _cache:
return _cache[x]
result = x + 1 # or a time consuming function


_cache[x] = result
return result

* If cache hits are the norm, then a try/except version would be a bit

Just

unread,
Dec 31, 2005, 3:23:05 AM12/31/05
to
In article <pan.2005.12.31....@REMOVETHIScyber.com.au>,

Steven D'Aprano <st...@REMOVETHIScyber.com.au> wrote:

> I was playing around with simple memoization and came up with something
> like this:
>
> _cache = {}
> def func(x):
> global _cache

There's no need to declare _cache as global, since you're not assigning
to it. So this global isn't all that pesky after all...

> if _cache.has_key(x):
> return _cache[x]
> else:
> result = x+1 # or a time consuming calculation...
> _cache[x] = result
> return result
>
> when it hit me if I could somehow bind the cache to the function, I could
> get rid of that pesky global variable.

[ ... ]


> What do folks think? Is there a better way?

I actually prefer such a global variable to the default arg trick. The
idiom I generally use is:

_cache = {}
def func(x):
result = _cache.get(x)
if result is None:
result = x + 1 # or a time consuming calculation...


_cache[x] = result
return result

Just

sk...@pobox.com

unread,
Dec 31, 2005, 5:44:10 AM12/31/05
to Just, pytho...@python.org

just> I actually prefer such a global variable to the default arg
just> trick. The idiom I generally use is:

just> _cache = {}
just> def func(x):
just> result = _cache.get(x)
just> if result is None:
just> result = x + 1 # or a time consuming calculation...
just> _cache[x] = result
just> return result

None of the responses I've seen mention the use of decorators such as the
one shown here:

http://wiki.python.org/moin/PythonDecoratorLibrary

While wrapping one function in another is obviously a bit slower, you can
memoize any function without tweaking its source.

Skip

Steven D'Aprano

unread,
Dec 31, 2005, 7:19:41 AM12/31/05
to
On Fri, 30 Dec 2005 21:08:29 -0800, Raymond Hettinger wrote:

> Steven D'Aprano wrote:
>> I was playing around with simple memoization and came up with something
>> like this:

[snip]

> Try something like this:
>
> def func(x, _cache={}):
> if x in cache:
> return cache[x]
> result = x + 1 # or a time consuming function
> return result

Of course. The most obvious thing is the least obvious :-)


> * If cache hits are the norm, then a try/except version would be a bit
> faster.
>
> * In your original code, the global declaration wasn't needed because
> the assigment to _cache wasn't changed, rather its contents were.

Sure, it isn't needed, but I still like to make it explicit. Call it a
foible of mine :-)

--
Steven

Steven D'Aprano

unread,
Dec 31, 2005, 7:21:43 AM12/31/05
to
On Sat, 31 Dec 2005 09:23:05 +0100, Just wrote:

> There's no need to declare _cache as global, since you're not assigning
> to it. So this global isn't all that pesky after all...

It is still a global variable, with all the potential Badness thereof,
even if you don't have to declare it.

--
Steven.

Tom Anderson

unread,
Jan 1, 2006, 4:57:24 PM1/1/06
to

I'd definitely say this is the way to go.

def memoised(fn):
cache = {}
def memoised_fn(*args):
if args in cache:
return cache[args]
else:
rtn = fn(*args)
cache[args] = rtn
return rtn
return memoised_fn

@memoised
def func(x):
return x + 1 # or a time-consuming calculation

tom

--
Exceptions say, there was a problem. Someone must deal with it. If you
won't deal with it, I'll find someone who will.

dr...@bigasterisk.com

unread,
Jan 4, 2006, 12:16:36 PM1/4/06
to
But while we're at it, how about that unhashable object problem?

@memoised
def func(x, i):
return x[i]

L = [1,2,3]
print func(L, 1) # TypeError: list objects are unhashable

What's the deal? My func can certainly be memoized (but possibly with a
slower lookup depending on how many args are "unhashable"). In fact,
list objects *could* be hashed:

class HashableList(list):
def __hash__(self):
return 0

L = HashableList([1,2,3])
print func(L, 1)

That's both correct and useful. I've written fancier memoize()
implementations in the past that added a __hash__ to incoming objects
that didn't have any, although handling lists would be a tricky case.
In my cases, most arguments could be hashed and the values that
couldn't were part of very small sets, so I actually had very few hash
table collisions.

I think python is broken here-- why aren't lists hashable, or why isn't
there a straightforward way to make memoised() work?

Tom Anderson

unread,
Jan 4, 2006, 3:29:42 PM1/4/06
to
On Wed, 4 Jan 2006 dr...@bigasterisk.com wrote:

> I think python is broken here-- why aren't lists hashable, or why isn't
> there a straightforward way to make memoised() work?

a = [1, 2, 3]
d = {a: "foo"}
a[0] = 0
print d[a]

I feel your pain, but i don't think lists (and mutable objects generally)
being unhashable is brokenness. I do think there's room for a range of
opinion, though, and i'm not sure what i think is right.

tom

--
Rapid oxidation is the new black. -- some Mike

Mike Meyer

unread,
Jan 4, 2006, 4:43:03 PM1/4/06
to
dr...@bigasterisk.com writes:
> class HashableList(list):
> def __hash__(self):
> return 0

This will suck for performance if you put a lot of lists in the same
dictionary. Faster is:

class FastHashableList(list):
def __hash__(self):
return id(self)

> I think python is broken here-- why aren't lists hashable, or why isn't
> there a straightforward way to make memoised() work?

There isn't a straightforward way to make memoised work because -
well, it's not a straightforward thing to make work.

For instance, consider the two hashable list implementations
above. Consider this:

@memoised
def broken(l):
return len(l) + 1

x = HL([1, 2, 3]) # Or FHL
print broken(x)
x.append(4)
print broken(x)

Will your memoised handle this case correctly?

Which is also why lists aren't hashable - there's no good definition
of the hash of a list that isn't broken for some common use case.

<mike
--
Mike Meyer <m...@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.

dr...@bigasterisk.com

unread,
Jan 5, 2006, 12:05:44 AM1/5/06
to
> Faster is:
>
>class FastHashableList(list):
> def __hash__(self):
> return id(self)

But that's wrong. Two equal objects must hash to the same value, and
you're not guaranteeing that at all. My degenerate __hash__ does
guarantee that, and I still claim the performance impact from
collisions will actually be very low if each unhashable argument is not
too diverse, which was the common scenario for me. That is, my Foo
objects were unhashable, but there were only like two Foo objects in
the program that might be passed to the memoized function (but with
many possible combinations of additional float-type arguments).

> There isn't a straightforward way to make memoised work because -
> well, it's not a straightforward thing to make work.

Seems straightforward to me, as long as equality is defined for the
argument types. A hash that spreads the argument values across a hash
table is a fine optimization, but I don't think a memoizer needs to
have optimal performance in that area to be useful. In many of the
cases I have come across, even the dumbest list search (i.e. what you
get if all args always hash to 0) will be much faster than actually
rerunning the original function.

> x = HL([1, 2, 3]) # Or FHL
> print broken(x)
> x.append(4)
> print broken(x)
>
> Will your memoised handle this case correctly?

For that one to work my fancy memoizer would have to make a copy of the
arguments. The comparisons will work fine, though. [1,2,3] !=
[1,2,3,4].

> Which is also why lists aren't hashable - there's no good definition
> of the hash of a list that isn't broken for some common use case.

Only if broken means 'slow'. My hunch is that lists weren't made
hashable because people would naively use them heavily as dict keys and
suffer poorer performance than if they used tuples.

I would still like to see a well-done memoizer that doesn't suffer from
the unhashable bug and maybe has a way to avoid the mutable object bug
that you presented in your last post.

The version I keep mentioning as "my fancy memoizer" is actually just a
hacked one that adds a __hash__ to the first argument, since that's all
I needed at the time.

Mike Meyer

unread,
Jan 5, 2006, 1:59:16 AM1/5/06
to
dr...@bigasterisk.com writes:
>> Faster is:
>>class FastHashableList(list):
>> def __hash__(self):
>> return id(self)
> But that's wrong.

You're right - I should have fixed equality while I was at it. The
results may not be what you had in mind, but it's a perfectly
reasonable use case.

>> There isn't a straightforward way to make memoised work because -
>> well, it's not a straightforward thing to make work.
> Seems straightforward to me, as long as equality is defined for the
> argument types. A hash that spreads the argument values across a hash
> table is a fine optimization, but I don't think a memoizer needs to
> have optimal performance in that area to be useful. In many of the
> cases I have come across, even the dumbest list search (i.e. what you
> get if all args always hash to 0) will be much faster than actually
> rerunning the original function.
>
>> x = HL([1, 2, 3]) # Or FHL
>> print broken(x)
>> x.append(4)
>> print broken(x)
>>
>> Will your memoised handle this case correctly?
>
> For that one to work my fancy memoizer would have to make a copy of the
> arguments. The comparisons will work fine, though. [1,2,3] !=
> [1,2,3,4].

Right - you need to make a copy. There are three things to consider
here:

1) the copy doesn't have to be a list. You can create a tuple of the
list and use that, and get the same effect without having create a
hashable list type.

2) You need to apply this hack for *every* mutable type that you want
to handle as an argument.

3) There will still be cases where it's broken:

@memoised
def still_broken_after_all_these_years(x):
return id(x) + 1

x = [1, 2, 3]
y = [1, 2, 3]

assert still_broken_after_all_these_years(x) != still_broken_after_all_these_ears(y)

>> Which is also why lists aren't hashable - there's no good definition
>> of the hash of a list that isn't broken for some common use case.
> Only if broken means 'slow'. My hunch is that lists weren't made
> hashable because people would naively use them heavily as dict keys and
> suffer poorer performance than if they used tuples.

You're only looking at *your* use case. The definition you used isn't
broken for that, just slow. For *other* use cases, it's broken. There
are cases where you want a list to be considered equal to itself as it
changes over time.

> I would still like to see a well-done memoizer that doesn't suffer from
> the unhashable bug and maybe has a way to avoid the mutable object bug
> that you presented in your last post.

Like I said, I don't think it's possible to do it in a way that
"works" for all possible functions. For instance, you could solve the
two bugs you mention in one fell swoop, by declaring that you don't
memoize calls that have mutable arguments, but call the function every
time for them. That's really the only solution that's going to work
for all functions. Except that "hashable" and "immutable" aren't
synonyms; there are hashable builtins that are mutable, so there are
arguments that will be memoized incorrectly.

0 new messages