Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

proposal for slice hashing

146 views
Skip to first unread message

Will Bradshaw

unread,
May 11, 2020, 3:59:02 PM5/11/20
to
I recently ran into a situation where I needed to hash a slice and found this to be unsupported. This seems silly as there is little benefit of slices not supporting hashing and large downsides. This coupled with the fact that one cannot subclass slices is a significant and needless annoyance.

It seems to me a hash of a slices would simply be a hash of a slice would (in python) be:
def __hash__(self):
return hash((self.start, self.stop, self.step))
I can see no reason why this is not the case


Context:

I have an object that presents itself as a multidimensional array in the numpy style but is computes it's values on __getitem__ calls as it represents an infinite field of numbers. However this operation is expensive. In many cases the same slice of the field will be needed repeatedly. This lends itself to using an lru cache on __getitem__() however this does not work as slices cannot be stored in the dictionary key used for the lru_cache.*


The only options as of now are:
1. use 3 layers of wrappers to pack the slices into a custom type that supports hashing pass this mangled version of the arguments through lru_cache wrapper into a function that reverses the process of the first wrapper and passes through to the underlying implementation. (see below "code for workaround" as example)
- This is kinda jank and arguably slow. Though in my case the cost of the calculation operation dwarfs this cost by an several orders of magnitude.
- mapping may be unreliable and is a rather long and impenetrable mess.
2. implementing my own custom caching for this situation which does not scale well and is a heck of a lot of work.
3. implement a special case for slices in the lru_cache function. However, this is just moving the problem into the functools library.


* While I do realize that further efficiency gains would be found by caching results in a sparse and tracking what is and is not filled in the field thus keeping what is essentially a by cell cache that would be significantly more work and unlikely to yield much better results in my case and still doesn't solve the general use case.


Code for Workaround:

from functools import lru_cache


class _hashable_slice(object):
__slots__ = ['slice']

def __init__(self, s: slice):
self.slice = s

def __hash__(self):
return hash((self.slice.start, self.slice.stop, self.slice.step))

def __eq__(self, other):
return other == self.slice


def slice_lru_cache(*lru_args, **lru_kwargs):
lru_wrapper = lru_cache(*lru_args, **lru_kwargs)

def decorator(f):
@lru_wrapper
def inner(*args, **kwargs):
def unpack(x):
if isinstance(x, _hashable_slice):
return x.slice
if isinstance(x, (tuple, list)):
return type(x)(unpack(v) for v in x)
else:
return x

return f(*(unpack(v) for v in args),
**{k: unpack(v) for k, v in kwargs.items()})

def wrapper(*args, **kwargs):
def pack(x):
if isinstance(x, slice):
return _hashable_slice(x)
if isinstance(x, (tuple, list)):
return type(x)(pack(v) for v in x)
else:
return x
return inner(*(pack(v) for v in args),
**{k: pack(v) for k, v in kwargs.items()})
wrapper.__getattr__ = lambda name: getattr(inner, name)
return wrapper
return decorator

Chris Angelico

unread,
May 11, 2020, 4:10:56 PM5/11/20
to
On Tue, May 12, 2020 at 6:01 AM Will Bradshaw <defena...@gmail.com> wrote:
> The only options as of now are:
> 1. use 3 layers of wrappers to pack the slices into a custom type that supports hashing pass this mangled version of the arguments through lru_cache wrapper into a function that reverses the process of the first wrapper and passes through to the underlying implementation. (see below "code for workaround" as example)
> - This is kinda jank and arguably slow. Though in my case the cost of the calculation operation dwarfs this cost by an several orders of magnitude.
> - mapping may be unreliable and is a rather long and impenetrable mess.
> 2. implementing my own custom caching for this situation which does not scale well and is a heck of a lot of work.
> 3. implement a special case for slices in the lru_cache function. However, this is just moving the problem into the functools library.
>

4. Implement __getitem__ as a wrapper around a caching lookup function
that simply takes the three arguments.

def __getitem__(self, slice):
return generate_values(slice.start, slice.stop, slice.step)

@lru_cache
def generate_values(start, stop, step):
...

Not sure if this makes it easy enough to not worry about the hashability.

ChrisA

Will Bradshaw

unread,
May 11, 2020, 4:27:04 PM5/11/20
to
does not work in the case of multi dimensional __getitem__ calls in the numpy style which happens to be my case as the number of dimensions in the indexes changes by case and all of the following are supported for each axis: slice, array of indexes, int index, and other custom index object types which I am hesitant to disclose)

Chris Angelico

unread,
May 11, 2020, 4:45:55 PM5/11/20
to
Ah, fair enough. Was just looking for a solution that would work on
existing Python versions rather than demanding an upgrade to 3.10 :)

Your use-case isn't common, but on the other hand, making slice
objects hashable doesn't seem like it'd break anything. Obviously
they'd only be hashable if start/stop/step are all hashable, but
that's no different from tuples. +0.5.

BTW, your third option (moving the problem to functools) might
actually be easier than you think. You'd ultimately just be changing
the behaviour of _make_key. Unfortunately there's no easy way to
replace that function for a specific lru_cache, but that might itself
be a useful feature. Imagine this:

@lru_cache(make_key=hashable_slices)
def __getitem__(self, item):
...

Current implementation has a single global _make_key function that
then gets snapshotted into the closure, so you could do something
hacky as a proof of concept:

old = functools._make_key
functools._make_key = hashable_slices
@lru_cache
def __getitem__(self, item):
...
functools._make_key = old

Obviously that's bad code, but it's a great low-effort proof of concept :)

ChrisA

Will Bradshaw

unread,
May 11, 2020, 6:09:31 PM5/11/20
to
On Monday, May 11, 2020 at 4:45:55 PM UTC-4, Chris Angelico wrote:
> On Tue, May 12, 2020 at 6:31 AM Will Bradshaw <defena...@gmail.com> wrote:
> >
> > On Monday, May 11, 2020 at 4:10:56 PM UTC-4, Chris Angelico wrote:
> > > On Tue, May 12, 2020 at 6:01 AM Will Bradshaw <defena...@gmail.com> wrote:
> > > > The only options as of now are:
> > > > 1. use 3 layers of wrappers to pack the slices into a custom type that supports hashing pass this mangled version of the arguments through lru_cache wrapper into a function that reverses the process of the first wrapper and passes through to the underlying implementation. (see below "code for workaround" as example)
> > > > - This is kinda jank and arguably slow. Though in my case the cost of the calculation operation dwarfs this cost by an several orders of magnitude.
> > > > - mapping may be unreliable and is a rather long and impenetrable mess.
> > > > 2. implementing my own custom caching for this situation which does not scale well and is a heck of a lot of work.
> > > > 3. implement a special case for slices in the lru_cache function. However, this is just moving the problem into the functools library.
> > > >
> > >
> > > 4. Implement __getitem__ as a wrapper around a caching lookup function
> > > that simply takes the three arguments.
> > >
> > > def __getitem__(self, slice):
> > > return generate_values(slice.start, slice.stop, slice.step)
> > >
> > > @lru_cache
> > > def generate_values(start, stop, step):
> > > ...
> > >
> > > Not sure if this makes it easy enough to not worry about the hashability.
> > >
> > > ChrisA
> >
> > does not work in the case of multi dimensional __getitem__ calls in the numpy style which happens to be my case as the number of dimensions in the indexes changes by case and all of the following are supported for each axis: slice, array of indexes, int index, and other custom index object types which I am hesitant to disclose)
> >
>
> Ah, fair enough. Was just looking for a solution that would work on
> existing Python versions rather than demanding an upgrade to 3.10 :)

I have a solution to the issue in current python, it was given in my first post at the bottom. I implemented a variant of it for my work. Just because there is a hack does not mean a fix should not be implemented.

> Your use-case isn't common, but on the other hand, making slice
> objects hashable doesn't seem like it'd break anything. Obviously
> they'd only be hashable if start/stop/step are all hashable, but
> that's no different from tuples. +0.5.


I'm certain that my use case is not common. However, there are certainly many uncommon use cases in which being able to have a slice in a dictionary key or otherwise hashing a slice could be useful.

>
> BTW, your third option (moving the problem to functools) might
> actually be easier than you think. You'd ultimately just be changing
> the behaviour of _make_key. Unfortunately there's no easy way to
> replace that function for a specific lru_cache, but that might itself
> be a useful feature. Imagine this:
>
> @lru_cache(make_key=hashable_slices)
> def __getitem__(self, item):
> ...

This is a good idea as it would allow for quite a bit of tuning by the user and allow trade offs to be made between cache lookup complexity and potentially unnecessary re-computes.

Terry Reedy

unread,
May 12, 2020, 10:58:04 AM5/12/20
to
On 5/11/2020 3:58 PM, Will Bradshaw wrote:
> I recently ran into a situation where I needed to hash a slice and found this to be unsupported.

Slice objects, as opposed to slices of objects, have no explicit use in
Python itself. They were added for use by "NumericalPython and other
3rd party extensions".
https://docs.python.org/3/library/functions.html#slice
They have the functionality needed for that use. If this does not meet
your needs, don't use them.

Slice objects are similar to named tuples in having a fixed number of
immutable fields accessed by name.
https://docs.python.org/3/glossary.html#term-named-tuple
They were added before collections.namedtuple but if added today *for
general use*, probably would be nametuples, with hashing and anything
that comes from subclassing tuple.

So either use the namedtuple factory or roll your own to make something
that works for you.


--
Terry Jan Reedy

Paul Rubin

unread,
May 12, 2020, 5:51:37 PM5/12/20
to
Terry Reedy <tjr...@udel.edu> writes:
> Slice objects are similar to named tuples

In that case making them hashable sounds like good sense.

Will Bradshaw

unread,
May 16, 2020, 6:12:06 PM5/16/20
to
I mean I've already made and tested a patch it wasn't hard. I did same day as I made the post. I've also done a bit of work on the proposal to make the key function changeable in functools.lru_cache though that is a bit more complex and will take a bit of refactoring to do right.
0 new messages