Any way to use a range as a key in a dictionary?

Mudcat

unread,

Mar 26, 2009, 5:35:42 PM3/26/09

to

I would like to use a dictionary to store byte table information to
decode some binary data. The actual number of entries won't be that
large, at most 10. That leaves the other 65525 entries as 'reserved'
or 'other' but still need to be somehow accounted for when
referenced.

So there are a couple of ways to do this that I've seen. I can loop
that many times and create a huge dictionary. This isn't a good idea.
I can just assume if a key isn't there that it's not relevant. That's
a better idea.

However I wondered if there was a way to simply use a range as a key
reference somehow. I played around with some options with no success.
Or maybe there was another way to do this with another data type that
I haven't thought about.

Thanks

Daniel Fetchinson

unread,

Mar 26, 2009, 5:51:04 PM3/26/09

to pytho...@python.org

Ranges are lists and lists are unhashable. But tuples are hashable so
if you convert your lists into tuples, that will work:

>>> x = dict( )
>>> x[range(10)]='hello'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: list objects are unhashable
>>> x[tuple(range(10))]='hello'
>>> x[tuple(range(10))]
'hello'
>>>

--
Psss, psss, put it down! - http://www.cafepress.com/putitdown

andrew cooke

unread,

Mar 26, 2009, 6:10:18 PM3/26/09

to Mudcat, pytho...@python.org

Mudcat wrote:
> I would like to use a dictionary to store byte table information to
> decode some binary data. The actual number of entries won't be that
> large, at most 10. That leaves the other 65525 entries as 'reserved'
> or 'other' but still need to be somehow accounted for when
> referenced.

[...]

i don't completely follow what you are doing, but i currently use the
following to find a transition in a finite automaton for a regular
expression, and i suspect it's similar to what you want.

my problem is that i want to associate certain ranges of characters with a
particular value. for example, a-c might be 1, p-x might be 2. i use
intervals that are (a, b) tuples, where a and b are the inclusive bounds
(so (a, b) means a <= x <= b).

i have a class that is a collection of these intervals, called Character.

from bisect import bisect_left

class Character:
def __init__(self, intervals):
'''
Note that intervals must already be sorted.
'''
self.__intervals = intervals
self.__index = [interval[1] for interval in intervals]
def __contains__(self, c):
'''
Does the value lie within the intervals?
'''
if self.__index:
index = bisect_left(self.__index, c)
if index < len(self.__intervals):
(a, b) = self.__intervals[index]
return a <= c <= b
return False

this lets me test for whether things are in range:

>>> c = Character([('a','c'),('p','x')])
>>> 'a' in c
True
>>> 'm' in c
False

and i just realised that i deleted the code that lets me also associate
values with intervals! but hopefully that gives you the idea. bisect can
be used to find values via a sorted index. so once you find "index"
above, you can use that to return a value from an array.

i doubt this is super-fast (i think the bisect library is pure python),
but it was good enough for my initial attempt.

andrew

Paul Rubin

unread,

Mar 26, 2009, 6:13:53 PM3/26/09

to

Mudcat <mnat...@gmail.com> writes:
> However I wondered if there was a way to simply use a range as a key
> reference somehow.

Dictionaries aren't really the right structure for that. You want a
tree or binary search structure of some sort. The bisect module might
be helpful, but you will have to write some code by hand if you use
it.

bearoph...@lycos.com

unread,

Mar 26, 2009, 9:43:47 PM3/26/09

to

An interval map maybe?
http://code.activestate.com/recipes/457411/

A programmer has to know the name of many data structures :-)

Bye,
bearophile

Aaron Brady

unread,

Mar 26, 2009, 10:18:57 PM3/26/09

to

On Mar 26, 5:10 pm, "andrew cooke" <and...@acooke.org> wrote:
> Mudcat wrote:
> > I would like to use a dictionary to store byte table information to
> > decode some binary data. The actual number of entries won't be that
> > large, at most 10. That leaves the other 65525 entries as 'reserved'
> > or 'other' but still need to be somehow accounted for when
> > referenced.
>
> [...]
>
> i don't completely follow what you are doing, but i currently use the
> following to find a transition in a finite automaton for a regular
> expression, and i suspect it's similar to what you want.
>
> my problem is that i want to associate certain ranges of characters with a
> particular value. for example, a-c might be 1, p-x might be 2. i use
> intervals that are (a, b) tuples, where a and b are the inclusive bounds
> (so (a, b) means a <= x <= b).
>

snip

>
> and i just realised that i deleted the code that lets me also associate
> values with intervals! but hopefully that gives you the idea. bisect can
> be used to find values via a sorted index. so once you find "index"
> above, you can use that to return a value from an array.
>
> i doubt this is super-fast (i think the bisect library is pure python),
> but it was good enough for my initial attempt.
>
> andrew

I wonder if there is a faster method than a binary search, or what the
proof that there isn't is.

Carl Banks

unread,

Mar 26, 2009, 10:45:51 PM3/26/09

to

On Mar 26, 2:35 pm, Mudcat <mnati...@gmail.com> wrote:
> I would like to use a dictionary to store byte table information to
> decode some binary data. The actual number of entries won't be that
> large, at most 10. That leaves the other 65525 entries as 'reserved'
> or 'other' but still need to be somehow accounted for when
> referenced.

I seems like all you really need it to use the get method. If the
item doesn't exist in the dictionary it returns None (or whatever you
pass in as the optional second argument). For instance:

d = { 1: "s", 56: "w", 4363: "n", 8953: "k" }

d.get(1) -> "s"
d.get(56) -> "w"
d.get(10) -> None
d.get(10,"x") -> "x"

Carl Banks

Message has been deleted

Paul Rubin

unread,

Mar 27, 2009, 2:02:21 AM3/27/09

to

Dennis Lee Bieber <wlf...@ix.netcom.com> writes:
> ... v = theDict.get(x, NOT_RELEVANT)
> ... if v is not NOT_RELEVANT:
> ... print x, v

I think you'd normally do this with

if x in theDict:
print x, v

but the OP was asking about a different problem, involving looking up
numeric ranges, which could conceivably be very large, thus the
suggestions about interval trees and so forth.

Carl Banks

unread,

Mar 27, 2009, 11:50:02 AM3/27/09

to

On Mar 26, 11:02 pm, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:

> Dennis Lee Bieber <wlfr...@ix.netcom.com> writes:
>
> > ... v = theDict.get(x, NOT_RELEVANT)
> > ... if v is not NOT_RELEVANT:
> > ... print x, v
>
> I think you'd normally do this with
>
> if x in theDict:
> print x, v

Where does v come from?

> but the OP was asking about a different problem, involving looking up
> numeric ranges, which could conceivably be very large, thus the
> suggestions about interval trees and so forth.

The fact that the OP asked about what to do with the the other 65526
values suggests that he only wanted to use intervals to fill in the
gaps between meaningful values (thus having complete coverage over the
16-bit integer range). An interval tree would not be a good approach
for that situation.

Carl Banks

Paul Rubin

unread,

Mar 27, 2009, 2:20:23 PM3/27/09

to

Carl Banks <pavlove...@gmail.com> writes:
> > if x in theDict:
> > print x, v
>
> Where does v come from?

Oops, pasted from original. Meant of course "print x, theDict[x]".

Carl Banks

unread,

Mar 27, 2009, 3:18:55 PM3/27/09

to

On Mar 27, 11:20 am, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:

You have look up x twice with that code, whereas you wouldn't have to
with this:

v = theDict.get(x)
if v is not None:
print x, v

or with this:

try:
v = theDict[x]
except KeyError:
pass
else:
print x,v

Also, these minimize the number of times you have to type x. Thus I
recommend either of these two, or a defaultdict, for this.

Carl Banks

Peter Otten

unread,

Mar 27, 2009, 3:37:55 PM3/27/09

to

Mudcat wrote:

You can use a defaultdict

>>> from collections import defaultdict
>>> d = defaultdict(lambda: "reserved", [(1,2), (2,3)])
>>> d[1]
2
>>> d[2]
3
>>> d[42]
'reserved'

If the keys are successive integers starting at 0 a list is also an option.
It makes setting ranges to a particular value easy:

>>> d = ["reserved"]*2**16
>>> d[10:20] = [42]*10
>>> d[5], d[10], d[15], d[20]
('reserved', 42, 42, 'reserved')

Peter

Duncan Booth

unread,

Mar 27, 2009, 4:06:51 PM3/27/09

to

Carl Banks <pavlove...@gmail.com> wrote:

> On Mar 27, 11:20 am, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
>> Carl Banks <pavlovevide...@gmail.com> writes:
>> > > if x in theDict:
>> > > print x, v
>>
>> > Where does v come from?
>>
>> Oops, pasted from original. Meant of course "print x, theDict[x]".
>
> You have look up x twice with that code, whereas you wouldn't have to
> with this:
>
> v = theDict.get(x)
> if v is not None:
> print x, v
>

Note that while you only lookup x in the dict once your code does still
involve two dict lookups: once to lookup the get method and once to lookup
x. It also involves creating a new bound method object so if performance is
a concern you'll find that either the 'if x in theDict: ...' or the
try...except are likely to be faster.

Carl Banks

unread,

Mar 27, 2009, 9:11:56 PM3/27/09

to

On Mar 27, 1:06 pm, Duncan Booth <duncan.bo...@invalid.invalid> wrote:

Not necessarily: if the hash calculation for x is expensive enough the
get version would still be faster. If all you're doing is looking up
strings or ints (as the OP is doing) it's hardly going to matter
either way.

Carl Banks

andrew cooke

unread,

Mar 27, 2009, 11:44:28 PM3/27/09

to pytho...@python.org

andrew cooke wrote:
> i don't completely follow what you are doing, but i currently use the
> following to find a transition in a finite automaton for a regular
> expression, and i suspect it's similar to what you want.

i get the impression the original poster went away, and maybe they just
wanted dict.get(x, default) anyway, but i ended up needing this, so here
it is. it makes a vague attempt to trade memory for lookup speed (which
will be in a tight loop) and is for open intervals (eg integer intervals,
not floats).

class IntervalMap(dict):
'''
Note - this is for open intervals! This means it will not work as
expected for continuous variables (which will overlap when two intervals
share a single boundary value). In other words, you cannot store
(1,2) and (2,3) together because both contain 2.
'''

def __init__(self):
# None is used as a flag to indicate that a new index is needed
self.__index = None

def index(self):
'''
Build the internal indices. Called automatically when necessary.
'''
second = lambda x: x[1]
self.__intervals = list(sorted(self.keys(), key=second))
self.__index = list(map(second, self.__intervals))

def __setitem__(self, interval, value):
# these are rather inefficient, but perhaps useful during development
assert None == self[interval[0]], 'Overlap'
assert None == self[interval[1]], 'Overlap'
self.__index = None
super(IntervalMap, self).__setitem__(interval, value)

def __getitem__(self, point):
'''
The argument here is a single value, not an interval.
'''
if self.__index is None:
self.index()
if self.__index:
index = bisect_left(self.__index, point)
if index < len(self.__index):
# keep interval for identity on retrieval, just in case
(a, b) = interval = self.__intervals[index]
if a <= point <= b:
return super(IntervalMap, self).__getitem__(interval)
return None

def __delitem__(self, interval):
self.__index = None
super(IntervalMap, self).__delitem__(interval)

andrew

Paul Rubin

unread,

Mar 28, 2009, 12:08:04 AM3/28/09

to

Carl Banks <pavlove...@gmail.com> writes:
> Not necessarily: if the hash calculation for x is expensive enough the
> get version would still be faster.

Yeah, the get version with the special marker value is just ugly IMO,
as is the version with exceptions. Maybe there should be a two-value
version:

found, value = d.xget(key [,default])

xget returns a 2-tuple, the first element of which is a boolean
indicating whether the key was present. The second is the value (if
present) or the default (if supplied), or None.

Another possible interface:

values = d.vget(key)

This returns a list containing the values found for the key, or an
empty list if none are found. In the case where d is a dict,
the list is therefore always either 1-element or empty, but one
could imagine other mapping types where longer lists could come back.

Message has been deleted

Paul Rubin

unread,

Mar 28, 2009, 2:56:40 PM3/28/09

to

Dennis Lee Bieber <wlf...@ix.netcom.com> writes:

> And what real difference is that going to gain versus the existing
> .get() method where the default is a sentinel?

It's just less ugly. I don't know a way to get a unique sentinel
other than "sentinel = object()" or something like that, creating more
clutter. Maybe in a specific application, I can figure out a value
that won't occur in the dictionary and use that, but it's cleaner to
use a generic construction.