So there are a couple of ways to do this that I've seen. I can loop
that many times and create a huge dictionary. This isn't a good idea.
I can just assume if a key isn't there that it's not relevant. That's
a better idea.
However I wondered if there was a way to simply use a range as a key
reference somehow. I played around with some options with no success.
Or maybe there was another way to do this with another data type that
I haven't thought about.
Thanks
Ranges are lists and lists are unhashable. But tuples are hashable so
if you convert your lists into tuples, that will work:
>>> x = dict( )
>>> x[range(10)]='hello'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: list objects are unhashable
>>> x[tuple(range(10))]='hello'
>>> x[tuple(range(10))]
'hello'
>>>
--
Psss, psss, put it down! - http://www.cafepress.com/putitdown
i don't completely follow what you are doing, but i currently use the
following to find a transition in a finite automaton for a regular
expression, and i suspect it's similar to what you want.
my problem is that i want to associate certain ranges of characters with a
particular value. for example, a-c might be 1, p-x might be 2. i use
intervals that are (a, b) tuples, where a and b are the inclusive bounds
(so (a, b) means a <= x <= b).
i have a class that is a collection of these intervals, called Character.
from bisect import bisect_left
class Character:
def __init__(self, intervals):
'''
Note that intervals must already be sorted.
'''
self.__intervals = intervals
self.__index = [interval[1] for interval in intervals]
def __contains__(self, c):
'''
Does the value lie within the intervals?
'''
if self.__index:
index = bisect_left(self.__index, c)
if index < len(self.__intervals):
(a, b) = self.__intervals[index]
return a <= c <= b
return False
this lets me test for whether things are in range:
>>> c = Character([('a','c'),('p','x')])
>>> 'a' in c
True
>>> 'm' in c
False
and i just realised that i deleted the code that lets me also associate
values with intervals! but hopefully that gives you the idea. bisect can
be used to find values via a sorted index. so once you find "index"
above, you can use that to return a value from an array.
i doubt this is super-fast (i think the bisect library is pure python),
but it was good enough for my initial attempt.
andrew
Dictionaries aren't really the right structure for that. You want a
tree or binary search structure of some sort. The bisect module might
be helpful, but you will have to write some code by hand if you use
it.
A programmer has to know the name of many data structures :-)
Bye,
bearophile
I wonder if there is a faster method than a binary search, or what the
proof that there isn't is.
I seems like all you really need it to use the get method. If the
item doesn't exist in the dictionary it returns None (or whatever you
pass in as the optional second argument). For instance:
d = { 1: "s", 56: "w", 4363: "n", 8953: "k" }
d.get(1) -> "s"
d.get(56) -> "w"
d.get(10) -> None
d.get(10,"x") -> "x"
Carl Banks
I think you'd normally do this with
if x in theDict:
print x, v
but the OP was asking about a different problem, involving looking up
numeric ranges, which could conceivably be very large, thus the
suggestions about interval trees and so forth.
Where does v come from?
> but the OP was asking about a different problem, involving looking up
> numeric ranges, which could conceivably be very large, thus the
> suggestions about interval trees and so forth.
The fact that the OP asked about what to do with the the other 65526
values suggests that he only wanted to use intervals to fill in the
gaps between meaningful values (thus having complete coverage over the
16-bit integer range). An interval tree would not be a good approach
for that situation.
Carl Banks
Oops, pasted from original. Meant of course "print x, theDict[x]".
You have look up x twice with that code, whereas you wouldn't have to
with this:
v = theDict.get(x)
if v is not None:
print x, v
or with this:
try:
v = theDict[x]
except KeyError:
pass
else:
print x,v
Also, these minimize the number of times you have to type x. Thus I
recommend either of these two, or a defaultdict, for this.
Carl Banks
You can use a defaultdict
>>> from collections import defaultdict
>>> d = defaultdict(lambda: "reserved", [(1,2), (2,3)])
>>> d[1]
2
>>> d[2]
3
>>> d[42]
'reserved'
If the keys are successive integers starting at 0 a list is also an option.
It makes setting ranges to a particular value easy:
>>> d = ["reserved"]*2**16
>>> d[10:20] = [42]*10
>>> d[5], d[10], d[15], d[20]
('reserved', 42, 42, 'reserved')
Peter
> On Mar 27, 11:20 am, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
>> Carl Banks <pavlovevide...@gmail.com> writes:
>> > > if x in theDict:
>> > > print x, v
>>
>> > Where does v come from?
>>
>> Oops, pasted from original. Meant of course "print x, theDict[x]".
>
> You have look up x twice with that code, whereas you wouldn't have to
> with this:
>
> v = theDict.get(x)
> if v is not None:
> print x, v
>
Note that while you only lookup x in the dict once your code does still
involve two dict lookups: once to lookup the get method and once to lookup
x. It also involves creating a new bound method object so if performance is
a concern you'll find that either the 'if x in theDict: ...' or the
try...except are likely to be faster.
Not necessarily: if the hash calculation for x is expensive enough the
get version would still be faster. If all you're doing is looking up
strings or ints (as the OP is doing) it's hardly going to matter
either way.
Carl Banks
i get the impression the original poster went away, and maybe they just
wanted dict.get(x, default) anyway, but i ended up needing this, so here
it is. it makes a vague attempt to trade memory for lookup speed (which
will be in a tight loop) and is for open intervals (eg integer intervals,
not floats).
class IntervalMap(dict):
'''
Note - this is for open intervals! This means it will not work as
expected for continuous variables (which will overlap when two intervals
share a single boundary value). In other words, you cannot store
(1,2) and (2,3) together because both contain 2.
'''
def __init__(self):
# None is used as a flag to indicate that a new index is needed
self.__index = None
def index(self):
'''
Build the internal indices. Called automatically when necessary.
'''
second = lambda x: x[1]
self.__intervals = list(sorted(self.keys(), key=second))
self.__index = list(map(second, self.__intervals))
def __setitem__(self, interval, value):
# these are rather inefficient, but perhaps useful during development
assert None == self[interval[0]], 'Overlap'
assert None == self[interval[1]], 'Overlap'
self.__index = None
super(IntervalMap, self).__setitem__(interval, value)
def __getitem__(self, point):
'''
The argument here is a single value, not an interval.
'''
if self.__index is None:
self.index()
if self.__index:
index = bisect_left(self.__index, point)
if index < len(self.__index):
# keep interval for identity on retrieval, just in case
(a, b) = interval = self.__intervals[index]
if a <= point <= b:
return super(IntervalMap, self).__getitem__(interval)
return None
def __delitem__(self, interval):
self.__index = None
super(IntervalMap, self).__delitem__(interval)
andrew
Yeah, the get version with the special marker value is just ugly IMO,
as is the version with exceptions. Maybe there should be a two-value
version:
found, value = d.xget(key [,default])
xget returns a 2-tuple, the first element of which is a boolean
indicating whether the key was present. The second is the value (if
present) or the default (if supplied), or None.
Another possible interface:
values = d.vget(key)
This returns a list containing the values found for the key, or an
empty list if none are found. In the case where d is a dict,
the list is therefore always either 1-element or empty, but one
could imagine other mapping types where longer lists could come back.
It's just less ugly. I don't know a way to get a unique sentinel
other than "sentinel = object()" or something like that, creating more
clutter. Maybe in a specific application, I can figure out a value
that won't occur in the dictionary and use that, but it's cleaner to
use a generic construction.