I assume it would be something like...
list.index(something,mycmp)
Thanks!
or can i just say....
list.index.__cmp__ = mycmp
and do it that way? I just want to make sure I'm not doing anything
evil.
Wouldn't it be enough to get the items that are "within a couple of
seconds" out of the list and into another list. Then you can process
the other list however you want. Like this:
def isNew(x):
return x < 5
data = range(20)
print data
out, data = filter(isNew, data), filter(lambda x: not isNew(x), data)
print out, data
Why do you want to use 'index'?
Your suggestion "list.index.__cmp__ = mycmp" certainly doesn't do
anything good. In fact, it just fails because the assignment is
illegal.. I don't think any documentation suggests doing that, so why
are you even trying to do that? It's just not a good idea to invent
semantics and hope that they work, in general.
Slightly off topic here, but these uses of filter will be slower than
the list comprehension equivalents::
out = [x for x in data if x < 5]
data = [x for x in data if x >= 5]
Here are sample timings::
$ python -m timeit -s "data = range(20)" -s "def is_new(x): return x <
5" "filter(is_new, data)"
100000 loops, best of 3: 5.05 usec per loop
$ python -m timeit -s "data = range(20)" "[x for x in data if x < 5]"
100000 loops, best of 3: 2.15 usec per loop
Functions like filter() and map() are really only more efficient when
you have an existing C-coded function, like ``map(str, items)``. Of
course, if the filter() code is clearer to you, feel free to use it, but
I find that most folks find list comprehensions easier to read than
map() and filter() code.
STeVe
This sounds like you want itertools.groupby. What is the exact
requirement?
The obvious option is reimplementing the functionality of index as an
explicit loop, such as:
def myindex(lst, something, mycmp):
for i, el in enumerate(lst):
if mycmp(el, something) == 0:
return i
raise ValueError("element not in list")
Looping in Python is slower than looping in C, but since you're
calling a Python function per element anyway, the loop overhead might
be negligible.
A more imaginative way is to take advantage of the fact that index
uses the '==' operator to look for the item. You can create an object
whose == operator calls your comparison function and use that object
as the argument to list.index:
class Cmp(object):
def __init__(self, item, cmpfun):
self.item = item
self.cmpfun = cmpfun
def __eq__(self, other):
return self.cmpfun(self.item, other) == 0
# list.index(Cmp(something, mycmp))
For example:
>>> def mycmp(s1, s2):
... return cmp(s1.tolower(), s2.tolower())
>>> ['foo', 'bar', 'baz'].index(Cmp('bar', mycmp))
1
>>> ['foo', 'bar', 'baz'].index(Cmp('Bar', mycmp))
1
>>> ['foo', 'bar', 'baz'].index(Cmp('nosuchelement', mycmp))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: list.index(x): x not in list
The timeit module shows, somewhat surprisingly, that the first method
is ~1.5 times faster, even for larger lists.
> On Sep 28, 12:30 pm, xkenneth <xkenn...@gmail.com> wrote:
>> Looking to do something similair. I'm working with alot of timestamps
>> and if they're within a couple seconds I need them to be indexed and
>> removed from a list.
>> Is there any possible way to index with a custom cmp() function?
The comparison is made by the list elements themselves (using their __eq__
or __cmp__), not by the index method nor the list object.
So you should modify __cmp__ for all your timestamps (datetime.datetime, I
presume?), but that's not very convenient. A workaround is to wrap the
object you are searching into a new, different class - since the list
items won't know how to compare to it, Python will try reversing the
operands.
datetime objects are a bit special in this behavior: they refuse to
compare to anything else unless the other object has a `timetuple`
attribute (see <http://docs.python.org/lib/datetime-date.html> note (4))
<code>
import datetime
class datetime_tol(object):
timetuple=None # unused, just to trigger the reverse comparison to
datetime objects
default_tolerance = datetime.timedelta(0, 10)
def __init__(self, dt, tolerance=None):
if tolerance is None:
tolerance = self.default_tolerance
self.dt = dt
self.tolerance = tolerance
def __cmp__(self, other):
tolerance = self.tolerance
if isinstance(other, datetime_tol):
tolerance = min(tolerance, other.tolerance)
other = other.dt
if not isinstance(other, datetime.datetime):
return cmp(self.dt, other)
delta = self.dt-other
return -1 if delta<-tolerance else 1 if delta>tolerance else 0
def index_tol(dtlist, dt, tolerance=None):
return dtlist.index(datetime_tol(dt, tolerance))
d1 = datetime.datetime(2007, 7, 18, 9, 20, 0)
d2 = datetime.datetime(2007, 7, 18, 9, 30, 25)
d3 = datetime.datetime(2007, 7, 18, 9, 30, 30)
d4 = datetime.datetime(2007, 7, 18, 9, 30, 35)
d5 = datetime.datetime(2007, 7, 18, 9, 40, 0)
L = [d1,d2,d3,d4,d5]
assert d3 in L
assert L.index(d3)==2
assert L.index(datetime_tol(d3))==1 # using 10sec tolerance
assert index_tol(L, d3)==1
assert index_tol(L, datetime.datetime(2007, 7, 18, 9, 43, 20),
datetime.timedelta(0, 5*60))==4 # 5 minutes tolerance
</code>
--
Gabriel Genellina
Hrvoje,
That's fun! thx.
--Alan
the cut-n-paste version /w minor fix to 'lower'.
# ----------------------------------------------
class Cmp(object):
def __init__(self, item, cmpfun):
self.item = item
self.cmpfun = cmpfun
def __eq__(self, other):
return self.cmpfun(self.item, other) == 0
def mycmp(s1, s2):
return cmp(s1.lower(), s2.lower())
print ['foo', 'bar', 'baz'].index(Cmp('bar', mycmp))
print ['foo', 'bar', 'baz'].index(Cmp('Bar', mycmp))
try:
print ['foo', 'bar', 'baz'].index(Cmp('nosuchelement', mycmp))
except ValueError:
print "Search String not found!"
# end example