Can I overload the compare (cmp()) function for a Lists ([]) index function?

xkenneth

unread,

Sep 28, 2007, 1:30:52 PM9/28/07

to

Looking to do something similair. I'm working with alot of timestamps
and if they're within a couple seconds I need them to be indexed and
removed from a list.
Is there any possible way to index with a custom cmp() function?

I assume it would be something like...

list.index(something,mycmp)

Thanks!

xkenneth

unread,

Sep 28, 2007, 1:36:54 PM9/28/07

to

or can i just say....

list.index.__cmp__ = mycmp

and do it that way? I just want to make sure I'm not doing anything
evil.

irs...@gmail.com

unread,

Sep 28, 2007, 2:06:07 PM9/28/07

to

Wouldn't it be enough to get the items that are "within a couple of
seconds" out of the list and into another list. Then you can process
the other list however you want. Like this:

def isNew(x):
return x < 5

data = range(20)
print data
out, data = filter(isNew, data), filter(lambda x: not isNew(x), data)
print out, data

Why do you want to use 'index'?

Your suggestion "list.index.__cmp__ = mycmp" certainly doesn't do
anything good. In fact, it just fails because the assignment is
illegal.. I don't think any documentation suggests doing that, so why
are you even trying to do that? It's just not a good idea to invent
semantics and hope that they work, in general.

Steven Bethard

unread,

Sep 28, 2007, 7:42:49 PM9/28/07

to

irs...@gmail.com wrote:
> On Sep 28, 8:30 pm, xkenneth <xkenn...@gmail.com> wrote:
>> Looking to do something similair. I'm working with alot of timestamps
>> and if they're within a couple seconds I need them to be indexed and
>> removed from a list.
>> Is there any possible way to index with a custom cmp() function?
>>
>> I assume it would be something like...
>>
>> list.index(something,mycmp)
>>
>> Thanks!
>
> Wouldn't it be enough to get the items that are "within a couple of
> seconds" out of the list and into another list. Then you can process
> the other list however you want. Like this:
>
> def isNew(x):
> return x < 5
>
> data = range(20)
> print data
> out, data = filter(isNew, data), filter(lambda x: not isNew(x), data)
> print out, data

Slightly off topic here, but these uses of filter will be slower than
the list comprehension equivalents::

out = [x for x in data if x < 5]
data = [x for x in data if x >= 5]

Here are sample timings::

$ python -m timeit -s "data = range(20)" -s "def is_new(x): return x <
5" "filter(is_new, data)"
100000 loops, best of 3: 5.05 usec per loop
$ python -m timeit -s "data = range(20)" "[x for x in data if x < 5]"
100000 loops, best of 3: 2.15 usec per loop

Functions like filter() and map() are really only more efficient when
you have an existing C-coded function, like ``map(str, items)``. Of
course, if the filter() code is clearer to you, feel free to use it, but
I find that most folks find list comprehensions easier to read than
map() and filter() code.

STeVe

Paul Rubin

unread,

Sep 28, 2007, 8:07:54 PM9/28/07

to

xkenneth <xken...@gmail.com> writes:
> Looking to do something similair. I'm working with alot of timestamps
> and if they're within a couple seconds I need them to be indexed and
> removed from a list.
> Is there any possible way to index with a custom cmp() function?

This sounds like you want itertools.groupby. What is the exact
requirement?

Hrvoje Niksic

unread,

Sep 28, 2007, 8:12:24 PM9/28/07

to

xkenneth <xken...@gmail.com> writes:

The obvious option is reimplementing the functionality of index as an
explicit loop, such as:

def myindex(lst, something, mycmp):
for i, el in enumerate(lst):
if mycmp(el, something) == 0:
return i
raise ValueError("element not in list")

Looping in Python is slower than looping in C, but since you're
calling a Python function per element anyway, the loop overhead might
be negligible.

A more imaginative way is to take advantage of the fact that index
uses the '==' operator to look for the item. You can create an object
whose == operator calls your comparison function and use that object
as the argument to list.index:

class Cmp(object):
def __init__(self, item, cmpfun):
self.item = item
self.cmpfun = cmpfun
def __eq__(self, other):
return self.cmpfun(self.item, other) == 0

# list.index(Cmp(something, mycmp))

For example:

>>> def mycmp(s1, s2):
... return cmp(s1.tolower(), s2.tolower())
>>> ['foo', 'bar', 'baz'].index(Cmp('bar', mycmp))
1
>>> ['foo', 'bar', 'baz'].index(Cmp('Bar', mycmp))
1
>>> ['foo', 'bar', 'baz'].index(Cmp('nosuchelement', mycmp))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: list.index(x): x not in list

The timeit module shows, somewhat surprisingly, that the first method
is ~1.5 times faster, even for larger lists.

Gabriel Genellina

unread,

Sep 28, 2007, 10:52:24 PM9/28/07

to pytho...@python.org

En Fri, 28 Sep 2007 14:36:54 -0300, xkenneth <xken...@gmail.com> escribi�:

> On Sep 28, 12:30 pm, xkenneth <xkenn...@gmail.com> wrote:
>> Looking to do something similair. I'm working with alot of timestamps
>> and if they're within a couple seconds I need them to be indexed and
>> removed from a list.
>> Is there any possible way to index with a custom cmp() function?

The comparison is made by the list elements themselves (using their __eq__
or __cmp__), not by the index method nor the list object.
So you should modify __cmp__ for all your timestamps (datetime.datetime, I
presume?), but that's not very convenient. A workaround is to wrap the
object you are searching into a new, different class - since the list
items won't know how to compare to it, Python will try reversing the
operands.
datetime objects are a bit special in this behavior: they refuse to
compare to anything else unless the other object has a `timetuple`
attribute (see <http://docs.python.org/lib/datetime-date.html> note (4))

<code>
import datetime

class datetime_tol(object):
timetuple=None # unused, just to trigger the reverse comparison to
datetime objects
default_tolerance = datetime.timedelta(0, 10)

def __init__(self, dt, tolerance=None):
if tolerance is None:
tolerance = self.default_tolerance
self.dt = dt
self.tolerance = tolerance

def __cmp__(self, other):
tolerance = self.tolerance
if isinstance(other, datetime_tol):
tolerance = min(tolerance, other.tolerance)
other = other.dt
if not isinstance(other, datetime.datetime):
return cmp(self.dt, other)
delta = self.dt-other
return -1 if delta<-tolerance else 1 if delta>tolerance else 0

def index_tol(dtlist, dt, tolerance=None):
return dtlist.index(datetime_tol(dt, tolerance))

d1 = datetime.datetime(2007, 7, 18, 9, 20, 0)
d2 = datetime.datetime(2007, 7, 18, 9, 30, 25)
d3 = datetime.datetime(2007, 7, 18, 9, 30, 30)
d4 = datetime.datetime(2007, 7, 18, 9, 30, 35)
d5 = datetime.datetime(2007, 7, 18, 9, 40, 0)
L = [d1,d2,d3,d4,d5]

assert d3 in L
assert L.index(d3)==2
assert L.index(datetime_tol(d3))==1 # using 10sec tolerance
assert index_tol(L, d3)==1
assert index_tol(L, datetime.datetime(2007, 7, 18, 9, 43, 20),
datetime.timedelta(0, 5*60))==4 # 5 minutes tolerance
</code>

--
Gabriel Genellina

alan.h...@gmail.com

unread,

Oct 11, 2007, 1:10:11 AM10/11/07

to

On Sep 28, 5:12 pm, Hrvoje Niksic <hnik...@xemacs.org> wrote:

Hrvoje,

That's fun! thx.

--Alan

the cut-n-paste version /w minor fix to 'lower'.
# ----------------------------------------------

class Cmp(object):
def __init__(self, item, cmpfun):
self.item = item
self.cmpfun = cmpfun
def __eq__(self, other):
return self.cmpfun(self.item, other) == 0

def mycmp(s1, s2):
return cmp(s1.lower(), s2.lower())

print ['foo', 'bar', 'baz'].index(Cmp('bar', mycmp))
print ['foo', 'bar', 'baz'].index(Cmp('Bar', mycmp))
try:
print ['foo', 'bar', 'baz'].index(Cmp('nosuchelement', mycmp))
except ValueError:
print "Search String not found!"

# end example