can I sort a multidimensional array in Python by multiple sort keys? A
litte code sample would be nice!
Thx,
Rehceb
You can pass a function as argument to the sort method of a list.
The function should take two arguments and return -1, 0 or 1 as
comparison result. Just like the cmp function.
This will objects in list obj_lst by their id attributes:
def sorter(a, b):
return cmp(a.id, b.id)
obj_lst.sort(sorter)
Thomas
If you want a good answer you have to give me/us more details, and an
example too.
Bye,
bearophile
OK, here is some example data:
reaction is BUT by the
sodium , BUT it is
sea , BUT it is
this manner BUT the dissolved
pattern , BUT it is
rapid , BUT it is
As each line consists of 5 words, I would break up the data into an array
of five-field-arrays (Would you use lists or tuples or a combination in
Python?). The word "BUT" would be in the middle, with two fields/words
left and two fields/words right of it. I then want to sort this list by
- field 3
- field 4
- field 1
- field 0
in this hierarchy. This is the desired result:
pattern , BUT it is
rapid , BUT it is
sea , BUT it is
sodium , BUT it is
reaction is BUT by the
this manner BUT the dissolved
The first 4 lines all could not be sorted by fields 3 & 4, as they are
identical ("it", "is"), so they have been sorted first by field 1 (which
is also identical: ",") and then by field 0:
pattern
rapid
sea
sodium
I hope I have explained this in an understandable way. It would be cool
if you could show me how this can be done in Python!
Regards,
Rehceb
reaction is BUT by the
pattern , BUT it is
rapid , BUT it is
sea , BUT it is
sodium , BUT it is
this manner BUT the dissolved
because "by the" comes before and "the dissolved" after "it is". Sorry
for the confusion.
You're probably looking for the key= argument to list.sort(). If your
function simply returns the fields in the order above, I believe you get
the right thing::
>>> s = '''\
... reaction is BUT by the
... sodium , BUT it is
... sea , BUT it is
... this manner BUT the dissolved
... pattern , BUT it is
... rapid , BUT it is
... '''
>>> word_lists = [line.split() for line in s.splitlines()]
>>> def key(word_list):
... return word_list[3], word_list[4], word_list[1], word_list[0]
...
>>> word_lists.sort(key=key)
>>> word_lists
[['reaction', 'is', 'BUT', 'by', 'the'],
['pattern', ',', 'BUT', 'it', 'is'],
['rapid', ',', 'BUT', 'it', 'is'],
['sea', ',', 'BUT', 'it', 'is'],
['sodium', ',', 'BUT', 'it', 'is'],
['this', 'manner', 'BUT', 'the', 'dissolved']]
STeVe
>>> data = [
"reaction is BUT by the",
"sodium , BUT it is",
"sea , BUT it is",
"this manner BUT the dissolved",
"pattern , BUT it is",
"rapid , BUT it is",
]
>>> data = [ s.split() for s in data]
>>> from pprint import pprint
>>> pprint(data)
[['reaction', 'is', 'BUT', 'by', 'the'],
['sodium', ',', 'BUT', 'it', 'is'],
['sea', ',', 'BUT', 'it', 'is'],
['this', 'manner', 'BUT', 'the', 'dissolved'],
['pattern', ',', 'BUT', 'it', 'is'],
['rapid', ',', 'BUT', 'it', 'is']]
>>> from operator import itemgetter
>>> data.sort(key=itemgetter(0))
>>> data.sort(key=itemgetter(1))
>>> data.sort(key=itemgetter(4))
>>> data.sort(key=itemgetter(3))
>>> pprint(data)
(snipped)
> As each line consists of 5 words, I would break up the data into an array
> of five-field-arrays (Would you use lists or tuples or a combination in
> Python?). The word "BUT" would be in the middle, with two fields/words
> left and two fields/words right of it. I then want to sort this list by
>
> - field 3
> - field 4
> - field 1
> - field 0
import StringIO
buf = """
reaction is BUT by the
sodium , BUT it is
sea , BUT it is
this manner BUT the dissolved
pattern , BUT it is
rapid , BUT it is
""".lstrip()
mockfile = StringIO.StringIO(buf)
tokens = [ line.split() + [ line ] for line in mockfile ]
tokens.sort(key=lambda l: (l[3], l[4], l[1], l[0]))
for l in tokens:
print l[-1],
--
Hope this helps,
Steven
>>>> from operator import itemgetter
>>>> data.sort(key=itemgetter(0))
>>>> data.sort(key=itemgetter(1))
>>>> data.sort(key=itemgetter(4))
>>>> data.sort(key=itemgetter(3))
Or, in Python 2.5:
>>> data.sort(key=itemgetter(3, 4, 1, 0))
Peter
Regards,
Rehceb
Thanks, I'd forgotten itemgetter had that strangley assymmetric behaviour
of returning either a single value or a tuple.
class MyList(list):
# This is the index of the element to be compared
CmpIndex=0
# Comparision methods
@staticmethod
def __cmp_pars(x,y):
if isinstance(x,MyList):
x=x[MyList.CmpIndex]
if isinstance(y,MyList):
y=y[MyList.CmpIndex]
return x,y
def __cmp__(self,other):
s,o=MyList.__cmp_pars(self,other)
return cmp(s,o)
def __lt__(self,other):
s,o=MyList.__cmp_pars(self,other)
return s<o
def __le__(self,other):
s,o=MyList.__cmp_pars(self,other)
return s<=o
def __gt__(self,other):
s,o=MyList.__cmp_pars(self,other)
return s>o
def __ge__(self,other):
s,o=MyList.__cmp_pars(self,other)
return s>=o
def __eq__(self,other):
s,o=MyList.__cmp_pars(self,other)
return s==o
def __ne__(self,other):
s,o=MyList.__cmp_pars(self,other)
return s!=o
Use:
x=MyList(<list of lists>)
MyList.CmpIndex=2 # Compare by index 2
x.sort()
May be there is a better solution ...
HTH
Paulo
A MUCH better way to obtain exactly the same semantics would be:
def getid(a):
return a.id
obj_list.sort(key=getid)
Alex
Frankly speaking the purpose of the example was to show how to pass a
function as argument for the sort method.
Your code may be more efficient but it explains something different.
Thomas
Yes, but there's almost never a reason to use the cmp= argument to
sort() anymore. It's almost always better to use the key= argument.
STeVe
I always use key now, but maybe cmp uses less memory. There can be few
situations where cmp is better still.
Bye,
bearophile
Exactly. Passing a "comparison function" rather than a key-extraction
function is rarely a good idea -- except for some extremely complicated
cases involving e.g. certain string (or more complicated) fields needing
to be sorted in the reverse direction from others. In practice, the
existence of that argument and its prominent position as the first
positional argument is due entirely to backwards compatibility issues.
Alex