Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Sorting a multidimensional array by multiple keys

608 views
Skip to first unread message

Rehceb Rotkiv

unread,
Mar 31, 2007, 8:43:08 AM3/31/07
to
Hello everyone,

can I sort a multidimensional array in Python by multiple sort keys? A
litte code sample would be nice!

Thx,
Rehceb

Thomas Krüger

unread,
Mar 31, 2007, 8:53:11 AM3/31/07
to
Rehceb Rotkiv schrieb:

> can I sort a multidimensional array in Python by multiple sort keys? A
> litte code sample would be nice!

You can pass a function as argument to the sort method of a list.
The function should take two arguments and return -1, 0 or 1 as
comparison result. Just like the cmp function.

This will objects in list obj_lst by their id attributes:

def sorter(a, b):
return cmp(a.id, b.id)

obj_lst.sort(sorter)

Thomas

bearoph...@lycos.com

unread,
Mar 31, 2007, 9:13:46 AM3/31/07
to
Rehceb Rotkiv:

> can I sort a multidimensional array in Python by multiple sort keys? A
> litte code sample would be nice!

If you want a good answer you have to give me/us more details, and an
example too.

Bye,
bearophile

Rehceb Rotkiv

unread,
Mar 31, 2007, 9:42:12 AM3/31/07
to
> If you want a good answer you have to give me/us more details, and an
> example too.

OK, here is some example data:

reaction is BUT by the
sodium , BUT it is
sea , BUT it is
this manner BUT the dissolved
pattern , BUT it is
rapid , BUT it is

As each line consists of 5 words, I would break up the data into an array
of five-field-arrays (Would you use lists or tuples or a combination in
Python?). The word "BUT" would be in the middle, with two fields/words
left and two fields/words right of it. I then want to sort this list by

- field 3
- field 4
- field 1
- field 0

in this hierarchy. This is the desired result:

pattern , BUT it is
rapid , BUT it is
sea , BUT it is
sodium , BUT it is
reaction is BUT by the
this manner BUT the dissolved

The first 4 lines all could not be sorted by fields 3 & 4, as they are
identical ("it", "is"), so they have been sorted first by field 1 (which
is also identical: ",") and then by field 0:

pattern
rapid
sea
sodium

I hope I have explained this in an understandable way. It would be cool
if you could show me how this can be done in Python!

Regards,
Rehceb

Rehceb Rotkiv

unread,
Mar 31, 2007, 9:45:25 AM3/31/07
to
Wait, I made a mistake. The correct result would be

reaction is BUT by the

pattern , BUT it is
rapid , BUT it is
sea , BUT it is
sodium , BUT it is

this manner BUT the dissolved

because "by the" comes before and "the dissolved" after "it is". Sorry
for the confusion.

Steven Bethard

unread,
Mar 31, 2007, 11:40:02 AM3/31/07
to
Rehceb Rotkiv wrote:
>> If you want a good answer you have to give me/us more details, and an
>> example too.
>
> OK, here is some example data:
>
> reaction is BUT by the
> sodium , BUT it is
> sea , BUT it is
> this manner BUT the dissolved
> pattern , BUT it is
> rapid , BUT it is
>
> As each line consists of 5 words, I would break up the data into an array
> of five-field-arrays (Would you use lists or tuples or a combination in
> Python?). The word "BUT" would be in the middle, with two fields/words
> left and two fields/words right of it. I then want to sort this list by
>
> - field 3
> - field 4
> - field 1
> - field 0

You're probably looking for the key= argument to list.sort(). If your
function simply returns the fields in the order above, I believe you get
the right thing::

>>> s = '''\
... reaction is BUT by the
... sodium , BUT it is
... sea , BUT it is
... this manner BUT the dissolved
... pattern , BUT it is
... rapid , BUT it is
... '''
>>> word_lists = [line.split() for line in s.splitlines()]
>>> def key(word_list):
... return word_list[3], word_list[4], word_list[1], word_list[0]
...
>>> word_lists.sort(key=key)
>>> word_lists
[['reaction', 'is', 'BUT', 'by', 'the'],
['pattern', ',', 'BUT', 'it', 'is'],
['rapid', ',', 'BUT', 'it', 'is'],
['sea', ',', 'BUT', 'it', 'is'],
['sodium', ',', 'BUT', 'it', 'is'],
['this', 'manner', 'BUT', 'the', 'dissolved']]

STeVe

Duncan Booth

unread,
Mar 31, 2007, 11:44:34 AM3/31/07
to
Rehceb Rotkiv <reh...@no.spam.plz> wrote:

>>> data = [
"reaction is BUT by the",
"sodium , BUT it is",
"sea , BUT it is",
"this manner BUT the dissolved",
"pattern , BUT it is",
"rapid , BUT it is",
]
>>> data = [ s.split() for s in data]
>>> from pprint import pprint
>>> pprint(data)


[['reaction', 'is', 'BUT', 'by', 'the'],

['sodium', ',', 'BUT', 'it', 'is'],
['sea', ',', 'BUT', 'it', 'is'],

['this', 'manner', 'BUT', 'the', 'dissolved'],


['pattern', ',', 'BUT', 'it', 'is'],

['rapid', ',', 'BUT', 'it', 'is']]
>>> from operator import itemgetter
>>> data.sort(key=itemgetter(0))
>>> data.sort(key=itemgetter(1))
>>> data.sort(key=itemgetter(4))
>>> data.sort(key=itemgetter(3))
>>> pprint(data)

attn.st...@gmail.com

unread,
Mar 31, 2007, 12:14:13 PM3/31/07
to
On Mar 31, 6:42 am, Rehceb Rotkiv <reh...@no.spam.plz> wrote:

(snipped)

> As each line consists of 5 words, I would break up the data into an array
> of five-field-arrays (Would you use lists or tuples or a combination in
> Python?). The word "BUT" would be in the middle, with two fields/words
> left and two fields/words right of it. I then want to sort this list by
>
> - field 3
> - field 4
> - field 1
> - field 0

import StringIO

buf = """


reaction is BUT by the
sodium , BUT it is
sea , BUT it is
this manner BUT the dissolved
pattern , BUT it is
rapid , BUT it is

""".lstrip()

mockfile = StringIO.StringIO(buf)

tokens = [ line.split() + [ line ] for line in mockfile ]
tokens.sort(key=lambda l: (l[3], l[4], l[1], l[0]))
for l in tokens:
print l[-1],


--
Hope this helps,
Steven

Peter Otten

unread,
Mar 31, 2007, 12:39:48 PM3/31/07
to
Duncan Booth wrote:

>>>> from operator import itemgetter
>>>> data.sort(key=itemgetter(0))
>>>> data.sort(key=itemgetter(1))
>>>> data.sort(key=itemgetter(4))
>>>> data.sort(key=itemgetter(3))

Or, in Python 2.5:

>>> data.sort(key=itemgetter(3, 4, 1, 0))

Peter

Rehceb Rotkiv

unread,
Mar 31, 2007, 2:19:52 PM3/31/07
to
Thank you all for your helpful solutions!

Regards,
Rehceb

Duncan Booth

unread,
Mar 31, 2007, 3:57:10 PM3/31/07
to
Peter Otten <__pet...@web.de> wrote:

Thanks, I'd forgotten itemgetter had that strangley assymmetric behaviour
of returning either a single value or a tuple.

Paulo da Silva

unread,
Apr 1, 2007, 10:03:23 PM4/1/07
to
Rehceb Rotkiv escreveu:

> Hello everyone,
>
> can I sort a multidimensional array in Python by multiple sort keys? A
> litte code sample would be nice!

class MyList(list):
# This is the index of the element to be compared
CmpIndex=0

# Comparision methods
@staticmethod
def __cmp_pars(x,y):
if isinstance(x,MyList):
x=x[MyList.CmpIndex]
if isinstance(y,MyList):
y=y[MyList.CmpIndex]
return x,y
def __cmp__(self,other):
s,o=MyList.__cmp_pars(self,other)
return cmp(s,o)
def __lt__(self,other):
s,o=MyList.__cmp_pars(self,other)
return s<o
def __le__(self,other):
s,o=MyList.__cmp_pars(self,other)
return s<=o
def __gt__(self,other):
s,o=MyList.__cmp_pars(self,other)
return s>o
def __ge__(self,other):
s,o=MyList.__cmp_pars(self,other)
return s>=o
def __eq__(self,other):
s,o=MyList.__cmp_pars(self,other)
return s==o
def __ne__(self,other):
s,o=MyList.__cmp_pars(self,other)
return s!=o

Use:

x=MyList(<list of lists>)
MyList.CmpIndex=2 # Compare by index 2
x.sort()

May be there is a better solution ...
HTH
Paulo

Alex Martelli

unread,
Apr 2, 2007, 1:07:52 AM4/2/07
to
Thomas Krüger <newsg...@nospam.nowire.org> wrote:

A MUCH better way to obtain exactly the same semantics would be:

def getid(a):
return a.id

obj_list.sort(key=getid)


Alex

Thomas Krüger

unread,
Apr 2, 2007, 2:28:58 AM4/2/07
to
Alex Martelli schrieb:

> Thomas Krüger <newsg...@nospam.nowire.org> wrote:
>> def sorter(a, b):
>> return cmp(a.id, b.id)
>>
>> obj_lst.sort(sorter)
>
> A MUCH better way to obtain exactly the same semantics would be:
>
> def getid(a):
> return a.id
>
> obj_list.sort(key=getid)

Frankly speaking the purpose of the example was to show how to pass a
function as argument for the sort method.
Your code may be more efficient but it explains something different.

Thomas

Steven Bethard

unread,
Apr 2, 2007, 10:06:47 AM4/2/07
to

Yes, but there's almost never a reason to use the cmp= argument to
sort() anymore. It's almost always better to use the key= argument.

STeVe

bearoph...@lycos.com

unread,
Apr 2, 2007, 10:21:48 AM4/2/07
to
Steven Bethard:

> there's almost never a reason to use the cmp= argument to
> sort() anymore. It's almost always better to use the key= argument.

I always use key now, but maybe cmp uses less memory. There can be few
situations where cmp is better still.

Bye,
bearophile

Alex Martelli

unread,
Apr 2, 2007, 11:02:51 AM4/2/07
to
Steven Bethard <steven....@gmail.com> wrote:

Exactly. Passing a "comparison function" rather than a key-extraction
function is rarely a good idea -- except for some extremely complicated
cases involving e.g. certain string (or more complicated) fields needing
to be sorted in the reverse direction from others. In practice, the
existence of that argument and its prominent position as the first
positional argument is due entirely to backwards compatibility issues.


Alex

0 new messages