Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

sort array, apply rearrangement to second

2 views
Skip to first unread message

Victor Eijkhout

unread,
Mar 30, 2010, 7:25:38 PM3/30/10
to
I have two arrays, made with numpy. The first one has values that I want
to use as sorting keys; the second one needs to be sorted by those keys.
Obviously I could turn them into a dictionary of pairs and sort by the
first member, but I think that's not very efficient, at least in space,
and this needs to be done as efficiently as possible.

I could use a hand.

Victor.
--
Victor Eijkhout -- eijkhout at tacc utexas edu

Alf P. Steinbach

unread,
Mar 30, 2010, 7:45:35 PM3/30/10
to
* Victor Eijkhout:

> I have two arrays, made with numpy. The first one has values that I want
> to use as sorting keys; the second one needs to be sorted by those keys.
> Obviously I could turn them into a dictionary of pairs and sort by the
> first member, but I think that's not very efficient, at least in space,
> and this needs to be done as efficiently as possible.
>
> I could use a hand.

Just do the pairing, but in a 'list', not a dictionary (a dictionary is
unordered and can't be sorted). You need to keep track of which keys belong to
which values anyway. And anything in Python is a reference: you're not copying
the data by creating the pairs. That is, the space overhead is proportional to
the number of items but is independent of the data size of each item.


Cheers & hth.,

- Alf

Steve Holden

unread,
Mar 30, 2010, 7:56:25 PM3/30/10
to pytho...@python.org
Victor Eijkhout wrote:
> I have two arrays, made with numpy. The first one has values that I want
> to use as sorting keys; the second one needs to be sorted by those keys.
> Obviously I could turn them into a dictionary of pairs and sort by the
> first member, but I think that's not very efficient, at least in space,
> and this needs to be done as efficiently as possible.
>
> I could use a hand.
>
Well, my first approach would be to do it as inefficiently as I can ( or
at least no more efficiently than I can with a simple-minded approach)
and then take it from there.

If I believe this is not a premature optimization (a question about
which I am currently skeptical) I'd suggest conversion to a list of
pairs rather than a dict.

Can you use zip() on numpy arrays? That would be the easiest way to
create the list of pairs.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
See PyCon Talks from Atlanta 2010 http://pycon.blip.tv/
Holden Web LLC http://www.holdenweb.com/
UPCOMING EVENTS: http://holdenweb.eventbrite.com/

MRAB

unread,
Mar 30, 2010, 8:13:16 PM3/30/10
to pytho...@python.org
Victor Eijkhout wrote:
> I have two arrays, made with numpy. The first one has values that I want
> to use as sorting keys; the second one needs to be sorted by those keys.
> Obviously I could turn them into a dictionary of pairs and sort by the
> first member, but I think that's not very efficient, at least in space,
> and this needs to be done as efficiently as possible.
>
> I could use a hand.
>
You could sort a list of the indices, using the first array to provide
the keys.

Robert Kern

unread,
Mar 30, 2010, 11:00:38 PM3/30/10
to pytho...@python.org
On 2010-03-30 18:25 , Victor Eijkhout wrote:
> I have two arrays, made with numpy. The first one has values that I want
> to use as sorting keys; the second one needs to be sorted by those keys.
> Obviously I could turn them into a dictionary of pairs and sort by the
> first member, but I think that's not very efficient, at least in space,
> and this needs to be done as efficiently as possible.

second[first.argsort()]

Ask numpy questions on the numpy mailing list.

http://www.scipy.org/Mailing_Lists

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Victor Eijkhout

unread,
Mar 31, 2010, 2:58:03 PM3/31/10
to
Robert Kern <rober...@gmail.com> wrote:

> second[first.argsort()]

Really cool. Thanks.

> Ask numpy questions on the numpy mailing list.

I will. I thought that this question would have an answer in a generic
python idiom.

Robert Kern

unread,
Mar 31, 2010, 3:13:52 PM3/31/10
to pytho...@python.org
On 2010-03-31 13:58 PM, Victor Eijkhout wrote:
> Robert Kern<rober...@gmail.com> wrote:
>
>> second[first.argsort()]
>
> Really cool. Thanks.
>
>> Ask numpy questions on the numpy mailing list.
>
> I will. I thought that this question would have an answer in a generic
> python idiom.

When dealing with numpy arrays, the generic Python idiom is often much slower.

Steve Holden

unread,
Mar 31, 2010, 3:22:26 PM3/31/10
to pytho...@python.org
Victor Eijkhout wrote:
> Robert Kern <rober...@gmail.com> wrote:
>
>> second[first.argsort()]
>
> Really cool. Thanks.
>
>> Ask numpy questions on the numpy mailing list.
>
> I will. I thought that this question would have an answer in a generic
> python idiom.
>
> Victor.

Not an unreasonable assumption, but it turns out that for most Python
users (estimate PFTA: 97%) numpy/scipt is esoteric knowledge.

Raymond Hettinger

unread,
Mar 31, 2010, 4:09:54 PM3/31/10
to
On Mar 30, 4:25 pm, s...@sig.for.address (Victor Eijkhout) wrote:
> I have two arrays, made with numpy. The first one has values that I want
> to use as sorting keys; the second one needs to be sorted by those keys.
> Obviously I could turn them into a dictionary  of pairs and sort by the
> first member, but I think that's not very efficient, at least in space,
> and this needs to be done as efficiently as possible.

Alf's recommendation is clean and correct. Just make a list of
tuples.

FWIW, here's a little hack that does the work for you:

>>> values = ['A', 'B', 'C', 'D', 'E']
>>> keys = [50, 20, 40, 10, 30]
>>> keyiter = iter(keys)
>>> sorted(values, key=lambda k: next(keyiter))
['D', 'B', 'E', 'C', 'A']


Raymond

Steve Howell

unread,
Apr 1, 2010, 10:59:41 AM4/1/10
to

Another option:

[values[i] for i in sorted(range(len(keys)), key=lambda i: keys[i])]

Sort the indexes according to keys values, then use indexes to get the
values.

It might read more clearly when broken out into two lines:

>>> sorted_indexes = sorted(range(len(keys)), key = lambda i: keys[i])
>>> sorted_indexes
[3, 1, 4, 2, 0]
>>> [values[i] for i in sorted_indexes]


['D', 'B', 'E', 'C', 'A']

The advantage of Raymond's solution is that he only creates one new
Python list, whereas my solutions create an intermediate Python list
of integers. I don't think my solution really is that space-wasteful,
though, since by the time the second list gets created, any internal
intermediate lists from CPython's sorted() implementation will
probably have been cleaned up.

0 new messages