[Numpy-discussion] Inverting argsort(a, axis=0) to obtain column-wise ranks

1,455 views
Skip to first unread message

Alexander Michael

unread,
Sep 7, 2010, 4:01:31 PM9/7/10
to Discussion of Numerical Python
Calculating ranks by inverting the results of an argsort is
straightforward and fast for 1D arrays:

indices = argsort(a1)
ranks = zeros_like(indices)
ranks[indices] = arange(len(indices))

I was wondering if there was an equally pithy way to do this for
multiple data samples stored column-wise in a 2D array. That is, is
there a trick to invert the results of argsort(a2, axis=0) without
iterating (in python) over the columns?

Thanks,
Alex
_______________________________________________
NumPy-Discussion mailing list
NumPy-Di...@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Zachary Pincus

unread,
Sep 7, 2010, 4:06:38 PM9/7/10
to Discussion of Numerical Python
> indices = argsort(a1)
> ranks = zeros_like(indices)
> ranks[indices] = arange(len(indices))

Doesn't answer your original question directly, but I only recently
learned from this list that the following does the same as the above:
ranks = a1.argsort().argsort()
Will wonders never cease...

So does ranks=a2.argsort(axis=0).argsort(axis=0) then do the trick?

Zach

Alexander Michael

unread,
Sep 9, 2010, 4:59:42 PM9/9/10
to Discussion of Numerical Python
Clever and concise (and expect that it works), but isn't this less
efficient? Sorting is O(n*log(n)), while the code I gave is O(n).
Using argsort has the potential to use less memory, though.

Alexander Michael

unread,
Sep 9, 2010, 5:05:49 PM9/9/10
to Discussion of Numerical Python
Clever and concise (and expect that it works), but isn't this less
efficient? Sorting is O(n*log(n)), while the code I gave is O(n).
Using argsort has the potential to use less memory, though.

On Tuesday, September 7, 2010, Zachary Pincus <zachary...@yale.edu> wrote:

Robert Kern

unread,
Sep 9, 2010, 5:25:54 PM9/9/10
to Discussion of Numerical Python
On Thu, Sep 9, 2010 at 15:59, Alexander Michael <lxan...@gmail.com> wrote:
> Clever and concise (and expect that it works), but isn't this less
> efficient? Sorting is O(n*log(n)), while the code I gave is O(n).
> Using argsort has the potential to use less memory, though.

No, the code you gave is also O(n*log(n)) because it uses an
argsort(). Using two argsort()s won't change the O() complexity. O()
often tells you very little. Time it with typical values and find out.
Memory use is often the bottleneck. And sometimes, the real
performance differences just don't matter.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco

Alexander Michael

unread,
Sep 9, 2010, 11:53:07 PM9/9/10
to Discussion of Numerical Python
I was, of course, just thinking of the incremental work of inverting
the initial argsort, but you are completely correct in pointing out
that the overall complexity is O(n*log(n)) either way. As it turns
out, both approaches run in the same amount of time for my problem.

Thanks,
Alex

John Schulman

unread,
Sep 20, 2010, 1:17:22 AM9/20/10
to Discussion of Numerical Python
Argsort twice and you get the rank.

a1.argsort(axis=0).argsort(axis=0)

That's because argsort is it's own inverse when applied to the ranks.

John Schulman

unread,
Sep 20, 2010, 1:19:13 AM9/20/10
to Discussion of Numerical Python
D'oh, Zachary already gave that answer.
Reply all
Reply to author
Forward
0 new messages