Possible performance bug in radixSort(long[][])

31 views
Skip to first unread message

Johan Nystrom-Persson

unread,
Jun 21, 2021, 12:49:03 AM6/21/21
to fastutil
Hi,

Thank you for your work on fastutil, which is clearly a great library.

I wanted to report a possible performance bug in LongArrays.radixSort(long[][] a).

This method falls back on selectionSort when the data to be sorted is below a certain size, which is hardcoded as LongArrays.RADIXSORT_NO_REC = 1024, originally in Arrays.drv.
When profiling my application, I found that radixSort(long[][]) spent more than 90% of its time in selectionSort. This didn't seem right, and overall the speed was no faster than java.util.Arrays.sort with a custom comparator.

I experimentally tried lowering the RADIXSORT_NO_REC cutoff, and got performance improvements with smaller values. I have tried various powers of 2, currently using RADIXSORT_NO_REC=16, and the smaller the value, the better the performance, as far as I can tell. Currently 16 is the best value I have found (I have yet to try 8 or smaller). With this change, radixSort(long[][]) is overall dramatically faster than before, and much faster than java.util.Arrays.sort.

I would submit a pull request, except that I have no idea how the original value of 1024 was determined, and I have only tested this for long[][], not for other primitives such as int[][], so I am not confident that my change is universally valid. (I also see that this constant is used for slightly different purposes in other methods in Arrays.drv, which I haven't tested either.) About my test application: I am sorting genetic sequence fragments, where each item is stored as a short long[] (of length 1-3 typically). The data is mostly random-looking and I sort 10,000-100,000 items at a time. I am using OpenJDK 8 on Linux 5.10; my CPU is a 4-core Core i5 3570. I get the same result when I run this code on AWS Graviton-based VMs.

Many thanks,
Johan Nystrom-Persson



Sebastiano Vigna

unread,
Jun 21, 2021, 3:55:31 AM6/21/21
to fastutil


Il giorno lunedì 21 giugno 2021 alle 06:49:03 UTC+2 Johan Nystrom-Persson ha scritto:
Hi,

This method falls back on selectionSort when the data to be sorted is below a certain size, which is hardcoded as LongArrays.RADIXSORT_NO_REC = 1024, originally in Arrays.drv.
When profiling my application, I found that radixSort(long[][]) spent more than 90% of its time in selectionSort. This didn't seem right, and overall the speed was no faster than java.util.Arrays.sort with a custom comparator.

I experimentally tried lowering the RADIXSORT_NO_REC cutoff, and got performance improvements with smaller values. I have tried various powers of 2, currently using RADIXSORT_NO_REC=16, and the smaller the value, the better the performance, as far as I can tell. Currently 16 is the best value I have found (I have yet to try 8 or smaller). With this change, radixSort(long[][]) is overall dramatically faster than before, and much faster than java.util.Arrays.sort.

Can you quantify "dramatically"? I can measure a 10% improvement in speed going down to 256. It then stabilizes up to 16, and at 8 it slows down again.
 
I would submit a pull request, except that I have no idea how the original value of 1024 was determined, and I have only tested this for long[][], not for other primitives such as int[][], so I am not confident that my change is universally valid. (I also see that this constant is used for slightly different purposes in other methods in Arrays.drv, which I haven't tested either.) About my test application: I am sorting genetic sequence fragments, where each item is stored as a short long[] (of length 1-3 typically). The data is mostly random-looking and I sort 10,000-100,000 items at a time. I am using OpenJDK 8 on Linux 5.10; my CPU is a 4-core Core i5 3570. I get the same result when I run this code on AWS Graviton-based VMs.


 Oh wait. You're not sorting big arrays. You're sorting lexicographically arrays of arrays, right? That might explain the difference.

The values have been determined experimentally... quickly. I always wanted to do a more thorough job, but, you know, billions of things.

Please open an issue on github about this, I don't wanna lose track of the problem!

Johan Nyström-Persson

unread,
Jun 21, 2021, 11:27:22 PM6/21/21
to fast...@googlegroups.com
Hi Sebastiano,

Thanks for your quick reply.
I made a minimal test program, which you can find in this gist: https://gist.github.com/jtnystrom/c161628d7eea1377684ef21b93722304
For 10 million items (of size 2 longs each), the time for sorting converges to 1500 ms when the constant is 1024, and to about 600 ms when it is 64. So it is nearly 3x faster in the latter case.

Oh wait. You're not sorting big arrays. You're sorting lexicographically arrays of arrays, right? That might explain the difference.

Yes, that's right.

The values have been determined experimentally... quickly. I always wanted to do a more thorough job, but, you know, billions of things.

Of course I'm just grateful to be able to use this library. This sorting method has already made my application much faster than it was before.

I will open a github issue today.

Many thanks,
Johan

 



--
You received this message because you are subscribed to a topic in the Google Groups "fastutil" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/fastutil/9FfvdfuPfjU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to fastutil+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fastutil/61a15e36-6ea5-41c7-a950-cfa42f7a23bdn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages