Oh, sorry. There are two different sort examples discussed here:
1) A modification of Go’s internal sort library that changes how interfaces are used internally in the package. The standard sort library provides built in meta functions (swap, less, etc.) for a few built in types. When you call sort.Ints(), for example, that function calls the meta-sort by providing these meta function to swap two ints, compare two ints, and so on. The generic sort itself code uses interfaces all the way down to dispatch to these functions, which means discovering the underlying type ~ n log n times.
My modification (from way back) was to have those top level convenience functions, which are substantially all of the sorting done by Go’s library and compiler/build tools, detect the type of the sort at the top-level call and then call a type-specific version of the general sort function. This means that there is 3x the code (one for Ints, one for Strings, on for Float64) but that is a tiny cost. This also means that the sort is ~ 3x faster. That’s great to me because I value performance. This is the sort I use. It requires no external change to get the speedup. That was posted earlier as “sort.go”, and you can update your source tree with this to get the speedup. (Note, this was from a year ago so I don’t have any recent changes in the standard sort.go since then.)
2. A parallel sort that I wrote when exploring the 3SUM problem a few weeks ago. The original poster got irked at my efforts on his behalf so I never posted the full result showing just how fast one could do 3SUM in Go. That one used a speed-oriented but standard Quicksort and an optional parallel version of it that uses concurrency to manage the higher levels of split and merge. My goal was to do enough parallelism to exploit the actual CPUs but no more because there is an overhead involved. This code extends the typical tuned quicksort decision tree to be from array segment size small to large: insertion sort, 3 element partition selection qsort, 9 element partition selection qsort, 9 element partition selection qsort in parallel. The thresholds are optimal for my computer(s) and the qsort itself uses tail-recursion simulation to only make half the recursive calls.
This is the one I sent to Roberto privately. I did not want to upset the person who was so touchy before.
Michael