Hi,I have seen this behavior before too. My solution was to make the interface methods operate directly on a pointer to my type, getting rid of the need for the generated method. However, the thing that I am unclear on is why there is the need for main.(*InvertedSlice).Less in the first place. I had assumed that, because the interface is implemented by the type itself, the sort package would not call main.(InvertedSlice).Less (the non-pointer version) directly.
I don't think that Go inlines interface methods at all. The sort package will need to call the sort.Interface methods each and every time they are needed. For this reason, I think it is very likely that implementing your own sort function, specific to each slice type, will always be faster than the sort package (assuming you use a reasonable sorting algorithm). By the way, the qsort function in C has the same "problem," because it cannot inline the comparison function; C++ has templates, so it does not have this "problem."Anyway, I am not too worried about the performance of the sort package, as it is sufficiently fast for most cases, and that is its intention: being fast enough and convenient enough for general use. The thing that I still wonder about, however, is why such an apparently expensive (*InvertedSlice).Less method needs to be generated when sort.Interface is implemented by the non-pointer InvertedSlice type.
func (s *InvertedSlice) Less(i, j int) bool {return (*s)[i].Hash < (*s)[j].Hash || ((*s)[i].Hash == (*s)[j].Hash && (*s)[i].Position < (*s)[j].Position)}
Expanded Sort() 9.6615526ssort.Sort() 12.9767423s
Interesting to see the effect of inlining the default sort.Sort() code
It seems unfair to compare these two algorithms.
I think I expressed myself poorly. The standard sort function is a stable sort. If you don't require stability then rolling your own algorithm could give significant improvements, even if it was implemented in the same interface manner that package sort is. But to then compare the non-stable sort to the stable one is unfair, as package sort is doing extra work that you don't really need. The "most obvious usage" gives you stability, something that people may take for granted.
I will test the performance of both the obvious (semantically correct and simplest) way and the messy pointerized way with optimizations disabled.
That explicit pointer receivers for slice types, which serve no semantic purpose if the slice length/cap is not to be touched, perform better than non-pointer receivers seems to me to be either a case of the pointer methods being heavily optimized or the non-pointer methods generating far less performant code than I would have thought.
I will test the performance of both the obvious (semantically correct and simplest) way and the messy pointerized way with optimizations disabled.
--
I think I expressed myself poorly. The standard sort function is a stable sort. If you don't require stability then rolling your own algorithm
--
the timsort implementation theoretically could have been designed to be a drop-in replacement for the stdlib sort, accepting sort.Interface instead of the data and a less function
@Niklas: I had missed the "not in-place" notice on the documentation, and glossed over lines 120-126 in timsort.go. Merge sorts can indeed be done in-place, however.
@Donovan: I'm not aware of any comparison sort where reallocation of the actual data is needed. Besides which, a traditional recursively copying merge-sort could still be done without copying the actual data (and thus could still use sort.Interface alone): you just initialize an []int (I'll call it an 'index list') to the length of the input, such that each element is initially its own index, such as []int{0,1,2,3} when Len() returns 4. Then you do your series of Less calls but without any calls to Swap until you've sorted your internal []int. Finally, you scan over the index list, calling Swap for each index out of place. For example, if the result of sorting your index list resulted in []int{3,1,0,2}, the following calls would occur: Swap(3, 0); Swap(0, 2); It's essentially sorting the a known, type-safe surrogate for the initial data, involving as many copies as needed, following by asking the actual data to update itself based on the surrogate.
--
Build-in sort
PASSBenchmarkSortString1K 5000 803246 ns/opBenchmarkSortInt1K 5000 314817 ns/opBenchmarkSortInt64K 50 31921820 ns/opBenchmarkSortInt1M 5 615235200 ns/opBenchmarkSortRandInt1M 5 720041180 ns/opmergeSortPASSBenchmarkSortString1K 5000 420223 ns/opBenchmarkSortInt1K 10000 205411 ns/opBenchmarkSortInt64K 100 20051150 ns/opBenchmarkSortInt1M 5 397422720 ns/opBenchmarkSortRandInt1M 1 1285073500 ns/opinplaceMergesortPASSBenchmarkSortString1K 5000 522830 ns/opBenchmarkSortInt1K 20000 91005 ns/opBenchmarkSortInt64K 500 7144407 ns/opBenchmarkSortInt1M 10 134307680 ns/opBenchmarkSortRandInt1M 1 2711155100 ns/op
If you're searching for strings only you might want to have a look at Radix sort or even Tries (https://en.wikipedia.org/wiki/Trie) which is specificcally designed for string searches and stuff like autocomplete.
Also if you wan't to try different sorting algorithms, quicksort is really nice to implement and should be quite a bit faster than mergesort for most data.
On another note, in the algorithm research community comparison based sorts are only measure by the number of swaps which is independent of the architecture.
You can definitely see the Java influences, not very idiomatic Go.
Actually I started working on making this code more idiomatic Go 1, but this is really some work -- at least if you don't want to sacrifice the current performance. In fact some idioms in the code used are from Go versions that didn't know append.
Sorry got a mix up in my head there, of course one only counts the comparisons not the swaps, I guess I should only post stuff like that after the coffee :-)