Is sort.Interface meant to be goroutine-safe?

410 views
Skip to first unread message

Danver Braganza

unread,
Aug 18, 2015, 12:48:47 PM8/18/15
to golang-nuts
I've been thinking a lot about sorting, and I saw a thread from around two years ago about implementing parallel mergesort: https://groups.google.com/forum/#!topic/golang-nuts/nvkBtiJU5aE

It got me thinking that most implementations of Less() and Swap() are not goroutine-safe. Nor does the definition of sort.Interface require them to be. However, you could get undefined behaviour if you ended up calling Less(i, j) and Swap(i, j) from different goroutines, for the same i and j. 

The standard library sort (to my knowlegde) arranges its calls to Less and Swap sequentialy, so this isn't really a problem there. Once we start building exotic sorting algorithms, we run the risk of breaking that guarantee. My concern is that there could be implementations of sort.Interface that will not run on some sorting functions that require a sort.Interface.

Is there a standard for marking interface methods as necessarily goroutine safe? Should there be? 

Dustin

unread,
Aug 18, 2015, 1:08:41 PM8/18/15
to golang-nuts
The common case in Go is to document methods which are safe for concurrent use, and assume things that don't mention this to be unsafe for concurrent use. 

For interfaces, I've done the same thing. If I design an interface where I expect certain methods to be concurrent safe, I document this on the interface.
Message has been deleted

Michael Jones

unread,
Aug 18, 2015, 1:35:03 PM8/18/15
to Roberto Zanotto, golang-nuts
I have a parallel sort that is 2.5x-3x faster in 4 CPUs.

I have a "do the interface matching at the top level” version of sort that is always 3.2x faster.

These multiply, so 10x faster than sort is easily doable in a day’s work if that matters to you.

Michael

— 
Michael Jones, CEO  •  mic...@wearality.com  •  +1 650 656-6989 
Wearality Corporation  •  289 S. San Antonio Road  •  Los Altos, CA 94022

On Aug 18, 2015, at 10:15 AM, Roberto Zanotto <roby...@gmail.com> wrote:

I saw some library that implemented parallel sorting (don't remember the link) and with great efforts it achieved something like a 2x speedup on big datasets (non in-place, if I remember correctly). Go is not really meant for high performance computing and sorting is not really the best parallelizable task. Also, quoting the Commander, "Fancy algorithms are slow when n is small, and n is usually small." So definitely not worth the added complexity, I'd say.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Roberto Zanotto

unread,
Aug 18, 2015, 2:01:52 PM8/18/15
to golang-nuts, roby...@gmail.com
Forgive me for my dumb reply then :)

Roberto Zanotto

unread,
Aug 18, 2015, 2:11:34 PM8/18/15
to golang-nuts
I don't think that's actually an issue of the Less and Swap function. Quicksort splits the array in two and then you can do the two separate sub arrays in parallel. Mergesort sorts two separate sub arrays in parallel and then merges them. It seems to me that a parallel algorithm that calls Swap on the same items from different threads is just plain wrong.


On Tuesday, August 18, 2015 at 6:48:47 PM UTC+2, Danver Braganza wrote:
It got me thinking that most implementations of Less() and Swap() are not goroutine-safe. Nor does the definition of sort.Interface require them to be. However, you could get undefined behaviour if you ended up calling Less(i, j) and Swap(i, j) from different goroutines, for the same i and j. 

Michael Jones

unread,
Aug 18, 2015, 5:01:59 PM8/18/15
to Roberto Zanotto, golang-nuts
Precisely so. 

Just ran the benchmarks again…not quite 3x on the parallel test at these sizes, but here’s an example:

# sort.Ints(v)

BenchmarkSSort250-8       100000      18790 ns/op
BenchmarkSSort500-8        30000      49454 ns/op
BenchmarkSSort1000-8       10000     115537 ns/op
BenchmarkSSort2000-8        5000     258600 ns/op
BenchmarkSSort4000-8        3000     569069 ns/op
BenchmarkSSort8000-8        1000    1222521 ns/op
BenchmarkSSort16000-8        500    2531189 ns/op
BenchmarkSSort32000-8        300    5555401 ns/op
BenchmarkSSort64000-8        100   11658911 ns/op
BenchmarkSSort128000-8        50   24931452 ns/op
BenchmarkSSort256000-8        30   52089587 ns/op
BenchmarkSSort512000-8        10  110441634 ns/op
BenchmarkSSort1024000-8        5  232445127 ns/op

# my QSort, same speed as my modified sort.Ints() that did not 
# make 1.4/1.5 because of code bloat concerns

BenchmarkQSort250-8       500000       3139 ns/op
BenchmarkQSort500-8       200000       9769 ns/op
BenchmarkQSort1000-8       50000      33830 ns/op
BenchmarkQSort2000-8       20000      86849 ns/op
BenchmarkQSort4000-8       10000     200239 ns/op
BenchmarkQSort8000-8        3000     433158 ns/op
BenchmarkQSort16000-8       2000     911839 ns/op
BenchmarkQSort32000-8       1000    1908656 ns/op
BenchmarkQSort64000-8        300    3964553 ns/op
BenchmarkQSort128000-8       200    8548046 ns/op
BenchmarkQSort256000-8       100   17770204 ns/op
BenchmarkQSort512000-8        30   37913451 ns/op
BenchmarkQSort1024000-8       20   79800748 ns/op 2.91282x

# my parallel QSort

BenchmarkPSort250-8       500000       3158 ns/op
BenchmarkPSort500-8       200000       9782 ns/op
BenchmarkPSort1000-8       50000      34033 ns/op
BenchmarkPSort2000-8       30000      46263 ns/op
BenchmarkPSort4000-8       20000      91221 ns/op
BenchmarkPSort8000-8       10000     237990 ns/op
BenchmarkPSort16000-8       5000     437962 ns/op
BenchmarkPSort32000-8       1000    1061331 ns/op
BenchmarkPSort64000-8       1000    2087604 ns/op
BenchmarkPSort128000-8       500    3382563 ns/op
BenchmarkPSort256000-8       200    6760996 ns/op
BenchmarkPSort512000-8       100   15559801 ns/op
BenchmarkPSort1024000-8       50   31135715 ns/op 
7.46555x

7.4x is lower than the 9-10x I was claiming, but that was on a system with more CPUs. (I’m 4 here on my notebook)

— 
Michael Jones, CEO  •  mic...@wearality.com  •  +1 650 656-6989 
Wearality Corporation  •  289 S. San Antonio Road  •  Los Altos, CA 94022

Roberto Zanotto

unread,
Aug 18, 2015, 5:21:31 PM8/18/15
to golang-nuts, roby...@gmail.com
Just some clarifications:
sort.Ints is the one on the standard library, right? What's the difference between the second and the third? Are they both your implementation, one serial and the other parallel? Do they use the same interface as the standard library? Do you mind sharing the code, I would be interested in taking a look.

Michael Jones

unread,
Aug 18, 2015, 6:49:21 PM8/18/15
to Roberto Zanotto, golang-nuts
Yes, just so. I sent you the code.


— 
Michael Jones, CEO  •  mic...@wearality.com  •  +1 650 656-6989 
Wearality Corporation  •  289 S. San Antonio Road  •  Los Altos, CA 94022

Roberto Zanotto

unread,
Aug 18, 2015, 7:12:29 PM8/18/15
to golang-nuts, roby...@gmail.com
Thanks :)
It looks like a pretty standard, fine tuned quicksort where you spawn goroutines to do the recursive calls. But it's a bit unfair to compare it to the standard library's one as it is specialized for integers, you are not paying the overhead of the Less and Swap calls, nor the extra space and indirection that interfaces introduce. Do you have by chance a version that works on the same interface? This is interesting, I might even try to adapt the code of the standard library to do concurrency, it should be pretty simple.

Roberto Zanotto

unread,
Aug 18, 2015, 7:29:12 PM8/18/15
to golang-nuts, roby...@gmail.com
By the way it's worth noting that your version has not really log(n) extra memory, it takes some additional space because in the case of the parallel sort you do both recursive calls with "go", from a memory point of view it would be better to make one call asynchronous and let the other slide with "tail recursion" (of course it's a parallelism vs memory tradeoff here).

I'm looking at the code from the standard library, it would require like 5 lines of code to make it parallel. Why aren't we doing this? I'm assuming it's not really a problem of added complexity, it's 5 lines over 500. Maybe we don't want the library call spawn goroutines without the user knowing? Maybe we don't want package sort to depend on sync? Now that Go is getting GOMAXPROCS boosted it could make a noticeable difference for some applications (well, maybe I have to do some profiling before asserting that :)

Michael Jones

unread,
Aug 18, 2015, 9:29:37 PM8/18/15
to Roberto Zanotto, golang-nuts
It looks like a pretty standard, fine tuned quicksort where you spawn goroutines to do the recursive calls.

Yes. As advertised.

But it's a bit unfair to compare it to the standard library's one as it is specialized for integers, you are not paying the overhead of the Less and Swap calls, nor the extra space and indirection that interfaces introduce. Do you have by chance a version that works on the same interface?

Ah! That’s the whole point. 

The standard one is “dumb” in that it does the interface dereferencing all the way down to the leaf rather than once at the top, where it could then immediately dispatch to dedicated versions of each sort internal function, one for Ints, one for Strings, one for Floats and so on. That’s the change I made which makes the one-time cost of the interfaces mechanism immaterial, provides the full generality, and makes everything better. It was not chosen for 1.4 (or 1.5) because the 3x speed was not considered worth the expansion of the code. I recommended it to Rob as the poster child for go generate.

This fast version is attached. Replace your “sort.go” and rebuild. Enjoy.

Michael
sort.go

Michael Jones

unread,
Aug 18, 2015, 9:41:02 PM8/18/15
to Roberto Zanotto, golang-nuts
Try it that way and see how it benchmarks for you. Maybe you can make it better!


— 
Michael Jones, CEO  •  mic...@wearality.com  •  +1 650 656-6989 
Wearality Corporation  •  289 S. San Antonio Road  •  Los Altos, CA 94022

Roberto Zanotto

unread,
Aug 18, 2015, 9:57:22 PM8/18/15
to golang-nuts, roby...@gmail.com
Ok now I get your intentions (communication on the internet surely is difficult :)
You meant to do specialized versions like the image package is doing, in addition to adding parallelism. It makes sense, it would be tempting to be able to say: look! the Go parallel sort beats the sh*t out of java and python :D
But the Go team is there also to keep us safe from such frivolous temptations.
It could be interesting to gain some data about how the sort package is used in real application and if it is enough or if they demand (and possibly use) specialized versions.

Giulio Iotti

unread,
Aug 19, 2015, 6:02:13 AM8/19/15
to golang-nuts, roby...@gmail.com
On Wednesday, August 19, 2015 at 12:49:21 AM UTC+2, Michael Jones wrote:
Yes, just so. I sent you the code.

Err... don't forget about the rest of us, please. I'd also like to have a look.

-- 
Giulio Iotti
Message has been deleted

Michael Jones

unread,
Aug 19, 2015, 9:13:36 AM8/19/15
to Giulio Iotti, golang-nuts, roby...@gmail.com
Oh, sorry. There are two different sort examples discussed here:

1) A modification of Go’s internal sort library that changes how interfaces are used internally in the package. The standard sort library provides built in meta functions (swap, less, etc.) for a few built in types. When you call sort.Ints(), for example, that function calls the meta-sort by providing these meta function to swap two ints, compare two ints, and so on. The generic sort itself code uses interfaces all the way down to dispatch to these functions, which means discovering the underlying type ~ n log n times. 

My modification (from way back) was to have those top level convenience functions, which are substantially all of the sorting done by Go’s library and compiler/build tools, detect the type of the sort at the top-level call and then call a type-specific version of the general sort function. This means that there is 3x the code (one for Ints, one for Strings, on for Float64) but that is a tiny cost. This also means that the sort is ~ 3x faster. That’s great to me because I value performance. This is the sort I use. It requires no external change to get the speedup. That was posted earlier as “sort.go”, and you can update your source tree with this to get the speedup. (Note, this was from a year ago so I don’t have any recent changes in the standard sort.go since then.)

2. A parallel sort that I wrote when exploring the 3SUM problem a few weeks ago. The original poster got irked at my efforts on his behalf so I never posted the full result showing just how fast one could do 3SUM in Go. That one used a speed-oriented but standard Quicksort and an optional parallel version of it that uses concurrency to manage the higher levels of split and merge. My goal was to do enough parallelism to exploit the actual CPUs but no more because there is an overhead involved. This code extends the typical tuned quicksort decision tree to be from array segment size small to large: insertion sort, 3 element partition selection qsort, 9 element partition selection qsort, 9 element partition selection qsort in parallel. The thresholds are optimal for my computer(s) and the qsort itself uses tail-recursion simulation to only make half the recursive calls.

This is the one I sent to Roberto privately. I did not want to upset the person who was so touchy before.

Michael

— 
Michael Jones, CEO  •  mic...@wearality.com  •  +1 650 656-6989 
Wearality Corporation  •  289 S. San Antonio Road  •  Los Altos, CA 94022

On Aug 19, 2015, at 3:56 AM, Giulio Iotti <dullg...@gmail.com> wrote:

On Wednesday, August 19, 2015 at 12:02:13 PM UTC+2, Giulio Iotti wrote:
On Wednesday, August 19, 2015 at 12:49:21 AM UTC+2, Michael Jones wrote:
Yes, just so. I sent you the code.

Err... don't forget about the rest of us, please. I'd also like to have a look.

Oops, pardon me, I see you attached it to the other message.
Reply all
Reply to author
Forward
0 new messages