Interface Performance Questions

Joshua Marsh

unread,

Dec 4, 2013, 7:15:55 PM12/4/13

to golan...@googlegroups.com

I'd be interested in peoples take on the cost-benefit analysis of using Interfaces. It seems like they can cause a performance hit. Here is the code for the tests I've used: https://gist.github.com/icub3d/7794453. The results are:

$ go test -bench .
PASS
BenchmarkSortInt1K-8	   10000	    150809 ns/op
BenchmarkMyIntSort1K-8	   50000	     34777 ns/op
BenchmarkMyIntSortFunc1K-8	   50000	     51170 ns/op
BenchmarkMyIntSortStruct1k-8	   10000	    143204 ns/op
BenchmarkMyIntSortPtr1k-8	   50000	     56585 ns/op
ok  	_/home/jmarsh/sort_test/mysort	13.993s

They all are using the sort algorithm from the sort package.

The first is using the code unaltered with interfaces.
The second uses an []int with all the swap/less parts inlined.
The thirds uses an []int but makes a call to a local Swap/Less functions.
The fourth uses a struct with an []int in it and uses the struct's Swap/Less functions.
The last is the same as the forth but uses pointers to structs.

I tried it the different ways because I was hoping to get an idea of how the interface works internally. I read a couple of papers/blogs (like http://research.swtch.com/interfaces) that describe the internals, but I was curious what the performance penalty is when using them. I was hoping that it would fall under the category of you should almost always just use interfaces, but the performance seems to suggest otherwise. Interfaces and structs were on par and did 3-4 times worse performance wise. For the struct, it's clearly because of the copying involved (the struct pointers are substantially faster). I'm guessing the Interface is slow for the same reason. Is that right? Is there an easy way to solve that problem?

It seems like from this test, albeit contrived, that the general rule will be to never use interfaces in performance critical code. That seems like I'm missing something though because a lot of the standard library uses interfaces, even parts I would consider performance critical to an application. Any help in what I'm not understanding would be appreciated.

Volker Dobler

unread,

Dec 4, 2013, 7:39:21 PM12/4/13

to golan...@googlegroups.com

Am Donnerstag, 5. Dezember 2013 01:15:55 UTC+1 schrieb Joshua Marsh:

...

It seems like from this test, albeit contrived, that the general rule will be to never use interfaces in performance critical code.

I think this a far to broad statement.

First performance is not everything, otherwise we would all be

coding assembler all day long ,or even FPGAs. The only "general

rule" I would come up with is "don't run performance critical code".

I don't think you would state the general rule of "never use strings in

performance critical code" just because ints are faster.

Second your example is really a border case. Sorting doesn't do

much but compare and swap, so having some slight delay caused

by interfaces gives a huge effect as nothing else is going on during

sorting.

That seems like I'm missing something though because a lot of the standard library uses interfaces, even parts I would consider performance critical to an application.

As usual there is no free lunch. Yes, interface come with performance

costs, but their flexibility during writing, testing and maintaining code

outweighs their cost (IMHO by far).

V.

Jesse McNelis

unread,

Dec 4, 2013, 7:51:02 PM12/4/13

to Joshua Marsh, golang-nuts

On Thu, Dec 5, 2013 at 11:15 AM, Joshua Marsh <jos...@themarshians.com> wrote:

I tried it the different ways because I was hoping to get an idea of how the interface works internally. I read a couple of papers/blogs (like http://research.swtch.com/interfaces) that describe the internals, but I was curious what the performance penalty is when using them. I was hoping that it would fall under the category of you should almost always just use interfaces, but the performance seems to suggest otherwise.

They have the same cost that other kinds of dynamic dispatch have. The cost of indirection of looking up a table for the method call at runtime and the inability to inline any of those method calls at compile time.

The sort.Interface is a particularly costly use of interfaces. If you have a large amount of sorting to do in a very short amount of time then it's a good idea to specialise the sorting code for your specific type.

But most sorting is of arrays with length in the hundreds not the billions and computers are fast.

Interfaces and structs were on par and did 3-4 times worse performance wise. For the struct, it's clearly because of the copying involved (the struct pointers are substantially faster). I'm guessing the Interface is slow for the same reason. Is that right? Is there an easy way to solve that problem?

It seems like from this test, albeit contrived, that the general rule will be to never use interfaces in performance critical code. That seems like I'm missing something though because a lot of the standard library uses interfaces, even parts I would consider performance critical to an application. Any help in what I'm not understanding would be appreciated.

Interfaces allow for decoupling and flexibility. Making code flexible always has some additional computational cost but it reduces the programmer cost.

When you need the performance more than the flexibility you can remove the flexibility.

Joshua Marsh

unread,

Dec 5, 2013, 12:29:48 AM12/5/13

to golan...@googlegroups.com, Joshua Marsh, jes...@jessta.id.au

On Wednesday, December 4, 2013 5:39:21 PM UTC-7, Volker Dobler wrote:

Am Donnerstag, 5. Dezember 2013 01:15:55 UTC+1 schrieb Joshua Marsh:

...

It seems like from this test, albeit contrived, that the general rule will be to never use interfaces in performance critical code.

I think this a far to broad statement.

First performance is not everything, otherwise we would all be
coding assembler all day long ,or even FPGAs.

I agree performance isn't everything. I love Interfaces. They make testing substantially easier for me. The idea of duck-typing in my mind is so much cleaner as well than other similar language features. In my experience though, you start to find critical sections of code where a performance improvement can save lots of money.

Second your example is really a border case. Sorting doesn't do
much but compare and swap, so having some slight delay caused
by interfaces gives a huge effect as nothing else is going on during
sorting.

I'm not sure I agree that the example is some obscure corner case. Sure not everyone is using sort, but the notion of a CPU bound critical section of code seems fairly common from my experience. It never occurred to me that stripping out interfaces would cause an improvement (which in the example isn't a mild improvement, it's drastic).

On Wednesday, December 4, 2013 5:39:21 PM UTC-7, Volker Dobler wrote:

That seems like I'm missing something though because a lot of the standard library uses interfaces, even parts I would consider performance critical to an application.

As usual there is no free lunch. Yes, interface come with performance

costs, but their flexibility during writing, testing and maintaining code
outweighs their cost (IMHO by far).

On Wednesday, December 4, 2013 5:51:02 PM UTC-7, Jesse McNelis wrote:

Interfaces allow for decoupling and flexibility. Making code flexible always has some additional computational cost but it reduces the programmer cost.

When you need the performance more than the flexibility you can remove the flexibility.

Here I think we can all agree. I just was startled at the cost and was hoping there was a relatively easy solution or I was missing something. In the test of the struct, you can almost entirely solve the problem just by switching to pointers.

Jian Zhen

unread,

Dec 5, 2013, 1:41:01 AM12/5/13

to Joshua Marsh, golang-nuts

Joshua,

This is actually quite interesting. I decided to modify one of my projects to test using interface vs no interface and saw a 2.4x performance difference. The first benchmark uses interface. The second does not.

1 2	`Benchmark1ProducerAnd1Consumer-3 5000000 353 ns/op Benchmark1ProducerAnd1ConsumerInBytes-3 10000000 147 ns/op`

You can find the code at github.com/reducedb/ringbuffer. The tests are in the bytebuffer directory.

Jian

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Kyle Lemons

unread,

Dec 5, 2013, 2:05:57 AM12/5/13

to Jian Zhen, Joshua Marsh, golang-nuts

On Wed, Dec 4, 2013 at 10:41 PM, Jian Zhen <zhe...@gmail.com> wrote:

Joshua,

This is actually quite interesting. I decided to modify one of my projects to test using interface vs no interface and saw a 2.4x performance difference. The first benchmark uses interface. The second does not.

Don't confuse performance on a microbenchmark with performance in practice. In real code, you may never notice the 200ns difference (especially if there's any network or disk i/o going on).

Kevin Gillette

unread,

Dec 5, 2013, 3:58:40 AM12/5/13

to golan...@googlegroups.com, Joshua Marsh, jes...@jessta.id.au

On Wednesday, December 4, 2013 10:29:48 PM UTC-7, Joshua Marsh wrote:

I agree performance isn't everything. I love Interfaces. They make testing substantially easier for me. The idea of duck-typing in my mind is so much cleaner as well than other similar language features. In my experience though, you start to find critical sections of code where a performance improvement can save lots of money.

I've found that constant-time-reduction optimizations (which this is) have only fixed hardware cost savings, but that deprioritizing work spent towards such optimizations has closer to linear savings in time and money. In other words, at typical (or even sub-typical) programmer wages, it generally makes more financial sense to buy 2x faster hardware than it does to pay the programmer to make the program run only 2x faster. If it were a quadratic time savings (which this isn't) or even a linear savings (which this also isn't), then it would start to make financial sense to have programmers begin to rewrite code.

I'm not sure I agree that the example is some obscure corner case. Sure not everyone is using sort, but the notion of a CPU bound critical section of code seems fairly common from my experience. It never occurred to me that stripping out interfaces would cause an improvement (which in the example isn't a mild improvement, it's drastic).

The downside of this is that to avoid use of interfaces in sort, you have to fork, and then maintain, one or more type-specific versions of sort. For many times of input, there are far better sort algorithms than quicksort, thus forking the stdlib sort to have a slightly faster integer quicksort seems pointless when there are, e.g. bucket sorts you could be using.

As far as the benchmarks go, 1 << 10 seems a bit small, and may or may not be biased toward testing the overhead (or lack thereof) involved with algorithm heuristics than it is toward actually testing the sort behavior. Last time I looked, the stdlib sort is a hybrid sort that is predominantly quicksort based -- but any hybrid sort has some overhead, either during an initial probe, or in periodic checks, to determine which sorting behavior to employ. I'd consider 16 << 20 or more to produce timings that are more likely to be representative of the core algorithms involved, though of course, the timings may come out to be much the same.

Additionally, you're creating a lot of unnecessary garbage in your b.N loops, since the data is being consistently reinitialized each iteration irrespective of the contents of the data slice -- that allocation can be moved to the function scope. Though it's not being timed, in the spirit of discussing performance critical code, you could turn your initialization loops (based on len(data)) into range loops to potentially avoid boundary checks on each iteration -- and it will look cleaner too.

Joshua Marsh

unread,

Dec 5, 2013, 9:37:39 AM12/5/13

to golan...@googlegroups.com

On Thu, Dec 5, 2013 at 1:58 AM, Kevin Gillette <extempor...@gmail.com> wrote:

I've found that constant-time-reduction optimizations (which this is) have only fixed hardware cost savings, but that deprioritizing work spent towards such optimizations has closer to linear savings in time and money. In other words, at typical (or even sub-typical) programmer wages, it generally makes more financial sense to buy 2x faster hardware than it does to pay the programmer to make the program run only 2x faster. If it were a quadratic time savings (which this isn't) or even a linear savings (which this also isn't), then it would start to make financial sense to have programmers begin to rewrite code.

I guess everyone's mileage may vary. I can see where you are coming from and have felt like that with some projects. On the other hand though, early this year I did some similar optimizing over a couple weeks that saved my company about $20,000/month in hardware costs.

The downside of this is that to avoid use of interfaces in sort, you have to fork, and then maintain, one or more type-specific versions of sort.

I agree here. The feature of the language is very compelling to me and solves the complaint of the lack of generics by some people. Based on what I've read, it probably performs better than how generics are implemented in the more common languages today.

As far as the benchmarks go, 1 << 10 seems a bit small, and may or may not be biased toward testing the overhead (or lack thereof) involved with algorithm heuristics than it is toward actually testing the sort behavior. Last time I looked, the stdlib sort is a hybrid sort that is predominantly quicksort based -- but any hybrid sort has some overhead, either during an initial probe, or in periodic checks, to determine which sorting behavior to employ. I'd consider 16 << 20 or more to produce timings that are more likely to be representative of the core algorithms involved, though of course, the timings may come out to be much the same.

This did help to some degree. Instead of being 4x slower, it's now only about 3x slower.

Additionally, you're creating a lot of unnecessary garbage in your b.N loops, since the data is being consistently reinitialized each iteration irrespective of the contents of the data slice -- that allocation can be moved to the function scope. Though it's not being timed, in the spirit of discussing performance critical code, you could turn your initialization loops (based on len(data)) into range loops to potentially avoid boundary checks on each iteration -- and it will look cleaner too.

I just pulled the code from the sort package and hadn't really paid much attention to it. While it doesn't change the results of the test, the tests did run a couple of seconds faster. Perhaps I can provide a patch and submit a new issue.

Michael Jones

unread,

Dec 5, 2013, 10:14:45 AM12/5/13

to Kevin Gillette, golang-nuts, Joshua Marsh, Jesse McNelis

Abstract is slower than concrete. You are asking the CPU to repeatedly discover at run-time what you were unwilling or unable to specify at compile time. Maybe you're being lazy or maybe there is a good reason, but the key notion is crucial: the actual instructions that the computer performs are type-specific so any coding that is type-unspecific forces an analysis and branch table somewhere (preprocessor expansion, compiler concretization, run-time dispatch among equals, or in hardware.)

Differences are often insignificant. Common CPUs are fast compared to common data sizes (so the difference is not important to the user) and most real programs do more in their inner-loops than the exchanges in these sort functions (so the amortized extra time may barely be measurable).

Point of view is important. Programmers coming from the dynamic typing world expect abstraction and find system design without it difficult. They've always been paying the overhead, even for "i = i + 1." From this perspective, it is not that interfaces are slow, but that not using interfaces is unusually fast. (The same argument applies to writing critical sections in assembler vs a higher-level language, making use of special SIMD or AES-assist instructions vs assembler, GPU/MIC coprocessors vs the main CPU, or BitCoin and security ASICs.)

Usage trumps mechanism. Examine the sort package's source code (go/src/pkg/sort/sort.go) and look at Ints() in line 270. What happens is that the API that knows you want to sort an int slice in non-decreasing order uses the abstract sorting engine by specifying the use of default int-specific Len(), Less(), and Swap() functions. That's easy, but it forces the default sorting engine to be abstract "all the way down." Instead, it could have said, "Oh, I have special code for this" and simply been the int-specific version of sorting in to non-decreasing order with custom code for this case (and similarly for other built-in types). If it were coded this way, then the abstract-to-concrete mechanism would happen once per sort call and be immeasurable in terms of performance penalty. That's an example of style of usage being more important than cost of usage.

Solutions are well known. The situation of wanting to perform a fixed algorithm on various types of data is common. The desire to do this in a "factored" way, where the logic is written once with data types as parameters is as old as algorithms. I can easily imagine a student asking Euclid, "but, master, what if we need the GCD of a 64-bit number? You only showed us what to do for 16-bit numbers. Is the algorithm for the larger type related in any way to the smaller?" Both Euclid and I agree that the algorithms are the same and the difference in types is trivial. I use m4 or mcpp (http://mcpp.sourceforge.net/) to make such trivialities, well, trivial. I do this because Go does not have a native macro, parameterized, or other abstract-to-specific type management facility. Maybe this could be a good research project for Go 1.3 interval -- not to adopt an answer but to explore alternatives with vigor in search of consensus.

Michael

--

You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--

Michael T. Jones | Chief Technology Advocate | m...@google.com | +1 650-335-5765

Chris Hines

unread,

Dec 5, 2013, 10:32:26 AM12/5/13

to golan...@googlegroups.com, Jian Zhen, Joshua Marsh

It predates Go 1.0, but here is a post in which I describe my encounter with the performance overhead of interfaces. But I agree with the other responses, the significance of this overhead (like so much else that we do) is contextual.

https://groups.google.com/d/msg/golang-nuts/tgfc0xThXd8/fD8TAElYAB8J

Ian Lance Taylor

unread,

Dec 5, 2013, 11:38:22 AM12/5/13

to Joshua Marsh, golang-nuts

On Wed, Dec 4, 2013 at 4:15 PM, Joshua Marsh <jos...@themarshians.com> wrote:
>
> Interfaces and
> structs were on par and did 3-4 times worse performance wise. For the
> struct, it's clearly because of the copying involved (the struct pointers
> are substantially faster). I'm guessing the Interface is slow for the same
> reason. Is that right? Is there an easy way to solve that problem?

I would guess that the slowdown when using interfaces is more from
calling methods than from copying the interface values.

Ian

Jian Zhen

unread,

Dec 5, 2013, 12:21:15 PM12/5/13

to Michael Jones, Kevin Gillette, golang-nuts, Joshua Marsh, Jesse McNelis

So I agree with everything that's being said. So thank you to all. And thank you Michael for the great write up below.

To me, it is a matter of knowledge and learning. Knowing the difference gives you choices. You can choose to NOT use interfaces and suffer the downside, but gain significant performance in the right context; or you can choose to continue with interfaces and reap the benefits of that, but with lower performance and hopefully still meet whatever performance requirements you have.

On the other hand, you won't even know that you have these options if you do NOT know the difference.

What Joshua did here is just made what everyone know as obvious (abstract is slower than concrete) a bit more concrete. I think it's a great piece of knowledge to gain (at least to me, maybe everyone else already knows the exact performance differences), as long as we all keep things in perspective.

Jian

Reply all

Reply to author

Forward