is it possible to speed up type assertion?

ChrisLu

unread,

Feb 2, 2017, 1:04:23 AM2/2/17

to golang-nuts

Go's type assertion seems quite slow. The added cost is too much if it has to be in a tight loop. Here are the time taken on my laptop for the following code.

https://play.golang.org/p/cA96miTkx_

chris$ time ./p

count=1073741824 time taken=7.899207181s

count=1073741824 time taken=300.601453ms

real 0m8.205s

user 0m8.163s

sys 0m0.029s

chris$ time luajit -e "count = 0

> for i=1, 1024*1024*1024, 1 do count = count + 1 end

> print(count)"

1073741824

real 0m0.900s

user 0m0.891s

sys 0m0.005s

Message has been deleted

T L

unread,

Feb 2, 2017, 3:25:05 AM2/2/17

to golang-nuts

Type assertion is even slower than call dynamic method: https://play.golang.org/p/jUrazcbB9h
Some surprised.

T L

unread,

Feb 2, 2017, 3:25:43 AM2/2/17

to golang-nuts

The result:
assert: count=5242880 time taken=79.262474ms
direct: count=5242880 time taken=2.291717ms
method: count=5242880 time taken=27.456853ms

T L

unread,

Feb 2, 2017, 3:40:31 AM2/2/17

to golang-nuts

Type assertion is even slower than reflect: https://play.golang.org/p/zvUTEKDfiL

assert: count=33554432 time taken=499.061188ms
direct: count=33554432 time taken=14.981847ms
method: count=33554432 time taken=176.977503ms
reflect: count=33554432 time taken=383.905004ms

On Thursday, February 2, 2017 at 4:25:05 PM UTC+8, T L wrote:

Axel Wagner

unread,

Feb 2, 2017, 3:58:32 AM2/2/17

to T L, golang-nuts

Hi,

I can not really reproduce your results. I rewrote your code to use the builtin benchmarking: http://sprunge.us/IfQc

Giving, on my laptop:

BenchmarkAssertion-4 1000000000 2.89 ns/op

BenchmarkAssertionOK-4 500000000 2.66 ns/op

BenchmarkBare-4 1000000000 2.22 ns/op

BenchmarkIface-4 50000000 30.0 ns/op

BenchmarkReflect-4 200000000 9.74 ns/op

Note, that a) yes, there is an overhead of the type-assertion, but b) it's pretty small, especially compared to the other things you're trying and c) it can be further reduced by using the two-value form (so that there is never a need to consider stack-unwinding).

Overall, this smells like a micro-benchmark. I wouldn't worry too much about it until you have specific evidence that it's slowing down a real program.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

T L

unread,

Feb 2, 2017, 4:20:01 AM2/2/17

to golang-nuts, tapi...@gmail.com

On Thursday, February 2, 2017 at 4:58:32 PM UTC+8, Axel Wagner wrote:

Hi,

I can not really reproduce your results. I rewrote your code to use the builtin benchmarking: http://sprunge.us/IfQc
Giving, on my laptop:

BenchmarkAssertion-4 1000000000 2.89 ns/op
BenchmarkAssertionOK-4 500000000 2.66 ns/op
BenchmarkBare-4 1000000000 2.22 ns/op
BenchmarkIface-4 50000000 30.0 ns/op
BenchmarkReflect-4 200000000 9.74 ns/op

Note, that a) yes, there is an overhead of the type-assertion, but b) it's pretty small, especially compared to the other things you're trying and c) it can be further reduced by using the two-value form (so that there is never a need to consider stack-unwinding).

Overall, this smells like a micro-benchmark. I wouldn't worry too much about it until you have specific evidence that it's slowing down a real program.

The result on my machine for your test:
BenchmarkAssertion-4         500000000            17.9 ns/op
BenchmarkAssertionOK-4       500000000            17.9 ns/op
BenchmarkBare-4              2000000000             3.93 ns/op
BenchmarkIface-4             100000000            86.7 ns/op
BenchmarkReflect-4           500000000            15.6 ns/op

To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

T L

unread,

Feb 2, 2017, 4:26:11 AM2/2/17

to golang-nuts, tapi...@gmail.com

On Thursday, February 2, 2017 at 5:20:01 PM UTC+8, T L wrote:

On Thursday, February 2, 2017 at 4:58:32 PM UTC+8, Axel Wagner wrote:
Hi,

I can not really reproduce your results. I rewrote your code to use the builtin benchmarking: http://sprunge.us/IfQc
Giving, on my laptop:

BenchmarkAssertion-4 1000000000 2.89 ns/op
BenchmarkAssertionOK-4 500000000 2.66 ns/op
BenchmarkBare-4 1000000000 2.22 ns/op
BenchmarkIface-4 50000000 30.0 ns/op
BenchmarkReflect-4 200000000 9.74 ns/op

Note, that a) yes, there is an overhead of the type-assertion, but b) it's pretty small, especially compared to the other things you're trying and c) it can be further reduced by using the two-value form (so that there is never a need to consider stack-unwinding).

Overall, this smells like a micro-benchmark. I wouldn't worry too much about it until you have specific evidence that it's slowing down a real program.

The result on my machine for your test:
BenchmarkAssertion-4         500000000            17.9 ns/op
BenchmarkAssertionOK-4       500000000            17.9 ns/op
BenchmarkBare-4              2000000000             3.93 ns/op
BenchmarkIface-4             100000000            86.7 ns/op
BenchmarkReflect-4           500000000            15.6 ns/op

The difference after inclining the functions for iface is quite large:
https://play.golang.org/p/9PGknpwxPj

BenchmarkAssertion-4         500000000            14.8 ns/op
BenchmarkAssertionOK-4       500000000            14.9 ns/op
BenchmarkBare-4              2000000000             0.44 ns/op
BenchmarkIface-4             2000000000             0.44 ns/op
BenchmarkReflect-4           1000000000            10.6 ns/op

I guess CPU make some cache.

Ian Davis

unread,

Feb 2, 2017, 4:28:26 AM2/2/17

to golan...@googlegroups.com

On Thu, 2 Feb 2017, at 09:20 AM, T L wrote:

On Thursday, February 2, 2017 at 4:58:32 PM UTC+8, Axel Wagner wrote:

Hi,

I can not really reproduce your results. I rewrote your code to use the builtin benchmarking: http://sprunge.us/IfQc

Giving, on my laptop:

BenchmarkAssertion-4 1000000000 2.89 ns/op

BenchmarkAssertionOK-4 500000000 2.66 ns/op

BenchmarkBare-4 1000000000 2.22 ns/op

BenchmarkIface-4 50000000 30.0 ns/op

BenchmarkReflect-4 200000000 9.74 ns/op

Note, that a) yes, there is an overhead of the type-assertion, but b) it's pretty small, especially compared to the other things you're trying and c) it can be further reduced by using the two-value form (so that there is never a need to consider stack-unwinding).

Overall, this smells like a micro-benchmark. I wouldn't worry too much about it until you have specific evidence that it's slowing down a real program.

The result on my machine for your test:

BenchmarkAssertion-4         500000000            17.9 ns/op

BenchmarkAssertionOK-4       500000000            17.9 ns/op

BenchmarkBare-4              2000000000             3.93 ns/op

BenchmarkIface-4             100000000            86.7 ns/op

BenchmarkReflect-4           500000000            15.6 ns/op

What version of Go and what OS/hardware?

T L

unread,

Feb 2, 2017, 4:32:20 AM2/2/17

to golang-nuts

$ go version
go version go1.7.5 linux/amd64
$ cat /proc/cpuinfo | grep 'model name' | uniq
model name : Intel(R) Core(TM) i3-2350M CPU @ 2.30GHz

Steven Hartland

unread,

Feb 2, 2017, 4:34:24 AM2/2/17

to golan...@googlegroups.com

Similar results here:
BenchmarkAssertion-24           100000000               14.4 ns/op
BenchmarkAssertionOK-24         100000000               14.0 ns/op
BenchmarkBare-24                1000000000               2.81 ns/op
BenchmarkIface-24               30000000                40.2 ns/op
BenchmarkReflect-24             100000000               13.2 ns/op

go version go1.7.5 freebsd/amd64

CPU: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz (2500.06-MHz K8-class CPU)

    Regards
    Steve

Dave Cheney

unread,

Feb 2, 2017, 4:51:18 AM2/2/17

to golang-nuts

0.44ns/op is about 2.2.ghz, the compiler has optimised away your microbenchmark.

T L

unread,

Feb 2, 2017, 5:01:19 AM2/2/17

to golang-nuts

I found that it is much faster if the dynamic values are pointers instead of non-pointer.

package main

import (
    "testing"
)

func AssertInt(c *int, d interface{}) {
    *c += d.(int)
}

func BenchmarkAssertionInt(b *testing.B) {
    count := 0
    var t int = 1
    d := (interface{})(t)
    for i := 0; i < b.N; i++ {
        AssertInt(&count, d)
    }
}

func AssertPtr(c *int, d interface{}) {
    *c += *d.(*int)
}

func BenchmarkAssertionPtr(b *testing.B) {
    count := 0
    var t int = 1
    d := (interface{})(&t)
    for i := 0; i < b.N; i++ {
        AssertPtr(&count, d)
    }
}

BenchmarkAssertionInt-4 500000000 15.3 ns/op
BenchmarkAssertionPtr-4 2000000000 3.09 ns/op

On Thursday, February 2, 2017 at 2:04:23 PM UTC+8, ChrisLu wrote:

T L

unread,

Feb 2, 2017, 5:41:14 AM2/2/17

to golang-nuts

On Thursday, February 2, 2017 at 6:01:19 PM UTC+8, T L wrote:

I found that it is much faster if the dynamic values are pointers instead of non-pointer.

By looking the code of function assertE2T in runtime/iface.go,
ot looks memmove(to, from unsafe.Pointer, n uintptr) is slow for values with size <= one word.
Currently, official gc doesn't make optimizations for types with size <= one word.
Pointers are treated as special value in interface.

Axel Wagner

unread,

Feb 2, 2017, 5:41:25 AM2/2/17

to T L, golang-nuts

I want to re-emphasize that all of these are micro-benchmarks. They say nothing useful at all, as proven by this thread; in some circumstances, the added cost may be significant, in others it isn't. The same rule that has been repeated lots of times on this list still applies: Write your program to be simple and readable, if it is too slow, benchmark and improve. Basing your code on what any of these Benchmarks says is just ridiculous, base it on the bottlenecks you measure in your actual, real-world program running on the actual production hardware with actual production data.

mhh...@gmail.com

unread,

Feb 2, 2017, 7:13:27 AM2/2/17

to golang-nuts, tapi...@gmail.com

is there a paper to read that introduces and explains benchmark pitfalls and BP for naive developers ?

there are papers out there which talks about go bench, it is sparse and often they pretext an obvious
bad code to introduce the reader to the benchmarks go tooling .
Which is good, don't take me wrong, but that did not help me to avoid some pitfalls i faced IRL.

I suspect the topic is hairy to explain to beginners so they can be autonomous and confident,
in my experience and feeling, if you don t have a benchmarking background in other languages,
you are left to take the wrong decision because you can t produce the right understanding of the situation.

Marvin Renich

unread,

Feb 2, 2017, 9:22:00 AM2/2/17

to golang-nuts

* T L <tapi...@gmail.com> [170202 04:20]:

>
>
> On Thursday, February 2, 2017 at 4:58:32 PM UTC+8, Axel Wagner wrote:
> >
> > Hi,
> >
> > I can not really reproduce your results. I rewrote your code to use the
> > builtin benchmarking: http://sprunge.us/IfQc
> > Giving, on my laptop:
> >
> > BenchmarkAssertion-4 1000000000 2.89 ns/op
> > BenchmarkAssertionOK-4 500000000 2.66 ns/op
> > BenchmarkBare-4 1000000000 2.22 ns/op
> > BenchmarkIface-4 50000000 30.0 ns/op
> > BenchmarkReflect-4 200000000 9.74 ns/op
> >
> > Note, that a) yes, there is an overhead of the type-assertion, but b) it's
> > pretty small, especially compared to the other things you're trying and c)
> > it can be further reduced by using the two-value form (so that there is
> > never a need to consider stack-unwinding).
> >
> > Overall, this smells like a micro-benchmark. I wouldn't worry too much
> > about it until you have specific evidence that it's slowing down a real
> > program.
> >
>
> The result on my machine for your test:
> BenchmarkAssertion-4 500000000 17.9 ns/op
> BenchmarkAssertionOK-4 500000000 17.9 ns/op
> BenchmarkBare-4 2000000000 3.93 ns/op
> BenchmarkIface-4 100000000 86.7 ns/op
> BenchmarkReflect-4 500000000 15.6 ns/op

BenchmarkIface is testing the wrong thing; the value is swamped by the
implicit conversion of d (type T) to the function argument of type I.
Try:

func BenchmarkIface(b *testing.B) {
count := 0
var t T = 1
d := I(t)

for i := 0; i < b.N; i++ {

Iface(&count, d)
}
}

...Marvin

Marvin Renich

unread,

Feb 2, 2017, 9:32:16 AM2/2/17

to golang-nuts

* Marvin Renich <mr...@renich.org> [170202 09:22]:

>
> BenchmarkIface is testing the wrong thing; the value is swamped by the
> implicit conversion of d (type T) to the function argument of type I.

Or, maybe it is testing the correct thing. That is the problem with
microbenchmarking. Try benchmarking your actual code instead. :-P

...Marvin

Rene Kaufmann

unread,

Feb 2, 2017, 9:55:01 AM2/2/17

to Dave Cheney, golang-nuts

BenchmarkAssertion-4 300000000 4.08 ns/op

BenchmarkAssertionOK-4 500000000 3.03 ns/op

BenchmarkBare-4 500000000 3.01 ns/op

BenchmarkIface-4 30000000 55.1 ns/op

BenchmarkReflect-4 100000000 12.8 ns/op

CPU: Intel(R) Core(TM) i5 CPU M 540 @ 2.53GHz

On Thu, Feb 2, 2017 at 10:51 AM Dave Cheney <da...@cheney.net> wrote:

0.44ns/op is about 2.2.ghz, the compiler has optimised away your microbenchmark.

Jan Ziak

unread,

Feb 2, 2017, 11:16:43 AM2/2/17

to golang-nuts, mr...@renich.org

A large difference between an implementable highly-optimized run-time performance of feature X and the physically implemented run-time performance of feature X causes programmers to avoid X.

Axel Wagner

unread,

Feb 2, 2017, 12:56:09 PM2/2/17

to Jan Ziak, golang-nuts, mr...@renich.org

The point is, that (as repeatedly pointed out) microbenchmarks just, in general, don't tell you anything really useful, as any effects you measure could be due to anything ranging from random noise, over compiler optimizations to code-alignments. Avoiding type assertions based on anything contained in this thread is pretty unreasonable; I see zero evidence here, that they are, at all, a problem.

Go gives you pretty amazing tools to find the actual bottlenecks in your code. Again: Write your code as it's most convenient, then run it with production data on production hardware and if it isn't fast enough, use pprof and similar tools to find the actual bottlenecks. They very likely won't be type-assertions.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.

To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.

Jan Ziak

unread,

Feb 2, 2017, 1:46:32 PM2/2/17

to golang-nuts, 0xe2.0x...@gmail.com, mr...@renich.org

The compiler field hasn't so far reached a state of sophistication in handling microbenchmarks like the human mind does handle them. Until that state is reached in compiler technology, it is natural for people to keep using microbenchmarks as one of the primary performance indicators.

The observed difference is in [how a human mind would execute the microbenchmark in order to execute it in the shortest amount of time possible] compared to [how the microbenchmark is being executed on a real CPU running instructions generated by a real compiler].

Rational path for compiler technology is to make the observed gap gradually smaller.

To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

Axel Wagner

unread,

Feb 2, 2017, 1:54:00 PM2/2/17

to Jan Ziak, golang-nuts, mr...@renich.org

Making the gap smaller means making the compiler dumber. And I disagree with you about microbenchmarks. There is a difference between benchmarks and microbenchmarks and using the latter as the primary means of evaluating performance is plain wrong. Nothing wrong with artificial benchmarks in itself; if you benchmark how quickly a json-decoder decodes some artificial set of data is fine, it's reasonably close to real-world to give you some indication of performance. Benchmarking some code that will compile to only a handful of instructions will give you simply zero information.

To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.

Chris Lu

unread,

Feb 2, 2017, 4:04:33 PM2/2/17

to Axel Wagner, Jan Ziak, golang-nuts, mr...@renich.org

Original poster here. Please do not call it micro benchmarking. My real life use case is this.

I am trying to build a generic distributed map reduce system similar to Spark. Without generics, the APIs pass data via interface{}. For example, a reducer is written this way:

func sum(x, y interface{}) (interface{}, error) {

    return x.(uint64) + y.(uint64), nil

To be more generic, this framework also support LuaJIT.

There is a noticeable difference in terms of performance difference. LuaJIT is faster than pure Go. The profiling of pure Go showed the assertE2T and assertI2T cost a non-trivial amount of time.

In the original post, you can see pure Go with type casting took 8 seconds, while LuaJIT only used 900 milliseconds to get the same amount of job done.

btw: Java took 30 ms to do the same. https://play.golang.org/p/FSnvLb2uxA

10% of time of pure go without any time casting.

Chris

See more about the framework on https://github.com/chrislusf/gleam

--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/Og8s9Y-Kif4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts+unsubscribe@googlegroups.com.

Ian Lance Taylor

unread,

Feb 2, 2017, 4:32:03 PM2/2/17

to Chris Lu, Axel Wagner, Jan Ziak, golang-nuts, Marvin Renich

On Thu, Feb 2, 2017 at 1:04 PM, Chris Lu <chri...@gmail.com> wrote:
>
> I am trying to build a generic distributed map reduce system similar to
> Spark. Without generics, the APIs pass data via interface{}. For example, a
> reducer is written this way:
>
> func sum(x, y interface{}) (interface{}, error) {
>
> return x.(uint64) + y.(uint64), nil
>
> }
>
>
> To be more generic, this framework also support LuaJIT.
>
> There is a noticeable difference in terms of performance difference. LuaJIT
> is faster than pure Go. The profiling of pure Go showed the assertE2T and
> assertI2T cost a non-trivial amount of time.

Note that in the 1.8 release assertE2T and assertI2T no longer exist.
They were removed by https://golang.org/cl/32313. That should speed
up these cases; assertE2T and assertI2T were trivial, but in some
cases they did call typedmemmove. When your interface values store
non-pointers, and the code is inlined as it is in 1.8, the calls to
typedmemmove disappear.

You might want to retry your benchmarks with the 1.8 release candidate
to see if you can observe any real difference.

Ian

Chris Lu

unread,

Feb 2, 2017, 4:55:15 PM2/2/17

to Ian Lance Taylor, Axel Wagner, Jan Ziak, golang-nuts, Marvin Renich

Cool! Upgrading to 1.8.rc3 shows great improvement! I wish all problems can be resolved by upgrading. :) Here is the before and after result.

chris$ go test -bench=.

testing: warning: no tests to run

BenchmarkAssertion-8 200000000 9.57 ns/op

BenchmarkAssertionOK-8 200000000 9.02 ns/op

BenchmarkBare-8 1000000000 2.09 ns/op

BenchmarkIface-8 50000000 27.0 ns/op

BenchmarkReflect-8 200000000 9.00 ns/op

PASS

ok _/Users/chris/tmp/test 11.980s

chris$ go version

go version go1.8rc3 darwin/amd64

chris$ go test -bench=.

BenchmarkAssertion-8 1000000000 2.67 ns/op

BenchmarkAssertionOK-8 1000000000 2.32 ns/op

BenchmarkBare-8 1000000000 2.10 ns/op

BenchmarkIface-8 50000000 26.0 ns/op

BenchmarkReflect-8 200000000 8.80 ns/op

PASS

ok _/Users/chris/tmp/test 11.761s

The time taken of the original type casting code shrinks from 8 seconds to 474ms!

count=1073741824 time taken=474.843325ms

count=1073741824 time taken=308.870452ms

Michael Jones

unread,

Feb 2, 2017, 6:26:45 PM2/2/17

to Chris Lu, Ian Lance Taylor, Axel Wagner, Jan Ziak, golang-nuts, Marvin Renich

Insight here feels tenuous.

It is rare that a well-written "real" program would be measurably influenced by something this small. As it happens, I often write such rare programs (where a 2x faster math.Log() really might make the whole program run 1.9x faster). Even as a poster child for it, I think it is very uncommon. To get here you can never read or write data, transcode data, communicate with devices, processes, or users, or do anything else so typical of real programs.

Benchmarks that measure one single thing in the absence of everything else are often misleading. They hide the effect of the "rest of the time" as outlined above and worse, they are immune to the good and bad of the reality of real computers. For example, integer divide is slow, but is also often infrequent. If you do such a divide every 100 cycles, the nature of modern CPUs is that the delay will be hidden in overlapped execution of the instruction stream and at worst, other instruction streams. A nothing but integer divide micro benchmark will block on the 19-cycle or whatever completion rate. That's the good. On the other hand, a "write over and over to the same memory" micro benchmark will run fast thanks to multiple layers of caching while the same thing on a real-program scale will have cache contention effects that could be 1/20th the throughput. (This is the same as measuring highway drive time at 3am vs 8am.)

The most meaningful benchmark measures a real use case. In this situation the resulting measurements directly interpret application performance. In any synthetic benchmark, however, you can often be unsure how to apply the result to the real world. The smaller and more focused the benchmark, the less easy it is to learn from the result.

Even simple statistical inference is subtly difficult; consider the fact that the average person has less than two legs. If this is hard for people, then how much harder to properly understand the whole-program meaning of a 10x slowdown in type assertions?

Michael

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Michael T. Jones
michae...@gmail.com

Chris Lu

unread,

Feb 2, 2017, 7:36:14 PM2/2/17

to Michael Jones, Ian Lance Taylor, Axel Wagner, Jan Ziak, golang-nuts, Marvin Renich

Thanks for the detailed thoughtful answer!

Statistically you are correct. 95% of Go features are reasonably fast, and will get 99% job done very fast. But can we call the benchmark on the rest 5% slow part micro benchmarking?

This can easily become an objective discussion based on different experiences. We can get 99% people supporting that this is a micro benchmarking. However that would not help anything. Indeed this seems an ostrich approach.

We are talking about Go language here, not any specific application. A fair benchmark can be measuring differences against other languages, showing the gap for Go to catch up.

Anything we feel slow should be tracked so that we can see the performance improvement over time and newbies can have a correct expectation of the cost.

Chris

Michael Jones

unread,

Feb 3, 2017, 10:03:28 AM2/3/17

to Chris Lu, Ian Lance Taylor, Axel Wagner, Jan Ziak, golang-nuts, Marvin Renich

Nothing wrong with micro benchmarks. Sorry if I gave that impression. The problem is in trying to use them to anticipate behavior of a larger system. If you have a "micro question" then a "micro benchmark" is a great one. Most people have macro questions though, and yet they often start with micro benchmarks as a logical-seeming computer science version of Descartes method of reasoning. The problem is that in the computer case, it can be difficult or impossible to extrapolate and deduce behavior at larger scales.

Performance is always a good thing. But other things are good too, and even performance comes in flavors. Note benchmarks that report number of allocations and size of allocations as well as elapsed time. There are macro cases where slower-with-less-garbage might lead to a faster overall run time than faster-with-more-GC-pressure. A macro benchmark answers this; a micro one does not.

This is also a subtle strategic issue in performance tuning. If you make type assertions faster but defers slower, is that better for the language? There is no answer for this. No real answer. Because it would be better for some people and worse for others. Even the Go Gods don't know answers to such questions with confidence.

Agree about benchmarking everything. Agree about tracking things seen as slow. Absolutely agree about exposing the cost of actions, and personally, like designs where most things have uniform cost--where the textual complexity of the program makes the runtime complexity evident--except for well marked danger zones.

Where I'm less in agreement is about "...showing the gap for Go to catch up." This can be good, but not always. If language X is 10x faster at some small thing, but that is at the expense of memory usage, GC pressure, complicated dynamic library settings, absence of concurrency architecture, etc., then it is important to understand that the micro decision is made with appreciation of macro decisions. When all can be better then it must. But often engineering is about choosing wise tradeoffs and that's the risk of inspecting too closely.

lar...@gmail.com

unread,

Nov 25, 2018, 11:10:09 AM11/25/18

to golang-nuts

The code below consumes ~40% of the total execution time. According to the profiler i := uint64(arg.(uint32)) is a major contributor

// Cast the integer argument to uint64 and call a "writer"
// The "writer" knows how many bytes to add to the binary stream
// Type casts from interface{} to integer consume 40% of the overall
// time. Can I do better? What is interface{} in Golang?
func (b *Binlog) writeArgumentToOutput(writer writer, arg interface{}, argKind reflect.Kind) error {
    // unsafe pointer to the data depends on the data type
    var err error
    switch argKind {
    case reflect.Int8:
        i := uint64(arg.(int8))
        err = writer.write(b.ioWriter, unsafe.Pointer(&i))
    case reflect.Int16:
        i := uint64(arg.(int16))
        err = writer.write(b.ioWriter, unsafe.Pointer(&i))
    case reflect.Int32:
        i := uint64(arg.(int32))
        err = writer.write(b.ioWriter, unsafe.Pointer(&i))
    case reflect.Int64:
        i := uint64(arg.(int64))
        err = writer.write(b.ioWriter, unsafe.Pointer(&i))
    case reflect.Uint8:
        i := uint64(arg.(uint8))
        err = writer.write(b.ioWriter, unsafe.Pointer(&i))
    case reflect.Uint16:
        i := uint64(arg.(uint16))
        err = writer.write(b.ioWriter, unsafe.Pointer(&i))
    case reflect.Uint32:
        i := uint64(arg.(uint32))
        err = writer.write(b.ioWriter, unsafe.Pointer(&i))
    case reflect.Uint64:
        i := uint64(arg.(uint64))
        err = writer.write(b.ioWriter, unsafe.Pointer(&i))
    case reflect.Int:
        i := uint64(arg.(int))
        err = writer.write(b.ioWriter, unsafe.Pointer(&i))
    case reflect.Uint:
        i := uint64(arg.(uint))
        err = writer.write(b.ioWriter, unsafe.Pointer(&i))
    default:
        return fmt.Errorf("Unsupported type: %T\n", reflect.TypeOf(arg))
    }
    return err
}

Jan Mercl

unread,

Nov 25, 2018, 11:17:31 AM11/25/18

to lar...@gmail.com, golang-nuts

No offense intended, but that code is wrong on so many levels...

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.

To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

-j

Michel Levieux

unread,

Nov 25, 2018, 11:32:52 AM11/25/18

to 0xj...@gmail.com, lar...@gmail.com, golan...@googlegroups.com

No offense intended, but that code is wrong on so many levels...

I strongly believe you that you are not trying to offend anyone, but this question is interesting for me too, and I'd like more details. Could you please explain, from worst to "less worst", what's wrong with this code? Reflection is sometimes really misleading for people who don't have a deep understanding of the compiler, and such pieces of code exist all around the internet. Would you give us a more precise feedback?

Thanks in advance!

Axel Wagner

unread,

Nov 25, 2018, 11:54:32 AM11/25/18

to m.le...@capitaldata.fr, Jan Mercl, lar...@gmail.com, golan...@googlegroups.com

I'd suggest simply

func (b *Binlog) writeArgumentToOutput(writer writer, arg uint64) error { /* do the writing */ }

and doing the actual conversions at the call-site. It's type-safe, shorter, faster and more idiomatic - with the tiny downside of a `uint64()` here and there.

Alternatively, use reflect as it's intended to - something like

func writeArgumentToOutput(writer writer, arg interface{}) error {

rv := reflect.ValueOf(arg)

var v uint64

if k := rv.Kind(); k >= reflect.Int && k < reflect.Uint {

v = uint64(rv.Int())

} else if k <= reflect.Uintptr {

v = rv.Uint()

} else {

return fmt.Errorf("Unsupported type: %T\n", reflect.TypeOf(arg))

}

/* write v */

return nil

}

this is probably a bit slower than the above, type-safe version, but probably not noticeably slower than the unsafe large type-switch above and has the benefit of also not needing conversion at the call-site (it also works correctly with all integer types, which the type-switch doesn't).

But in the end, any slowness here comes from passing things as an interface{}, instead of just as a value. You can't really optimize that away, because it's inherent in the approach.

roger peppe

unread,

Nov 25, 2018, 11:56:13 AM11/25/18

to lar...@gmail.com, golang-nuts

You don't need to use reflect here. You may well find that something like this is faster:

func (b *Binlog) writeArgumentToOutput(writer writer, arg interface{}) error {

var i uint64

switch arg := arg.(type) {

case uintptr:

i = uint64(arg)

case int:

i = uint64(arg)

case int8:

i = uint64(arg)

case int16:

i = uint64(arg)

case int32:

i = uint64(arg)

case int64:

i = uint64(arg)

case uint8:

i = uint64(arg)

case uint16:

i = uint64(arg)

case uint32:

i = uint64(arg)

case uint64:

i = arg

default:

return fmt.Errorf("Unsupported type: %T\n", reflect.TypeOf(arg))

}

return writer.write(b.ioWriter, &i)

}

There are a couple of issues I'd point out with the original code:

- just because a value has a given Kind (e.g. reflect.Int16) does not mean it can be converted to that type, because it may be some other named type with an int16 underlying type. A better (and shorter, but probably no faster) way to do it would be something like: https://play.golang.org/p/jhpbaaDoY57

- it seems odd that you're passing an unsafe.Pointer to the write method. How is that method supposed to know how many bytes to write?

roger peppe

unread,

Nov 25, 2018, 12:01:42 PM11/25/18

to Axel Wagner, m.le...@capitaldata.fr, Jan Mercl, lar...@gmail.com, golang-nuts

On Sun, 25 Nov 2018 at 16:54, 'Axel Wagner' via golang-nuts <golan...@googlegroups.com> wrote:

I'd suggest simply
func (b *Binlog) writeArgumentToOutput(writer writer, arg uint64) error { /* do the writing */ }
and doing the actual conversions at the call-site. It's type-safe, shorter, faster and more idiomatic - with the tiny downside of a `uint64()` here and there.

Alternatively, use reflect as it's intended to - something like
func writeArgumentToOutput(writer writer, arg interface{}) error {
rv := reflect.ValueOf(arg)
var v uint64
if k := rv.Kind(); k >= reflect.Int && k < reflect.Uint {

I know constants are protected by the Go compatibility guarantee, doing range comparisons on reflect.Kind constants seems a bit dubious to me. Without going to the definitions, it's not clear to the reader which exact kinds are included here. I'd suggest enumerating all the expected kinds directly in a switch statement (the compiler may well optimize to a range comparison for that anyway).

Ian Denhardt

unread,

Nov 25, 2018, 1:25:35 PM11/25/18

to Axel Wagner, roger peppe, m.le...@capitaldata.fr, Jan Mercl, lar...@gmail.com, golang-nuts

Quoting roger peppe (2018-11-25 12:01:08)

I agree; had to stare at this a bit. Something like:

switch rv.Kind() {
case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
v = uint64(rv.Int())
case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64:
v = rv.Uint()

default:
return fmt.Errorf("Unsupported type: %T\n", reflect.TypeOf(arg))
}

Only slightly more verbose, but much easier to understand. But ultimate
Axel's first instinct is I think the right one -- just do the cast at
the call site.

roger peppe

unread,

Nov 25, 2018, 1:56:50 PM11/25/18

to i...@zenhack.net, Axel Wagner, m.le...@capitaldata.fr, Jan Mercl, lar...@gmail.com, golang-nuts

Yes - I linked to some very similar play.golang.org code in my first reply in this thread.

I wonder whether reflect.Type should have "CanInt", and "CanUint" methods that can be called to determine whether it's OK to call Value.Int and Value.Uint respectively, without needing to explicitly enumerate all the int-like types.

Only slightly more verbose, but much easier to understand. But ultimate
Axel's first instinct is I think the right one -- just do the cast at
the call site.

Agreed, but this might not be possible if the value has come from within some other type using reflect.

lar...@gmail.com

unread,

Nov 25, 2018, 2:18:00 PM11/25/18

to golang-nuts

These are great tips! Thank you!

This is the context for the code above https://github.com/larytet/binlog/blob/master/binlog.go#L548

The following switch has exactly the same performance

    switch arg := arg.(type) {
    case int:
        i := uint64(arg)
        err = writer.write(b.ioWriter, unsafe.Pointer(&i))
    case uint:

lar...@gmail.com

unread,

Nov 25, 2018, 2:26:02 PM11/25/18

to golang-nuts

This code fails my tests with "panic: interface conversion: interface {} is int32, not int"

switch argKind {


    case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:


        i := uint64(arg.(int))
        err = writer.write(b.ioWriter, unsafe.Pointer(&i))


    case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64:


        i := arg.(uint)
        err = writer.write(b.ioWriter, unsafe.Pointer(&i))

lar...@gmail.com

unread,

Nov 25, 2018, 2:57:05 PM11/25/18

to golang-nuts

This code is ~20% slower (probably)

func (b *Binlog) writeArgumentToOutput(writer writer, arg interface{}) error {
    var err error
    rv := reflect.ValueOf(arg)


    var v uint64
    if k := rv.Kind(); k >= reflect.Int && k < reflect.Uint {


        v = uint64(rv.Int())
        err = writer.write(b.ioWriter, unsafe.Pointer(&v))


    } else if k <= reflect.Uintptr {
        v = rv.Uint()


        err = writer.write(b.ioWriter, unsafe.Pointer(&v))
    } else {


        return fmt.Errorf("Unsupported type: %T\n", reflect.TypeOf(arg))
    }


    /* write v */
    return err
}

Space A.

unread,

Nov 25, 2018, 7:53:41 PM11/25/18

to golang-nuts

+1000

The most valuable comment in this thread IMO. Thank you.

четверг, 2 февраля 2017 г., 13:41:25 UTC+3 пользователь Axel Wagner написал:

lar...@gmail.com

unread,

Nov 26, 2018, 3:22:05 AM11/26/18

to golang-nuts

The ugly hack below shaves 20% from the switch-case

type iface struct {
    tab  *unsafe.Pointer
    data *unsafe.Pointer
}

func getInterfaceData(arg interface{}) unsafe.Pointer {
    return unsafe.Pointer((((*iface)(unsafe.Pointer(&arg))).data))


}

// Cast the integer argument to uint64 and call a "writer"
// The "writer" knows how many bytes to add to the binary stream
//
// Type casts from interface{} to integer consume 40% of the overall
// time. Can I do better? What is interface{} in Golang?

// Switching to args *[]interface makes the performance 2x worse
// Before you jump to conlusions see
// https://groups.google.com/forum/#!topic/golang-nuts/Og8s9Y-Kif4


func (b *Binlog) writeArgumentToOutput(writer writer, arg interface{}) error {


    // unsafe pointer to the data depends on the data type
    var

 err error
    err = writer.write(b.ioWriter, getInterfaceData(arg))
    return err
}

Arkady

unread,

Nov 26, 2018, 4:56:35 AM11/26/18

to golan...@googlegroups.com

P.S. shaves 20% from the total execution time.

> --
> You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/Og8s9Y-Kif4/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.

Jan Mercl

unread,

Nov 26, 2018, 6:38:11 AM11/26/18

to lar...@gmail.com, golang-nuts

What roger peppe said.

- Performing an unchecked type assertion is incorrect in the general case.

Example: https://play.golang.org/p/YmZyw0MnOK8

- Unsafe seems to be used for no good reason.

- Reflect ditto.

This code https://play.golang.org/p/4sd1yayjoNY prints

50000000 35.3 ns/op

While this one https://play.golang.org/p/tSfWbrqv4PN prints

100000000 18.7 ns/op

Using Intel® Xeon(R) CPU E5-1650 v2 @ 3.50GHz × 12.

PS: The last two links point to code that cannot run in the playground.

--

-j

Arkady

unread,

Nov 26, 2018, 8:40:00 AM11/26/18

to golang-nuts

Jan,

I have just measured overall performance of my code - call to Log(string, args []interface{}).

This code makes the difference

type iface struct {
    tab  *unsafe.Pointer
    data *unsafe.Pointer
}

func getInterfaceData(arg interface{}) unsafe.Pointer {
    return unsafe.Pointer((((*iface)(unsafe.Pointer(&arg))).data))
}

func (b *Binlog) writeArgumentToOutput(writer writer, arg interface{}) error {


    // unsafe pointer to the data depends on the data type
    var

 err error
    err = writer.write(b.ioWriter, getInterfaceData(arg))
    return err
}

roger peppe

unread,

Nov 26, 2018, 9:10:11 AM11/26/18

to Arkady M, golang-nuts

On Sun, 25 Nov 2018 at 19:17, <lar...@gmail.com> wrote:

These are great tips! Thank you!
This is the context for the code above https://github.com/larytet/binlog/blob/master/binlog.go#L548

I took a look at this code. It seems that you have a deep understanding of how the runtime works, but to me it really seems like you're prematurely optimising here, and running serious risk of broken code. The code is full of unsafe and non-portable operations that will almost certainly break in the future. For example, reading /proc to determine the base offset for static strings is... inadvisable.

Just because you know what's going on under the covers doesn't mean that you should write code that relies on that information.

If you want to see a highly performant logging implementation that does not seriously rely on unsafe practices, I'd encourage you to take a look at the zap package (https://godoc.org/go.uber.org/zap).

By the way, your SingleInt benchmark is misleading. You're using a constant argument to Log, which means that the runtime can use a single interface value for every call, with no allocation required. If you change the loop so that it passes a different number each time:

for i := 0; i < b.N; i++ {

binlog.Log(fmtString, i)

}

you are likely to find that the performance gap is considerably smaller.

cheers,

rog.

Robert Johnstone

unread,

Nov 26, 2018, 9:39:39 AM11/26/18

to golang-nuts

Hello,

Separate question, why are you passing an unsafe pointer to writer? You are (probably) forcing that int to escape to the heap. If you want to write an uint64, pass in a uint64.

1) I suspect that you are using pointer arithmetic inside writer, don't. The code will not be portable. Instead, you should use shift and mask to extract the bytes (e.x, uint8(i>>16). I expect the resulting code will be faster.

2) Even if you keep the pointer arithmetic, you should defer taking the address until necessary.

Good luck.

Robert

roger peppe

unread,

Nov 26, 2018, 11:27:54 AM11/26/18

to Arkady M, golang-nuts

On Mon, 26 Nov 2018 at 14:16, Arkady <lar...@gmail.com> wrote:

ZAP is significantly slower than what I do. The binary log has an inherent edge.

AFAIK there is nothing about zap which precludes it from writing binary files.

Unlike zap, your scheme will inherently incur more memory allocations, as it takes a memory allocation to put a non-pointer value into an interface (that's the reason why zap is designed the way it is), so I suspect it will end up slower in more realistic scenarios.

For comparison call to Log() is 3x faster than fmt.Fprintf()

At least fmt.Fprintf works predictably and reliably on all Go architectures. Correctness is more important than speed. I'd suggest starting with a more complete solution and making it work before trying to optimise the heck out of it.

cheers,

rog.

I have added BenchmarkSingleIntRogerPeppe() - 13ns more

lar...@gmail.com

unread,

Nov 26, 2018, 11:45:06 AM11/26/18

to golang-nuts

Do you mean this

// This is straight from the https://github.com/uber-go/zap plabook
func (b *Binlog) LogStructured(msg string, fields ...Field) error

My first goal was to allow drop in replacement for the log package

I have to handle 50K queries/s on a mediocre machine. My time budget is about ~200 micro/query

roger peppe

unread,

Nov 26, 2018, 12:20:04 PM11/26/18

to Arkady M, golang-nuts

How many log messages are being produced per query? What are your queries actually doing? What does the profiler tell you?

If logging is really a bottleneck, you could consider other approaches such as sampling or just logging less often, or differently.

BTW if you're handling 50K queries per second, that gives you 20µs/query, not 200 AFAICS.

lar...@gmail.com

unread,

Nov 26, 2018, 2:47:36 PM11/26/18

to golang-nuts

Talking about loggers and ZAP (interesting idea to accommodate API to the JSON)

The following code gets 40ns/op

type FieldType uint8

type Field struct {
    Key       string
    Type      FieldType
    Integer   int64
    String    string
    Interface interface{}
}

const (
    // UnknownType is the default field type. Attempting to add it to an encoder will panic.
    UnknownType FieldType = iota
    // Int64Type indicates that the field carries an int64.
    Uint64Type
)

func Uint64(key string, val uint64) Field {
    return Field{Key: key, Type: Uint64Type, Integer: int64(val)}
}

func handleFields(s string, fields ...Field) {

}

func BenchmarkZapApi(b *testing.B) {
    b.ResetTimer()


    for i := 0; i < b.N; i++ {


        handleFields("Hello ",
            Uint64("world", 0),
            Uint64("world", 1),
            Uint64("world", 2),
        )
    }
    b.StopTimer()
}

robert engels

unread,

Nov 26, 2018, 2:52:55 PM11/26/18

to lar...@gmail.com, golang-nuts

Why are you putting the ResetTimer() and StopTimer() calls in there ? Unnecessary AFAIK.

Also, all this test is benchmarking is the time it takes to construct a Field struct. What is the point ?

--

You received this message because you are subscribed to the Google Groups "golang-nuts" group.

To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

lar...@gmail.com

unread,

Nov 26, 2018, 2:56:00 PM11/26/18

to golang-nuts

This logger https://github.com/ScottMansfield/nanolog is roughly 2x faster than ZAP and will work on all Go platforms.

I want my logs to be collected and stored. I do not want to think where and when I can or can not log. 300ns and up (and not deterministic in my tests) blocking call is something I would like to avoid. I can run a separate log in every thread and reorder, merge the logs offline. When I need to actually read the log I can spend quite lot of time to decode the data.

Reply all

Reply to author

Forward