Possible compiler optimization

214 views
Skip to first unread message

tux21b

unread,
Nov 11, 2011, 11:18:56 AM11/11/11
to golan...@googlegroups.com
Hi gophers!

First I want apologize for the bad thread title, but I do not know the
right term. Feel free to correct me :)

I was wondering lately why binary.Write(buffer, ...) is so much slower
than BigEndian.PutUint32() for example. Sure, the first one is more general,
it does dispatching based on the type and the byte order is dynamic.

However, since Go is a statically typed language and the byte order
is also constant in most cases, there is probably no reason why the
following snipped can not be (nearly) as fast as BigEndian.PutUint32(
buf, uint32(x))

x := 4
binary.Write(buffer, binary.BigEndian, x)

In my opinion this optimization might be quite important (besides inlining
which is probably a requirement for this to work). The reason why
I think it's that important is, that even the standard library (except exp/spdy)
has rewritten all uses of binary.Write() to things like:

binary.BigEndian.PutUint32(buffer, uint32(h))

which makes the binary.Write() API a bit useless. But there are also
many other cases where such an optimization might be useful. For
example:

atomic.LoadInt32(&x, MemoryOrder.Acquire)

I would really like such a flag similar to C++0x, but the switch(order) is
probably more expensive than the load itself.

or even:

fmt.Println("Hello") // vs. os.Stdout.WriteString("Hello")


So, how is this optimization technique called and is there any chance we
might get it after inlining, since it might heavily influence the design of APIs?

Regards,
christoph

SteveD

unread,
Nov 12, 2011, 3:56:47 AM11/12/11
to golang-nuts
On Nov 11, 6:18 pm, tux21b <tux...@gmail.com> wrote:
> I was wondering lately why binary.Write(buffer, ...) is so much slower
> than BigEndian.PutUint32() for example. Sure, the first one is more general,
> it does dispatching based on the type and the byte order is dynamic.

Yes, I imagine because binary.Write works by reflection.

Was also pondering this problem since I have about a TB of bigendian
data and reading it is rather slow !

steve d.

Bobby Powers

unread,
Nov 13, 2011, 1:05:13 AM11/13/11
to SteveD, golang-nuts

Interesting. I would think that with a dataset that large, the
bottleneck would be the speed of your rotating media. What speeds are
you seeing?

yours,
Bobby

SteveD

unread,
Nov 14, 2011, 7:06:35 AM11/14/11
to golang-nuts
On Nov 13, 8:05 am, Bobby Powers <bobbypow...@gmail.com> wrote:
> Interesting.  I would think that with a dataset that large, the
> bottleneck would be the speed of your rotating media.  What speeds are
> you seeing?

It's not only with BigEndian. It was taking me about 9 seconds (!) to
read two-byte integers from a 25Meg file.

Once I used an unclean method, the read took practically no time, as
expected.

type Trace []int16

var TraceType = unsafe.Typeof(Trace{})

func FastRead16 (in io.Reader, size int) Trace {
bytes := make([]byte,2*size)
in.Read(bytes)
_,addr := unsafe.Reflect(bytes)
return unsafe.Unreflect(TraceType,addr).(Trace)
}

This does feel like a hack, and I'm nervous about what the GC thinks
of these shenanigans.

This is r59 on 32-bit Windows, I'll try a later version with Linux for
comparison.

steve d.

SteveD

unread,
Nov 14, 2011, 8:17:33 AM11/14/11
to golang-nuts
On Nov 14, 2:06 pm, SteveD <steve.j.dono...@gmail.com> wrote:
> Once I used an unclean method, the read took practically no time, as
> expected.

Unclean, and also incorrect (because the slice length is not changed)

For that, we need

https://github.com/feyeleanor/raw

and simply say

func FastRead16 (in io.Reader, size int) []int16 {
bytes := make([]byte,2*size)
in.Read(bytes)
return raw.Int16Slice(bytes)
}

Before I prematurely rejoice, are there any GC pitfalls with this?

steve d.

roger peppe

unread,
Nov 14, 2011, 8:47:54 AM11/14/11
to SteveD, golang-nuts

binary/encoding could definitely be more efficient. i think
i did some work on making reading and writing slices a bit
better a while back but i never submitted the code.

BTW if you must use unsafe, here's a perhaps
more efficient way of doing it, which avoids the
allocation, although it's a little more verbose.
i *think* it's GC-safe.

func FastRead16(in io.Reader, buf []uint16) (int, error) {
var bytes []byte
h := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
h.Len = len(buf) * 2
h.Cap = cap(buf) * 2
if len(buf) > 0 {
h.Data = uintptr(unsafe.Pointer(&buf[0]))
}
n, err := io.ReadFull(in, bytes)
return n / 2, err
}

steve donovan

unread,
Nov 14, 2011, 8:54:36 AM11/14/11
to golang-nuts
On Mon, Nov 14, 2011 at 3:47 PM, roger peppe <rogp...@gmail.com> wrote:
> BTW if you must use unsafe, here's a perhaps
> more efficient way of doing it, which avoids the
> allocation, although it's a little more verbose.

Great, I saw some similar code in the RAW ('Random Access Woes')
package and now it makes sense.

Naturally, I would prefer not to use unsafe for something as routine
as reading binary data. But the speed benefits are too good to be
ignored (more than an order of magnitude).

steve d.

Andy Balholm

unread,
Nov 14, 2011, 11:33:58 AM11/14/11
to golan...@googlegroups.com
What about something like:

// b is a byte slice containing some data.

w := make([]uint16, len(b)/2)

j := 0
for i:=0; i < len(w); i++ {
    w[i] = uint16(b[j]) + (uint16(b[j+1]) << 8)
    j += 2
}

This should be faster than using the binary package, but slower than using unsafe. (This example is little-endian; for big-endian, move the shift to b[j].)

SteveD

unread,
Nov 14, 2011, 12:31:22 PM11/14/11
to golang-nuts
On Nov 14, 6:33 pm, Andy Balholm <andybalh...@gmail.com> wrote:
> This should be faster than using the binary package, but slower than using
> unsafe. (This example is little-endian; for big-endian, move the shift to
> b[j].)

It's probably not much slower than using unsafe, and has the advantage
of easily swinging both ways.

I have a lot of big-endian 32-bit float data, and this kind of
approach would work (although I am still enough of a C person to find
it hard to resist fooling with the bytes)

It would be a good thing if the standard binary package had some
special case optimizations for slice i/o.

steve d.

Kyle Lemons

unread,
Nov 14, 2011, 1:22:50 PM11/14/11
to SteveD, golang-nuts
I wouldn't worry about GC safety in this case.  Someone correct me if I'm wrong, but I believe the biggest way to screw up the garbage collector is to take something that it thinks can't be a pointer ([]int, I believe, is one example) and using it to store pointers, and then converting them back later after it's garbage collected the things to which they point.  In general, it thinks everything can be a pointer, so even this is pretty rare.
Reply all
Reply to author
Forward
0 new messages