Bytconv where art thou

427 views
Skip to first unread message

Sebastien Binet

unread,
Sep 1, 2017, 3:50:29 AM9/1/17
to Michael Jones, Nilsocket, lutz...@posteo.de, Patrick Smith, peterGo, golang-nuts
Hi,

I'd also be very interested in looking at 'bytconv'. And most probably should use it in anger :)

-s

sent from my droid

On Aug 31, 2017 8:28 PM, "Michael Jones" <michae...@gmail.com> wrote:
Nice! Is "bytconv" shared somewhere?

On Thu, Aug 31, 2017 at 10:53 AM, peterGo <go.pe...@gmail.com> wrote:
Michael,

n your code, you have :

const n = 1000 * 1000

for i := 0; i < n && scan.Scan(); i++ {
    d, _ := strconv.Atoi(scan.Text())
    array[i] = int64(d)
}


https://play.golang.org/p/SgpAXyvsGs

Here's a benchmark that demonstrates the fundamental issue, unnecessary string conversions.
,
BenchmarkAtoiBytconv-4       50000000     30.4 ns/op    0 B/op    0 allocs/op
BenchmarkAtoiStrconv-4       20000000    102 ns/op      8 B/op    1 allocs/op


https://play.golang.org/p/oSQ8RZeGL7

Peter


On Thursday, August 31, 2017 at 12:24:20 PM UTC-4, peterGo wrote:
Michael,

Your read times look slow to me. I used bytconv instead of strconv.

bytconv:
read 1000000 98.517584ms
sort 1000000 481.994354ms

strconv:
read 1000000 174.720883ms
sort 1000000 479.437831ms

bytconv is the missing Go standard library package. bytconv is the byte input analog of string input strconv.

Peter

On Wednesday, August 30, 2017 at 7:43:49 PM UTC-4, Michael Jones wrote:
good point. I was trying to show that the buffered stdin was "just like" normal scanning but the performance was less compared to the updated scanning code.

here is another version, this time with a data generator and since the input is line oriented, the default split function is fine.

read 1000000 65.362993ms
sort 1000000 187.092493ms


On Wed, Aug 30, 2017 at 2:56 PM, Patrick Smith <pat42...@gmail.com> wrote:
That is simpler, but slower. Not nearly as slow as using unbuffered io though. Timings on my machine, reading 1e6 integers chosen randomly from the range [0,1e18):

5.626974435s
155.367779ms

Original poster's optimized code https://play.golang.org/p/1Aoxwwv-zo
168.638597ms
150.923225ms

Michael's simpler code https://play.golang.org/p/tMyipz6sYU
954.543351ms
166.710399ms

So this is about 6 times slower. My guess is this is due to the use of reflection in fmt.Fscanf. But that is just a guess; I don't really have any evidence to back it up.

On Wed, Aug 30, 2017 at 1:33 PM, Michael Jones <michae...@gmail.com> wrote:
This can be much simpler...

On Wed, Aug 30, 2017 at 7:55 AM, Nilsocket <nils...@gmail.com> wrote:

Can you provide example code for how you read the input?

Both of them were given same input size is:1000000, (i.e., 1 million)

// Time taken to read input : 9.840256889s
// Time taken to sort: 731.469604ms

I have implemented the same using bufio:

// Time taken to read input : 377.038152ms
// Time taken to sort: 688.20638ms

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Michael T. Jones
michae...@gmail.com

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.




--
Michael T. Jones
michae...@gmail.com

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Michael T. Jones
michae...@gmail.com

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Peter Waller

unread,
Sep 1, 2017, 4:28:48 AM9/1/17
to Sebastien Binet, Michael Jones, Nilsocket, lutz...@posteo.de, Patrick Smith, peterGo, golang-nuts
On 1 September 2017 at 08:49, Sebastien Binet <seb....@gmail.com> wrote:
I'd also be very interested in looking at 'bytconv'. And most probably should use it in anger :)

I've written my own bytconv.Atoi in two projects where it gave a nice speedup, so +1 to this.

Nilsocket

unread,
Sep 1, 2017, 10:22:40 AM9/1/17
to golang-nuts, michae...@gmail.com, nils...@gmail.com, lutz...@posteo.de, pat42...@gmail.com, go.pe...@gmail.com
Recently I have written a program for palindrome, I thought it would be really simple and efficient to check whether a number is palindrome or not, using []byte.

I thought that go had bytconv package just like strconv, but it doesn't.
I searched in internet for a very long time, trying to convert an integer to []byte, only solution I found by trolling over internet is to convert it to string and then typeCast it to []byte.

But my program is being timed out, so I thought there should be some better way to convert things to []byte.

[]byte seems to be awesome choice for those who care about performance.

I love stdlib but some times it does way too much work than I want it to.
Ex:- I want to split my input if there is any " ". but scanWords take all the pain of considering \t,... and several other things.
obviously I do have a choice to write my own split function.

Anyhow later I found there is some pattern in how palindrome numbers were generated.

It would be cool if bytconv package exists.

Thank you.

peterGo

unread,
Sep 1, 2017, 2:01:18 PM9/1/17
to golang-nuts, michae...@gmail.com, nils...@gmail.com, lutz...@posteo.de, pat42...@gmail.com, go.pe...@gmail.com
Michael and Sebastien,

Since my bytconv package is currently private, I'm fixing it up for publication. I hope to make bytconv available in go get'able form within a week. I'll let you know when it' is avalable.

Peter

Michael Jones

unread,
Sep 1, 2017, 6:47:54 PM9/1/17
to peterGo, golang-nuts, Nilsocket, lutz...@posteo.de, Patrick Smith
Great! Very kind of you

as

unread,
Sep 15, 2017, 1:26:27 AM9/15/17
to golang-nuts
Strings are immutable, slices arent. Doesnt this make a string the optimal read only data structure for storing and passing around a run of bytes?

If there is performance data, please share.

Peter Waller

unread,
Sep 15, 2017, 5:38:30 AM9/15/17
to as, golang-nuts
Because of the immutability of the string, constructing a string on a byte slice requires allocating and copying the bytes. Once you've got the string, it's "free" to pass around.

If you've already got strings, everything is fine. But if you're reading the data from IO, you're probably starting with bytes, and if you can avoid the allocs, it means less GC pressure. YMMV, but I have seen performance wins of 2-5x by avoiding string allocations in a data parse heavy loop (Atoi being one place where strings are allocated).

On 15 September 2017 at 06:26, as <as....@gmail.com> wrote:
Strings are immutable, slices arent. Doesnt this make a string the optimal read only data structure for storing and passing around a run of bytes?

If there is performance data, please share.

roger peppe

unread,
Sep 15, 2017, 6:55:23 AM9/15/17
to Peter Waller, as, golang-nuts
I suspect that this kind of performance issue may be fixed by better
compiler optimisation in the future (I think should be possible for
the compiler to notice that the string doesn't escape and pass a
pointer to the byte slice as the string argument).
>> email to golang-nuts...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts...@googlegroups.com.

Peter Waller

unread,
Sep 24, 2017, 4:13:54 PM9/24/17
to roger peppe, as, golang-nuts
On 15 September 2017 at 11:54, roger peppe <rogp...@gmail.com> wrote:
I suspect that this kind of performance issue may be fixed by better
compiler optimisation in the future (I think should be possible for
the compiler to notice that the string doesn't escape and pass a
pointer to the byte slice as the string argument).

Agreed in principle. Thinking aloud follows.

What if the byte slice changes after string construction? At least once you've got a string you know that (in principle) it shouldn't be modified, but if the compiler elides the copy, the consumer of the string could be in for a surprise.

func foo(s string)

...

bs := []byte{0, 1, 2}
s := string(bs)
bs[0] = 42
foo(s)

To cope with this the compiler would have to be able to prove that the backing array of the slice isn't modified between the creation of the string and its use, which maybe it can perhaps do in trivial cases such as foo(string(bs)), where the passed string does not escape.

Amnon Baron Cohen

unread,
Sep 25, 2017, 1:43:43 PM9/25/17
to golang-nuts

Peter Waller

unread,
Sep 26, 2017, 8:17:36 AM9/26/17
to Amnon Baron Cohen, golang-nuts
On 25 September 2017 at 17:43, Amnon Baron Cohen <amn...@gmail.com> wrote:
 
Nice link, fun to see that all this discussion has happened before (and no doubt, will happen again!).

Reading that thread and related threads, I get the impression that there is not an open issue covering string([]byte) where the []byte can be proven to not be mutated. At least not one that is cross referenced (or I may have missed it with my quick scanning).

Russ did comment (https://github.com/golang/go/issues/2632#issuecomment-252725945) that he would prefer to see the compiler improved, but I didn't find an open issue which covers that. Can anyone else find one? I think there are some valuable insights in the linked issue, though it is closed because it proposes API duplication rather than an underlying compiler fix.

For fun I did an AST search of string(thing of type []byte) in the Go corpus. It comes a fair amount. Not sure though how to evaluate the performance impact of fixing it. Would be nice if it were a simple matter of running everyone's code to find out ;-). Anecdotally it appears in some hot loops for me, and maybe it's interesting that other people have discussed it quite a bit (quoting Bradfitz Oct 2015: "4 years and still a problem": https://github.com/golang/go/issues/2632#issuecomment-144591243)
Reply all
Reply to author
Forward
0 new messages