Unsafe bytes to string and the reverse

2,125 views
Skip to first unread message

tacod...@gmail.com

unread,
Sep 30, 2015, 4:37:05 AM9/30/15
to golang-nuts
What is the condition for using the following piece of code:

func UnsafeToString(b []byte) string {
   return *(*string)(unsafe.Pointer(&b))
}

I have some parts of my code that could benefit from a reduced use of allocations, but this requires that I will not change the underlying bytes while the string-view of the data is used?
What problems could arise with this unsafe conversion?

I understand that this works because the first two words of the header are the same (the pointer to the bytes and the length), and that capacity is the third word in a slice header. So when converting back I need to use the reflect package to set the capacity for the slice, right?
Any additional problems could arise with a string to bytes conversion? Unless the string-view isn't used anymore, I can safely change the bytes content?

Thanks for clearing this up. I know I can shoot myself in the foot with this, but it might just improve my codes performance quite significantly!

Giulio Iotti

unread,
Sep 30, 2015, 4:48:22 AM9/30/15
to golang-nuts, tacod...@gmail.com
On Wednesday, September 30, 2015 at 10:37:05 AM UTC+2, tacod...@gmail.com wrote:
I have some parts of my code that could benefit from a reduced use of allocations, but this requires that I will not change the underlying bytes while the string-view of the data is used?
What problems could arise with this unsafe conversion?

 What does your code do that requires the conversions in the first place? Can we avoid them?

-- 
Giulio Iotti

tacod...@gmail.com

unread,
Sep 30, 2015, 5:08:04 AM9/30/15
to golang-nuts, tacod...@gmail.com
It's mainly calling functions in `net/url` and `strconv`. These require strings to be passed, but everything in my code is a byte-slice. (I'm building parsers)

I understand that conversion to string merely for map-keys is optimized away?

Op woensdag 30 september 2015 10:48:22 UTC+2 schreef Giulio Iotti:

Giulio Iotti

unread,
Sep 30, 2015, 5:45:19 AM9/30/15
to golang-nuts, tacod...@gmail.com
On Wednesday, September 30, 2015 at 11:08:04 AM UTC+2, tacod...@gmail.com wrote:
It's mainly calling functions in `net/url` and `strconv`. These require strings to be passed, but everything in my code is a byte-slice. (I'm building parsers)

Understood. As an alternative solution, you could see if you can reimplement the functions you need to work on byte slices.  The reason why strings are copies is indeed what you said before (need to be immutable).

I understand that conversion to string merely for map-keys is optimized away?

Never checked, sorry.

-- 
Giulio Iotti

James Bardin

unread,
Sep 30, 2015, 9:29:06 AM9/30/15
to golang-nuts, tacod...@gmail.com


On Wednesday, September 30, 2015 at 4:37:05 AM UTC-4, tacod...@gmail.com wrote:
What is the condition for using the following piece of code:

func UnsafeToString(b []byte) string {
   return *(*string)(unsafe.Pointer(&b))
}

I have some parts of my code that could benefit from a reduced use of allocations, but this requires that I will not change the underlying bytes while the string-view of the data is used?
What problems could arise with this unsafe conversion?


A problem here is that a string struct is one int smaller than a slice struct. I'm pretty certain this could leak an int every time it's called (and presumably you're calling this a lot if this optimization means anything to you). You definitely can't convert the other way safely, because the cap in the slice is coming from some other memory value.

I understand that conversion to string merely for map-keys is optimized away?

AFAIK, converting to a string in a map index operation is currently optimized to not copy.
 

Ian Lance Taylor

unread,
Sep 30, 2015, 10:24:39 AM9/30/15
to Taco de Wolff, golang-nuts
On Wed, Sep 30, 2015 at 1:36 AM, <tacod...@gmail.com> wrote:
What is the condition for using the following piece of code:

func UnsafeToString(b []byte) string {
   return *(*string)(unsafe.Pointer(&b))
}

I have some parts of my code that could benefit from a reduced use of allocations, but this requires that I will not change the underlying bytes while the string-view of the data is used?
What problems could arise with this unsafe conversion?

I understand that this works because the first two words of the header are the same (the pointer to the bytes and the length), and that capacity is the third word in a slice header. So when converting back I need to use the reflect package to set the capacity for the slice, right?
Any additional problems could arise with a string to bytes conversion? Unless the string-view isn't used anymore, I can safely change the bytes content?

This is of course unsafe and may break in future implementations.  In the current implementation it should work well enough provided you never change the bytes again.  There is no way for you to know when the string view is no longer used.  There are many operations that will retain a pointer to the string, and that will only work correctly if the bytes never change.

Ian 

Giulio Iotti

unread,
Sep 30, 2015, 1:04:58 PM9/30/15
to golang-nuts, tacod...@gmail.com
On Wednesday, September 30, 2015 at 4:29:06 PM UTC+3, James Bardin wrote:
A problem here is that a string struct is one int smaller than a slice struct. I'm pretty certain this could leak an int every time it's called (and presumably you're calling this a lot if this optimization means anything to you). You definitely can't convert the other way safely, because the cap in the slice is coming from some other memory value.

Why leak an int? When the []byte is freed, the cap int is gone together with the len and ptr, or am I wrong?

-- 
Giulio Iotti 

Dave Cheney

unread,
Sep 30, 2015, 1:07:22 PM9/30/15
to golang-nuts
The cap int is part of the slice header value, not the underlying array, unless you are passing around *string, there is no issue.

James Bardin

unread,
Sep 30, 2015, 1:11:28 PM9/30/15
to Dave Cheney, golang-nuts
On Wed, Sep 30, 2015 at 1:07 PM, Dave Cheney <da...@cheney.net> wrote:
> The cap int is part of the slice header value, not the underlying array, unless you are passing around *string, there is no issue.

yes, this is correct. I hadn't had my coffee and forgot about the
final dereference, which copies the string header to a new location.

Brad Fitzpatrick

unread,
Sep 30, 2015, 1:32:20 PM9/30/15
to tacod...@gmail.com, golang-nuts
I strongly encourage you to not do this.

You'll waste way more time debugging random failures in the future than any savings you get.


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dave Cheney

unread,
Sep 30, 2015, 1:36:22 PM9/30/15
to golang-nuts
I agree with Brad. The way to avoid this cost of string/byte/string conversions is to avoid doing them, not slight of hand.

Dan Kortschak

unread,
Sep 30, 2015, 5:28:28 PM9/30/15
to Brad Fitzpatrick, tacod...@gmail.com, golang-nuts
On 01/10/2015, at 3:02 AM, "Brad Fitzpatrick" <brad...@golang.org> wrote:

> I strongly encourage you to not do this.

For the kinds of data sets I work with, this does actually make a significant performance difference - often tsv numeric data gigabytes in size.

I would be nice if it were not necessary to do, but strconv is what it is. Are there plans for improving the compiler so that it recognises when, e.g. ParseInt is given a string([]byte) and not allocating a new string when the value is not retained.

Caleb Spare

unread,
Sep 30, 2015, 5:35:22 PM9/30/15
to Dan Kortschak, Brad Fitzpatrick, tacod...@gmail.com, golang-nuts
There's previous discussion about adding new APIs to strconv that take
[]byte arguments. See https://github.com/golang/go/issues/2632

I've also needed this unsafe optimization in the past. And in fact,
Brad mentions it on that bug as well:

"As a medium-term hack, I showed they could do:

[code the OP posted, pretty much]

But I felt bad even pointing that out."

-Caleb

Brad Fitzpatrick

unread,
Sep 30, 2015, 5:38:05 PM9/30/15
to Caleb Spare, golang-nuts, tacod...@gmail.com, Dan Kortschak

I ended up just forking parts of strconv to add my own byte slice variants. It made amazing differences in Camlistore start-up time (where it slurps the index off storage into ram).

Dan Kortschak

unread,
Sep 30, 2015, 6:14:55 PM9/30/15
to Brad Fitzpatrick, Caleb Spare, golang-nuts, tacod...@gmail.com
What was the reason for putting all these potentially tight loop
functions behind the higher allocating path (via string instead of via
[]byte)?

Brad Fitzpatrick

unread,
Sep 30, 2015, 6:50:36 PM9/30/15
to Dan Kortschak, Caleb Spare, golang-nuts, tacod...@gmail.com
That was like 7 years ago. I don't think anybody had put much thought into string vs []byte back then. Hell, slices had just emerged a month or two prior it looks like.

Dan Kortschak

unread,
Sep 30, 2015, 7:10:11 PM9/30/15
to Brad Fitzpatrick, Caleb Spare, golang-nuts, tacod...@gmail.com
Unfortunate then. It would be nice to not have to do the silly and
dangerous dance or code duplication necessary to avoid this cost.

Brad Fitzpatrick

unread,
Sep 30, 2015, 7:20:51 PM9/30/15
to Dan Kortschak, Caleb Spare, golang-nuts, Taco de Wolff
What is your proposal?

If you're just sad, that's okay. I often am too.

Dan Kortschak

unread,
Sep 30, 2015, 7:27:20 PM9/30/15
to Brad Fitzpatrick, Caleb Spare, golang-nuts, Taco de Wolff
On Wed, 2015-09-30 at 16:20 -0700, Brad Fitzpatrick wrote:
> What is your proposal?

I don't think the proposal would be accepted, but it would be a similar
mirroring of strconv as exists between bytes and strings - or more
likely since strconv already includes []byte destinations just a
mirroring of the string parsing functions as []byte parsing functions.

> If you're just sad, that's okay. I often am too.

More this.

Brad Fitzpatrick

unread,
Sep 30, 2015, 7:28:54 PM9/30/15
to Dan Kortschak, Caleb Spare, golang-nuts, Taco de Wolff
On Wed, Sep 30, 2015 at 4:26 PM, Dan Kortschak <dan.ko...@adelaide.edu.au> wrote:
On Wed, 2015-09-30 at 16:20 -0700, Brad Fitzpatrick wrote:
> What is your proposal?

I don't think the proposal would be accepted, but it would be a similar
mirroring of strconv as exists between bytes and strings - or more
likely since strconv already includes []byte destinations just a
mirroring of the string parsing functions as []byte parsing functions.

But you said you didn't want code duplication?

Code duplication in the standard library doesn't count? :)

Dan Kortschak

unread,
Sep 30, 2015, 7:47:18 PM9/30/15
to Brad Fitzpatrick, Caleb Spare, golang-nuts, Taco de Wolff
On Wed, 2015-09-30 at 16:28 -0700, Brad Fitzpatrick wrote:
> But you said you didn't want code duplication?
>
> Code duplication in the standard library doesn't count? :)

Yes, I understand - the difference is that it is easier to autogenerate
code from a single source if that single source is written with that
intention in mind. The other option for this kind of change would be to
have a single set of []byte parsers that are called with string parser
shims - they are already the not fast path.

fatdo...@gmail.com

unread,
Sep 30, 2015, 8:59:14 PM9/30/15
to golang-nuts, dan.ko...@adelaide.edu.au, ces...@gmail.com, tacod...@gmail.com
Then put in the work and fix it, stop hiding behind the farce you call go 1 compatibility.

Brad Fitzpatrick

unread,
Sep 30, 2015, 9:03:06 PM9/30/15
to fatdo...@gmail.com, Dan Kortschak, Taco de Wolff, golang-nuts, ces...@gmail.com

Please be civil.

fatdo...@gmail.com

unread,
Sep 30, 2015, 9:04:01 PM9/30/15
to golang-nuts, dan.ko...@adelaide.edu.au, ces...@gmail.com, tacod...@gmail.com
It doesn't count when you hide behind the farce that is the go 1 compatibility. put the work in, this ecosystem suffers from your ineptitude.

fatdo...@gmail.com

unread,
Sep 30, 2015, 9:09:52 PM9/30/15
to golang-nuts, tacod...@gmail.com
proposal: put the work in. create a second standard library with all these little fixes and see where the herd grazes. gcc camp did this awhile back when there was a disagreement on how to proceed. They eventually dropped the old codebase in favor of the new. There was so much rhetorical bs leading up to that decision but it ended up to be the best decision ever made for the project.

fatdo...@gmail.com

unread,
Sep 30, 2015, 9:11:31 PM9/30/15
to golang-nuts, fatdo...@gmail.com, dan.ko...@adelaide.edu.au, tacod...@gmail.com, ces...@gmail.com
I will be civil when you are humble.

Sugu Sougoumarane

unread,
Sep 30, 2015, 9:17:55 PM9/30/15
to golang-nuts, tacod...@gmail.com
We still legacy code in vitess that performs a similar trick. We 'should' get rid of it, but it's high QPS code path. So, we have to do the benchmark due diligence, etc. before switching.
However, if you really really want to use this function, I'd recommend starting the function with this check:
if len(b) == 0 {
  return ""
}
Then you won't get bitten if and empty string has a unique canonical representation.

Ian Lance Taylor

unread,
Sep 30, 2015, 9:26:24 PM9/30/15
to AK Willis, golang-nuts, Dan Kortschak, Taco de Wolff, Caleb Spare
On Wed, Sep 30, 2015 at 6:11 PM, <fatdo...@gmail.com> wrote:
> I will be civil when you are humble.

I'm sorry, but civility really does have to be a requirement on this
mailing list. Humility is nice, but it is not a requirement.

Ian

Ian Lance Taylor

unread,
Sep 30, 2015, 9:28:34 PM9/30/15
to AK Willis, golang-nuts, Taco de Wolff
That is not really an accurate description of what happened in the GCC project.

I also don't really see it as necessary for the Go project. I think
that, where appropriate, we should simply fix the existing library.

Ian

Andrew Gerrand

unread,
Sep 30, 2015, 9:28:34 PM9/30/15
to AK Willis, golang-nuts
On 1 October 2015 at 11:11, <fatdo...@gmail.com> wrote:
I will be civil when you are humble.

Adam "AK" Willis,

You have been told many times that your conduct in this forum is inappropriate, yet you continue to behave as you do.
This leaves me no choice but to ban you.

I'm sorry that it came to this.

Sincerely,
Andrew

Shawn Milochik

unread,
Sep 30, 2015, 9:36:55 PM9/30/15
to golan...@googlegroups.com
And for all those who chafe at the idea of a code of conduct, this is why we need one. To keep this the kind of place where curious, creative people want to be we have to disallow some behavior. The fact that some of it doesn't offend you personally is irrelevant because it hurts and excludes some others who have as much to contribute as you do. From their perspective, the rules enforce nothing but civility.

Taco de Wolff

unread,
Oct 1, 2015, 3:52:13 AM10/1/15
to Dan Kortschak, Brad Fitzpatrick, Caleb Spare, golang-nuts
I think duplicated code is not the right way. I believe we ultimately need a smarter compiler and the trick is merely temporary.

I would outline such optimizations as follows, but correct me if I'm wrong:
Take for example the `ParseFloat` function. It requires a `string` because that is a stronger guarantee than `[]byte`, namely `string` enforces non-mutability within the function. Because the atof.go functions are pure it is safe to say that they have no state lingering or external dependencies. Therefore, unless concurrently used perhaps!, it is safe to call such functions without a copy. The compiler would recognize that the bytes to string conversion is just a cast and not a copy. This would mean it is preferable to write functions accepting strings (for that is a stronger guarantee) and there is no need to write duplicated code.

Anyways, I see that this trick is unsavory, I will refrain from using it.

Nico

unread,
Oct 1, 2015, 3:57:19 AM10/1/15
to golan...@googlegroups.com
No matter how you look at it, this is a loss for everyone.

I'm sure we all can do better.

Andrew Bursavich

unread,
Oct 1, 2015, 5:03:45 AM10/1/15
to golang-nuts, dan.ko...@adelaide.edu.au, ces...@gmail.com, tacod...@gmail.com
Whatever happened with your read-only slice proposal? I haven't seen any discussion around it. I'm assuming that even if it was deemed worth the added complexity to the language, it'd probably be too disruptive for go 1. It seemed like it could solve a lot of these issues, but maybe it was just too late.

Egon

unread,
Oct 1, 2015, 5:14:38 AM10/1/15
to golang-nuts, dan.ko...@adelaide.edu.au, ces...@gmail.com, tacod...@gmail.com
On Thursday, 1 October 2015 12:03:45 UTC+3, Andrew Bursavich wrote:
Whatever happened with your read-only slice proposal? I haven't seen any discussion around it. I'm assuming that even if it was deemed worth the added complexity to the language, it'd probably be too disruptive for go 1. It seemed like it could solve a lot of these issues, but maybe it was just too late.

Andrew Bursavich

unread,
Oct 1, 2015, 5:31:24 AM10/1/15
to golang-nuts, dan.ko...@adelaide.edu.au, ces...@gmail.com, tacod...@gmail.com
Aha. Thanks for the link. Great analysis.

Ian Lance Taylor

unread,
Oct 1, 2015, 11:09:24 AM10/1/15
to Taco de Wolff, Dan Kortschak, Brad Fitzpatrick, Caleb Spare, golang-nuts
On Thu, Oct 1, 2015 at 12:51 AM, Taco de Wolff <tacod...@gmail.com> wrote:
>
> I would outline such optimizations as follows, but correct me if I'm wrong:
> Take for example the `ParseFloat` function. It requires a `string` because
> that is a stronger guarantee than `[]byte`, namely `string` enforces
> non-mutability within the function. Because the atof.go functions are pure
> it is safe to say that they have no state lingering or external
> dependencies. Therefore, unless concurrently used perhaps!, it is safe to
> call such functions without a copy. The compiler would recognize that the
> bytes to string conversion is just a cast and not a copy. This would mean it
> is preferable to write functions accepting strings (for that is a stronger
> guarantee) and there is no need to write duplicated code.

The interesting case is, of course, parallel use. When can we safely
treat a []byte as a string? Only when we know that the contents of
the []byte do not change. If we change the memory model to say that a
program with a race condition is undefined, which is approximately how
we treat race conditions anyhow, then given f(string(b)) where b is a
[]byte the compiler can assume that the contents of b will not change
if there are no possible happens-before relationships in f--that is,
if f does not have any synchronization points before or after which b
might change.

This will be true if f does not call any sync functions, does not have
any channel operations, does not start any goroutines, etc., and if
the same is true for all functions call by f. This can be determined
statically if f is a function or non-interface method call, and if
everything it calls are functions or non-interface methods. I think
we can determine this for the special case of strconv.ParseFloat and
friends.

A similar static analysis can apply for f([]byte(s)). That case is
different--we know that s will not change, so we need to look for any
case where f might try to change the slice contents.

How general this would be, I don't know. It doesn't help for the
important case of writer.Write([]byte(s)), when writer is an
io.Writer. Still, it seems worth investigating.

Ian

Joe Taber

unread,
Oct 1, 2015, 3:28:03 PM10/1/15
to golang-nuts, tacod...@gmail.com, dan.ko...@adelaide.edu.au, brad...@golang.org, ces...@gmail.com
On Thursday, October 1, 2015 at 10:09:24 AM UTC-5, Ian Lance Taylor wrote:
If we change the memory model to say that a program with a race
condition is undefined..

That seems like a reasonable starting point for such an optimization.

So we can determine at compile time whether any call with a statically
known call graph can have these optimizations applied. Specifically:

func(string) with no synchronization points can accept a 'casted' (as opposed to copied) []byte
func([]byte) that doesn't change the slice can accept a 'casted' string

As you mention, the problem is calls to interface methods cannot be 
statically known to have these properties. My first thought was to have an
additional flag on all interface methods to say whether it's safe to apply this 
optimization, but I doubt that would fly.

Here's another idea I haven't thought this through all the way:

1. Structs with methods that accept a string or []byte and can safely 
have this optimization applied to them (i.e. it's statically known) insert an 
additional hidden method on the struct with a mangled name. {e.g. a safe Write([]byte) 
would add a Write-Const([]byte) to the struct}

2. At call sites to interface methods that could potentially have this
optimization applied, first attempt to type assert it to an interface with
the mangled name, and apply this optimization only if the assert
succeeds. Otherwise, do the allocation/copy as usual.

For example, something like this example would be translated to something

Some thoughts:
* I have no clue whether this would be worth it as an optimization.
* This only directly solves the problem one call deep: e.g. if you wrap
  another interface then you can't guarantee to your caller that the 
  interface you wrap is safe. (There may be a way around this with -- 
  wait for it -- more code generation.)
* This doesn't touch on methods that take multiple such arguments.
  To solve this with an arbitrary number of arguments would require
  a 2^n hidden methods. Perhaps it could be capped or methods 
  with multiple arguments just skipped.

David Chase

unread,
Oct 6, 2015, 11:39:39 AM10/6/15
to golang-nuts, tacod...@gmail.com, dan.ko...@adelaide.edu.au, brad...@golang.org, ces...@gmail.com


On Thursday, October 1, 2015 at 3:28:03 PM UTC-4, Joe Taber wrote:
On Thursday, October 1, 2015 at 10:09:24 AM UTC-5, Ian Lance Taylor wrote:
If we change the memory model to say that a program with a race
condition is undefined..

That seems like a reasonable starting point for such an optimization.
 
...
 
Some thoughts:
* I have no clue whether this would be worth it as an optimization.

Declaring race conditions as truly undefining your program would make
some concurrency theorists much happier, and this is indeed an example
of the sort of optimization that it might enable.

One thing you can try to see if this optimization is likely to be worthwhile
is to play with "-gcflags -m" and look at what escape analysis is figuring out.
If any of the functions to which you always[might] pass your cast-string leaks it to
heap, then the optimization can't[probably can't] happen.  [If the string leaks
only down a rarely-executed error path, perhaps the costly conversion can be
sunk down to that code.]

It's tempting to consider what might be learned from whole-program analysis,
but it's first necessary to think hard about what tricks we can play with reflection.

Sokolov Yura

unread,
Oct 7, 2015, 2:13:59 AM10/7/15
to golang-nuts
There is no need in code conduct.
Wise moderator with ability to ban is just enough.

Matt Harden

unread,
Oct 7, 2015, 10:23:40 AM10/7/15
to Sokolov Yura, golang-nuts
Our wise moderator says we do need a code of conduct and I respect that.

On Wed, Oct 7, 2015 at 1:14 AM Sokolov Yura <funny....@gmail.com> wrote:
There is no need in code conduct.
Wise moderator with ability to ban is just enough.

Reply all
Reply to author
Forward
0 new messages