Understanding how []byte("astring") works

196 views
Skip to first unread message

Amit Saha

unread,
Jun 13, 2021, 8:24:57 PM6/13/21
to golang-nuts
Hi - My main motivation to understand this is i always had to google
this - how to convert a string to a byte slice.

Is []byte a type that has been defined in the compiler?

Or, is that an internal level detail that an earlier stage (parsing)
takes care of when the compiler sees that statement?

Thanks,
Amit

peterGo

unread,
Jun 13, 2021, 9:41:54 PM6/13/21
to golang-nuts
Amit,

Compilers implement a specification:

The Go Programming Language Specification
https://golang.org/ref/spec

Conversions

A conversion changes the type of an expression to the type specified by the conversion. A conversion may appear literally in the source, or it may be implied by the context in which an expression appears.

Conversions to and from a string type

4. Converting a value of a string type to a slice of bytes type yields a slice whose successive elements are the bytes of the string.

[]byte("hellø")   // []byte{'h', 'e', 'l', 'l', '\xc3', '\xb8'}
[]byte("")        // []byte{}

MyBytes("hellø")  // []byte{'h', 'e', 'l', 'l', '\xc3', '\xb8'}

Peter

Roland Müller

unread,
Jun 14, 2021, 1:24:27 AM6/14/21
to Amit Saha, golang-nuts
Hello,

a []byte is a sequence of octets and strings in Go consistent of a bytes. These bytes represent an sequence of unicode characters according to UTF-8. One such character consists of either a single or two bytes. ASCII -only strings than have as many bytes as UTF characters.

In the example I made two loop functions loopStringByBytes(s) and loopStringByChars(s) and checked them against a ASCII string and a cyrillic string. You can see that for second string every character occupies 2 bytes.


BR,
Roland

 
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CANODV3nBYmshDLFwUUdnnyVuvpjhWnwBJOb%3DwrZKEHXmtBgbSg%40mail.gmail.com.

Axel Wagner

unread,
Jun 14, 2021, 2:45:19 AM6/14/21
to Amit Saha, golang-nuts
Hi,

On Mon, Jun 14, 2021 at 2:24 AM Amit Saha <amits...@gmail.com> wrote:
Hi - My main motivation to understand this is i always had to google
this - how to convert a string to a byte slice.

Is []byte a type that has been defined in the compiler?

No, but `byte` is. It is a predeclared identifier, referring to a builtin integer type (it's an alias for `uint8`).
So the compiler knows what a `byte` is and what a slice is, so it knows what a slice of byte is.

The conversion between a slice of `byte`s and a `string` is then defined in the spec:

A non-constant value x can be converted to type T in any of these cases:
  • […]
  • x is an integer or a slice of bytes or runes and T is a string type.
  • x is a string and T is a slice of bytes or runes.
This means the compiler, when seeing a conversion, explicitly tests for these two cases.

How the conversion then actually works, you can see in the compiler explorer. The compiler emits a call into a function called `runtime.stringtoslicebyte`.
You can actually find that function in the source code of the runtime then (the same function also contains other functions implementing similar conversions).
Really, it just allocates a new `[]byte` of the appropriate size and then copies over the bytes from the string.

Or, is that an internal level detail that an earlier stage (parsing)

You can actually mess with this a little bit to show that it's not done in the parsing stage, but that the compiler actually knows if a type is a slice of bytes or not.
Because `byte` is a predeclared identifier, it can be shadowed, like every other identifier. That is, you can declare your *own* type called `byte`:
You can also define your own alias for `uint8` and then convert that:
Or you can do both - first shadown the builtin `byte` alias and *then* create your own:
You can also define your own type called `uint8` and then define `byte` as an alias to that and see that you can no longer convert:

All of this shows that the compiler really can't just rely on parsing and the name. It really needs to have a notion of whether something is a slice of a specific pre-declared type `uint8`, no matter what it is called in the source code.

It does that by creating a "virtual" package builtin inside the compiler and then synthesizing type-definitions in that package. The code for that is here.
But it should be noted that this package doesn't really exist and behaves significantly different from "normal" packages - not just because it is implemented entirely in the compiler/runtime, but also because it's identifier are defined "outside" any package, in the universe block.

I assume this covers all questions :) Let us know if you have follow-ups :)

Axel

takes care of when the compiler sees that statement?

Thanks,
Amit

Amit Saha

unread,
Jun 14, 2021, 8:03:41 AM6/14/21
to Axel Wagner, golang-nuts
Thank you! That's enough for me to start digging in for a bit.

Amit Saha

unread,
Jun 14, 2021, 8:04:15 AM6/14/21
to Roland Müller, golang-nuts
On Mon, Jun 14, 2021 at 3:23 PM Roland Müller <rol...@gmail.com> wrote:
>
> Hello,
>
> Am Mo., 14. Juni 2021 um 03:24 Uhr schrieb Amit Saha <amits...@gmail.com>:
>>
>> Hi - My main motivation to understand this is i always had to google
>> this - how to convert a string to a byte slice.
>>
>> Is []byte a type that has been defined in the compiler?
>>
>> Or, is that an internal level detail that an earlier stage (parsing)
>> takes care of when the compiler sees that statement?
>>
>> Thanks,
>> Amit
>>
>
> a []byte is a sequence of octets and strings in Go consistent of a bytes. These bytes represent an sequence of unicode characters according to UTF-8. One such character consists of either a single or two bytes. ASCII -only strings than have as many bytes as UTF characters.
>
> In the example I made two loop functions loopStringByBytes(s) and loopStringByChars(s) and checked them against a ASCII string and a cyrillic string. You can see that for second string every character occupies 2 bytes.
>
> https://play.golang.org/p/DDSpiFuR8Lp

Thank you.

Shulhan

unread,
Jun 14, 2021, 8:41:02 AM6/14/21
to Amit Saha, golang-nuts
Maybe this blogs can help: https://blog.golang.org/strings

Amit Saha

unread,
Jun 14, 2021, 9:00:52 PM6/14/21
to peterGo, golang-nuts
On Mon, Jun 14, 2021 at 11:42 AM peterGo <go.pe...@gmail.com> wrote:
>
> Amit,
>
> Compilers implement a specification:
>
> The Go Programming Language Specification
> https://golang.org/ref/spec
>
> Conversions
>
> A conversion changes the type of an expression to the type specified by the conversion. A conversion may appear literally in the source, or it may be implied by the context in which an expression appears.
>
> Conversions to and from a string type
>
> 4. Converting a value of a string type to a slice of bytes type yields a slice whose successive elements are the bytes of the string.
>
> []byte("hellø") // []byte{'h', 'e', 'l', 'l', '\xc3', '\xb8'}
> []byte("") // []byte{}
>
> MyBytes("hellø") // []byte{'h', 'e', 'l', 'l', '\xc3', '\xb8'}

Thank you.

>
> Peter
>
> On Sunday, June 13, 2021 at 8:24:57 PM UTC-4 amits...@gmail.com wrote:
>>
>> Hi - My main motivation to understand this is i always had to google
>> this - how to convert a string to a byte slice.
>>
>> Is []byte a type that has been defined in the compiler?
>>
>> Or, is that an internal level detail that an earlier stage (parsing)
>> takes care of when the compiler sees that statement?
>>
>> Thanks,
>> Amit
>
> --
> You received this message because you are subscribed to the Google Groups "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/34100c93-63d9-43b9-a36c-bf2aaf324c21n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages