Best way to truncate a ridiculously long string in GO

205 views
Skip to first unread message

Travis Keep

unread,
Aug 21, 2017, 8:04:42 PM8/21/17
to golang-nuts
Suppose you have code like this


    // verylongstring may be several thousand bytes long
    verylongstring := GetSomeVeryLongString()

    prefix := verylongstring[:3]

   prefixStore.StorePrefixForDurationOfApplication(prefix)



If I understand correctly, verylongstring cannot be garbage collected because prefix points to the first 3 characters of it and is stored forever.

What I want is to hold onto the first 3 characters while allowing the rest of verylongstring to be GCed? In JAVA, you can do

prefix := new String(verylongstring.substring(0,3))

which allows verylongstring to be GCed even if the application holds onto prefix.  This is because the above JAVA code has prefix reference a copy of the first 3 characters of verylongstring instead of having prefix reference verylongstring itself.

What is the best way to do this in go?





dja...@gmail.com

unread,
Aug 21, 2017, 8:26:03 PM8/21/17
to golang-nuts
https://play.golang.org/p/YXXlJlsNGa
Hi,
i'm not sure if compiler is smart enough to optimize v1, but likely v2 ends in separate memory.   

Tamás Gulácsi

unread,
Aug 22, 2017, 12:15:44 AM8/22/17
to golang-nuts
prefix := string([]byte(verylongstring[:3]))

peterGo

unread,
Aug 22, 2017, 7:01:36 AM8/22/17
to golang-nuts
A benchmark;

https://play.golang.org/p/oUyeldDG5Q

$ cat strslice_test.go
package main

import (
    "strings"
    "testing"
)

var (
    s      = strings.Repeat("a very, very long string", 4096)
    prefix string
)

func BenchmarkNil(b *testing.B) {
    for i := 0; i < b.N; i++ {
        prefix = string(append([]byte(nil), s[:3]...))
    }
}

func BenchmarkLiteral(b *testing.B) {
    for i := 0; i < b.N; i++ {
        prefix = string(append([]byte{}, s[:3]...))
    }
}

func BenchmarkConvert(b *testing.B) {
    for i := 0; i < b.N; i++ {
        prefix = string([]byte(s[:3]))
    }
}

$ go test -run=! -bench=. -benchmem strslice_test.go
goos: linux
goarch: amd64
BenchmarkNil-4           10000000           174 ns/op          16 B/op           2 allocs/op
BenchmarkLiteral-4       10000000           174 ns/op          16 B/op           2 allocs/op
BenchmarkConvert-4       20000000            77.0 ns/op         3 B/op           1 allocs/op
PASS
ok      command-line-arguments    5.512s
$


Peter

Val

unread,
Aug 22, 2017, 8:07:01 AM8/22/17
to golang-nuts
FWIW, append is most often a small performance penalty when the number of elements is known ahead.
And for some reason, using built-in func copy, or an explicit loop, is slightly faster on my workstation than BenchmarkConvert.
Also, "small benchmarks are hard" so I suspect the small allocations in tight benchmark loops may not be representative of their real cost in a real program.


func BenchmarkCopy(b *testing.B) {

    for i := 0; i < b.N; i++ {
        buffer := make([]byte, 3)
        copy(buffer, s[:3])
        prefix = string(buffer)
    }
}

func BenchmarkLoop(b *testing.B) {

    for i := 0; i < b.N; i++ {
        buffer := make([]byte, 3)
        for i := 0; i < 3; i++ {
            buffer[i] = s[i]
        }
        prefix = string(buffer)
    }
}



BenchmarkNil-4           30000000            61.9 ns/op          16 B/op           2 allocs/op
BenchmarkLiteral-4       30000000            59.6 ns/op          16 B/op           2 allocs/op
BenchmarkConvert-4       50000000            37.4 ns/op           3 B/op           1 allocs/op
BenchmarkCopy-4          50000000            30.8 ns/op           3 B/op           1 allocs/op
BenchmarkLoop-4          50000000            29.3 ns/op           3 B/op           1 allocs/op


Cheers
Val

peterGo

unread,
Aug 22, 2017, 1:22:42 PM8/22/17
to golang-nuts
Val,

That's a lot of speculation!

The original benchmark applies to the original question:

   prefix := verylongstring[:3]

If we change the parameters of the benchmark then we expect to get different results. For example, read the Go gc compiler code. There's a stack/heap optimization for 32 bytes or less.

cmd/gc: allocate buffers for non-escaped strings on stack
commit    e6fac08146df323eb95f46508bef937cdfb802fd
https://go.googlesource.com/go/+/e6fac08146df323eb95f46508bef937cdfb802fd
https://go-review.googlesource.com/c/go/+/3120

3 byte prefix:

BenchmarkNil-4            10000000           173 ns/op          16 B/op           2 allocs/op
BenchmarkLiteral-4        10000000           171 ns/op          16 B/op           2 allocs/op
BenchmarkConvert-4        20000000            76.9 ns/op         3 B/op           1 allocs/op
BenchmarkCopy-4           20000000            67.1 ns/op         3 B/op           1 allocs/op
BenchmarkLoop-4           20000000            65.7 ns/op         3 B/op           1 allocs/op

32 byte prefix:

BenchmarkNil-4             5000000           235 ns/op          64 B/op           2 allocs/op
BenchmarkLiteral-4        10000000           232 ns/op          64 B/op           2 allocs/op
BenchmarkConvert-4        10000000           129 ns/op          32 B/op           1 allocs/op
BenchmarkCopy-4           10000000           118 ns/op          32 B/op           1 allocs/op
BenchmarkLoop-4           10000000           183 ns/op          32 B/op           1 allocs/op

33 byte prefix:

BenchmarkNil-4             5000000           295 ns/op          96 B/op           2 allocs/op
BenchmarkLiteral-4         5000000           285 ns/op          96 B/op           2 allocs/op
BenchmarkConvert-4         5000000           251 ns/op          96 B/op           2 allocs/op
BenchmarkCopy-4           10000000           143 ns/op          48 B/op           1 allocs/op
BenchmarkLoop-4           10000000           188 ns/op          48 B/op           1 allocs/op

256 byte prefix:

BenchmarkNil-4             2000000           661 ns/op         512 B/op           2 allocs/op
BenchmarkLiteral-4         2000000           665 ns/op         512 B/op           2 allocs/op
BenchmarkConvert-4         2000000           659 ns/op         512 B/op           2 allocs/op
BenchmarkCopy-4            5000000           369 ns/op         256 B/op           1 allocs/op
BenchmarkLoop-4            2000000           883 ns/op         256 B/op           1 allocs/op

$ go version
go version devel +33484a6 Tue Aug 22 08:09:42 2017 +0000 linux/amd64


$ go test -run=! -bench=. -benchmem strslice_test.go
goos: linux
goarch: amd64



$ cat  strslice_test.go
package main

import (
    "strings"
    "testing"
)

const pfxLen = 3 // 3, 32, 33, 256


var (
    s      = strings.Repeat("a very, very long string", 4096)
    prefix string
)

func BenchmarkNil(b *testing.B) {
    for i := 0; i < b.N; i++ {
        prefix = string(append([]byte(nil), s[:pfxLen]...))

    }
}

func BenchmarkLiteral(b *testing.B) {
    for i := 0; i < b.N; i++ {
        prefix = string(append([]byte{}, s[:pfxLen]...))

    }
}

func BenchmarkConvert(b *testing.B) {
    for i := 0; i < b.N; i++ {
        prefix = string([]byte(s[:pfxLen]))

    }
}

func BenchmarkCopy(b *testing.B) {
    for i := 0; i < b.N; i++ {
        buffer := make([]byte, pfxLen)
        copy(buffer, s)

        prefix = string(buffer)
    }
}

func BenchmarkLoop(b *testing.B) {
    for i := 0; i < b.N; i++ {
        buffer := make([]byte, pfxLen)
        for i := 0; i < len(buffer); i++ {

            buffer[i] = s[i]
        }
        prefix = string(buffer)
    }
}
$

Peter

Val

unread,
Aug 22, 2017, 1:57:29 PM8/22/17
to golang-nuts
Thanks Peter, this is very interesting.  Indeed I would not have anticipated (though it makes some sense) that calling append would outperform the manual loop, when the prefix is big, despite the overhead of a builtin func call.
Reply all
Reply to author
Forward
0 new messages