Best way to store large numbers of string constants

302 views
Skip to first unread message

Jamie Hall

unread,
May 12, 2016, 4:24:45 AM5/12/16
to golang-nuts
Hi all,

I've got an application that needs to store a large number (~900,000) of small (~16 bytes) string constants. From a little bit of poking around, I found storing them as byte slices made builds unmanageable (several minutes, CPU maxed out), while strings were much more approachable. However, it's still taking about 30s to build on a fairly hefty MacBook Pro. Is this the best I'm likely to see, or is there a clever trick I could use to make builds quicker? For example, would I see better performance if I merged them into a single string and split at runtime?

If 30s builds are the best I'm going to get under these conditions, that's ok, I was just wondering whether I was missing a trick.

Thanks,

Jamie

Dave Cheney

unread,
May 12, 2016, 4:43:40 AM5/12/16
to golang-nuts
The compile time blowout is strongly influenced by the number of OLITERAL (string) nodes in the past. If you can combine each string into a single strong with a slice of offsets, hopefully pregenetated, you should be able to cut down compile time.

Jamie Hall

unread,
May 12, 2016, 5:58:58 AM5/12/16
to golang-nuts
I tried concatenating the strings, but the compile time exploded. It looks like the reason was that the strings got so large that they started having to be swapped in and out. I'm sure there's a sweet spot of string size to string count that gives optimal performance, but I'm not sure I have the time to find it.

Thanks for your help :)

Matthew Zimmerman

unread,
May 12, 2016, 6:04:57 AM5/12/16
to Jamie Hall, golang-nuts
With something similar (large map literal) I had build times in the minutes.  Exporting that data to json and building the map at runtime was seconds.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dave Cheney

unread,
May 12, 2016, 6:25:45 AM5/12/16
to golang-nuts
A large map literal turns into a heroically large init function during compilation.

Jamie Hall

unread,
May 12, 2016, 6:45:23 AM5/12/16
to golang-nuts
Thanks both. I've now got a small script building large literals from a text file, and then an init function splitting into smaller strings, which has brought compile time down to just under 5 seconds.

Thanks again for your help.

Dave Cheney

unread,
May 12, 2016, 7:13:09 AM5/12/16
to golang-nuts
You could try moving this into a separate package to avoid compiling it over and over again. 

I think with some code generation you can replace splitting the file with a static array (not a slice, that requires runtime initialisation) of offsets and do the lookup on the fly.

Alex Bligh

unread,
May 12, 2016, 8:18:57 AM5/12/16
to Jamie Hall, Alex Bligh, golang-nuts

On 12 May 2016, at 10:58, 'Jamie Hall' via golang-nuts <golan...@googlegroups.com> wrote:

> I tried concatenating the strings, but the compile time exploded.

This surprises me.

I have one test file which consists in the main of several megabytes of string literal
(which is about the size you want) defined as

var const foo = `
stuff here
`

and compile time is pretty quick. In my instance it's base64 encoded binary data, but it
could be anything.

--
Alex Bligh




Alex Bligh

unread,
May 12, 2016, 8:20:10 AM5/12/16
to Jamie Hall, Alex Bligh, golang-nuts

On 12 May 2016, at 13:18, Alex Bligh <al...@alex.org.uk> wrote:

> var const foo = `
> stuff here
> `

should be

> const foo = `
> stuff here
> `

obviously

--
Alex Bligh




Val

unread,
May 12, 2016, 8:48:05 AM5/12/16
to golang-nuts
This looks like a job for go:generate.
A simple design would have the program create a new source file "thestrings.go", read a big raw file of 900K lines, put it in a string literal "thestringsconcat", and put the boundaries in an array "limits" of size 900K + 1.
Then in your real program, you would call  getstring( i )  which returns  thestringsconcat[ limits[i] : limits[i+1] ]

Jamie Hall

unread,
May 12, 2016, 10:21:26 AM5/12/16
to golang-nuts, the.sl...@googlemail.com, al...@alex.org.uk
Yes, this ended up being the fastest way. My mistake was that I'd stored it as a var, which had other impacts.

Manlio Perillo

unread,
May 12, 2016, 12:08:35 PM5/12/16
to golang-nuts
Il giorno giovedì 12 maggio 2016 10:24:45 UTC+2, Jamie Hall ha scritto:
Hi all,

I've got an application that needs to store a large number (~900,000) of small (~16 bytes) string constants. From a little bit of poking around, I found storing them as byte slices made builds unmanageable (several minutes, CPU maxed out), while strings were much more approachable. However, it's still taking about 30s to build on a fairly hefty MacBook Pro. Is this the best I'm likely to see, or is there a clever trick I could use to make builds quicker?

IMHO the best method for storing resources data, is to store them in a custom section of the object file.

However the data is stored in the .data section, and this is loaded by the kernel when the program is executed.
Another solution is to store the data in a private section (that will ignored by the kernel), or to simply append data to the executable.

Manlio


Konstantin Khomoutov

unread,
May 12, 2016, 12:44:53 PM5/12/16
to Manlio Perillo, golang-nuts
On Thu, 12 May 2016 09:08:35 -0700 (PDT)
Manlio Perillo <manlio....@gmail.com> wrote:

[...]
> IMHO the best method for storing resources data, is to store them in
> a custom section of the object file.
>
> You can use objdump
> (http://www.linuxjournal.com/content/embedding-file-executable-aka-hello-world-version-5967)
> However the data is stored in the .data section, and this is loaded
> by the kernel when the program is executed.
> Another solution is to store the data in a private section (that will
> ignored by the kernel), or to simply append data to the executable.

IIRC go.rice [1] implements both of these approaches.

1. https://github.com/GeertJohan/go.rice
Reply all
Reply to author
Forward
0 new messages