I'm aware of Artem Krylysov's idea for string interning published on
If I understand it correctly, each string is stored twice in a map, once as
a key and once as a value. That means that words that only appear once in
his example of string interning the words in the novel 1984 are 100% inefficient
in terms of storage. Words that appear twice are neutral. The advantage of his
approach only appears for words that appear three or more times.
I'm wondering if there's a map-like data structure that would store a string
as the key, and the address of the key as the value. I'm aware that a standard
Go map can't be used for this because its components might be moved around
while a program is running so taking the address of the key would be dangerous,
which is why it isn't allowed.
This came up because I profiled a run of an app I'm working on that
processed 12518 strings, of which 9810 appeared 2 or fewer times. Krylysov's
approach would only be marginally useful for this app.
The data structure I'm looking for would always be at least neutral, and would start
showing a benefit when a word appears twice.
Does anybody know of such a data structure?
Cordially,
Jon Forrest