Short destination error with unicode.Transform

129 views
Skip to first unread message

Joe Schafer

unread,
Jul 29, 2022, 10:01:39 PM7/29/22
to golang-nuts
I had a curious bug appear in my server logs when using a unicode Transformer:

    transform unicode "wind-Pa\x00\x00\x00" to ascii: transform: short destination buffer

Here's the simplified code that caused the error (Gist and Go Playground). I assumed that converting from unicode to ascii would always have an equal or smaller length, hence the panic. Here's the essential bits of the simplified code:

    cs := []byte("wind-Pa\x00\x00\x00")
    chars := make([]byte, len(cs))
    t := transform.Chain(norm.NFD, runes.Remove(runes.In(unicode.Mn)), norm.NFC)
    nDst, _, err := t.Transform(chars, cs, true)

I suspect the error is thrown by text/runes.go:149 (or perhaps on line 165) by the remove transformer. It looks like the form transformers never throw ErrShortDestination.

I haven't been able to reproduce the error on my dev mac or on the playground and there's only been a single occurrence of the error in my server logs. The server binary was compiled with Bazel for the @io_bazel_rules_go//go/toolchain:linux_amd64 toolchain using Go version 1.18.4.

I'd like to understand when ErrShortDestination is thrown by the Transformer. My code allocates a buffer the same length as the input so I thought I'd avoid the short destination error.

Brian Candler

unread,
Jul 30, 2022, 3:13:27 AM7/30/22
to golang-nuts
If this is non-repeatable then perhaps it is some sort of race? Have you tried running your code with the race detector enabled?

Joe Schafer

unread,
Jan 12, 2023, 11:16:02 PM1/12/23
to golang-nuts
Update after seeing the error again.

I inlined everything, creating the transformer on every invocation, to avoid any races, but the error persists. https://gist.github.com/jschaf/bd600ce71ad3798af6c160d74904ac9c

I'm unable to reproduce the error locally. My current plan is to attempt to workaround the issue and use transform.String instead of calling Transform directly.

My best guess is that I needed to allocate a larger buffer than the input buffer because normalization form D (and C?) may expand characters greater than unicode.MaxASCII to multiple bytes. The error occurs with the following strings (not a regular apostrophe):

    wind-Andy’s
    wind-Carr’s

I've tested that hypothesis unsuccessfully at https://go.dev/play/p/6FT6KHzeBwM.

Reply all
Reply to author
Forward
0 new messages