golang.org/x/text/transform.Chain is not thread safe

433 views
Skip to first unread message

Bryan Reynaert

unread,
Jun 20, 2016, 9:43:33 PM6/20/16
to golang-nuts
Dear all,

the following code panics with: runtime error: slice bounds out of range


var testTransform = transform.Chain(norm.NFD, norm.NFC)

func main() {
for i := 0; i < 200; i++ {
go func() {
transform.String(testTransform, "nonemptystring")
}()
}
time.Sleep(time.Second)
}

The reason is transform.Chain returns a Transformer which keeps an internal state
while sequentially applying each transformation. When called by two goroutines,
this state gets corrupted.

For me, this was unexpected, as each transform by itself seems thread-safe. Errors
caused by this behaviour might be hard to track. I suggest changing the implementation
of transform.Chain.Transform to generate a new state for each call or make a note about
this in the docs.

Or maybe I was expecting something I should not?

I can write a thread-safe version of chain.Transform if this is the way to go. Please comment.

Regards

Bryan

Ian Lance Taylor

unread,
Jun 20, 2016, 11:56:20 PM6/20/16
to Bryan Reynaert, golang-nuts
The general rule for the standard library is that if a type does not
explicitly say that it is safe to call methods from multiple
goroutines, then it is not safe to do so. That said, it may be
reasonable to explicitly state that in this case.

I don't think it should be made thread-safe unless the result is just
as efficient. Efficiency matters here.

Ian

alk...@gmail.com

unread,
Jun 21, 2016, 12:15:38 AM6/21/16
to golang-nuts, alk...@gmail.com
I am guessing the same efficiency should be possible. Except, each call to a chained Transform
would involve a couple of allocations (to create the state). On the good side, this simplifies concurrent
code sharing the same object (no locks), avoids the original problem (concurrency corruption) and avoid
having to create multiple instances of the same chain if one is to avoid locks (saves allocations here).

 
The general rule for the standard library is that if a type does not
explicitly say that it is safe to call methods from multiple
goroutines, then it is not safe to do so.  That said, it may be
reasonable to explicitly state that in this case.

I agree, but consider the Forms used in my example above, they are thread-safe as they are package level
values.
 

I don't think it should be made thread-safe unless the result is just
as efficient.  Efficiency matters here.

Ian


Then I guess it is worth the try? I will have a proposal this weekend.

Regards

Bryan 

Ian Lance Taylor

unread,
Jun 21, 2016, 12:33:39 AM6/21/16
to Bryan Reynaert, golang-nuts
On Mon, Jun 20, 2016 at 9:15 PM, <alk...@gmail.com> wrote:
> I am guessing the same efficiency should be possible. Except, each call to a
> chained Transform
> would involve a couple of allocations (to create the state). On the good
> side, this simplifies concurrent
> code sharing the same object (no locks), avoids the original problem
> (concurrency corruption) and avoid
> having to create multiple instances of the same chain if one is to avoid
> locks (saves allocations here).

Unfortunately, a couple of allocations is not the same efficiency.

That said, this is mpvl's code and he would make the final decision.

Ian

Nigel Tao

unread,
Jun 22, 2016, 11:13:36 PM6/22/16
to Ian Lance Taylor, Marcel van Lohuizen, Bryan Reynaert, golang-nuts
+mpvl although my instinct is that this is all working as intended,
and types are not goroutine-safe unless they explicitly say so.

mp...@golang.org

unread,
Jun 23, 2016, 6:16:03 AM6/23/16
to Ian Lance Taylor, Bryan Reynaert, golang-nuts
Indeed. The reason this is not thread-safe is especially for performance reasons. Also, Transformers are, in general, not thread-safe (which is already implied by the Reset method).

Making Chain thread-safe even wouldn't even work for a sequence of thread-safe Transformers without sacrificing performance, as you would have to allocate largish buffers on every call to Transform. 

The solution is really not to share Transformer returned by Chain. On the bright side, sort of, the creation of the buffers (the state you want to replicate on each call to transform) is biggest cost of creating a new Transformer using Chain. So if allocating the buffers is not an issue, you can do

var testTransform = func() Transformer { return transform.Chain(norm.NFD, norm.NFC) }

and 

transform.String(testTransform(), "nonemptystring")

Alternatively, if you are dealing with small strings, you could avoid using Chain altogether:

norm.NFC.String(norm.NFD.String("nonemptystring")).



 

Ian

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages