Design Review: new go.text/unicode/encoding package

Nigel Tao

unread,

Jun 7, 2013, 5:33:09 AM6/7/13

to Rob 'Commander' Pike, mp...@golang.org, roger peppe, Andy Balholm, golang-dev

I've been thinking about a semi-official Go package to convert between
UTF-8 and other encodings (e.g. UTF-16, Windows-1252). It would live
in the go.text sub-repo, as
code.google.com/p/go.text/unicode/encoding.

The key ideas are that an Encoding is an interface (e.g. "package
big5; var Encoding encoding.Encoding = etc"), it best operates on
[]byte (although it can operate on rune), and it is stateless:

type Encoding interface {
// Decode converts encoded runes in src to UTF-8 bytes in dst.
It returns
// the number of dst bytes written, the number of src bytes
read, and the
// next encoding to use to decode the rest of the byte stream.
Decode(dst, src []byte) (nDst, nSrc int, enc Encoding)
DecodeRune(p []byte) (r rune, n int, enc Encoding)
Encode(dst, src []byte) (nDst, nSrc int, enc Encoding)
EncodeRune(p []byte, r rune) (n int, enc Encoding)
}

func NewReader(e Encoding, r io.Reader) *Reader
func NewWriter(e Encoding, w io.Writer) *Writer

The act of decoding (or encoding) can change the Encoding in use. For
example, you could start with an endian-agnostic UTF-16 Encoding, and
Decode could return an UTF-16 (Little Endian) Encoding on encountering
a byte order mark.

I am aware of two existing similar packages:
code.google.com/p/go-charset/charset/
code.google.com/p/mahonia/

My proposal differs in a number of ways:

1. "package encoding" only provides a minimal number of encodings:
UTF-8 and UTF-16. Other encodings like Big-5 or GBK, which can require
large data tables, would be in separate packages. If your program
needs the Big-5 encoding, it can import big5 and refer to
big5.Encoding as a variable, instead of having to look up by string.
If your program does not need Big-5 or GBK, then the compiler and
linker do not need to see those data tables. Data tables are generated
before compile time; no data files are read at run time.

2. There is no central registry of encodings, keyed by strings such as
"windows-1252" or "cp1252". If you want the Big-5 encoding, import
big5 and refer to big5.Encoding. If you want to implement the
equivalent of iconv, provide your own map[string]Encoding.

2. There's only an Encoding. There is no separation of (stateless)
Factory and (stateful) Translator, or (stateless) Charset and
(stateful) Decoder.

3. The primary interface is batch; as I anticipate most users would
want to use NewReader or NewWriter, which boils down to Decode or
Encode. Decode takes []byte, DecodeRune is provided mostly as a
convenience. Decode takes a destination buffer as an argument, like
io.Reader, instead of returning a buffer, like go-charset. Unlike
mahonia, conversion does not require a Decoder call per decoded rune.

4. The Decode method does not take an explicit eof argument, unlike
go-charset. Decode will return nSrc == 0 if it cannot decode a rune
from src. It is up to the caller if it wants to behave differently
depending on whether or not they are at EOF.

5. The Writer implementation is buffered; it has an explicit Flush method.

https://codereview.appspot.com/10085049 has a proof of concept for the
Windows-1252 encoding, a lot more wordage about the Encoding
interface, and how e.g. DecodeRune differs from utf8.DecodeRune from
the standard unicode/utf8 package.

Some benchmark numbers comparing "this package" against go-charset and
mahonia on 26K of Windows-1252 data:

$ go test -bench=. -benchmem
PASS
BenchmarkReaderGoCharset 10000 131647 ns/op 41157 B/op 6 allocs/op
BenchmarkReaderMahonia 10000 259387 ns/op 12565 B/op 7 allocs/op
BenchmarkReaderThisPackage 50000 64742 ns/op 8337 B/op
3 allocs/op
BenchmarkWriterGoCharset8K --- FAIL: BenchmarkWriterGoCharset8K
encoding_test.go:190: written 25879 bytes, want 25877
BenchmarkWriterGoCharset64K 10000 150204 ns/op 28805 B/op
4 allocs/op
BenchmarkWriterMahonia8K 5000 370096 ns/op 17651 B/op 7 allocs/op
BenchmarkWriterMahonia64K 5000 374755 ns/op 28907 B/op 5 allocs/op
BenchmarkWriterThisPackage8K 10000 135038 ns/op 8261 B/op
2 allocs/op
BenchmarkWriterThisPackage64K 10000 134980 ns/op 8261 B/op
2 allocs/op
ok code.google.com/p/go.text/unicode/encoding 15.918s

I'm not sure why the GoCharset benchmark fails for the
many-smaller-writes case but passes the one-big-write case. I might be
doing something dumb with go-charset (and/or mahonia).

WDYT? Am I missing any subtleties?

roger peppe

unread,

Jun 7, 2013, 8:36:10 AM6/7/13

to Nigel Tao, Rob 'Commander' Pike, mp...@golang.org, Andy Balholm, golang-dev

On 7 June 2013 10:33, Nigel Tao <nige...@golang.org> wrote:
> I've been thinking about a semi-official Go package to convert between
> UTF-8 and other encodings (e.g. UTF-16, Windows-1252). It would live
> in the go.text sub-repo, as
> code.google.com/p/go.text/unicode/encoding.
>
> The key ideas are that an Encoding is an interface (e.g. "package
> big5; var Encoding encoding.Encoding = etc"), it best operates on
> []byte (although it can operate on rune), and it is stateless:
>
> type Encoding interface {
> // Decode converts encoded runes in src to UTF-8 bytes in dst.
> It returns
> // the number of dst bytes written, the number of src bytes
> read, and the
> // next encoding to use to decode the rest of the byte stream.
> Decode(dst, src []byte) (nDst, nSrc int, enc Encoding)
> DecodeRune(p []byte) (r rune, n int, enc Encoding)
> Encode(dst, src []byte) (nDst, nSrc int, enc Encoding)
> EncodeRune(p []byte, r rune) (n int, enc Encoding)
> }
>
> func NewReader(e Encoding, r io.Reader) *Reader
> func NewWriter(e Encoding, w io.Writer) *Writer
>
> The act of decoding (or encoding) can change the Encoding in use. For
> example, you could start with an endian-agnostic UTF-16 Encoding, and
> Decode could return an UTF-16 (Little Endian) Encoding on encountering
> a byte order mark.

This is nice, and definitely more elegant than go-charset's Translator.

How useful are DecodeRune and EncodeRune BTW? Do they
provide any functionality (or performance) not available through
Decode and Encode?
I'd prefer to see as few methods in the interface as possible
as there will be many implementations.

Momentary musing: of course an Encoding may itself be stateless but it
can encode
state in the returned Encoding. I wonder whether there's a good
reason that to disallow an Encoder (not the initial encoder though)
from being stateful. Other than the starting Encoder, I'm not
sure it can ever be correct to use it other than for continuing
the original text. Actually, I suppose there are some possibilities,
but reasonably far fetched.

> 1. "package encoding" only provides a minimal number of encodings:
> UTF-8 and UTF-16. Other encodings like Big-5 or GBK, which can require
> large data tables, would be in separate packages. If your program
> needs the Big-5 encoding, it can import big5 and refer to
> big5.Encoding as a variable, instead of having to look up by string.
> If your program does not need Big-5 or GBK, then the compiler and
> linker do not need to see those data tables. Data tables are generated
> before compile time; no data files are read at run time.
>
> 2. There is no central registry of encodings, keyed by strings such as
> "windows-1252" or "cp1252". If you want the Big-5 encoding, import
> big5 and refer to big5.Encoding. If you want to implement the
> equivalent of iconv, provide your own map[string]Encoding.

Some kind of registry of names is useful - someone needs to maintain
a mapping from encoding names (of which there are hundreds) to
packages. I'm not sure of the best approach here though - go-charset
provides both external and internal data sets as required, but
it doesn't allow compiling with a subset of the data sets built in AFAIR.

But maybe that can be something external. I like the idea of a centrally
agreed upon Encoding interface.

> 2. There's only an Encoding. There is no separation of (stateless)
> Factory and (stateful) Translator, or (stateless) Charset and
> (stateful) Decoder.
>
> 3. The primary interface is batch; as I anticipate most users would
> want to use NewReader or NewWriter, which boils down to Decode or
> Encode. Decode takes []byte, DecodeRune is provided mostly as a
> convenience. Decode takes a destination buffer as an argument, like
> io.Reader, instead of returning a buffer, like go-charset. Unlike
> mahonia, conversion does not require a Decoder call per decoded rune.
>
> 4. The Decode method does not take an explicit eof argument, unlike
> go-charset. Decode will return nSrc == 0 if it cannot decode a rune
> from src. It is up to the caller if it wants to behave differently
> depending on whether or not they are at EOF.

This seems potentially problematic. For instance a given encoder
may want to translate all broken characters in the input to some
particular byte sequence (the way we use RuneError).
AFAICS the encoder interface doesn't give the encoder any chance
to do that for partial sequences at the end of a file - they will just
be discarded if we're using encoding.Writer (the usual case).

Moreover, it's quite possible that in some encodings it
isn't actually possible to tell what's character to produce
without peeking ahead by a byte, and that a
valid character might be lost at EOF because of this.

> I'm not sure why the GoCharset benchmark fails for the
> many-smaller-writes case but passes the one-big-write case. I might be
> doing something dumb with go-charset (and/or mahonia).

It was a bug. Now fixed. Oops. go-charset lacks for comprehensive
tests, unfortunately. I've been meaning to fix that for a long time.

> WDYT? Am I missing any subtleties?

Overall it looks good.

One niggle that I have thought about occasionally in the
past: error handling - sometimes you *do* want to know if an encoding
has failed, and producing unusual characters to the output
is not always a good option.

cheers,
rog.

PS I've published (but not mailed) a couple of preliminary comments on
the code review.

Marcel van Lohuizen

unread,

Jun 7, 2013, 12:45:44 PM6/7/13

to Nigel Tao, Rob 'Commander' Pike, roger peppe, Andy Balholm, golang-dev

On Fri, Jun 7, 2013 at 11:33 AM, Nigel Tao <nige...@golang.org> wrote:

I've been thinking about a semi-official Go package to convert between
UTF-8 and other encodings (e.g. UTF-16, Windows-1252). It would live
in the go.text sub-repo, as
code.google.com/p/go.text/unicode/encoding.

Why not just go.text/encoding?

Note that we are likely to move unicode/norm to the main repo at some point.

The key ideas are that an Encoding is an interface (e.g. "package
big5; var Encoding encoding.Encoding = etc"), it best operates on
[]byte (although it can operate on rune), and it is stateless:

type Encoding interface {
// Decode converts encoded runes in src to UTF-8 bytes in dst.
It returns
// the number of dst bytes written, the number of src bytes
read, and the
// next encoding to use to decode the rest of the byte stream.
Decode(dst, src []byte) (nDst, nSrc int, enc Encoding)

As Roger mentioned as well, nor allowing to specify eof actually may be an issue for incremental en/decoding.

The Go Internationalization Design Overview doc defines a Transformer interface very similar to this one, but which includes an eof (terminate) as well. (normalization, bidi manipulation, case changing, transliteration, etc. are all Transformers.) There are some benefits to having all text-related packages use the same interface for translation-related functionality. In fact, this is one of the main reasons why I am reluctant to move unicode/norm to core; it may need to change slightly to adhere to such an interface.

Admittedly, the details of the Transformer interface in this doc could be improved (and the use is not consistent throughout the document). The concept seems very similar, though. It would be nice if we can have the encoders use the same interface. The most important difference is the returning of an Encoder interface in your proposal.

DecodeRune(p []byte) (r rune, n int, enc Encoding)

Are the DecodeRune and EncodeRune methods really necessary? I would be inclined to leave them out of the interface.

Encode(dst, src []byte) (nDst, nSrc int, enc Encoding)
EncodeRune(p []byte, r rune) (n int, enc Encoding)
}

func NewReader(e Encoding, r io.Reader) *Reader
func NewWriter(e Encoding, w io.Writer) *Writer

The act of decoding (or encoding) can change the Encoding in use. For
example, you could start with an endian-agnostic UTF-16 Encoding, and
Decode could return an UTF-16 (Little Endian) Encoding on encountering
a byte order mark.

I agree with Roger that returning an Encoding provides for a roundabout way of implementing state. Wouldn't it be easier to allow a Decoder to have state? Either way, I see that either approach has its issues and I'm not opposed to including this even in a generic interface used in go.text if this seems to be the best approach.

Do you have example code to demonstrate the practicalities of this approach?

I am aware of two existing similar packages:
code.google.com/p/go-charset/charset/
code.google.com/p/mahonia/

My proposal differs in a number of ways:

1. "package encoding" only provides a minimal number of encodings:
UTF-8 and UTF-16. Other encodings like Big-5 or GBK, which can require
large data tables, would be in separate packages. If your program
needs the Big-5 encoding, it can import big5 and refer to
big5.Encoding as a variable, instead of having to look up by string.
If your program does not need Big-5 or GBK, then the compiler and
linker do not need to see those data tables. Data tables are generated
before compile time; no data files are read at run time.

2. There is no central registry of encodings, keyed by strings such as
"windows-1252" or "cp1252". If you want the Big-5 encoding, import
big5 and refer to big5.Encoding. If you want to implement the
equivalent of iconv, provide your own map[string]Encoding.

I agree on the no central registry part. If it is really necessary, there could be a separate package that wraps all of them.

2. There's only an Encoding. There is no separation of (stateless)
Factory and (stateful) Translator, or (stateless) Charset and
(stateful) Decoder.

3. The primary interface is batch; as I anticipate most users would
want to use NewReader or NewWriter, which boils down to Decode or
Encode. Decode takes []byte, DecodeRune is provided mostly as a
convenience. Decode takes a destination buffer as an argument, like
io.Reader, instead of returning a buffer, like go-charset. Unlike
mahonia, conversion does not require a Decoder call per decoded rune.

Apart from some minor details, this is very similar to the other API's implemented and envisioned in go.text, as mentioned before. For consistency, I would ditch the *Rune methods. In practice, I would say it is not very common to have to convert to Rune. Note that almost none of the existing and proposed APIs in go.text use rune. collate, norm, etc. all operate directly on UTF-8. If there are real performance benefits to using the Rune variants, you could possibly define a separate interface for these methods.

4. The Decode method does not take an explicit eof argument, unlike
go-charset. Decode will return nSrc == 0 if it cannot decode a rune
from src. It is up to the caller if it wants to behave differently
depending on whether or not they are at EOF.

This is an issue, in my opinion, and I would include the explicit eof argument.

--

Trying this for a while: http://go/OnlyCheckEmailTwiceADay.

Marcel van Lohuizen -- Google Switzerland GmbH -- Identifikationsnummer: CH-020.4.028.116-1

Robin

unread,

Jun 7, 2013, 7:09:14 PM6/7/13

to golan...@googlegroups.com

> The Go Internationalization Design
> Overview<https://docs.google.com/a/google.com/document/d/1eQDkJkn6O5MCvQdDleGIBFB85Z4FqeH4VOXCefQcrFs/edit>
> doc

Is this document available to the public? The link requires a Google login.

roger peppe

unread,

Jun 7, 2013, 2:00:16 PM6/7/13

to Marcel van Lohuizen, Nigel Tao, Rob 'Commander' Pike, Andy Balholm, golang-dev

There's one particular down side to returning an Encoder, of
course, which is that nothing can implement an Encoder
without importing code.google.com/p/go.text/unicode/encoding.
So there could be no Encoders in go core, which is perhaps
what Marcel was driving at with respect to unicode/norm,
now I come to think of it.

Andy Balholm

unread,

Jun 7, 2013, 2:03:32 PM6/7/13

to roger peppe, Marcel van Lohuizen, Nigel Tao, Rob 'Commander' Pike, golang-dev

On Jun 7, 2013, at 11:00 AM, roger peppe <rogp...@gmail.com> wrote:

> There's one particular down side to returning an Encoder, of
> course, which is that nothing can implement an Encoder
> without importing code.google.com/p/go.text/unicode/encoding.
> So there could be no Encoders in go core, which is perhaps
> what Marcel was driving at with respect to unicode/norm,
> now I come to think of it.

But on the other hand, once this package stabilizes, maybe it could be moved to Go core, since it is pretty simple itself; it's just the Encoding implementations that would cause bloat if they were included.

Andy Balholm

unread,

Jun 7, 2013, 2:30:18 PM6/7/13

to Nigel Tao, Rob 'Commander' Pike, mp...@golang.org, roger peppe, golang-dev

On Jun 7, 2013, at 2:33 AM, Nigel Tao <nige...@golang.org> wrote:

> I've been thinking about a semi-official Go package to convert between
> UTF-8 and other encodings (e.g. UTF-16, Windows-1252). It would live
> in the go.text sub-repo, as
> code.google.com/p/go.text/unicode/encoding.

I think that's a great idea: define a standard interface that anyone can implement to add more charsets, instead of putting them all in one huge package that one person has to maintain.

> The key ideas are that an Encoding is an interface (e.g. "package
> big5; var Encoding encoding.Encoding = etc"), it best operates on
> []byte (although it can operate on rune), and it is stateless:
>
> type Encoding interface {
> // Decode converts encoded runes in src to UTF-8 bytes in dst.
> It returns
> // the number of dst bytes written, the number of src bytes
> read, and the
> // next encoding to use to decode the rest of the byte stream.
> Decode(dst, src []byte) (nDst, nSrc int, enc Encoding)
> DecodeRune(p []byte) (r rune, n int, enc Encoding)

I concur with Roger and Marcel that implementing Encodings would be easier if there were just two methods in the interface—one for decoding and one for encoding. From an implementation standpoint, DecodeRune and EncodeRune would usually be the easiest, but for the sake of efficiency it makes sense to match the most common use case and use Decode and Encode instead. Perhaps the EncodeRune and DecodeRune methods could be optional, in a separate interface.

Now that I think of it, how about using one-method interfaces:

type Decoder interface{
Decode(dst, src []byte) (nDst, nSrc int, d Decoder)
}

type Encoder interface{
Encode(dst, src []byte) (nDst, nSrc int, e Encoder)
}

type Encoding interface{
Decoder
Encoder
}

func NewReader(d Decoder, r io.Reader) *Reader
func NewWriter(e Encoder, w io.Writer) *Writer

Then if a package only implemented a one-way conversion, it could provide only a Decoder, and you could still use it to create a Reader. This would be more consistent with how interfaces are used in the io package, where you ask for a Reader if all you need to do is read, or a ReadCloser if you need to read and close.

> Encode(dst, src []byte) (nDst, nSrc int, enc Encoding)
> EncodeRune(p []byte, r rune) (n int, enc Encoding)
> }
>
> func NewReader(e Encoding, r io.Reader) *Reader
> func NewWriter(e Encoding, w io.Writer) *Writer
>
> The act of decoding (or encoding) can change the Encoding in use. For
> example, you could start with an endian-agnostic UTF-16 Encoding, and
> Decode could return an UTF-16 (Little Endian) Encoding on encountering
> a byte order mark.

I think this is a great way to handle state without requiring the user to create a separate encoder or decoder object to hold the state. It might actually simplify the code for stateful encodings a little too.

Andy Balholm

unread,

Jun 7, 2013, 2:34:22 PM6/7/13

to Marcel van Lohuizen, Nigel Tao, Rob 'Commander' Pike, roger peppe, Andy Balholm, golang-dev

On Jun 7, 2013, at 9:45 AM, Marcel van Lohuizen <mp...@google.com> wrote:

The Go Internationalization Design Overview doc defines a Transformer interface very similar to this one, but which includes an eof (terminate) as well. (normalization, bidi manipulation, case changing, transliteration, etc. are all Transformers.) There are some benefits to having all text-related packages use the same interface for translation-related functionality. In fact, this is one of the main reasons why I am reluctant to move unicode/norm to core; it may need to change slightly to adhere to such an interface.

Admittedly, the details of the Transformer interface in this doc could be improved (and the use is not consistent throughout the document). The concept seems very similar, though. It would be nice if we can have the encoders use the same interface. The most important difference is the returning of an Encoder interface in your proposal.

I can't access the design document, so I don't know exactly what the Transformer interface is like, but I suspect that if some method of signaling eof were added to the Encoding interface, the encoding package could provide a simple type that would wrap an Encoding, keeping track of encoding changes, and fulfill the Transformer interface.

Kyle Lemons

unread,

Jun 7, 2013, 4:17:53 PM6/7/13

to Nigel Tao, Rob 'Commander' Pike, mp...@golang.org, roger peppe, Andy Balholm, golang-dev

On Fri, Jun 7, 2013 at 2:33 AM, Nigel Tao <nige...@golang.org> wrote:

I've been thinking about a semi-official Go package to convert between
UTF-8 and other encodings (e.g. UTF-16, Windows-1252). It would live
in the go.text sub-repo, as
code.google.com/p/go.text/unicode/encoding.

The key ideas are that an Encoding is an interface (e.g. "package
big5; var Encoding encoding.Encoding = etc"), it best operates on
[]byte (although it can operate on rune), and it is stateless:

type Encoding interface {
// Decode converts encoded runes in src to UTF-8 bytes in dst.
It returns
// the number of dst bytes written, the number of src bytes
read, and the
// next encoding to use to decode the rest of the byte stream.
Decode(dst, src []byte) (nDst, nSrc int, enc Encoding)
DecodeRune(p []byte) (r rune, n int, enc Encoding)
Encode(dst, src []byte) (nDst, nSrc int, enc Encoding)
EncodeRune(p []byte, r rune) (n int, enc Encoding)
}

I'd also like to add in a +1 for making DecodeRune and EncodeRune a package function if anything and not part of the Encoding interface (unless I'm missing something and it won't be boilerplate code that's the same everywhere).

--

---
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Marcel van Lohuizen

unread,

Jun 8, 2013, 7:51:23 AM6/8/13

to Andy Balholm, Nigel Tao, Rob 'Commander' Pike, roger peppe, golang-dev

I've copied the document to golang.org so that it can be shared outside of Google. Here is the link:

https://docs.google.com/document/d/1Q64ktYh7XptpEI3L2G7xYqsusohOeLUft865Zd7fbGU/edit?usp=sharing

Nigel Tao

unread,

Jun 11, 2013, 2:03:15 AM6/11/13

to Marcel van Lohuizen, Andy Balholm, Rob 'Commander' Pike, roger peppe, golang-dev

Thanks for all the great feedback. Here's a refreshed API design
that's closer to mpvl's Transformer idea; de/encoders are no longer
'stateless'.

--------
package transform

// This package is in the go.text sub-repository, but it could
eventually move to
// the standard library as unicode/transform or even io/transform, as
nothing in it
// is specific to Unicode, unless we wanted to add ReadRune/WriteRune methods
// to the *Reader and *Writer type.

type Transformer interface {
Transform(dst, src []byte, eof bool) (nDst, nSrc int)
// TBD: a TransformString method as well?
}

// A *Reader has Read and ReadByte.
func NewReader(t Transformer, r io.Reader) *Reader

// A *Writer has Write, WriteByte, Flush and Close.
func NewWriter(t Transformer, w io.Writer) *Writer
--------

--------
package encoding

type Encoding interface {
NewDecoder() transform.Transformer
NewEncoder() transform.Transformer
}

// Package encoding has the UTF encodings, and 8-bit code pages,
// but ones backed by larger data tables are in e.g. encoding/big5.

var UTF8 Encoding = etc
var UTF16 Encoding = etc
var UTF16BE Encoding = etc
var UTF16LE Encoding = etc
var Windows1252 Encoding = etc
--------

I dropped the DecodeRune and EncodeRune methods. That meant that I
could then inline the Writer.encodeRune method that was previously
factored out of both, and on the current 6g, cutting out the function
call in the inner loop was dramatic. The before/after based on the
state of https://codereview.appspot.com/10085049 at the start of this
mail thread:
BenchmarkWriterThisPackage8K 10000 146956 ns/op
8261 B/op 2 allocs/op
BenchmarkWriterThisPackage8K 20000 80165 ns/op
8258 B/op 2 allocs/op

After doing that, I changed the API to be transformy, and we're now at
BenchmarkWriterThisPackage8K 20000 84358 ns/op
8258 B/op 2 allocs/op

Overall (and pulling in the go-charset fix):

$ go test -bench=. -benchmem
PASS

BenchmarkReaderGoCharset 10000 129519 ns/op 41157
B/op 6 allocs/op
BenchmarkReaderMahonia 10000 264952 ns/op 12565 B/op
7 allocs/op
BenchmarkReaderThisPackage 50000 64892 ns/op 8337
B/op 3 allocs/op
BenchmarkWriterGoCharset8K 10000 147423 ns/op 27277
B/op 7 allocs/op
BenchmarkWriterGoCharset64K 10000 147979 ns/op 28805
B/op 4 allocs/op
BenchmarkWriterMahonia8K 5000 350750 ns/op 17651
B/op 7 allocs/op
BenchmarkWriterMahonia64K 5000 354176 ns/op 28907
B/op 5 allocs/op
BenchmarkWriterThisPackage8K 20000 84358 ns/op
8258 B/op 2 allocs/op
BenchmarkWriterThisPackage64K 20000 84003 ns/op
8258 B/op 2 allocs/op
ok code.google.com/p/go.text/unicode/transform 19.560s

https://codereview.appspot.com/10085049 has the new code if anyone
wants to play with it. It's vastly under-commented.

Reply all

Reply to author

Forward