Charset conversion in Go language

1,221 views
Skip to first unread message

Kirill A. Shutemov

unread,
Jul 1, 2012, 11:51:04 PM7/1/12
to golan...@googlegroups.com, roger peppe
Hi,

Currently, there's no standard way to convert between charsets in Go.
It limits development of packages which already part of language: i.e.
package net/mail can only decode rfc2047-encoded text if it uses utf-8 or
iso8859-1 charsets.

There are few out-of-tree packages: go-charset, etc, but they are not
really useful for in-tree packages.

Is there any plan to implement charset conversion in Go language?

--
Kirill A. Shutemov

qis hao

unread,
Jul 2, 2012, 11:59:25 AM7/2/12
to golang-nuts
dont konw go team.but here is some character-set conversion library
implemented in pure Go.try:
go get github.com/axgle/mahonia

Brad Fitzpatrick

unread,
Jul 2, 2012, 1:21:16 PM7/2/12
to Kirill A. Shutemov, golan...@googlegroups.com, roger peppe
Note encoding/xml's Decoder.CharsetReader hook:


Then you could plug go-charset into it.  Something similar could be done in net/mail or wherever else it's needed.

Brad Fitzpatrick

unread,
Jul 2, 2012, 2:26:58 PM7/2/12
to Kirill A. Shutemov, golan...@googlegroups.com, roger peppe


On Mon, Jul 2, 2012 at 11:15 AM, Kirill A. Shutemov <kir...@shutemov.name> wrote:
On Mon, Jul 02, 2012 at 10:21:16AM -0700, Brad Fitzpatrick wrote:
> Note encoding/xml's Decoder.CharsetReader hook:
>
> http://golang.org/pkg/encoding/xml/#Decoder
>
> Then you could plug go-charset into it.  Something similar could be done in
> net/mail or wherever else it's needed.

Yeah. But it looks like a hack due missing feature in language, not like a
elegant solution.

In the language?  I think you mean standard library.

Surely you don't mean that the language itself should support every weird encoding.

But once you support one non-UTF-8 encoding in the standard library, you then invite requests to support all non-UTF-8 encodings, and that gets crazy.  It's also increasingly unnecessary as most things use Unicode encodings.  Having go-charset or other libraries provide io.Readers doesn't seem like a hack to me any more than the crypto/tls package providing a io.Reader speaking TLS seems like a hack.  It actually seems quite nice.

Kirill A. Shutemov

unread,
Jul 2, 2012, 2:15:54 PM7/2/12
to Brad Fitzpatrick, golan...@googlegroups.com, roger peppe
On Mon, Jul 02, 2012 at 10:21:16AM -0700, Brad Fitzpatrick wrote:
> Note encoding/xml's Decoder.CharsetReader hook:
>
> http://golang.org/pkg/encoding/xml/#Decoder
>
> Then you could plug go-charset into it. Something similar could be done in
> net/mail or wherever else it's needed.

Yeah. But it looks like a hack due missing feature in language, not like a
elegant solution.

--
Kirill A. Shutemov

Kirill A. Shutemov

unread,
Jul 2, 2012, 3:21:39 PM7/2/12
to Brad Fitzpatrick, golan...@googlegroups.com, roger peppe
On Mon, Jul 02, 2012 at 11:26:58AM -0700, Brad Fitzpatrick wrote:
> On Mon, Jul 2, 2012 at 11:15 AM, Kirill A. Shutemov <kir...@shutemov.name>wrote:
>
> > On Mon, Jul 02, 2012 at 10:21:16AM -0700, Brad Fitzpatrick wrote:
> > > Note encoding/xml's Decoder.CharsetReader hook:
> > >
> > > http://golang.org/pkg/encoding/xml/#Decoder
> > >
> > > Then you could plug go-charset into it. Something similar could be done
> > in
> > > net/mail or wherever else it's needed.
> >
> > Yeah. But it looks like a hack due missing feature in language, not like a
> > elegant solution.
>
>
> In the language? I think you mean standard library.

Sure I mean standard library. Standard library is part of Go 1, isn't it?
>
> Surely you don't mean that the language itself should support every weird
> encoding.
>
> But once you support one non-UTF-8 encoding in the standard library, you
> then invite requests to support all non-UTF-8 encodings, and that gets
> crazy. It's also increasingly unnecessary as most things use Unicode
> encodings. Having go-charset or other libraries provide io.Readers doesn't
> seem like a hack to me any more than the crypto/tls package providing a
> io.Reader speaking TLS seems like a hack. It actually seems quite nice.

go-charset is not a hack, but hooks like CharsetReader in standard library
is a hack to workaround missing functionality in stdlib.

--
Kirill A. Shutemov
Reply all
Reply to author
Forward
0 new messages