code review 4437061: xml: Parser hook for non-UTF-8 charset converters (issue4437061)

117 views
Skip to first unread message

brad...@golang.org

unread,
Apr 20, 2011, 9:22:20 PM4/20/11
to golan...@googlegroups.com, golan...@googlegroups.com, re...@codereview.appspotmail.com
Reviewers: golang-dev_googlegroups.com,

Message:
Hello golan...@googlegroups.com (cc: golan...@googlegroups.com),

I'd like you to review this change to
https://go.googlecode.com/hg/


Description:
xml: Parser hook for non-UTF-8 charset converters

Adds an optional hook to Parser to let charset
converters step in when a processing directive
with a non-UTF-8 encoding is specified.

(Open to alternative proposals too...)

Please review this at http://codereview.appspot.com/4437061/

Affected files:
M src/pkg/xml/xml.go
M src/pkg/xml/xml_test.go


Russ Cox

unread,
Apr 21, 2011, 7:52:59 AM4/21/11
to brad...@golang.org, golan...@googlegroups.com, re...@codereview.appspotmail.com
Seems reasonable.

The ByteReader function signature doesn't seem
quite right. I think you want a general reader instead.
For now it is fine to double-buffer in this case, so

CharsetReader func(charset string, r io.Reader) (io.Reader, os.Error)

and then pass the current bufio.Reader in and wrap
a new bufio.Reader on top of the returned result.
Or if you are worried about double buffering,
use b.Available(), b.Peek(), bytes.NewBuffer, and
io.MultiReader to get back to an unbuffered input.

Using the signature above means that someone could
use Roger's charset package directly too (possibly after
changing the name on the way in).
http://code.google.com/p/go-charset/source/browse/charset/charset.go

Russ

brad...@golang.org

unread,
Apr 21, 2011, 1:36:03 PM4/21/11
to golan...@googlegroups.com, r...@golang.org, golan...@googlegroups.com, re...@codereview.appspotmail.com

Brad Fitzpatrick

unread,
Apr 21, 2011, 1:38:10 PM4/21/11
to r...@golang.org, golan...@googlegroups.com, re...@codereview.appspotmail.com
Done.  For some reason I thought NewParser took a ByteReader which led me down that path.

Now that I see I can safely do parser.r.(io.Reader), this works great.

PTAL.  Added another test too.

r...@golang.org

unread,
Apr 21, 2011, 3:10:40 PM4/21/11
to brad...@golang.org, golan...@googlegroups.com, re...@codereview.appspotmail.com

http://codereview.appspot.com/4437061/diff/9001/src/pkg/xml/xml.go
File src/pkg/xml/xml.go (right):

http://codereview.appspot.com/4437061/diff/9001/src/pkg/xml/xml.go#newcode171
src/pkg/xml/xml.go:171: CharsetReader func(charset string, input
io.Reader) io.Reader
CharsetReader should return io.Reader, os.Error like the one in
go-charset does.
Then the error below should be

"xml: using charset %q: %s", name, err

Obviously one of the most common errors is going to be
unknown character set, but there are other possible ones.
For example, if the character set implementation needs
to read a data file and cannot, it's much nicer to get

xml: using charset latin1: cannot open /usr/local/data/latin1:
permission denied

than

xml: unknown character set latin1

http://codereview.appspot.com/4437061/

brad...@golang.org

unread,
Apr 21, 2011, 3:56:18 PM4/21/11
to r...@golang.org, golan...@googlegroups.com, re...@codereview.appspotmail.com
Hello rsc (cc: golan...@googlegroups.com),

r...@golang.org

unread,
Apr 21, 2011, 4:01:18 PM4/21/11
to brad...@golang.org, golan...@googlegroups.com, re...@codereview.appspotmail.com
LGTM

http://codereview.appspot.com/4437061/diff/2002/src/pkg/xml/xml.go
File src/pkg/xml/xml.go (right):

http://codereview.appspot.com/4437061/diff/2002/src/pkg/xml/xml.go#newcode509
src/pkg/xml/xml.go:509: p.err = fmt.Errorf("xml: getting CharsetReader
for encoding %q: %v", enc, err)
s/getting/opening/

http://codereview.appspot.com/4437061/diff/2002/src/pkg/xml/xml.go#newcode512
src/pkg/xml/xml.go:512: if newr == nil {
You can skip this one. We don't check that os.Open returns f == nil
after checking err != nil either.

http://codereview.appspot.com/4437061/diff/2002/src/pkg/xml/xml_test.go
File src/pkg/xml/xml_test.go (right):

http://codereview.appspot.com/4437061/diff/2002/src/pkg/xml/xml_test.go#newcode101
src/pkg/xml/xml_test.go:101: <?xml version="1.0"
encoding="x-testing-uppercase"?>
nice test

http://codereview.appspot.com/4437061/diff/2002/src/pkg/xml/xml_test.go#newcode200
src/pkg/xml/xml_test.go:200: if err == nil && c >= 'A' && c <= 'Z' {
c is still valid, so this can drop the err == nil.
Also in other places in the Go library this is usually written

if 'A' <= c && c <= 'Z' {
c += 'a' - 'A'
}

0x20 is a bit opaque (though it's better than c |= ' ')

http://codereview.appspot.com/4437061/

brad...@golang.org

unread,
Apr 21, 2011, 4:07:36 PM4/21/11
to r...@golang.org, golan...@googlegroups.com, re...@codereview.appspotmail.com

brad...@golang.org

unread,
Apr 21, 2011, 4:07:44 PM4/21/11
to r...@golang.org, golan...@googlegroups.com, re...@codereview.appspotmail.com

http://codereview.appspot.com/4437061/diff/2002/src/pkg/xml/xml.go#newcode509
src/pkg/xml/xml.go:509: p.err = fmt.Errorf("xml: getting CharsetReader
for encoding %q: %v", enc, err)

On 2011/04/21 20:01:18, rsc wrote:
> s/getting/opening/

I went with that plus your original text:

p.err = fmt.Errorf("xml: opening charset %q: %v", enc, err)

On 2011/04/21 20:01:18, rsc wrote:
> You can skip this one. We don't check that os.Open returns f == nil
after
> checking err != nil either.

I'd rather see this than a nil pointer crash, though.

If we don't allow nil, nil to be returned that means we need more docs
above. I'd rather have less docs and just support nil, nil without
crashing.

http://codereview.appspot.com/4437061/

Russ Cox

unread,
Apr 21, 2011, 4:11:59 PM4/21/11
to brad...@golang.org, r...@golang.org, golan...@googlegroups.com, re...@codereview.appspotmail.com
> I'd rather see this than a nil pointer crash, though.
>
> If we don't allow nil, nil to be returned that means we need more docs
> above.  I'd rather have less docs and just support nil, nil without
> crashing.

Allowing nil, nil is very non-idiomatic Go.
A func CharsetReader that returns that pair is broken.
The contract for all these routines is that if you
don't do what you were asked, you must return
an error.

Russ

brad...@golang.org

unread,
Apr 21, 2011, 4:18:16 PM4/21/11
to r...@golang.org, golan...@googlegroups.com, re...@codereview.appspotmail.com

Brad Fitzpatrick

unread,
Apr 21, 2011, 4:18:57 PM4/21/11
to r...@golang.org, golan...@googlegroups.com, re...@codereview.appspotmail.com
I'll keep that in mind. code/docs updated. PTAL.

r...@golang.org

unread,
Apr 21, 2011, 4:40:55 PM4/21/11
to brad...@golang.org, golan...@googlegroups.com, re...@codereview.appspotmail.com

Brad Fitzpatrick

unread,
Apr 21, 2011, 5:00:36 PM4/21/11
to brad...@golang.org, r...@golang.org, golan...@googlegroups.com, re...@codereview.appspotmail.com
On Thu, Apr 21, 2011 at 1:40 PM, <r...@golang.org> wrote:

http://codereview.appspot.com/4437061/diff/17001/src/pkg/xml/xml.go

File src/pkg/xml/xml.go (right):

http://codereview.appspot.com/4437061/diff/17001/src/pkg/xml/xml.go#newcode512

src/pkg/xml/xml.go:512: if newr == nil {
still here ?
 
But with a panic now.  Not really accepted like it was before.  You'd prefer an obscure nil deference crash in the future?

Russ Cox

unread,
Apr 21, 2011, 5:09:14 PM4/21/11
to Brad Fitzpatrick, golan...@googlegroups.com, re...@codereview.appspotmail.com
> But with a panic now.  Not really accepted like it was before.  You'd prefer
> an obscure nil deference crash in the future?

It's the same thing io.ReadFile is going
to do if os.Open returns nil, nil. Or the same
thing that xml will do if bufio.NewReader returns nil.
Broken functions cause crashes.

If you think this one is particularly likely and
merits the different error, it's fine. But it's not
typical unless there's a reason.

Brad Fitzpatrick

unread,
Apr 21, 2011, 5:21:13 PM4/21/11
to r...@golang.org, golan...@googlegroups.com, re...@codereview.appspotmail.com
I treat this one differently because it's something a user passes in. If it's an internal detail I'm fine exploding whenever, but I don't want to see a bug report against the xml package with a stack trace looking like it's coming from the core xml code that's actually the result of somebody elsewhere passing in something not obeying the contract. I'd rather assign blame early and avoid the confusing bug report.

Russ Cox

unread,
Apr 21, 2011, 5:34:04 PM4/21/11
to Brad Fitzpatrick, re...@codereview.appspotmail.com, r...@golang.org, golan...@googlegroups.com

LGTM

brad...@golang.org

unread,
Apr 21, 2011, 5:37:35 PM4/21/11
to brad...@golang.org, r...@golang.org, golan...@googlegroups.com, re...@codereview.appspotmail.com
*** Submitted as
http://code.google.com/p/go/source/detail?r=9f771f92ac93 ***

xml: Parser hook for non-UTF-8 charset converters

Adds an optional hook to Parser to let charset
converters step in when a processing directive
with a non-UTF-8 encoding is specified.

(Open to alternative proposals too...)

R=rsc
CC=golang-dev
http://codereview.appspot.com/4437061


http://codereview.appspot.com/4437061/

Reply all
Reply to author
Forward
0 new messages