Message:
Hello golan...@googlegroups.com (cc: golan...@googlegroups.com),
I'd like you to review this change to
https://go.googlecode.com/hg/
Description:
xml: Parser hook for non-UTF-8 charset converters
Adds an optional hook to Parser to let charset
converters step in when a processing directive
with a non-UTF-8 encoding is specified.
(Open to alternative proposals too...)
Please review this at http://codereview.appspot.com/4437061/
Affected files:
M src/pkg/xml/xml.go
M src/pkg/xml/xml_test.go
The ByteReader function signature doesn't seem
quite right. I think you want a general reader instead.
For now it is fine to double-buffer in this case, so
CharsetReader func(charset string, r io.Reader) (io.Reader, os.Error)
and then pass the current bufio.Reader in and wrap
a new bufio.Reader on top of the returned result.
Or if you are worried about double buffering,
use b.Available(), b.Peek(), bytes.NewBuffer, and
io.MultiReader to get back to an unbuffered input.
Using the signature above means that someone could
use Roger's charset package directly too (possibly after
changing the name on the way in).
http://code.google.com/p/go-charset/source/browse/charset/charset.go
Russ
http://codereview.appspot.com/4437061/diff/9001/src/pkg/xml/xml.go#newcode171
src/pkg/xml/xml.go:171: CharsetReader func(charset string, input
io.Reader) io.Reader
CharsetReader should return io.Reader, os.Error like the one in
go-charset does.
Then the error below should be
"xml: using charset %q: %s", name, err
Obviously one of the most common errors is going to be
unknown character set, but there are other possible ones.
For example, if the character set implementation needs
to read a data file and cannot, it's much nicer to get
xml: using charset latin1: cannot open /usr/local/data/latin1:
permission denied
than
xml: unknown character set latin1
http://codereview.appspot.com/4437061/diff/2002/src/pkg/xml/xml.go
File src/pkg/xml/xml.go (right):
http://codereview.appspot.com/4437061/diff/2002/src/pkg/xml/xml.go#newcode509
src/pkg/xml/xml.go:509: p.err = fmt.Errorf("xml: getting CharsetReader
for encoding %q: %v", enc, err)
s/getting/opening/
http://codereview.appspot.com/4437061/diff/2002/src/pkg/xml/xml.go#newcode512
src/pkg/xml/xml.go:512: if newr == nil {
You can skip this one. We don't check that os.Open returns f == nil
after checking err != nil either.
http://codereview.appspot.com/4437061/diff/2002/src/pkg/xml/xml_test.go
File src/pkg/xml/xml_test.go (right):
http://codereview.appspot.com/4437061/diff/2002/src/pkg/xml/xml_test.go#newcode101
src/pkg/xml/xml_test.go:101: <?xml version="1.0"
encoding="x-testing-uppercase"?>
nice test
http://codereview.appspot.com/4437061/diff/2002/src/pkg/xml/xml_test.go#newcode200
src/pkg/xml/xml_test.go:200: if err == nil && c >= 'A' && c <= 'Z' {
c is still valid, so this can drop the err == nil.
Also in other places in the Go library this is usually written
if 'A' <= c && c <= 'Z' {
c += 'a' - 'A'
}
0x20 is a bit opaque (though it's better than c |= ' ')
http://codereview.appspot.com/4437061/diff/2002/src/pkg/xml/xml.go#newcode509
src/pkg/xml/xml.go:509: p.err = fmt.Errorf("xml: getting CharsetReader
for encoding %q: %v", enc, err)
On 2011/04/21 20:01:18, rsc wrote:
> s/getting/opening/
I went with that plus your original text:
p.err = fmt.Errorf("xml: opening charset %q: %v", enc, err)
http://codereview.appspot.com/4437061/diff/2002/src/pkg/xml/xml.go#newcode512
src/pkg/xml/xml.go:512: if newr == nil {
On 2011/04/21 20:01:18, rsc wrote:
> You can skip this one. We don't check that os.Open returns f == nil
after
> checking err != nil either.
I'd rather see this than a nil pointer crash, though.
If we don't allow nil, nil to be returned that means we need more docs
above. I'd rather have less docs and just support nil, nil without
crashing.
Allowing nil, nil is very non-idiomatic Go.
A func CharsetReader that returns that pair is broken.
The contract for all these routines is that if you
don't do what you were asked, you must return
an error.
Russ
http://codereview.appspot.com/4437061/diff/17001/src/pkg/xml/xml.go#newcode512
src/pkg/xml/xml.go:512: if newr == nil {
still here ?
http://codereview.appspot.com/4437061/diff/17001/src/pkg/xml/xml.gohttp://codereview.appspot.com/4437061/diff/17001/src/pkg/xml/xml.go#newcode512
File src/pkg/xml/xml.go (right):
still here ?
src/pkg/xml/xml.go:512: if newr == nil {
It's the same thing io.ReadFile is going
to do if os.Open returns nil, nil. Or the same
thing that xml will do if bufio.NewReader returns nil.
Broken functions cause crashes.
If you think this one is particularly likely and
merits the different error, it's fine. But it's not
typical unless there's a reason.
LGTM
xml: Parser hook for non-UTF-8 charset converters
Adds an optional hook to Parser to let charset
converters step in when a processing directive
with a non-UTF-8 encoding is specified.
(Open to alternative proposals too...)
R=rsc
CC=golang-dev
http://codereview.appspot.com/4437061