XML newline escaping

1,236 views
Skip to first unread message

patrick...@gmail.com

unread,
Aug 15, 2017, 8:38:32 AM8/15/17
to golang-nuts
Hi all!

I simply want the XML files my program outputs to not have newline characters escaped into 
. My input XML files contain newlines and I want to preserve them.

This snippet shows my issue / inconsistency: https://play.golang.org/p/etosjWbkLn

I've researched and found that encoding/xml's EncodeToken on a CharData Token treats the newline properly. But Encoder.Encode(), which is my standard use case of just writing to a file, escapes newlines.

I've dug through the documentation, google searches, and the encoding/xml source code and tests but can't find out how I'm supposed to preserve newlines for just encoding the data in my struct. This feels like a bug / inconsistency, but I'm also not super familiar with the "go" way of doing things.

Help is much appreciated, thanks!

Lutz Horn

unread,
Aug 15, 2017, 2:53:24 PM8/15/17
to golan...@googlegroups.com
Hi,

Am 15.08.17 um 08:31 schrieb patrick...@gmail.com:
> I simply want the XML files my program outputs to not have newline
> characters escaped into 
. My input XML files contain newlines and I
> want to preserve them.

The 
 is the XML entity for LF. This is correct XML which contains
exactly what you put into it:

<Content><Elt>line 1&#xA;line 2</Elt></Content>

Any XML parser will be able to handle the LF. For example, xmllint does:

> $ xmllint -format content.xml
> <?xml version="1.0"?>
> <Content>
> <Elt>line 1
> line 2</Elt>
> </Content>

So there is no but that needs to be fixed.
xml.NewEncoder(os.Stdout).Encode() works as expected and produces output
that is valid.

Lutz

patrick...@gmail.com

unread,
Aug 16, 2017, 4:06:58 AM8/16/17
to golang-nuts, lutz...@posteo.de

That makes sense, and in fact my XML output with &#xA;s gets parsed later correctly. My personal use case has input XML files with literal LF characters, and it's frustrating that when my Go program modifies that file, they're all converted, which makes my git commit including that XML file messy. After your explanation though I've decided that's just me being petty.

I've also found a solution though! Implement io.PipeWriter such that all instances of &#xA; are replaced with LF.

type MyWriter struct {
    File *os.File
}
func (w MyWriter) Close() error { return w.File.Close() }
func (w MyWriter) CloseWithError(err error) error { return nil }
func (w MyWriter) Write(data []byte) (n int, err error) {
    n = len(data)
    data = bytes.Replace(data, []byte("&#xA;"), []byte("\n"), -1)
    _, err = w.File.Write(data)
    return
}

then later...
f, _ = os.Create(...)
w = MyWriter{File:f}
encoder := xml.NewEncoder(w)
_ = encoder.Encode(...)

Thanks for the response!

Patrick
Reply all
Reply to author
Forward
0 new messages