Good Folk,
To learn about Go, I thought I'd start with a topic I knew something
about elsewhere, XML-parsing. With a copy of the XML Conformance Test
Suites (the 20080827 suite from
http://www.w3.org/XML/Test/) and go-
lang.org in hand, I have produced the following program,
parsefiles.go. It attempts to read a list of XML files, and tries to
decide if the file should successfully parse, and if it did.
I chose the Conformance Suite because it presents a lot of different
ways XML can be valid and invalid, and I wanted to try to use that to
see how xml.Parser behaves. I'm a little unclear as to whether I'm
using xml.Parser correctly for this exact purpose: RawToken() may not
be what to use for "parse this file to see if it violates any
constraints". Uses of the xml package I see elsewhere are usually
Unmarshal() to populate a data-structure (src/cmd/godoc/codewalk.go
for instance). The xml package's tests also are of a different
flavour that what I've written here. So I'm wary of this that I've
written and the results.
Is there any interest in the full results (off list, say, since
there's quite a lot...)?
I'm pretty sure I've not gotten Go idioms yet. I solicit commentary
on the code itself.
cheers,
Nigel Kerr
nigel...@gmail.com
package main
// parsefiles.go, attempt to run files past xml.Parser, checking
// expectations as to whether the file should parse or not.
import (
"bufio"
"flag"
"fmt"
"os"
"strings"
"xml"
)
func parserPassesTest(fname string, resultType string) (should bool,
did bool, mesg string) {
var errorstring string = ""
var shoulderror bool = false
if resultType != "valid" {
shoulderror = true
}
var sawerror bool = false
defer func() {
if r := recover(); r != nil {
mesg = fmt.Sprintf("recovered from a panic: %v", r)
should = shoulderror
did = true
}
}()
f, err := os.Open(fname, 0, 0)
defer f.Close()
if err != nil {
panic(err)
}
p := xml.NewParser(f)
for {
pt, perr := p.RawToken()
if perr != nil {
if perr != os.EOF {
errorstring = fmt.Sprintf("%v", perr)
sawerror = true
}
break
}
if pt == nil {
break
}
}
return shoulderror, sawerror, errorstring
}
func main() {
flag.Parse()
if flag.NArg() != 1 {
fmt.Fprintf(os.Stderr, "parsefile: need one file on command line.
\n")
os.Exit(1)
}
dat, derr := os.Open(flag.Arg(0), 0, 0)
if derr != nil {
panic(derr)
}
defer dat.Close()
br := bufio.NewReader(dat)
fmt.Fprintf(os.Stdout, "FILE\tSHOULD_ERROR\tDID_ERROR\tMESG\n")
for {
line, err := br.ReadString('\n')
if err != nil {
if err != os.EOF {
fmt.Fprintf(os.Stderr, "reading %s: %s\n", flag.Arg(0), err)
}
break
}
pieces := strings.Split(strings.TrimSpace(line), "\t", -1)
shouldError, didError, errMesg := parserPassesTest(pieces[0],
pieces[1])
if shouldError != didError {
fmt.Fprintf(os.Stdout, "%s\t%v\t%v\t%s\n", pieces[0], shouldError,
didError, errMesg)
}
}
}
// end of parsefiles.go
which takes as its sole filename argument a filename for a file that
contains lines like this:
eduni/errata-2e/E57.xml error
eduni/errata-2e/E60.xml valid
eduni/errata-2e/E61.xml not-wf
eduni/errata-3e/E05a.xml valid
eduni/errata-3e/E05b.xml valid
(relative path to a test file in the conformance suite TAB test-type
value for that test file (valid, invalid, error, not-wf)) The test
suite has 2,411 such test files by my count, and I don't include my
whole file here. The results for those above five lines, showing that
three behaved as expected, and two of them were expected to fail, but
did not:
FILE SHOULD_ERROR DID_ERROR MESG
eduni/errata-2e/E57.xml true false
eduni/errata-2e/E61.xml true false