I am trying to write an XML parser for JMdict_e as downloadable here:
http://ftp.monash.edu.au/pub/nihongo/JMdict_e.gz (5.7 megabytes)
I already have an Expat-based parser for this file in C. As an
experiment, I tried to make a Go version of it.
When I try to run the program as follows, I get this error message:
error occurred XML syntax error on line 381: invalid character entity
&n;
The entity is defined in the file.
Here is the offending input:
----------
<entry>
<ent_seq>1000000</ent_seq>
<r_ele>
<reb>ヽ</reb>
</r_ele>
<r_ele>
<reb>くりかえし</reb>
</r_ele>
<sense>
<pos>&n;</pos> -------------------- line 381
<gloss>repetition mark in katakana</gloss>
</sense>
</entry>
----------
Here is the program:
-----------
package main
import (
"fmt"
"xml"
"os"
)
type Entry struct {
ent_seq string "chardata"
}
func main () {
jmdict_file := "/share/projects/j2e/dict/JMdict_e"
src, err := os.Open (jmdict_file, os.O_RDONLY, 0)
defer src.Close ()
if err != nil {
return
}
var entry Entry
for {
err := xml.Unmarshal(src, & entry)
if err != nil {
fmt.Printf ("error occurred %s\n", err);
break
}
fmt.Printf ("%s\n", entry.ent_seq);
}
}
------------
Any suggestions about how to go about this?