go.net/html parsing html tags and getting text

4,123 views
Skip to first unread message

t0

unread,
Jan 8, 2014, 4:00:27 PM1/8/14
to golan...@googlegroups.com
I can't seem to figure out how to get the text data between tags. I understand how to get it from the attributes but what about tags with empty non-existent attributes?


 https://godoc.org/code.google.com/p/go.net/html


s := `<p>Links:</p><ul><li><a href="foo">Foo</a><li><a href="/bar/baz">BarBaz</a></ul><span>TEXT I WANT</span>`
doc, err := html.Parse(strings.NewReader(s))
if err != nil {
    log.Fatal(err)
}
var f func(*html.Node)
f = func(n *html.Node) {
    if n.Type == html.ElementNode && n.Data == "span" {
        //?????
        }



 } for c := n.FirstChild; c != nil; c = c.NextSibling { f(c) } } f(doc)

Nigel Tao

unread,
Jan 8, 2014, 6:32:42 PM1/8/14
to t0, golang-nuts
On Thu, Jan 9, 2014 at 8:00 AM, t0 <cod...@gmail.com> wrote:
> I can't seem to figure out how to get the text data between tags.

Text is the Data of TextNodes, not a property of ElementNodes. A HTML
element can contain more than one text node. Note that I put a <b>
element in your <span> element in the code below.

s := `<p>Links:</p><ul><li><a href="foo">Foo</a><li><a
href="/bar/baz">BarBaz</a></ul><span>TEXT <b>I</b> WANT</span>`
doc, err := html.Parse(strings.NewReader(s))
if err != nil {
log.Fatal(err)
}
var f func(*html.Node, bool)
f = func(n *html.Node, printText bool) {
if printText && n.Type == html.TextNode {
fmt.Printf("%q\n", n.Data)
}
printText = printText || (n.Type == html.ElementNode && n.Data == "span")
for c := n.FirstChild; c != nil; c = c.NextSibling {
f(c, printText)
}
}
f(doc, false)

t0

unread,
Jan 9, 2014, 1:51:27 AM1/9/14
to golan...@googlegroups.com, t0
Ok now it makes sense, thanks. I was thinking it treated sub nodes as plain text until next one is called.

cuban...@gmail.com

unread,
Aug 11, 2016, 11:16:01 PM8/11/16
to golang-nuts, cod...@gmail.com
Awesome. Thanks. This is the only threat I have seen that answer this kind of question. 


Reply all
Reply to author
Forward
0 new messages