Yet they do not have or are what I need to learn about it. Probably it
is enough for most of you but not me. Any help will be appreciated.
Thanks!
Fernando
I don't know of a tutorial for htmlparse either, but the package is
basically quite easy to use if you understand the basic options:
1. SAX like parsing
You can define callbacks that are called whenever an opening tag or
closing tag is encountered and act accordingly. This works well if you
have either large page and need to keep memory use low, or when you have
very simple pages, like huge lists or tables with not much structure,
where you just need to skip some intro and then parse the rows or items.
Its also the way to go for parsing things like html based chatrooms and
things like that.
Usually you have something like a state machine or things like that to
handle parsing.
2. DOM like parsing
This transforms the document into a Tcllib struct::tree object, which
can then dissect as you like with the usual struct::tree methods or the
treeql module.
Depending on your needs and the html involved you can also use other tcl
based html parsers like tdom (www.tdom.org) with the -html option,
tclwebtest (tclwebtest.sourceforge.net) which has more support for
interacting with forms but a less robust parser and is slow, tkhtml3,
which is a html display widget, which provides access to its parse tree
(tkhtml.tcl.tk).
Michael
One way htmlparse works is that it scans the HTML and calls a procedure
for every tag that it finds. You can then customize that procedure to do
whatever you want with the data.
Here's a simple example that just prints out the arguments of each call
to the parse command:
set html {
<html>
<head><title>Sample HTML</title></head>
<body>
<H1>Heading 1</H1>
<p>Hello, world</p>
<img src="example.gif"></img>
<p><b>bold</b>, <i>italics</i>, <u>underline</u></p>
</body>
</html>
}
proc parseCommand {args} {
foreach {tag slash param text} $args {break}
puts "=> tag='$tag' slash='$slash' param='$param' text='$text'"
}
package require htmlparse
htmlparse::parse -cmd parseCommand $html
Ok, thanks! I will look into it!
Fernando