Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

HTMLParse - Tutorials and Practical Examples

448 views
Skip to first unread message

Fernando Quinones

unread,
Sep 14, 2006, 10:25:42 AM9/14/06
to
Group,
Hello, i'm looking for a tutorial and some practical examples using
htmlparse. I have been recommended to use that module a couple times
already but I cant find much examples out there. The only meaningful
pages I have found are:
http://wiki.tcl.tk/2204
http://tcllib.sourceforge.net/doc/htmlparse.html

Yet they do not have or are what I need to learn about it. Probably it
is enough for most of you but not me. Any help will be appreciated.

Thanks!
Fernando

Michael Schlenker

unread,
Sep 14, 2006, 10:39:31 AM9/14/06
to

I don't know of a tutorial for htmlparse either, but the package is
basically quite easy to use if you understand the basic options:

1. SAX like parsing
You can define callbacks that are called whenever an opening tag or
closing tag is encountered and act accordingly. This works well if you
have either large page and need to keep memory use low, or when you have
very simple pages, like huge lists or tables with not much structure,
where you just need to skip some intro and then parse the rows or items.
Its also the way to go for parsing things like html based chatrooms and
things like that.

Usually you have something like a state machine or things like that to
handle parsing.

2. DOM like parsing
This transforms the document into a Tcllib struct::tree object, which
can then dissect as you like with the usual struct::tree methods or the
treeql module.

Depending on your needs and the html involved you can also use other tcl
based html parsers like tdom (www.tdom.org) with the -html option,
tclwebtest (tclwebtest.sourceforge.net) which has more support for
interacting with forms but a less robust parser and is slow, tkhtml3,
which is a html display widget, which provides access to its parse tree
(tkhtml.tcl.tk).

Michael

Bryan Oakley

unread,
Sep 14, 2006, 10:54:54 AM9/14/06
to

One way htmlparse works is that it scans the HTML and calls a procedure
for every tag that it finds. You can then customize that procedure to do
whatever you want with the data.

Here's a simple example that just prints out the arguments of each call
to the parse command:

set html {
<html>
<head><title>Sample HTML</title></head>
<body>
<H1>Heading 1</H1>
<p>Hello, world</p>
<img src="example.gif"></img>
<p><b>bold</b>, <i>italics</i>, <u>underline</u></p>
</body>
</html>
}

proc parseCommand {args} {
foreach {tag slash param text} $args {break}
puts "=> tag='$tag' slash='$slash' param='$param' text='$text'"
}

package require htmlparse

htmlparse::parse -cmd parseCommand $html

Fernando Quinones

unread,
Sep 15, 2006, 12:00:30 PM9/15/06
to

Ok, thanks! I will look into it!

Fernando

0 new messages