Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Parse a html file as a XML file

0 views
Skip to first unread message

Stan SR

unread,
Jan 19, 2008, 3:35:20 AM1/19/08
to
Hi,

I need to read a html file and parse it as a XML File.

All my html file have this structure.
<html>
<head>
<title>
</title>
<script language="javascript">
</script>
</head>
<body>
</body>
</html>

My code has to read some sections (title, script, body).
Everything works when the script language (javascript code) section has not
code or not a lot, but sometimes it fails when there are characters like ;
(especially in "for" statement).
So for that works, I had to add "decorate" the script section with
<![CDATA[ ]]> and it looks like

<script language="javascript">
<![CDATA[

]]>
</script>

Is there a way to parse the file without using the <![CDATA[ ]]> tag ?

Stan


Cowboy (Gregory A. Beamer)

unread,
Jan 19, 2008, 11:56:57 AM1/19/08
to
Try <!-- and -->, which is a standard practice. I imagine some parsers will
still puke on this methodology, but it should solve the major issue.

Can you solve this without doing anything? Probably not. It is the nature of
freeform sections, which XML does not understand the same way HTML parsers
do, as the rules are more strict.

--
Gregory A. Beamer
MVP, MCP: +I, SE, SD, DBA

*************************************************
| Think outside the box!
|
*************************************************
"Stan SR" <st...@pasdepam.netsunset.com> wrote in message
news:eQnv7YnW...@TK2MSFTNGP05.phx.gbl...

Peter Bromberg [C# MVP]

unread,
Jan 19, 2008, 12:56:00 PM1/19/08
to
You could try using Simon Mourier's "HtmlAgilityPack", which can be found on
codeplex.com.
It uses the concept of HtmlDocument class which parses the HTML of the page
into an XPATH conformant document object that works "just like" XmlDocument.
-- Peter
Site: http://www.eggheadcafe.com
UnBlog: http://petesbloggerama.blogspot.com
MetaFinder: http://www.blogmetafinder.com
0 new messages