I know the topic of parsing markup in PHP came up, and I just found
something while hacking on something else that might be of interest to
some.
Parsing tag soup is a pain, so hkit uses various tricks (including
tidy) to try and make sure it's dealing with XML. Specifically in the
loadUrl method.
The PHP function DOMDocument->loadHTMLFile() appears to do this pretty
well (from a small amount of testing so far).
http://uk.php.net/manual/en/function.dom-domdocument-loadhtmlfile.php
http://uk.php.net/manual/en/function.dom-domdocument-loadhtml.php
All I'm using (for admittedly a pretty simple case) is:
$url = "http://google.com"
$xml = new DOMDocument();
$xml->loadHTMLFile($url);
If I get a chance I'll try and make the change to hkit and give it a
going over. If anyone knows a reason this won't work please let me
know.
Thanks
Gareth
--
Gareth Rushgrove
garethrushgrove.com
morethanseven.net
getjobsin.com
isitbirthday.com
Parsing tag soup is a pain, so hkit uses various tricks (including
tidy) to try and make sure it's dealing with XML. Specifically in the
loadUrl method.