What's wrong with my regex and how is it done?
> I thought this should work:
Don't use regular expressions to parse HTML, use the DOM[1].
Also, you didn't use delimiters in your regular expression. See
below.
> '<([a-zA-Z][a-zA-Z0-9]*)[^>]*>
This won't even match all valid (X)HTML tag names, as far as I can
tell (e.g., namespaces).
> (.*)
The dot metacharacter doesn't match newlines by default, so
<p>
foo bar
</p>
wouldn't match, even if your pattern parsed. Read the PCRE manual
in PHP's documentation, they list and explain the available
modifiers for handling situations such as this. Again, though,
you should use the DOM to parse HTML, not regular expressions.
> [^(</\1>)]'
You can't refer to backreferences within character classes, so the
characters have no special meaning. But I can see no reason to
use a negated character class here in the first place.
> but I just get an error: Warning: preg_match_all()
> [function.preg-match-all]: Unknown modifier ']' ....
PCRE regular expressions require delimiters[2] to be enclosed
around the pattern, so your regex isn't being parsed correctly.
____
[1] = <http://php.net/manual/en/domdocument.loadhtml.php>
[2] = <http://php.net/manual/en/regexp.reference.delimiters.php>
--
Curtis Dyer
<?$x='<?$x=%c%s%c;printf($x,39,$x,39);?>';printf($x,39,$x,39);?>
Use the Tidy extension
Rgds
Denis McMahon
You can use the DOM functions to *fix* invalid HTML but... can you
actually use it to *report* what's wrong?
(Of course, the OP probably doesn't really want to do the latter.)
--
-- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programación web: http://borrame.com
-- Mi web de humor satinado: http://www.demogracia.com
--
Thank you for your advise, but I can't use the Tidy extension.
I magaged to solve the problem, by using the DOMdocument.
Triar.
Thx for your advise. I managed to solve mij problem by using the
DOMdocument class.
Triar.
> El 19/06/2010 6:51, Curtis Dyer escribió/wrote:
>> Triar<tr...@spam.la> wrote:
>>
>>> I thought this should work:
>>
>> Don't use regular expressions to parse HTML, use the DOM[1].
>
> You can use the DOM functions to *fix* invalid HTML but... can
> you actually use it to *report* what's wrong?
Yes, with the help of libxml.
<?php
libxml_use_internal_errors(true);
$html = '<p>Learn 2 <em>close tags</p>';
$dom = new DOMDocument();
$msg = "HTML parse error: Line %d: %s\n";
if ($dom->loadHTML($html)) {
if ($err = libxml_get_last_error())
printf($msg, $err->line, $err->message);
else
echo "No parse errors.\n";
}
else {
echo "Unable to parse HTML.\n";
}
You can also get an array of LibXMLError objects with
libxml_get_errors(), so this example's a bit contrived.
Nice! Thanks for the code snippet.
--
-- http://alvaro.es - �lvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programaci�n web: http://borrame.com
for starters, you need a delimiter...