The issue with empty elements

30 views
Skip to first unread message

Billy Dale

unread,
Jun 5, 2013, 10:45:27 PM6/5/13
to support-...@googlegroups.com
HI there, it seems qp and htmlqp does not play nice with empty tags, I currently use this to fix them so querypath will parse them properly:

$response= str_replace('><', '> <', $response);

otherwise tags like this: <div></div> get turned into this: <div/> which in a perfect world should be ok, but evidently not in many cases :(

but I don't like these greedy type find and replace as it can bring other problems with it. At this time the above seems to be working but I am wondering if there are plans for a fix.

Thanks, QP is great!

TechnoSophos

unread,
Jun 5, 2013, 11:43:36 PM6/5/13
to support-...@googlegroups.com
Here are the painstaking details:

- In XML, collapsing is allowed on all empty elements.
- In HTML4, collapsing is not really allowed.
- In XHTML1, collapsing is supposed to follow the XML rules, but (as you noticed) actually doesn't.
- In HTML5, collapsing is allowed on only certain elements.

In theory, htmlqp() should work, but the libxml HTML parser is so buggy that this isn't always the case. (QueryPath's parser is actually the libxml one that comes with PHP.)

Today I checked in the branch `feature/html5`, which will allow you to choose a full HTML5 parser that should obey the collapsing rules. It probably won't be stable for a little while, though. The new parser is this one: https://github.com/Masterminds/html5-php

The workarounds I have seen are basically like yours (sometimes using <!-- comments --> instead of a space).

So... I think this will get fixed for QueryPath 3.1. But right now it falls into the category of "known and *really* annoying bug".


--
You received this message because you are subscribed to the Google Groups "support-querypath" group.
To unsubscribe from this group and stop receiving emails from it, send an email to support-queryp...@googlegroups.com.
To post to this group, send email to support-...@googlegroups.com.
Visit this group at http://groups.google.com/group/support-querypath?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
http://technosophos.com
http://querypath.org
Reply all
Reply to author
Forward
0 new messages