Re: Can I set QueryPath not to modify (optimise?) HTML?

101 views
Skip to first unread message

TechnoSophos

unread,
Mar 27, 2013, 4:52:27 PM3/27/13
to support-...@googlegroups.com
What are you using to generate the output? writeHTML() and html() should create valid HTML 4.x. writeXML() and xml() will output XHTML. PHP still lags behind on HTML 5, so outputting HTML 5 currently requires post-processing.

I've looked at a few HTML5 libraries for PHP, but haven't found one that uses DOM. I'd love to get HTML 5 support into QP, though. So if you have any recommendations….

Matt

-- 
TechnoSophos
Twitter: @technosophos
Sent with Sparrow

On Wednesday, March 27, 2013 at 1:28 PM, Vic Ktor wrote:

Hi Matt,

I'm trying libraries for manipulating HTML for few days and I think QueryPath is the best! Great Job!

I'm parsing with it a fragment of HTML code. The problem is that it makes some optimisations to the code that are not valid in HTML.

For example an empty div like this:

    <div id="some-id"></div>

after working with QueryPath, it becomes:

    <div id="some-id" />

This may be valid in XHTML but for sure is not valid in HTML. I just want to leave it unchanged. Is it possible? I tried with both function htmlqp() and qp() but the result is the same.

Thank you!

--
You received this message because you are subscribed to the Google Groups "support-querypath" group.
To unsubscribe from this group and stop receiving emails from it, send an email to support-queryp...@googlegroups.com.
To post to this group, send email to support-...@googlegroups.com.
Visit this group at http://groups.google.com/group/support-querypath?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Vic Ktor

unread,
Mar 27, 2013, 6:50:38 PM3/27/13
to support-...@googlegroups.com
echo qp('<div id="a"><div id="b"></div></div>', '#b')->html(); 

    will output this:

<div id="b"/> and I believe this is not valid HTML 4.x.

I've tried also Ganon. It would be my second choice. It is also fast but it doesn't have so much functions as QueryPath. They say it supports HTML5. 

TechnoSophos

unread,
Mar 27, 2013, 6:55:46 PM3/27/13
to support-...@googlegroups.com
Out of curiosity, what does it output if you use writeHTML()? It will probably try to add some boilerplate, but does it also collapse the DIV?

-- 
TechnoSophos
Twitter: @technosophos
Sent with Sparrow

Adam Docherty

unread,
Mar 27, 2013, 9:11:55 PM3/27/13
to support-...@googlegroups.com
It always does this with empty elements from what I have seen, well not completely sure if it is "always", but I have been having to add a space to elements to make them work.

On 3/27/13 7:50 PM, Vic Ktor wrote:
echo qp('<div id="a"><div id="b"></div></div>', '#b')->html();�

� � will output this:

<div id="b"/> and I believe this is not valid HTML 4.x.

I've tried also Ganon. It would be my second choice. It is also fast but it doesn't have so much functions as QueryPath. They say it supports HTML5.�

On Wednesday, March 27, 2013 10:52:27 PM UTC+2, Matt wrote:
What are you using to generate the output? writeHTML() and html() should create valid HTML 4.x. writeXML() and xml() will output XHTML. PHP still lags behind on HTML 5, so outputting HTML 5 currently requires post-processing.

I've looked at a few HTML5 libraries for PHP, but haven't found one that uses DOM. I'd love to get HTML 5 support into QP, though. So if you have any recommendations�.

Matt

--�
TechnoSophos
Twitter: @technosophos
Sent with Sparrow

On Wednesday, March 27, 2013 at 1:28 PM, Vic Ktor wrote:

Hi Matt,

I'm trying libraries for manipulating HTML�for few days�and I think�QueryPath�is the best! Great Job!

I'm parsing with it a fragment of HTML code. The problem is that�it makes some optimisations to the code that are not valid in HTML.

For example an empty div like this:

� � <div id="some-id"></div>

after working with QueryPath, it becomes:

� � <div id="some-id" />

This may be valid in XHTML but for sure is not valid in HTML. I just want to leave it unchanged. Is it possible? I tried with both function�htmlqp() and qp() but the result is the same.

Thank you!
--
You received this message because you are subscribed to the Google Groups "support-querypath" group.
To unsubscribe from this group and stop receiving emails from it, send an email to support-queryp...@googlegroups.com.
To post to this group, send email to support-...@googlegroups.com.
Visit this group at http://groups.google.com/group/support-querypath?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
�
�

--
You received this message because you are subscribed to the Google Groups "support-querypath" group.
To unsubscribe from this group and stop receiving emails from it, send an email to support-queryp...@googlegroups.com.
To post to this group, send email to support-...@googlegroups.com.
Visit this group at http://groups.google.com/group/support-querypath?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
�
�

Vic Ktor

unread,
Mar 28, 2013, 4:17:08 AM3/28/13
to support-...@googlegroups.com
with writeHTML() outputs all with DOCTYPE:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><div id="a"><div id="b"></div></div></body></html>

This is correct but unfortunately I can't use it like this because I need to get just the selected element in a variable.

I'm building an online HTML editor that is using jQuery on client side and I need to do some parsing on server side too.
Adding a space, like webvida said, is not the best solution in my case because it can affect the layout. Any other suggestion?

TechnoSophos

unread,
Apr 2, 2013, 5:06:47 PM4/2/13
to support-...@googlegroups.com
I'm actively looking for an HTML5 parser for PHP. This is looking most promising so far:


If I'm reading the documentation correctly, it already can create a DOMDocument and then serialize it. So in theory you should be able to parse a document, then pass that document into QueryPath.

You can then serialize it again with the HTML5Lib.

So in a nutshell, you'd be using QueryPath to query, but that library to parse and serialize.

I've created an issue to track this:


If you test this out, please let me know what you think. I'd love to get HTML5 support into QueryPath.


-- 
TechnoSophos
Twitter: @technosophos
Sent with Sparrow

--

Vic Ktor

unread,
Apr 3, 2013, 3:46:35 AM4/3/13
to support-...@googlegroups.com
Thank you, I really appreciate your help. The php version of html5lib looks unmaintained. It was last updated in 2009.

TechnoSophos

unread,
Apr 3, 2013, 11:39:57 AM4/3/13
to support-...@googlegroups.com
I'm really not big on forking projects needlessly, but in this case… I did:


I've already split the project into a namespace, added a composer.json, and done just a little bit of testing. I'm also trying to enlist a little development help. Here are some of the things I am going to work on (roughly by priority):

- Add an HTML5 output writer
- Rewrite the unit tests in PHPUnit or Atoum, with better coverage
- Make some performance improvements to the parser
- Optimize the code for PHP's opcode cache

And, of course, I'll be fixing bugs as I go.

My personal goal is to make this a good general purpose DOM-compatible parser that works with QueryPath.

-- 
TechnoSophos
Twitter: @technosophos
Sent with Sparrow

--
Reply all
Reply to author
Forward
0 new messages