QP Priorities

11 views
Skip to first unread message

Ryan Mahoney

unread,
Feb 24, 2012, 11:42:35 AM2/24/12
to devel-querypath
Hi Group,

Exciting to see the direction that this package is taking!

That being said, I think the biggest obstacle that QP faces in terms
of long term adoption is not features. Arguably, it already has more
that enough features (at least for my purposes) -- it's performance.
QP has the potential to support radically improved web dev
methodolgies (on the View side of things) -- but it can't be taken too
seriously if it a major scalability/performance bottleneck for highly
trafficked sites. I get that many sites are using some level of
caching -- but not every page can be cached.

There are a few possible ideas for making QP faster...

1. Writing a native class for converting CSS selectors to XPATH
queries and using the native XPATH find that already supported by
DOMDocuments. (I have something like this written non-natively)

2. Figuring out how to make persistent storage of DOMDocuments so the
parsing/loading phase can be eliminated.

Any thoughts/suggestions on these ideas?

Thanks for all your hard work and creativity!

-r

Matt

unread,
Feb 24, 2012, 12:32:59 PM2/24/12
to devel-querypath
(For those just tuning in, QueryPath 3.x is under active dev right
now, with the current emphasis on refactoring code for full PHP 5.3
support, especially namespacing and PSR-0 compliance.)

Regarding #1, I'd like to hear some performance numbers. Many CSS
selectors could be quickly converted to XPath queries. I don't know
how complex it would be, though, to try to build a complete resolver.

For #2, I have looked at some options for this in the past, because I
too think this is a great idea. The biggest difficulty is that a
DOMDocument cannot be serialized, which means it has to be walked. And
walking the DOM tree didn't seem to be noticeably faster than parsing
a DOM document. That said, I'm a HUGE fan of this idea, and if anyone
were to suggest some potential routes to accomplishing it, I'm all for
it.

Another is to change the way the traversal happens.

Right now, when a selector comes in for 'foo > bar', we search
downward through the document for all <foo> elements, and then for
each foo element, we check to see which have <bar> children. Or, for
an even tougher case, 'foo bar' searches first for all <foo> elements,
and then searches all descendants for <bar> elements.

We could drastically improve QueryPath's search performance by
traversing "bottom up":

For 'foo bar', find all 'bar' elements, and then go up the tree to see
which ones have <foo> ancestors. Because each element has only one
parent, and because the ancestors can be traversed very quickly, this
method is considerably faster.

(To visualize this, draw a tree structure and compare strategies for
starting at the top and finding a path to a specific place at the
bottom, and starting at the bottom and finding a path to the top.)

I started on this feature a long time ago, but just couldn't carve out
a block of time. I'd love to come back to it.

The bottom line, though, is that I am 100% in agreement with Ryan: I
would rather improve QueryPath's performance than add more features.

Matt
Reply all
Reply to author
Forward
0 new messages