I posted about this on Basecamp, but now that we've started to use this list I'm going to post it here for public consumption.
We've talked about optimizing $$ in the past -- it's one of my personal goals for 1.5.1. So I took great interest in Jack Slocum's new DomQuery extension for YUI (http://www.jackslocum.com/blog/2007/01/11/domquery-css-selector-basic...). Jack is a brilliant JavaScripter and has managed to write a really, really fast CSS selector engine here.
I took a look at his code -- it's quite clever, but also verbose and inelegant in places. He handles a lot of specific CSS token combinations by hand, which results in really fast querying but also *lots* of lines of code. I resolved to write a version that was more Prototypish.
I'm still trying to make it better, but as I see it this new $$ solves several problems with the current $$:
(1) The current $$ does not filter out duplicates. You can see this on the test page: "div div" and "div div div" both return far more results than they should because certain nodes are added to the collection more than once. Calling "uniq" on the array before it's returned is *far* too costly, so I used Jack's inspired method here: it enumerates the collection sets a property on each node, so that if the function finds a node with that property already it knows it's seen it before. (It then sets that property to "undefined" before it's done.)
(2) The current $$ is not very modular or extensible. With this new implementation, I can add new tokens very easily -- for example, all the operators for attribute matching (=, $=, ^=, *=, |=, ~=) -- because they live in a hash with the operator as the key and a comparator as the value. Similarly, adding new selectors like child (>) and adjacency (+), or even pseudoclasses (:nth-child(even)) could be done by adding a regex, an XPath translation, and a string-of-JS-code translation.
(3) The current $$ is SLOW. XPath clearly solves this problem (try the tests in Firefox and see for yourself), but not for Safari (version <2.0) or MSIE (version *anything*). So even the "slow lane" needs to be faster here. Jack claims his implementation is the fastest on earth, and though I've taken his general approach I have not yet realized his gains. Still, on costly selectors the new approach is almost twice as fast as the current $$ (even leaving out XPath), and that's with more functionality and fewer bugs (i.e., duplicated nodes).
I'd love to hear some feedback. I've been looking at this code for way too long now, so a fresh pair of eyes may point out something obvious that I've missed. Also, I'd love it if someone were to modify this code to add new stuff so that $$ can accommodate a wider range of selectors.
If you take a look at my patch, I use indexOf for the 3 additional selectors. I see you use indexOf for one and substr for the other 2 I was implementing. Would it be "faster" to use indexOf for all of them?
I paused with completion of the patch due to issues with IE. I just thought I'd bring my "minimal" contribution to your attention. Looks like you're well on your way to a nice implementation.
Andrew K :)
On Jan 19, 6:54 pm, "Andrew Dupont" <goo...@andrewdupont.net> wrote:
> I posted about this on Basecamp, but now that we've started to use this > list I'm going to post it here for public consumption.
> We've talked about optimizing $$ in the past -- it's one of my personal > goals for 1.5.1. So I took great interest in Jack Slocum's new > DomQuery extension for YUI > (http://www.jackslocum.com/blog/2007/01/11/domquery-css-selector-basic...). > Jack is a brilliant JavaScripter and has managed to write a really, > really fast CSS selector engine here.
> I took a look at his code -- it's quite clever, but also verbose and > inelegant in places. He handles a lot of specific CSS token > combinations by hand, which results in really fast querying but also > *lots* of lines of code. I resolved to write a version that was more > Prototypish.
> I'm still trying to make it better, but as I see it this new $$ solves > several problems with the current $$:
> (1) The current $$ does not filter out duplicates. > You can see this on the test page: "div div" and "div div div" both > return far more results than they should because certain nodes are > added to the collection more than once. Calling "uniq" on the array > before it's returned is *far* too costly, so I used Jack's inspired > method here: it enumerates the collection sets a property on each node, > so that if the function finds a node with that property already it > knows it's seen it before. (It then sets that property to "undefined" > before it's done.)
> (2) The current $$ is not very modular or extensible. > With this new implementation, I can add new tokens very easily -- for > example, all the operators for attribute matching (=, $=, ^=, *=, |=, > ~=) -- because they live in a hash with the operator as the key and a > comparator as the value. Similarly, adding new selectors like child > (>) and adjacency (+), or even pseudoclasses (:nth-child(even)) could > be done by adding a regex, an XPath translation, and a > string-of-JS-code translation.
> (3) The current $$ is SLOW. > XPath clearly solves this problem (try the tests in Firefox and see for > yourself), but not for Safari (version <2.0) or MSIE (version > *anything*). So even the "slow lane" needs to be faster here. Jack > claims his implementation is the fastest on earth, and though I've > taken his general approach I have not yet realized his gains. Still, > on costly selectors the new approach is almost twice as fast as the > current $$ (even leaving out XPath), and that's with more functionality > and fewer bugs (i.e., duplicated nodes).
> I'd love to hear some feedback. I've been looking at this code for way > too long now, so a fresh pair of eyes may point out something obvious > that I've missed. Also, I'd love it if someone were to modify this > code to add new stuff so that $$ can accommodate a wider range of > selectors.
On 1/20/07, Andrew Dupont <goo...@andrewdupont.net> wrote:
> I'd love to hear some feedback. I've been looking at this code for way > too long now, so a fresh pair of eyes may point out something obvious > that I've missed. Also, I'd love it if someone were to modify this > code to add new stuff so that $$ can accommodate a wider range of > selectors.
Andrew-
I'm reviewing your code for the second time now and still I cannot find anything bad about it. It is faster, it is better... So what are we waiting for? Let's get this into core first and then ponder about extending it even further. Changes and collaboration are easier when you have it versioned!
I certainly concur with Mislav. I've been looking at your code twice, and it looks pretty good to me. I especially like the feature modularity thing.
Yes, it weights twice as much as the current selector implementation, but then, its way faster, much more modular, and compressed JS is there for us anyhow.
Submit the patch, and tickle Sam until he works with it :-)
-- Christophe Porteneuve a.k.a. TDD "[They] did not know it was impossible, so they did it." --Mark Twain Email: t...@tddsworld.com
Alex Russell's post on the subject (http://blog.dojotoolkit.org/ 2007/02/04/dojoquery-a-css-query-engine-for-dojo) has stirred some diplomatic talks. It looks like Alex, John Resig, Jack Slocum, Dean Edwards, and I are going to pitch in toward some sort of site where all the different CSS-querying implementations can be benchmarked against uniform, real-world examples. I think this is a great idea and can only make all our engines better.
I don't think this project has any sort of timetable, though, so when I have a moment I'll clean up what I've got so far and submit a patch so we can start discussing this. And once Sam opens up branches I think it'll be much easier to collaborate on this code.
ATTENTION TO ANYONE WHO READS THIS LIST: If you're an XPath rockstar, I need to talk to you.
Cheers, Andrew
On Feb 5, 12:35 am, Christophe Porteneuve <t...@tddsworld.com> wrote:
> I certainly concur with Mislav. I've been looking at your code twice, > and it looks pretty good to me. I especially like the feature > modularity thing.
> Yes, it weights twice as much as the current selector implementation, > but then, its way faster, much more modular, and compressed JS is there > for us anyhow.
> Submit the patch, and tickle Sam until he works with it :-)
> -- > Christophe Porteneuve a.k.a. TDD > "[They] did not know it was impossible, so they did it." --Mark Twain > Email: t...@tddsworld.com
Man I just tried out your test page and WOW! I love the speed, that's what I'm talking about! I have refrained from using $$ before because speed is of utmost importance in my uses of Prototype, beyond that of elegance and maintainability unfortunately. I can't wait to have both! Mad props, Andrew. ;)
Colin
On Jan 19, 7:54 pm, "Andrew Dupont" <goo...@andrewdupont.net> wrote:
> I posted about this on Basecamp, but now that we've started to use this > list I'm going to post it here for public consumption.
> We've talked about optimizing $$ in the past -- it's one of my personal > goals for 1.5.1. So I took great interest in Jack Slocum's new > DomQuery extension for YUI > (http://www.jackslocum.com/blog/2007/01/11/domquery-css-selector-basic...). > Jack is a brilliant JavaScripter and has managed to write a really, > really fast CSS selector engine here.
> I took a look at his code -- it's quite clever, but also verbose and > inelegant in places. He handles a lot of specific CSS token > combinations by hand, which results in really fast querying but also > *lots* of lines of code. I resolved to write a version that was more > Prototypish.
> I'm still trying to make it better, but as I see it this new $$ solves > several problems with the current $$:
> (1) The current $$ does not filter out duplicates. > You can see this on the test page: "div div" and "div div div" both > return far more results than they should because certain nodes are > added to the collection more than once. Calling "uniq" on the array > before it's returned is *far* too costly, so I used Jack's inspired > method here: it enumerates the collection sets a property on each node, > so that if the function finds a node with that property already it > knows it's seen it before. (It then sets that property to "undefined" > before it's done.)
> (2) The current $$ is not very modular or extensible. > With this new implementation, I can add new tokens very easily -- for > example, all the operators for attribute matching (=, $=, ^=, *=, |=, > ~=) -- because they live in a hash with the operator as the key and a > comparator as the value. Similarly, adding new selectors like child > (>) and adjacency (+), or even pseudoclasses (:nth-child(even)) could > be done by adding a regex, an XPath translation, and a > string-of-JS-code translation.
> (3) The current $$ is SLOW. > XPath clearly solves this problem (try the tests in Firefox and see for > yourself), but not for Safari (version <2.0) or MSIE (version > *anything*). So even the "slow lane" needs to be faster here. Jack > claims his implementation is the fastest on earth, and though I've > taken his general approach I have not yet realized his gains. Still, > on costly selectors the new approach is almost twice as fast as the > current $$ (even leaving out XPath), and that's with more functionality > and fewer bugs (i.e., duplicated nodes).
> I'd love to hear some feedback. I've been looking at this code for way > too long now, so a fresh pair of eyes may point out something obvious > that I've missed. Also, I'd love it if someone were to modify this > code to add new stuff so that $$ can accommodate a wider range of > selectors.
> Alex Russell's post on the subject (http://blog.dojotoolkit.org/ > 2007/02/04/dojoquery-a-css-query-engine-for-dojo) has stirred some > diplomatic talks. It looks like Alex, John Resig, Jack Slocum, Dean > Edwards, and I are going to pitch in toward some sort of site where > all the different CSS-querying implementations can be benchmarked > against uniform, real-world examples. I think this is a great idea > and can only make all our engines better.
> I don't think this project has any sort of timetable, though, so when > I have a moment I'll clean up what I've got so far and submit a patch > so we can start discussing this. And once Sam opens up branches I > think it'll be much easier to collaborate on this code.
> ATTENTION TO ANYONE WHO READS THIS LIST: If you're an XPath rockstar, > I need to talk to you.
> Cheers, > Andrew
> On Feb 5, 12:35 am, Christophe Porteneuve <t...@tddsworld.com> wrote: >> I certainly concur with Mislav. I've been looking at your code >> twice, >> and it looks pretty good to me. I especially like the feature >> modularity thing.
>> Yes, it weights twice as much as the current selector implementation, >> but then, its way faster, much more modular, and compressed JS is >> there >> for us anyhow.
>> Submit the patch, and tickle Sam until he works with it :-)
>> -- >> Christophe Porteneuve a.k.a. TDD >> "[They] did not know it was impossible, so they did it." --Mark Twain >> Email: t...@tddsworld.com
On Feb 15, 4:14 pm, Thomas Fuchs <t.fu...@wollzelle.com> wrote:
> Andrew, this is just incredible. Keep it going. :)
Thanks!
I've finally submitted this as patch #7568 (http://dev.rubyonrails.org/ ticket/7568) and have updated the test page at (http:// andrewdupont.net/test/double-dollar/). There are a couple of FIXMEs in there that I'd love to get polished up, so let me know if you have any ideas.
Also, selectors with pseudoelements (e.g., div:first-child) are conspicuously absent. Whoever feels like writing the requisite regexen would be my friend for life.
Can't wait until branch access happens so that we can all start hacking on this!
I'd still like to make optimizations for three common kinds of selectors: * A single tag name ("li") * A single class name (".external") * Any selector with an ID in it ("div #sidebar")
The first two are important, in my opinion, because they'll get used a lot with Element.(down|up|next|previous). These are both easy optimizations to make.
The last is important because we can rely on the uniqueness of IDs to speed things up. For instance, on the benchmark page you'll notice that the "div#speech5" and "div #speech5" selectors are pretty sluggish without XPath. "div#speech5" has to grab all DIVs and find the one with an ID of "speech5"; "div #speech5" has to grab *all descendants* of all DIVs and find the one with an ID of "speech5."
A naïve approach would be to simply return $('speech5') in either of these cases, but that approach would not guarantee that the result matches the selector. (For instance, the returned node might not be a DIV or the descendant of a DIV.) A compromise, I think, is to find the ID and backtrack from there, asserting that it meets all of the conditions preceding that token in the selector. (So for "div div#speech5.dialog" it'd ask: does this node have a class name of "dialog", a tag name of "div", and an ancestor with a tag name of "div"? if so, it gets returned in a one-item array. If not, an empty array is returned.)
This approach would be much faster than the current approach, but would be algorithmic hell. I am prepared to award the Nobel Prize of Awesome to whomever manages to pull this off.
> I've finally submitted this as patch #7568 (http://dev.rubyonrails.org/ > ticket/7568) and have updated the test page at (http://
This is great, but you would need to provide unit tests for the extra features...
> Also, selectors with pseudoelements (e.g., div:first-child) are > conspicuously absent. Whoever feels like writing the requisite > regexen would be my friend for life.
I'm going to try and implement (at least some of the) pseudo-elements, and provide unit tests for current extra selectors and those I'd add. I'll try and have a look at your FIXME's as well.
Andrew Dupont wrote: > ... > Also, selectors with pseudoelements (e.g., div:first-child) are > conspicuously absent. Whoever feels like writing the requisite > regexen would be my friend for life.
> ...
Do you have a list of pseudo-elements you want to support?
- pseudo-elements are not going to be detectable in a portable manner (at least, too few of them), and are of little interest anyway. - pseudo-*classes* are very much detectable, and will: I'm working on it these days, specifically for this $$ rewrite.
I'm pestering Andrew on the back mail channel about suggested improvements and tech questions on his implementation, adding unit tests, and gearing up to adding most (possibly all) pseudo-classes from CSS3 (section 6.6 of the spec).
On Feb 16, 8:43 am, Christophe Porteneuve <t...@tddsworld.com> wrote:
> This is great, but you would need to provide unit tests for the extra > features...
Yeah, working on that. I posted the patch in the airport in Tel Aviv right before boarding a plane, and unit tests slipped my mind at the time.
> My opinion is pretty slicing clear:
> - pseudo-elements are not going to be detectable in a portable manner > (at least, too few of them), and are of little interest anyway. > - pseudo-*classes* are very much detectable, and will: I'm working on it > these days, specifically for this $$ rewrite.
You mean this the other way around, right? Pseudo-classes (hover, active, etc.) are the ones that don't seem useful to me (they select elements in certain states, which is nearly meaningless in a DOM scripting context).
On Feb 16, 11:57 am, Ken Snyder <kendsny...@gmail.com> wrote:
> Andrew Dupont wrote:
> Do you have a list of pseudo-elements you want to support?
The list you posted is more or less my list, in terms of priority.
> Do we want to address pseudo-elements that normally refer to text nodes > or fragments such as :first-letter, :first-line, and ::selection?
I'd rather walk barefoot across a barbecue.
> So are you asking someone to define corresponding entries in NewSelector > such as NewSelector.criteria.pseudo and NewSelector.patterns.pseudo?
Yeah. The more specific and obscure these selectors get, the harder the pattern gets, and the harder the XPath gets. The non-xpath logic is a bit more straightforward, but then optimization is the tricky part there.
Andrew Dupont wrote: >> Do we want to address pseudo-elements that normally refer to text nodes >> or fragments such as :first-letter, :first-line, and ::selection?
> I'd rather walk barefoot across a barbecue.
Oh come now, I don't know how useful it would be :D, but it would certainly be smooth to be able to manipulate the DOM using something like this:
<!--With html like this,--> <div id="story"><p>It was the best of times, it was the worst of times<p></div> <!--You would end up with a document equivalent to this--> <div id="story"><p><span class="dramatic-cap">I</span><span>t was the best of times, it was the worst of times</span><p></div>
// :first-letter $$ selector Selector.criteria.firstLetter = 'n = h.firstLetter(n, r); d= false;'; Selector.patterns.firstLetter = '/:first-letter(\b|$)/'; Selector.handler.firstLetter = function(nodes, root) { // collect matching nodes return nodes.collect(function(node) { // travel down the DOM until a node containing text is found while( typeof node.nodeValue == 'undefined' && node = node.down() ) {} // get the first letter var first = node.nodeValue.substring(0,1); // if the text!='', get the rest if( first ) { var rest = node.nodeValue.substring(1); // wrap the first character and the rest of the text in respective spans node.update('<span>'+first+'</span><span>'+rest+'</span>'); // return the span containing the first character return node.down(); });
That bothers me because simply querying '#story p:first-letter' modifies the document. $$ is a mechanism for reading nodes, not modifying them, so I think the confusion this would cause would outweigh any syntactic elegance.
Cheers, Andrew
On Feb 16, 5:54 pm, Ken Snyder <kendsny...@gmail.com> wrote:
> Andrew Dupont wrote: > >> Do we want to address pseudo-elements that normally refer to text nodes > >> or fragments such as :first-letter, :first-line, and ::selection?
> > I'd rather walk barefoot across a barbecue.
> Oh come now, I don't know how useful it would be :D, but it would > certainly be smooth to be able to manipulate the DOM using something > like this:
> <!--With html like this,--> > <div id="story"><p>It was the best of times, it was the worst of > times<p></div> > <!--You would end up with a document equivalent to this--> > <div id="story"><p><span class="dramatic-cap">I</span><span>t was the > best of times, it was the worst of times</span><p></div>
> // :first-letter $$ selector > Selector.criteria.firstLetter = 'n = h.firstLetter(n, r); d= false;'; > Selector.patterns.firstLetter = '/:first-letter(\b|$)/'; > Selector.handler.firstLetter = function(nodes, root) { > // collect matching nodes > return nodes.collect(function(node) { > // travel down the DOM until a node containing text is found > while( typeof node.nodeValue == 'undefined' && node = node.down() ) {} > // get the first letter > var first = node.nodeValue.substring(0,1); > // if the text!='', get the rest > if( first ) { > var rest = node.nodeValue.substring(1); > // wrap the first character and the rest of the text in respective > spans > node.update('<span>'+first+'</span><span>'+rest+'</span>'); > // return the span containing the first character > return node.down(); > });
> That bothers me because simply querying '#story p:first-letter' > modifies the document. $$ is a mechanism for reading nodes, not > modifying them, so I think the confusion this would cause would > outweigh any syntactic elegance.
That's very much my view, too. Plus, the suggested implementation will add a new <span> layer *every time*. It is clear that creating a span in there could result in dramatic unexpected changes due to styling.
This is the whole point of pseudo-elements: they're named that way because they refer to things that are *not* DOM elements (as opposed to pseudo-classes, which refer to DOM elements satisfying specific state/structural criteria).
I also think we're very much on an edge case here: if you intend to manipulate first letters/words by script, you could very much wrap them in a span originally (e.g. wherever the page's XHTML is generated), CSS around it, etc.
-- Christophe Porteneuve a.k.a. TDD "[They] did not know it was impossible, so they did it." --Mark Twain Email: t...@tddsworld.com
Andrew Dupont wrote: > That bothers me because simply querying '#story p:first-letter' > modifies the document. $$ is a mechanism for reading nodes, not > modifying them, so I think the confusion this would cause would > outweigh any syntactic elegance.
I agree. And that give me a much better picture of the vision for the $$ roadmap.
It would seem that additions in order of usefulness would be something like this:
Thanks for the code. I'm myself working on implementing PC's, and looking at your work is a boon. I like the general code architecture you did (how work gets split around nthFind, etc.), yet I need to do more careful inspection on it to verify that regexes are sound (they appear to be so far), and benchmark efficiency.
This latter point is my main concern here, as your attached demo page runs at an atrociously slow pace on my box here (FF2/Debian/AMD-2.4GHz)... But then, maybe it's just your innerHTML construction code on the benchmarking side.
Also, :lang relies on |=, not =, for the lang attribute check. It also primarily relies on xml:lang, not just lang. And *ideally*, it would examine DOM lineage for the node (but that's very much an edge case, I think).
Finally, there are no :first and :last selectors in CSS3, although their addition is nice. I'm a little concerned as to what would happen if $$ ended up supporting some CSS3 selectors in a not-quite-spec-compliant way (e.g. lang), and supporting *extra* selectors that people would then believe are usable straight in regular CSS...
Thanks again for your work! I'll dive more into it in the coming days.
-- Christophe Porteneuve a.k.a. TDD "[They] did not know it was impossible, so they did it." --Mark Twain Email: t...@tddsworld.com
Christophe Porteneuve wrote: > ... > Thanks for the code. I'm myself working on implementing PC's, and > looking at your work is a boon. I like the general code architecture > you did (how work gets split around nthFind, etc.), yet I need to do > more careful inspection on it to verify that regexes are sound (they > appear to be so far), and benchmark efficiency.
Yah, I tend to just be throwing out ideas with my drafts. I hope it is helpful. If nothing else it prevents you and others from going down paths that don't work. :P
> This latter point is my main concern here, as your attached demo page > runs at an atrociously slow pace on my box here > (FF2/Debian/AMD-2.4GHz)... But then, maybe it's just your innerHTML > construction code on the benchmarking side.
Yes, it was very slow for me too with any of the "nth-" predicates. That nth routine relies on counting the number of previous siblings FOR EVERY NODE. I'm not sure if that is avoidable, but I've added the corresponding xpath... see below. The xpath syntax may need to be reworked with previous-sibling or self, I'm not sure. Or this may just be a wrong path :)
> Also, :lang relies on |=, not =, for the lang attribute check. It also > primarily relies on xml:lang, not just lang. And *ideally*, it would > examine DOM lineage for the node (but that's very much an edge case, I > think).
> Finally, there are no :first and :last selectors in CSS3, although their > addition is nice. I'm a little concerned as to what would happen if $$ > ended up supporting some CSS3 selectors in a not-quite-spec-compliant > way (e.g. lang), and supporting *extra* selectors that people would then > believe are usable straight in regular CSS...
> Thanks again for your work! I'll dive more into it in the coming days.
I agree. I don't know why I thought :first or :last were CSS3, but I don't think we should add anything. $$ should be a method of convenience, not the exclusive way to build a list of nodes; there is a lot of power in methods such as Element.up(), Element.down(), Enumerable.inject(), etc. Not to mention the fact that styling the first and last element separately is great, but not such a great practice when trying to separate presentation from logic with JS.
I don't know how well you'll be able to implement a CSS "not()" equivalent in XPath. From what I could see, filtering out nodes in XPath depends on the content of not(). Consider:
CSS3: div.character:not(div[@id=speech2]) XPath: .//div[contains(concat(' ', @class, ' '), ' character ') and @id!='speech2']
And :empty seems tricky in XPath too. So Christophe, keep us posted with any XPath magic you have up your sleeve.... This is it from me for now with $$.
--Ken
Object.extend(NewSelector, { xpath: { descendant: "//", child: "/", adjacent: "/following-sibling::", tagName: "#{2}", className: "[contains(concat(' ', @class, ' '), ' #{1} ')]", id: "[@id='#{1}']", attrPresence: "[@#{1}]", firstChild: "[position()=1]", lastChild: "[position()=last()]", // count(s) empty: "/text()", nth: function(m) { // m for matches var isReverse = m[1]; // ('last-' or '') var flavor = m[2]; // ('child' or 'of-type') var notation = m[3]; // parens var ab = NewSelector.handlers.parseNthNotation(notation); if (flavor == 'child') { if (isReverse == 'last-') { return "[last()-position() mod "+ab[0]+"="+ab[1]+"]"; } else { return "[position() mod "+ab[0]+"="+ab[1]+"]"; } // use preceding-sibling:: } else { // 'of-type'
} }, attr: function(m) { return new Template(NewSelector.xpath.operators[m[2]]).evaluate(m); }, not: function(m) { // m for matches
criteria: { tagName: 'n = h.tagName(n, r, "#{2}", d); d = false;', className: 'n = h.className(n, r, "#{1}", d); d = false;', id: 'n = h.id(n, r, "#{1}", d); d = false;', attrPresence: 'n = h.attrPresence(n, r, "#{1}"); d = false;', attr: 'n = h.attr(n, r, "#{1}", "#{3}", "#{2}"); d = false;', descendant: 'd.list = "desc";', child: 'd.list = "child";', adjacent: 'd.list = "adjacent";', empty: 'n = h.empty(n); d = false;', not: 'n = h.not(n, "#{1}"); d = false;', firstChild: 'n = h.findNthNodes(n, r, "child", 1, 1, false); d = false;', lastChild: 'n = h.findNthNodes(n, r, "child", 1, 1, true); d = false;', lang: 'n = h.attr(n, r, "lang", "#{1}", "|="); d = false;', nth: 'n = h.nth(n, r, "#{1}", "#{2}", "#{3}"); d = false;' },
patterns: { child: /^(\s+)?>\s*/, adjacent: /^(\s+)?\+\s*/, descendant: /^\s/, tagName: /^(\s+)?(\*|[\w-]+)(\b|$)?/, id: /^#([\w-\*]+)(\b|$)/, className: /^\.([\w-\*]+)(\b|$)/, attrPresence: /^\[([\w]+)\]/, attr: /\[(?:@)?([\w-:]+)\s?(?:(=|.=)\s?['"]?([^\]]*?)["']?)?\](\s?>\s?|[\/\s\]]|$ )/, firstChild: /^:first-child(\b|$)/, lastChild: /^:last-child(\b|$)/, empty: /^:empty(\b|$)/, not: /^:not\(([^(]+|[^(]*:nth-[^(]+\([^()]+\))\)(\b|)/, lang: /^:lang(\b|$)/, nth: /^:nth-(last-)?(child|of-type)\((\d*n[+-]\d+|\d+n|\d+|odd|even)\)(\b|$)/ /* note: nesting of ":not" is disallowed (see http://www.w3.org/TR/2001/CR-css3-selectors-20011113/#negation) ":not" may contain other parens only if referring to the N format */ },
handlers: { /** * Given "nth-" notation, parse notation string and find nodes * * @param array nodes List of nodes to filter * @param object root (not needed?) * @param bool isReverse If true, counting should start from last node and continue to first * @param string flavor String containing 'child' or 'of-type' * @param string notation Notation in form an+b * @return array List of matching nodes */ nth: function(nodes, root, isReverse, flavor, notation) { var groupAndOffset = NewSelector.handlers.parseNthNotation(notation); return NewSelector.handlers.findNthNodes(nodes, root, flavor, groupAndOffset[0], groupAndOffset[1], isReverse == 'last-'); },
/** * Given a string notation in the form an+b, return a and b * a and b represent group size and offset in the css "nth-" notation * * @param string notation * return array [a, b] */ parseNthNotation: function(notation) { if (notation == 'odd') return [2,1]; else if (notation == 'even') return [2,0]; // parse notation var matches = /^(\d*n)?([+-])?(\d+)?$/.exec(notation); // $1 grouping size (optional) // $2 plus or minus (optional) // $3 offset (optional) if (matches[2] && matches[2] == '-') matches[3] = matches[3] * -1; return [parseFloat(matches[1] || 1), parseFloat(matches[3] || 0)]; },
/** * Given "nth-" notation information, find nodes * * @param array nodes List of nodes to filter * @param object root (not needed?) * @param string flavor String containing 'child' or 'of-type' * @param int groupSize Size of groups (a) * @param int offset Offset within group (b) * @param bool isReverse If true, counting should start from last node and continue to first * @return array List of filtered nodes */ findNthNodes: function(nodes, root, flavor, groupSize, offset, isReverse) { //console.log($A(arguments)); // is root necessary here? I think we can omit it if (isReverse) nodes = nodes.reverse(); // create a function to count the prevous siblings if (flavor == 'child') { // "nth-child" / "nth-last-child" var countPrev = function(node) { return node.previousSiblings().length; }; } else { // "nth-of-type" / "nth-last-of-type" might be more descriptively named "nth-child-with-matching-tagName" var countPrev = function(node,root) {
//console.log(node.previousSiblings().findAll(function(n) { return n.tagName == node.tagName; })); return node.previousSiblings().findAll(function(n) { return n.tagName == node.tagName; }).length; }; } // use our function to filter nodes nodes = nodes.findAll(function(node) { if (groupSize > 1 ) { // for groupSizes of 2 or more, use modulus return (countPrev(node,root) + 1) % groupSize == offset; } else { // for groupSizes of 1, a absolute offset return countPrev(node,root) + 1 == offset; } }); if (isReverse) nodes = nodes.reverse(); return nodes; },
not: function(nodes, innerExpression) { var toRemove = NewSelector.matchElements(nodes, innerExpression); var ret = nodes.reject(function(node) { return toRemove.member(node); }); //console.log(ret); return ret; },
empty: function(nodes) { return nodes.findAll(function(node) { // could use innerHTML=='' instead but it seems ugly return !node.firstChild && node.nodeValue === null; }); },
On Feb 18, 9:45 pm, Ken Snyder <kendsny...@gmail.com> wrote:
> Yes, it was very slow for me too with any of the "nth-" predicates. > That nth routine relies on counting the number of previous siblings FOR > EVERY NODE. I'm not sure if that is avoidable, but I've added the > corresponding xpath... see below. The xpath syntax may need to be > reworked with previous-sibling or self, I'm not sure. Or this may just > be a wrong path :)
DomQuery sidesteps this by setting an expando "nodeIndex" property the first time it iterates through a group of child nodes.
> I don't know how well you'll be able to implement a CSS "not()" > equivalent in XPath. From what I could see, filtering out nodes in > XPath depends on the content of not(). Consider:
> CSS3: div.character:not(div[@id=speech2]) > XPath: .//div[contains(concat(' ', @class, ' '), ' character ') and > @id!='speech2']
Luckily, "not" is a function in XPath. It returns the opposite boolean of whatever it contains:
.//div[not(@id="speech1")] .//div[not(contains(concat(' ', @class, ' '), ' character ') and @id='speech2')]
Still a little tricky because of the placement of the brackets. We might have to remove the brackets from Selector.xpath.* so that we can insert them properly based on the context of the predicate. Also, I'm not even sure the second example qualifies as a "simple selector" the way the CSS3 spec defines it.
Thanks for your help, Ken -- you've saved me some trial and error with XPath syntax.