jsdom version 0.3.0 released!

107 views
Skip to first unread message

Domenic Denicola

unread,
Dec 10, 2012, 3:35:48 AM12/10/12
to js...@googlegroups.com, diego....@gmail.com, ch...@chad-cat-lore-eddie.com
I'm happy to announce jsdom 0.3.0, my first non-patch version released since joining the project.

The biggest news is that we now have a real querySelector implementation, courtesy of Diego Perini's nwmatcher project. Diego's been super-helpful in this, accepting my pull request to add a hook for jsdom, and promptly fixing any small issues I found in his implementation. This fixes something like 10 issues in the GitHub issue tracker, as our substandard Sizzle-based querySelector has been until now one of the biggest problems with jsdom.

Almost as exciting is the addition of support for <style> and <link href="stylesheet"> elements, which now correctly affect getComputedStyle. This is due to a wonderful pull request by Chad Walker, who you may recognize as the one who made elements' `cssText` and `style` properties work correctly in the 0.2.16–0.2.18 releases. Great stuff.

There's a few more changes, all of which are nice stuff, but I wanted to highlight those two contributions since they're a) awesome; and b) will likely have an impact on your programs' behavior. Check out the rest at

https://github.com/tmpvar/jsdom/blob/master/Changelog.md#030

and let me know how this new release works for you!

Eric Goldberg

unread,
Dec 11, 2012, 12:46:48 AM12/11/12
to js...@googlegroups.com, diego....@gmail.com, ch...@chad-cat-lore-eddie.com
Congrats on the new release!

However, per  your request about how this new release works for me, I'm getting this error message crashing my node process:


Error: The string ";.rhfWrapper .shoveler .price-per-unit", is not a valid CSS selector
    at emit (/project_dir/node_modules/jsdom/node_modules/nwmatcher/src/nwmatcher.js:830:37)
    at Object.match (/project_dir/node_modules/jsdom/node_modules/nwmatcher/src/nwmatcher.js:1303:11)
    at Object.exports.applyQuerySelector.doc.createElement.element.matchesSelector (/project_dir/node_modules/jsdom/lib/jsdom/selectors/index.js:32:49)
    at exports.createWindow.DOMWindow.getComputedStyle (/project_dir/node_modules/jsdom/lib/jsdom/browser/index.js:243:20)
    at Array.forEach (native)
    at exports.createWindow.DOMWindow.getComputedStyle (/project_dir/node_modules/jsdom/lib/jsdom/browser/index.js:242:17)
    at Object.forEach (native)
    at exports.createWindow.DOMWindow.getComputedStyle (/project_dir/node_modules/jsdom/lib/jsdom/browser/index.js:241:15)

 
The line last call before jsdom is:

var computedStyle = window.parent.document.defaultView.getComputedStyle(node, null);

I think it may have to do with the HTML I'm scraping using invalid/dirty markup (in this case, looks like amazon has a superious semicolon in their CSS selector in their inline <style> block). I got this error while testing a scraper on this URL:


It's totally invalid CSS on the part of the content owner, but it'd be nice if the library didn't barf... especially since there's so much bad CSS out there!

cheers,
e

Domenic Denicola

unread,
Dec 11, 2012, 1:09:16 AM12/11/12
to js...@googlegroups.com

Thanks for finding this! 0.3.1 pushed with that fixed.

Eric Goldberg

unread,
Dec 11, 2012, 1:50:12 AM12/11/12
to js...@googlegroups.com
Confirmed. The Amazon URL I posted works in 0.3.1 and 0.2.19, but not 0.3.0. 

Thanks for the fast turnaround time!

Eric Goldberg

unread,
Dec 11, 2012, 11:24:30 AM12/11/12
to Diego Perini, js...@googlegroups.com, ch...@chad-cat-lore-eddie.com
I guess it just depends on what the proper default behavior is. Is jsdom being sold as an HTML validator? Or a jquery-like DOM walker? I think the latter is the case. Everyone is using it for screen scraping. And if you have a screen scraper that (by default) chokes on crap HTML, you're going to just see a bunch of people migrating to using phantomjs or other solutions.

There's also the case that this is arguably a regression. I get that parsing style tags is an enhancement, but if it breaks 10% of pages that it worked on in 0.2.19, that ends up being a big deal. You have to look at the common case and the edge case. There are a handful of people who need jsdom to break on bad CSS markup. There are millions of people who need it to act like jquery and "just work" :)

Either way, great job to the team on this awesome project!
e

On Tue, Dec 11, 2012 at 5:17 AM, Diego Perini <diego....@gmail.com> wrote:
Eric,
in nwmatcher this can be fixed by setting a simple flag (fx: in production):

    Dom.configure({ VERBOSITY: false });

In any case I would prefer to leave this "true" if the developer is NOT interested into catching these validation errors and still maintain a log of the wrong behavior. The latter can be fixed at a higher level (as showed). Current fix/patch seems OK to me.

Wrong ... is still wrong. However with the above infos everybody can make notifications fit their needs (break processing or not) at different stages of dev/dep.

Being able to only return an empty Array was not the right solution, also because the goal is to match Query Selector API which doesn't work like that. Developers should match that (QSA), with the required flexibility when feasible :)

PS: there are currently other invalid selectors in the current sizzle tests like "!=" and several unquoted "href" values.



--
Sent from my email

Domenic Denicola

unread,
Dec 11, 2012, 11:39:46 AM12/11/12
to js...@googlegroups.com, Diego Perini, ch...@chad-cat-lore-eddie.com
To be clear, jsdom does not choke on invalid HTML. However, JavaScript inside of jsdom that throws errors will... well, throw errors.

There was a bug in 0.3.0, wherein jsdom did not behave like a web browser in terms of CSS parsing behavior. The correct DOM/CSSOM thing to do is to discard such rules. If the correct behavior was to throw an error (as it is when passing invalid selectors to querySelector), then jsdom would do that.


From: js...@googlegroups.com [js...@googlegroups.com] on behalf of Eric Goldberg [eric.g...@gmail.com]
Sent: Tuesday, December 11, 2012 11:24
To: Diego Perini
Cc: js...@googlegroups.com; ch...@chad-cat-lore-eddie.com

Subject: Re: jsdom version 0.3.0 released!

Godmar Back

unread,
Dec 11, 2012, 1:50:29 PM12/11/12
to js...@googlegroups.com
BTW, what's the expected result of running tests with 0.3.1?

I'm seeing:

TOTALS: 7/2559 failed; 99% success

- Godmar

Domenic Denicola

unread,
Dec 11, 2012, 1:54:08 PM12/11/12
to js...@googlegroups.com
That's in the README these days:

https://github.com/tmpvar/jsdom#test-compliance

Curious which extra test you're seeing fail...

________________________________________
From: js...@googlegroups.com [js...@googlegroups.com] on behalf of Godmar Back [god...@gmail.com]
Sent: Tuesday, December 11, 2012 13:50
To: js...@googlegroups.com

Godmar Back

unread,
Dec 11, 2012, 2:02:37 PM12/11/12
to js...@googlegroups.com
READMEs are always good. Here's the failure.

$ ./runner -t filename_with_spaces_in_script_tag_can_be_read
running level2/html.js level2/html
✖ level2/html/filename_with_spaces_in_script_tag_can_be_read

ERROR null vs [ { [Error: ENOENT, open
'/home/gback/cloudbrowser/jsdom/test/level2/html/files/js/script']
errno: 34,
code: 'ENOENT',
path: '/home/gback/cloudbrowser/jsdom/test/level2/html/files/js/script' } ]

There should be no errors when using scripts with spaces in their
filenames AssertionError: There should be no errors when using scripts
with spaces in their filenames
at Object.strictEqual
(/home/gback/cloudbrowser/jsdom/node_modules/nodeunit/lib/types.js:83:39)
at /home/gback/cloudbrowser/jsdom/test/level2/html.js:19806:14
at Array.<anonymous> (/home/gback/cloudbrowser/jsdom/lib/jsdom.js:206:39)
at EventEmitter._tickCallback (node.js:190:38) Error
at /home/gback/cloudbrowser/jsdom/test/runner:188:58
at Array.forEach (native)
at Object.testDone (/home/gback/cloudbrowser/jsdom/test/runner:173:18)
at Array.0 (/home/gback/cloudbrowser/jsdom/node_modules/nodeunit/lib/types.js:145:25)
at EventEmitter._tickCallback (node.js:190:38)

level2/html 0/1 0%
------------------------------
TOTALS: 1/1 failed; 0% success
TIME: 408ms
$ node -v
v0.6.21-pre

Godmar Back

unread,
Dec 11, 2012, 2:13:34 PM12/11/12
to js...@googlegroups.com
PS: the reason is this call to URL.resolve:
https://github.com/tmpvar/jsdom/blob/master/lib/jsdom/level2/html.js#L94

which turns "script with spaces.js" into "script"

I'm wondering though if it isn't the test that's broken. Are spaces
in URLs legal or should they be URIencoded?

- Godmar

Domenic Denicola

unread,
Dec 19, 2012, 10:57:34 PM12/19/12
to js...@googlegroups.com
The intent of this test was to allow scripts with spaces in their filenames, not in URLs. It's curious that this fails for you; I can't get it to fail locally or on Travis. More investigation welcome.

> -----Original Message-----
> From: js...@googlegroups.com [mailto:js...@googlegroups.com] On
> Behalf Of Godmar Back
> Sent: Tuesday, December 11, 2012 14:14
> To: js...@googlegroups.com
> Subject: Re: jsdom version 0.3.0 released!
>

Godmar Back

unread,
Dec 20, 2012, 12:50:53 PM12/20/12
to js...@googlegroups.com

As I said in https://groups.google.com/d/msg/jsdom/20cPVDZvgpc/OWD-mjjapPAJ the problem is with URL.resolve.

Here's a test case that shows the url.resolve API call that makes it fail

$ cat whyitfails.js
var URL = require("url");
base = "file:///home/gback/cloudbrowser/jsdom/test/level2/html.js";
href = "./html/files/js/script with spaces.js";
console.log(URL.resolve(base, href));
$ node whyitfails.js
file:///home/gback/cloudbrowser/jsdom/test/level2/html/files/js/script
$

Running node 0.6.21-pre.  

 - Godmar

Reply all
Reply to author
Forward
0 new messages