jsdom evals js from crawled *real* html pages

59 views
Skip to first unread message

Alexandru Topliceanu

unread,
Jan 4, 2011, 1:06:07 PM1/4/11
to nodejs
Hi,

I'm building a crawler using node, jsdom and node-crawler.
I want to use jsdom's jQueryify method after the html has been fed to
jsdom.
This fails because js scripts( and inline code, like events, too )
inside the fetched page issue errors and exceptions.
Not to mention, this is a big security issue.
I have build a small hacklet into node-crawler to strip all js from
the retrieved html before jquerying it, but i feel this is not an
elegant solution.
Any thoughts on this?

Thank you,
Alex

PS. Sorry if i'm posting this on the wrong place ;D

Elijah Insua

unread,
Jan 4, 2011, 1:59:26 PM1/4/11
to nod...@googlegroups.com
When you call jsdom, disable the script execution feature like so:

jsdom("html", null, { features: { 'ProcessExternalResources' : false } })

If you don't even want the scripts to be fetched you can set the FetchExternalResources feature to false as well.

-- Elijah


--
You received this message because you are subscribed to the Google Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com.
To unsubscribe from this group, send email to nodejs+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nodejs?hl=en.


Reply all
Reply to author
Forward
0 new messages