Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Dynamic content scrape with Node.js
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  7 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Mark Hahn  
View profile  
 More options Oct 6 2012, 5:09 pm
From: Mark Hahn <m...@hahnca.com>
Date: Sat, 6 Oct 2012 14:09:17 -0700
Local: Sat, Oct 6 2012 5:09 pm
Subject: Re: [nodejs] Dynamic content scrape with Node.js

1) You should consider using the node `request` to scrape instead of cURL.

2) Any scraping is only going to return what you request. This is only
going to be the initially provided static content.  You are getting this
from the server, not the client. There is no way to get anything from the
client.

3) You will have to simulate the client and run the JS inside of your app.
 The easiest way to do this is to use a "headless" client.  I suggest you
use Zombie at http://zombie.labnotes.org

On Sat, Oct 6, 2012 at 1:34 PM, Narek Musakhanyan <nmusa...@gmail.com>wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
rektide  
View profile  
 More options Oct 6 2012, 6:04 pm
From: rektide <rekt...@voodoowarez.com>
Date: Sat, 6 Oct 2012 18:04:13 -0400
Local: Sat, Oct 6 2012 6:04 pm
Subject: Re: [nodejs] Dynamic content scrape with Node.js
Only just picked it up last week, but it worked well enough-- node.io.  It exposes a
jQuery-esque interface for querying scraped pages. Extremely high level, "just works"
scraping module, in my book!

It also has a fairly sizable task-processing system built in, which I have not used.

Good luck:
https://github.com/chriso/node.io

-rektide


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dave Kuhn  
View profile  
 More options Oct 6 2012, 11:46 pm
From: Dave Kuhn <david.s.k...@gmail.com>
Date: Sat, 6 Oct 2012 20:46:05 -0700
Local: Sat, Oct 6 2012 11:46 pm
Subject: Re: [nodejs] Dynamic content scrape with Node.js

Good suggestions so far, though i highly recommend you check out phantomjs.org. Phantom is a headless version of WebKit which is the rendering engine behind Chrome & Safari. It's the most comprehensive solution to handling AJAX content when scraping in my book since it's technically the same as interacting with a page loaded by your browser.

--
Dave Kuhn
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Stephan Bardubitzki  
View profile  
 More options Oct 7 2012, 7:15 pm
From: Stephan Bardubitzki <bard...@gmail.com>
Date: Sun, 7 Oct 2012 16:14:50 -0700
Local: Sun, Oct 7 2012 7:14 pm
Subject: Re: [nodejs] Dynamic content scrape with Node.js

Another option would be

https://github.com/MatthewMueller/cheerio

Tutorial:

http://vimeo.com/31950192


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Chad Engler  
View profile  
 More options Oct 8 2012, 12:53 pm
From: "Chad Engler" <Chad.Eng...@patlive.com>
Date: Mon, 8 Oct 2012 12:53:15 -0400
Local: Mon, Oct 8 2012 12:53 pm
Subject: RE: [nodejs] Dynamic content scrape with Node.js

This is probably the same person who asked this question on
StackOverflow:

http://stackoverflow.com/questions/12630891/scrape-data-generated-by-jav
ascript-on-server-side-from-webpages-aspx

Where I have already answered his question, he just didn't like it:

http://stackoverflow.com/questions/12630891/scrape-data-generated-by-jav
ascript-on-server-side-from-webpages-aspx#comment17032399_12630891

-Chad

From: nodejs@googlegroups.com [mailto:nodejs@googlegroups.com] On Behalf
Of Dave Kuhn
Sent: Saturday, October 06, 2012 11:46 PM
To: nodejs@googlegroups.com
Subject: Re: [nodejs] Dynamic content scrape with Node.js

Good suggestions so far, though i highly recommend you check out
phantomjs.org. Phantom is a headless version of WebKit which is the
rendering engine behind Chrome & Safari. It's the most comprehensive
solution to handling AJAX content when scraping in my book since it's
technically the same as interacting with a page loaded by your browser.

--
Dave Kuhn

Sent with Sparrow <http://www.sparrowmailapp.com/?sig>

On Saturday, October 6, 2012 at 3:04 PM, rektide wrote:

        Only just picked it up last week, but it worked well enough--
node.io. It exposes a

        jQuery-esque interface for querying scraped pages. Extremely
high level, "just works"

        scraping module, in my book!

        It also has a fairly sizable task-processing system built in,
which I have not used.

        Good luck:

        https://github.com/chriso/node.io

        -rektide

        On Sat, Oct 06, 2012 at 01:34:03PM -0700, Narek Musakhanyan
wrote:

                Hey guys . I tried to scrape a data from a website using
PHP cURL lib but

                I failed  since cURl allows you to scrape only static
content . But the

                content I want to scrape changes via javascript(AJAX)
since cURL cant

                hanfle that I couldnt handle scraping via cURL . So I
heard the this type

                of things can be done via node . Basically I need to
make my node app

                handle this js wait for some time until AJAX is done and
the pass it to

                php .So is it possible to do via node.js ? I dont know
node and I have to

                start from scratch so I am here you to point out the
right node framework

                to use to get the result I explained .

        --

        Job Board: http://jobs.nodejs.org/

        Posting guidelines:
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines

        You received this message because you are subscribed to the
Google

        Groups "nodejs" group.

        To post to this group, send email to nodejs@googlegroups.com

        To unsubscribe from this group, send email to

        nodejs+unsubscribe@googlegroups.com

        For more options, visit this group at

        http://groups.google.com/group/nodejs?hl=en?hl=en

--
Job Board: http://jobs.nodejs.org/
Posting guidelines:
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nodejs@googlegroups.com
To unsubscribe from this group, send email to
nodejs+unsubscribe@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
greelgorke  
View profile  
 More options Oct 9 2012, 3:25 am
From: greelgorke <greelgo...@gmail.com>
Date: Tue, 9 Oct 2012 00:25:06 -0700 (PDT)
Local: Tues, Oct 9 2012 3:25 am
Subject: Re: [nodejs] Dynamic content scrape with Node.js

why so complicated? just find out the url of the ajax request and do it
yourself with whatever lib you want...

Am Montag, 8. Oktober 2012 18:53:27 UTC+2 schrieb Chad Engler:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dave Kuhn  
View profile  
 More options Oct 9 2012, 10:26 am
From: Dave Kuhn <david.s.k...@gmail.com>
Date: Tue, 9 Oct 2012 07:26:42 -0700
Local: Tues, Oct 9 2012 10:26 am
Subject: Re: [nodejs] Dynamic content scrape with Node.js

True, you can get pretty far doing that but it gets difficult when crucial bits of information are hidden inside script tags and the like. Not to mention managing cookies for ASP.NET pages amongst others is a pain in the butt. You can avoid all that hassle with a fully resolved DOM and automatic support for cookies which Phantom JS will give you.

--  
Dave Kuhn
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »