Parallel Browsing

6233 views
Skip to first unread message

short extender

unread,
Jun 13, 2012, 2:43:45 AM6/13/12
to casp...@googlegroups.com
Hi,

I'm sorry if this is a dumb question.  What I'd like to do is open a URL a, read some information from a, open a separate URL b, read some information from b, and then put this information into a form on a and submit it.  So it's quite like "opening a new tab" and doing something in the background.  Is there a straightforward nice way to do this, without unnecessarily reloading a?

Thanks,
SE

Nicolas Perriault

unread,
Jun 13, 2012, 3:12:23 AM6/13/12
to casp...@googlegroups.com
On Wed, Jun 13, 2012 at 8:43 AM, short extender
<justfo...@googlemail.com> wrote:

> Is there a straightforward nice way to do this, without unnecessarily
> reloading a?

var casper = require('casper').create();
var data = {};

casper.start('http://sitea.com/page1.htm', function() {
data.title = this.getTitle();
});

casper.thenOpen('http://siteb.com/page2.htm', function() {
this.fill('form', data, true);
});

casper.run();

++

--
Nicolas Perriault
https://nicolas.perriault.net/http://www.akei.com/
Skype: nperriault
Phone: +33 (0) 660 92 08 67

short extender

unread,
Jun 13, 2012, 3:38:43 AM6/13/12
to casp...@googlegroups.com
Thank you for your quick reply, but I don't think this is what I meant.  I'd like to post something to "page1.htm" after having retrieved "page2.htm", without reloading "page1.htm".

SE

Nicolas Perriault

unread,
Jun 13, 2012, 3:48:59 AM6/13/12
to casp...@googlegroups.com
2012/6/13 short extender <justfo...@googlemail.com>:

> Thank you for your quick reply, but I don't think this is what I meant.  I'd
> like to post something to "page1.htm" after having retrieved "page2.htm",
> without reloading "page1.htm".

Ah ok I see, sorry for the misunderstanding. There's no official
support of parallel browsing right now in casperjs, but I've just
tried to set up two instances of casper and it looks to be working:

var google = require('casper').create();
var yahoo = require('casper').create();

google.start('http://google.com/');

yahoo.start('http://yahoo.com/', function() {
this.echo(google.getTitle());
});

google.run(function() {});

yahoo.run(function() {});

setTimeout(function() {
yahoo.exit();
}, 5000);

Handle with care though.

short extender

unread,
Jun 13, 2012, 12:27:20 PM6/13/12
to casp...@googlegroups.com
Awesome, thanks!

I think in general it's a good feature to be able to create several instances simultaneously.  I saw some posts where people were running CasperJS several times to do simultaneous scraping and were discussing performance issues.  This approach probably can do many more instances with the same amount of memory consumed.

Best,
SE

short extender

unread,
Jun 23, 2012, 4:42:50 AM6/23/12
to casp...@googlegroups.com
To get back to this, sorry for bothering you again, this is how I'd like to use it, but it won't work and I can't see why:


var google = require('casper').create();

google.start(''http://google.com/', function() {
    // Do something like get the first result for "casperjs"
    var newBrowser = require('casper').create();
    newBrowser.start('http://http://casperjs.org/', function() {
        // This is never reached, unfortunately
        // Do something like return the first word of the website to use for the next google search
    });
    newBrowser.run();
    newBrowser.exit();
}

google.run();

It would be really helpful if one could spawn a second casper instance like this.  Is there a way to do this?

Thanks!

SE

On Wednesday, June 13, 2012 12:48:59 AM UTC-7, Nicolas Perriault wrote:

short extender

unread,
Jun 23, 2012, 4:43:44 AM6/23/12
to casp...@googlegroups.com


On Saturday, June 23, 2012 1:42:50 AM UTC-7, short extender wrote:
    newBrowser.start('http://http://casperjs.org/', function() {

Of course just one http... that's not the problem.

Nicolas Perriault

unread,
Jun 23, 2012, 5:18:50 AM6/23/12
to casp...@googlegroups.com
On Sat, Jun 23, 2012 at 10:42 AM, short extender
<justfo...@googlemail.com> wrote:

> It would be really helpful if one could spawn a second casper instance like
> this.  Is there a way to do this?

You may have a look at the dynamic.js sample:
https://github.com/n1k0/casperjs/blob/master/samples/dynamic.js

Notice how it can run suites consecutively passing the `check` closure
to casper.run(). That should get you started, but in my own
experience, these kinds of parallel workflows are very, very tedious
to handle.

HTH

me

unread,
Jun 26, 2012, 7:54:59 PM6/26/12
to casp...@googlegroups.com
I'm confused as to how https://github.com/n1k0/casperjs/blob/master/samples/dynamic.js works in parallel when it runs suites consecutively.

I too want to crawl sites in parallel (although I'm not inserting data from one site into another as the OP).  I wanted multiple separate Casper instances, preferably spawned by a master Node.js process.  Unfortunately, I could not get it to work.  My next solution is below:

var casper = require('casper'),
    c1Done = false,
    c2Done = false;

var c1 = casper.create({verbose:true,logLevel:'debug'});
c1.on('run.complete',function(){this.test.comment('c1 run.complete');});
c1.on('exit',        function(s){this.test.comment('c1 exit: '+s);});
c1.on('error',       function(m,b){c1Done = true; this.test.comment('c1 error: '+m);});
c1.on('load.failed', function(){c1Done = true; this.test.comment('c1 load.failed');});
c1.start('http://google.com',function(){this.test.comment('c1 start');});
c1.run(function(){
  this.echo('c1 onComplete');
  c1Done = true;
  if(c2Done)
    this.exit('c1');
});

var c2 = casper.create({verbose:true,logLevel:'debug'});
c2.on('run.complete',function(){this.test.comment('c2 run.complete');});
c2.on('exit',        function(s){this.test.comment('c2 exit: '+s);});
c2.on('error',       function(m,b){c2Done = true; this.test.comment('c2 error: '+m);});
c2.on('load.failed', function(){c2Done = true; this.test.comment('c2 load.failed');});
c2.start('http://bing.com',function(){this.test.comment('c2 start');});
c2.run(function(){
  this.echo('c2 onComplete');
  c2Done = true;
  if(c1Done)
    this.exit('c2');
});

Not sure if this is the best way, especially since exit() is only being called on one of the Casper vars, but it seems to work as I need it to.  Notice that c1 or c2 can finish first. This, to me, means it's truly parallel.

If I'm wrong, please correct me.

short extender

unread,
Jun 27, 2012, 3:05:54 AM6/27/12
to casp...@googlegroups.com
Thank you very much, will try!

Рустем Муслимов

unread,
Jan 27, 2013, 7:13:19 AM1/27/13
to casp...@googlegroups.com
Yet another try to use parallel loads:

var google = require('casper').create({
    logLevel: 'info',
    verbose: true,
});

var google_e = false, bing_e = false;

function exit() {
    if (google_e && bing_e) {
        google.log('Exit.', 'info');
        google.exit();
    }
}

google.start('http://google.com/', function() {
    this.log(google.getTitle(), 'info');

    var bing = require('casper').create({
        logLevel: 'info',
        verbose: true,
    });

    bing.start('http://www.bing.com/', function() {
        bing.log(bing.getTitle(), 'info');
    });

    bing.run(function() {
        this.log('Bing done.', 'info')
        bing_e = true;
        exit();
    });
});


google.run(function() {
    this.log('Google Done.', 'info');
    google_e = true;
    exit();
});


среда, 27 июня 2012 г., 14:05:54 UTC+7 пользователь short extender написал:

Jonathan Langevin

unread,
Aug 25, 2014, 10:41:00 AM8/25/14
to casp...@googlegroups.com
Is the session data (cookies, etc) shared across each spawned instance (so if instead of google.com and bing.com, I opened 2 different URLs at Google), or does each instance create it's own session?

apk

unread,
Dec 20, 2017, 6:52:59 AM12/20/17
to CasperJS
This exactly is what I want to know as well - if context/session data (cookies/headers etc) is also shared among these multiple casper instances:

I see different possibilities of running multiple casper instances:
- like above :- multiple casper instances in one script itself, running in one shell instance
- open more parallel/simultaneous shell prompts and run scripts in-parallel in those shells (even if its same script)
- what if we use different proxies in different casper instances, created in same script or scripts running in different shells?
- Will it make any difference if we use phantom/casper executable from same location in above scenarios?  Or if we make multiple copies of phantom/casper executables and use one copy in one shell?

What happens to context/session data in above different scenarios?

Many combinations!!
Anybody has any experience here?

Ken Soh

unread,
Jan 10, 2018, 11:30:28 PM1/10/18
to CasperJS
I haven't got a lot of experience in this, but I understand that the bottleneck in running in parallel is the temp files used by PhantomJS, and maybe even CasperJS. Installing separate instances of them on the same server should be able to work, but have not tested this thoroughly. Look forward to hearing more from others here - 
Reply all
Reply to author
Forward
0 new messages