running a casper.js script in python using the sub process module

882 views
Skip to first unread message

Joshua Serry

unread,
May 19, 2014, 6:01:32 PM5/19/14
to casp...@googlegroups.com
Hi All,

I would like to run a casper.ja script in python.

I've been told you can do this using pythons subprocess module however I am not overly technical and so far I have not understood how to call a casper.js script using python.

Could someone please create a step by step example, which explains each line, as well as where my casper.js script needs to be relative to my ipython notebook and python scripts.

I'd love to be able to pass information from casper.js back to python so I could use the pandas python module to store the data i scrape into an excel spreadsheet or database.

If there are however ways of exporting data stored in casper.js to EXCEL or something like that this would be great - though I've not yet seen any examples on youtube of this being done.

Many Thanks, Josh.

The script could be as simple as Print "Hello World" so long as the value hello world (aka the output from casper.js) can be captured and used by python.

 

Srdjan Matic

unread,
May 21, 2014, 4:41:52 AM5/21/14
to casp...@googlegroups.com
Hi Joshua.
This is how you can achieve that (supposing that all the output CasperJS generates goes to *stdout*).

The python script should look like this:

#!/usr/bin/env python

import subprocess

CASPERJS_EXECUTABLE = "/bin/casperjs" # <-- here you put the path to you casperjs executable
CASPERJS_SCRIPT = "/tmp/example_casper.js" # this is the name of the script that casperjs should execute

stdout_as_string = subprocess.check_output([CASPERJS_EXECUTABLE, CASPERJS_SCRIPT])
print stdout_as_string


And this instead is the content of my "/tmp/example_casper.js" file:
 
var casper = require('casper').create();

casper.start('http://www.google.com/', function() {
    this.echo(this.getTitle());
});

casper.thenOpen('http://www.yahoo.com', function() {
    this.echo(this.getTitle());
});

casper.run()

Joshua Serry

unread,
May 25, 2014, 6:28:50 PM5/25/14
to casp...@googlegroups.com
WOW SRDJAN MATIC

Thanks for such a detailed example of how to run a python script and call a casperjs script.


After much messing around I managed to get your scripts working in windows powershell, (on my computer they wouldn't run in ipython or in cmd prompt) --> my fault not yours - your script is perfect.

In my python script I would like to pass a variable to casper js (probably list or maybe two lists or a dictionary, I doubt casperjs would undertand a pandas dataframe?)....

QUESTION --> I can't figure out how to modify what you have given me to pass a list of search terms from python to casper js, then iterate over them, and return the results to python. Could you please shed some light on this

I know how I have written it in the example below is syntactically incorrect but I wanted to try and give you a sense of what I was trying to do.

QUESTION --> Also pardon my lack of knowledge but what exactly is STDOUT?


The python script should look like this:

#!/usr/bin/env python

import subprocess

CASPERJS_EXECUTABLE = "/bin/casperjs" # <-- here you put the path to you casperjs executable
CASPERJS_SCRIPT = "/tmp/example_casper.js" # this is the name of the script that casperjs should execute

stdout_as_string = subprocess.check_output([CASPERJS_EXECUTABLE, CASPERJS_SCRIPT], search_terms[])  < -- Here I passed the list of search terms as a parameter, this is how I would do it with VBA anyway.
 
print stdout_as_string


And this instead is the content of my "/tmp/example_casper.js" file:
 
var casper = require('casper').create();

   results[]     < --- this is a list which stores all of the results, I want to pass this back to python for further processing.
 
for i in searchterms[]:
    casper.start('http://www.google.com/q=' + searchterm[i], function() {   <-- added the list of search terms, and a for loop to iterate over the search terms
    title = this.getTitle());
          
            results = results[] + title
 
   next i
});

casper.run()
Reply all
Reply to author
Forward
0 new messages