MultiThreading With PhantomJS

4,916 views
Skip to first unread message

Clicky

unread,
Feb 15, 2013, 3:56:43 PM2/15/13
to phan...@googlegroups.com
Hello,

I'm using a Java headless library called HtmlUnit, but due to it's lack of reliable javascript execution, I decided that PhantomJS will be better for what I'm doing, but I noticed that using PhantomJS in a similar fashion like HtmlUnit requires starting it as new processes or creating multiple webpages which cannot be called multi-threading. My question is this; will the following be possible with PhantomJS?

1. In HtmlUnit I can create new WebClient under new threads each with their own proxy, thereby I can run like 50 threads all scraping and parsing web pages, can PhantomJS be used in a similar way? If yes, how?

You are doing a good job with PhantomJS and I appreciate your effort for giving us this powerful headless browser.

Wishing you all the best.

James Greene

unread,
Feb 15, 2013, 7:44:01 PM2/15/13
to phan...@googlegroups.com

Sure, just spin up several PhantomJS processes. They are all self-contained, with the possible exceptions of caching/storage file location and, if using the "webserver" module, port number.

Alternatively, you could also have just a handful of PhantomJS processes that you keep running (think like a service) and then send data to them by various mechanisms: add a file to folder(s) that the PhantomJS instances are "watching", or have your PhantomJS instances listening to HTTP port(s) with the "webserver" module and send HTTP requests to them from your orchestrator process, or you could probably hookup some sort of WebSocket stream between PhantomJS and the orchestrator process, etc.

~~James

--
You received this message because you are subscribed to the Google Groups "phantomjs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to phantomjs+...@googlegroups.com.
Visit this group at http://groups.google.com/group/phantomjs?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

hazzadous

unread,
Feb 15, 2013, 8:37:04 PM2/15/13
to phan...@googlegroups.com
On a side note, do we operate on a cooperative fashion (ie. only one JS script is executable at any one time), or does each webpage have its own thread of execution?  It seems like the former.  Obviously that comes with its own problems in terms of executing external content eg. no way to stop infinite loops aside from at the process level.

hazzadous

unread,
Feb 15, 2013, 8:52:30 PM2/15/13
to phan...@googlegroups.com
For reference, my test:

var webpage = require('webpage');
var page = webpage.create();
page.content = '<script>for(;;);</script>';
phantom.exit();

James Greene

unread,
Feb 15, 2013, 8:58:40 PM2/15/13
to phan...@googlegroups.com

You are correct: because JavaScript is single-threaded, an individual PhantomJS process is also single-threaded.
~~James

James Greene

unread,
Feb 15, 2013, 9:01:31 PM2/15/13
to phan...@googlegroups.com

This is also why modern browsers have a prompt to halt script execution for long-running scripts. Not sure if QtWebKit exposes such a signal for us to connect to but it would be very useful!
~~James

James Greene

unread,
Feb 15, 2013, 9:12:19 PM2/15/13
to phan...@googlegroups.com

No signal, rather we just need to override the `QWebPage#shouldInterruptJavaScript` method:
    http://qt-project.org/doc/qt-5.0/qtwebkit/qwebpage.html#shouldInterruptJavaScript

We already override that but we currently just tell it to keep processing. We could easily add a signal/callback there to allow the user to programmatically halt the script. If you are interested in thia functionality, please file a new issue for the feature request and then reply here with the link to it.

Thanks!

~~James

hazzadous

unread,
Feb 15, 2013, 9:28:29 PM2/15/13
to phan...@googlegroups.com
Great.  Just back from crashing chrome to see what it does.  Funny stuff, seems it treats devtools run scripts differently from on page script.

Will file for a feature request, thanks.

On another side note and in an attempt to bring something useful to the OP, I've seen people mention using webdis as a method of running worker processes in phantomjs.  May be of some interest although never tried.

Anderson Wiese

unread,
Feb 16, 2013, 1:09:45 PM2/16/13
to phan...@googlegroups.com
If I remember correctly back to the early days of Chrome, one of its architectural features was that it runs each tab in a separate process. So each page environment is single threaded, but independent of other pages. That could affect what you see when testing runnaway behavior in Chrome. Back then I did some testing to prove to myself that scripts that would bring down the whole Firefox browser only affected one tab in Chrome. 
Reply all
Reply to author
Forward
0 new messages