How to scale PhantomJS to run many processes concurrently?

9,525 views
Skip to first unread message

@GotNoSugarBaby

unread,
Jan 25, 2012, 5:41:49 AM1/25/12
to phan...@googlegroups.com
Hi all,
I would really appreciate some help with this problem if anyone has the time - thank you.

A simplified and paraphrased version of the use case is;
  1. Many users have an account in my app (AppLocal).
  2. My users all have accounts with 2 other apps (AppRemote1 and AppRemote2).
  3. AppRemote1&2s users use both sites regularly, repeating the same tasks on both.
  4. AppLocal users store their login details for AppRemote1&2 in their AppLocal account.
  5. AppLocal users now do the tasks they used to do separately at AppRemote1&2 once, using AppLocal instead.
  6. AppRemote1 has a lame API and AppRemote2 doesn't provide one, so we use PhantomJS to control AppRemote1&2 via their front ends.
Sounds a little weird I know...

In this use case of PhantomJS, maintaining a queue into a single Phantom instance would be far too slow for even a small number of users. Current thinking is that I'll need a high number of individual PhantomJS sessions running independently of each-other (right?).

What general direction would you guys send me in to try and support this use case? 
Would you recommend running many instances on Virtual Servers and relaying jobs between them? or something else?

Thanks for your time,

Jamie

Solomon White

unread,
Jan 25, 2012, 8:27:35 AM1/25/12
to phan...@googlegroups.com
On Wed, Jan 25, 2012 at 3:41 AM, @GotNoSugarBaby <siun...@gmail.com> wrote:
What general direction would you guys send me in to try and support this use case? 
Would you recommend running many instances on Virtual Servers and relaying jobs between them? or something else?

If I was building something like this, I would set up a message queuing system to distribute the job, and then horizontal scaling becomes easier.  I mostly work in Ruby, so I would recommend checking out Resque or Beanstalk -- but there's probably something similar available if you are working in another language.

Hope this helps,

Solomon

Jon Leighton

unread,
Jan 25, 2012, 8:51:00 AM1/25/12
to phan...@googlegroups.com
On 25/01/12 13:27, Solomon White wrote:
> On Wed, Jan 25, 2012 at 3:41 AM, @GotNoSugarBaby <siun...@gmail.com
> <mailto:siun...@gmail.com>> wrote:
>
> What general direction would you guys send me in to try and support
> this use case?
> Would you recommend running many instances on Virtual Servers and
> relaying jobs between them? or something else?
>
>
> If I was building something like this, I would set up a message queuing
> system to distribute the job, and then horizontal scaling becomes
> easier. I mostly work in Ruby, so I would recommend checking out Resque
> <https://github.com/defunkt/resque> or Beanstalk
> <http://kr.github.com/beanstalkd/> -- but there's probably something

> similar available if you are working in another language.

Most of the time is presumably going to be spent doing network I/O, so
you could use a single phantomjs process to achive a level of
concurrency. Remember that you can create any number of WebPage objects
within PhantomJS. And I believe each WebPage will be in its own thread
and so won't block other WebPage. You could receive messages on a Web
Socket inside the "main" phantomjs context, and then use those messages
to spawn on WebPages that do stuff.

This is kinda how Poltergeist works - see the code at
https://github.com/jonleighton/poltergeist.

Cheers

--
http://jonathanleighton.com/

Leo Franchi

unread,
Jan 25, 2012, 9:05:36 AM1/25/12
to phantomjs
On Jan 25, 8:51 am, Jon Leighton <j...@jonathanleighton.com> wrote:
> On 25/01/12 13:27, Solomon White wrote:
>
> > On Wed, Jan 25, 2012 at 3:41 AM, @GotNoSugarBaby <siunm...@gmail.com
> > <mailto:siunm...@gmail.com>> wrote:
>
> >     What general direction would you guys send me in to try and support
> >     this use case?
> >     Would you recommend running many instances on Virtual Servers and
> >     relaying jobs between them? or something else?
>
> > If I was building something like this, I would set up a message queuing
> > system to distribute the job, and then horizontal scaling becomes
> > easier.  I mostly work in Ruby, so I would recommend checking out Resque
> > <https://github.com/defunkt/resque> or Beanstalk
> > <http://kr.github.com/beanstalkd/> -- but there's probably something
> > similar available if you are working in another language.
>
> Most of the time is presumably going to be spent doing network I/O, so
> you could use a single phantomjs process to achive a level of
> concurrency. Remember that you can create any number of WebPage objects
> within PhantomJS. And I believe each WebPage will be in its own thread
> and so won't block other WebPage. You could receive messages on a Web
> Socket inside the "main" phantomjs context, and then use those messages
> to spawn on WebPages that do stuff.

I don't think each WebPage will be in its own thread. QtWebkit in 4.x
is not threaded and must live on the main thread (due to it using
QWidgets, which must be on the GUI thread). In Tomahawk we get
'around' this by making all our JS that is executed to do asychronous
xhr (as that's been the slow part for us). If you look at
Phantom::createWebPage it simply creates a new WebPage/QWebPage in the
main thread.

cheers,
leo

Jon Leighton

unread,
Jan 25, 2012, 9:22:20 AM1/25/12
to phan...@googlegroups.com
On 25/01/12 14:05, Leo Franchi wrote:
> I don't think each WebPage will be in its own thread. QtWebkit in 4.x
> is not threaded and must live on the main thread (due to it using
> QWidgets, which must be on the GUI thread). In Tomahawk we get
> 'around' this by making all our JS that is executed to do asychronous
> xhr (as that's been the slow part for us). If you look at
> Phantom::createWebPage it simply creates a new WebPage/QWebPage in the
> main thread.

You're right, it's not threaded, but several different pages can be
doing network I/O at the same time. So if the bottleneck is network I/O
then a degree of concurrency could be achieved this way, but it would
certainly need to be complemented by having a pool of processes too.

Cheers

--
http://jonathanleighton.com/

@GotNoSugarBaby

unread,
Jan 25, 2012, 4:49:57 PM1/25/12
to phan...@googlegroups.com
Thanks a lot Jon, Solomon, Leo. I'll try some ideas in keeping with your comments.

Ariya Hidayat

unread,
Jan 26, 2012, 9:53:50 AM1/26/12
to phan...@googlegroups.com
Check also some projects listed in
http://code.google.com/p/phantomjs/wiki/WhoUsesPhantomJS. Screenshot
services usually use other framework (Play2, Node.js) to control
PhantomJS instances and interface it with the rest of the world.

For best performance, running multiple PhantomJS instances is still
the best way to go. Just experiment with the work load and see how
many of them you can run at the same time. I'll be also interested to
know, the quantitative metrics can be used to fine tune future version
of PhantomJS, to maximize its "instances per virtual server" quality.

Regards,

--
Ariya Hidayat, http://ariya.ofilabs.com

Ryan Wilson

unread,
Jun 4, 2015, 3:47:09 PM6/4/15
to phan...@googlegroups.com
Hello Ariya,

 I want to say that PhantomJS is a nice stable headless browser. I am currently trying to implement it into our existing VB.Net software testing tool. We use Selenium as a .dll file in our software, but I have encountered a problem when trying to run more than one instance of PhantomJS through Selenium on different threads. If I run one instance of PhantomJS it does fine, but if I run more it crashes out part ways through the test with null response returned from HTTP Request exception.

I was wondering if you had any advice.

Thank You,

Ryan

Georgios Diamantopoulos

unread,
Mar 24, 2017, 9:57:00 AM3/24/17
to phantomjs
Gentlemen, 

Any ideas how to get loads of phantomjs instances running on linux? 

I get 

bash: fork: retry: No child processes
bash: fork: Resource temporarily unavailable

After about 700 procs and it's not the fs.file-max limit, I tried that :(

Georgios
Reply all
Reply to author
Forward
0 new messages