What general direction would you guys send me in to try and support this use case?Would you recommend running many instances on Virtual Servers and relaying jobs between them? or something else?
Most of the time is presumably going to be spent doing network I/O, so
you could use a single phantomjs process to achive a level of
concurrency. Remember that you can create any number of WebPage objects
within PhantomJS. And I believe each WebPage will be in its own thread
and so won't block other WebPage. You could receive messages on a Web
Socket inside the "main" phantomjs context, and then use those messages
to spawn on WebPages that do stuff.
This is kinda how Poltergeist works - see the code at
https://github.com/jonleighton/poltergeist.
Cheers
You're right, it's not threaded, but several different pages can be
doing network I/O at the same time. So if the bottleneck is network I/O
then a degree of concurrency could be achieved this way, but it would
certainly need to be complemented by having a pool of processes too.
Cheers
For best performance, running multiple PhantomJS instances is still
the best way to go. Just experiment with the work load and see how
many of them you can run at the same time. I'll be also interested to
know, the quantitative metrics can be used to fine tune future version
of PhantomJS, to maximize its "instances per virtual server" quality.
Regards,
--
Ariya Hidayat, http://ariya.ofilabs.com