PhantomJS API design: methods defaulting sync vs. async?

1,952 views
Skip to first unread message

James Greene

unread,
Apr 4, 2013, 11:42:43 AM4/4/13
to phan...@googlegroups.com
Hey, all!

One thing I've been noodling on as PhantomJS has matured is the question of whether our API methods should default to operating synchronously or asynchronously for the method's default implementation (assuming both make sense for the given method).

Historically, PhantomJS API methods are created with the default named method (e.g. `WebPage#evaluate`) being synchronous and then adding another method with a suffix of "Async" to expose the asynchronous version (e.g. `WebPage#evaluateAsync`).

This is notably the exact opposite of how Node.js API methods are designed — they create the default named method as asynchronous and then add another method with a suffix of "Sync" to expose the synchronous version.

So the discussion I'd like to pose is as follows:
Are we content with the API method pattern we've been following to date, or would we like to try to stay closer in sync (no pun intended) with the Node.js API methods?

Obviously switching now would be a fairly big change but also one that we certainly wouldn't want to tackle even further down the road.  Given that not all of our users are overly comfortable with JavaScript but yet find PhantomJS a useful tool to try to utilize anyway, it is also possible—perhaps even likely—that preferring synchronous implementations as our defaults is preferable to them.

Thoughts?

Sincerely,
    James Greene

P.S. The catalyst for finally starting this discussion officially was Issue #10980.

Robert

unread,
Apr 4, 2013, 5:01:02 PM4/4/13
to phan...@googlegroups.com
My two cents:

when making changes like this, to create new method names rather than changing the behavior of existing methods. 

James Greene

unread,
Apr 4, 2013, 5:28:05 PM4/4/13
to phan...@googlegroups.com

Robert:
Valid point for consumer deprecation, yes, though definitely not ideal for project maintenance.

Do you have a preference for which mode is used for default method implementations? :)

Sincerely,
   James Greene

On Apr 4, 2013 4:01 PM, "Robert" <rsjan...@gmail.com> wrote:
My two cents:

when making changes like this, to create new method names rather than changing the behavior of existing methods. 

--
You received this message because you are subscribed to the Google Groups "phantomjs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to phantomjs+...@googlegroups.com.
Visit this group at http://groups.google.com/group/phantomjs?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Robert

unread,
Apr 4, 2013, 6:39:35 PM4/4/13
to phan...@googlegroups.com
I prefer synchronous methods as the default, because this is the way people use phantomjs.I.e. 

Load a page, fill out a form, submit, take a screenshot. 

Load a page, look for the presence of an element.

In each of these cases, what the user wants to do is fundamentally synchronous. With asynchronous methods, the user needs to write their own promise library or event loop and shoehorn the API to make it do what he wants to do in the first place. This is why casperJs is so popular and why phantomjs has a reputation for being hard to use. One shouldn't need to write a setInterval loop just to fill out a form and submit it.

Moving phantomjs more in the async direction makes the tool harder, not easier to use. There are notable exceptions, but because these are exceptions they can be labelled Async. 

Nodejs, on the other hand, was designed as a non-blocking server, which is very different from the phantomjs use cases. In phantomjs, we are always waiting: waiting for the page to finish loading, waiting for the DOM to render, etc. Phantomjs, as a client, is naturally driven by a blocking API. I would not try to make phantomjs be more like nodejs.


 Introduction of async methods means more work getting them to be synchronous. 

Bryan Bishop

unread,
Apr 4, 2013, 7:13:12 PM4/4/13
to phan...@googlegroups.com, Bryan Bishop
On Thu, Apr 4, 2013 at 5:39 PM, Robert <rsjan...@gmail.com> wrote:
I prefer synchronous methods as the default, because this is the way people use phantomjs.I.e. 

Hmm, I have a completely opposite opinion. I use phantomjs asynchronously, just like the vast majority of other javascript projects I poke at. This doesn't "make phantomjs more like nodejs"-- what you are observing is an already-existing property of how javascript/browsers work.

Robert

unread,
Apr 4, 2013, 7:31:23 PM4/4/13
to phan...@googlegroups.com, Bryan Bishop
Javscript in browsers is asynchronous, yes, because it is event driven and must not block the user or agent too much. But the agents *driving* the browsers are synchronous. Unless you open one tab, then open another tab while the first tab is still loading,  etc. I think very few people use phantomjs this way. They open one tab, wait for it to load, then click an element, wait for the response, etc. You cannot click the element before the DOM is loaded -- this is the source of much frustration. Things need to be done in order, and you are blocking (taking no action) until the next step appears. 

But I agree that as javascript has typically been used on the receiving side of user action, that many people think of javascript as inherently asynchronous. It is not more asynchronous than C or any other language. If you are waiting for events, then write asynchronous code. If you are the one driving the events, then the code is going to be synchronous because you have a test case to run through and that case needs to be processed in order.  You end up writing a lot of (ugly) contortions to make sure you are done waiting and don't take the next step prematurely.

James Greene

unread,
Apr 23, 2013, 7:47:40 PM4/23/13
to phan...@googlegroups.com
Any other opinions on this?

Sincerely,
    James Greene



--

Christoph Burgmer

unread,
Apr 24, 2013, 6:37:22 AM4/24/13
to phan...@googlegroups.com
Everything that happens asynchronously should be done asynchronously.

Should developers wish to simplify/prefer a synchronous style use something like async.js.

My 2c

execjosh

unread,
Apr 24, 2013, 11:08:04 AM4/24/13
to phan...@googlegroups.com
I think that any function that could be computationally expensive should be asynchronous, too.  For example, `page.render`, `page.evaluate`, `page.setContent`, `p{hantom,age}.injectJs`, etc. might have been better implemented as asynchronous functions in the first place.  However, little stuff like `p{hantom,age}.addCookie` do not necessarily need to be asynchronous, because there's not a lot of computation involved.

I also agree that JavaScript does not necessarily need to be asynchronous.  As such, each public API (including those of modules like fs) should be scrutinized, asking, "is this operation intrinsically asynchronous?"  If the answer is "yes", then the pair would be `func{,Sync}` where a synchronous version makes sense; and, if "no", then, `func{,Async}` where an asynchronous version makes sense.  Arguably the most important function in PhantomJS, `page.open`, is intrinsically asynchronous (lots of i/o); however, it doesn't really make sense to have a synchronous version (e.g., the tabbed browsing use case).

As a side note, I do think that the callback-with-error-as-first-param model is very nice and I would love to see it in PhantomJS's APIs where it makes sense (like in the fs module).  But, this would require careful analysis of the major PhantomJS use cases out there.  Maybe I'm off on a path by myself, though...

Also, about the "async is difficult" stuff, that problem has already been taken care of by various open-sourced js libraries.  And, since there is nothing preventing the user from using a library in PhantomJS, I don't see a problem.

Just my 2¢, also ;)

Bryan Bishop

unread,
Apr 24, 2013, 1:41:19 PM4/24/13
to phan...@googlegroups.com, Bryan Bishop
On Wed, Apr 24, 2013 at 10:08 AM, execjosh <buil...@gmail.com> wrote:
> Also, about the "async is difficult" stuff, that problem has already been
> taken care of by various open-sourced js libraries. And, since there is
> nothing preventing the user from using a library in PhantomJS, I don't see a
> problem.

On that note, I should point out that Q works pretty well in
PhantomJS. I am having some trouble figuring out the best way to
determine module compatibility with PhantomJS in bulk. Obviously,
anything with no dependencies and no requires() becomes probably okay
(but not always because of different JS features...). Anyway, use Q if
you hate async or whatever.

Darren Cook

unread,
Apr 24, 2013, 8:29:39 PM4/24/13
to phan...@googlegroups.com
> I think that any function that could be computationally expensive should be
> asynchronous, too. For example, `page.render`, `page.evaluate`, `
> page.setContent`, `p{hantom,age}.injectJs`, etc. might have been better
> implemented as asynchronous functions in the first place. However, little
> stuff like `p{hantom,age}.addCookie` do not necessarily need to be
> asynchronous, because there's not a lot of computation involved.

I'd much prefer all functions to be async or all functions to be sync.
Having to look in the manual every time to find out which way a function
goes would be a real pain. (Intuitions differ: e.g. why should
setContent be async, and addCookie be sync; surely they are both just
assigning to a string, and therefore super quick?)(*)

Darren


P.S. If asked to step off the fence I'd say all functions should be Sync
by default, and you have to add the Async suffix for an async version.
Which I think is the existing behaviour.


*: That is not a question for the list, just an example of how
perceptions can easily differ. :-)

Steven de Salas

unread,
Mar 13, 2015, 10:23:41 PM3/13/15
to phan...@googlegroups.com
Hi James, my 2c.

While I personally tend to favour synchronous code for testing (because that's how humans generally interact with a web page), given the maturity of PhantomJS I would assume that its better to leave it as it rather than try and change the default behaviour. This would be quite a surprise to current developers and affect thousands of scripts out there that rely on its existing functionality. 

Having said that there is nothing to stop one using PhantomJS synchronously, it just requires a bit of creativity. 

For example by calling a function to wait until a certain condition has been met. 

var page = require('webpage').create();
page
.open('http://en.wikipedia.org');
phantom
.waitFor(function() {!page.loading;}); // wait until finish loading
page
.render('wikipedia.org.png');

This would depend on a function as follows:

phantom.waitFor = function(callback) {
 
do {
 
this.page.sendEvent('mousemove');
 
} while (!callback());
}


You can use this approach every time you wait for something to happen (loading a page, clicking on button for XHR request etc).

Ivan

unread,
Mar 16, 2015, 5:35:23 PM3/16/15
to phan...@googlegroups.com
I personally use PhantomJS for web scraping. Initially, when I first found this beautiful project I was excited but then after some learning I realized that it is to hard to use it for even simplest scarping tasks. Then I found and tried CasperJS which solves most of problems but still even in Casper it was so unnatural to do some most basic stuff with a webpage, just because it was asynchronous. Finally I decided to fork project and make PhantomJS as synchronous as I could. I implemented opensync function and many synchronous wait functions (like waitForSelector, waitForVisible etc. though Casper got all those but they are async). So finally now I can create really complex web scrapers in just few lines of code. So I definite vote for SYNC as a first class citizen, it just makes life easer.

Reply all
Reply to author
Forward
0 new messages