[NEW API] Canceling network requests

Vitaliy Slobodin

unread,

Dec 27, 2012, 5:06:27 AM12/27/12

to phan...@googlegroups.com

Issue: http://code.google.com/p/phantomjs/issues/detail?id=230

Old API:

page.onResourceRequested = function(requestData) {
    // request data
}

New API:

page.onResourceRequested = function(request) {
    request.cancel(); // cancel the request
    request.data // access the request data like in the old API
}

Tests:

All tests passed

The new api will allow the user to cancel the network requests to js files, css, images, etc. Access the old data request is also presented.

My first idea was to pass QNetworkReply in a WebPage context. But this can lead to unexpected results. To avoid this, I created a new class to hold the pointer to QNetworkReply instance, but the user can access only to the 1 method - to cancel the request. As you can see, I deleted emitting signal 'resourceRequested' in the 'createRequest' method, because the abort request immediately after it is creating, WebKit gets stuck in infinite loop (msg loop on Windows). Since WebKit is waiting for the end of the request.

Also, in addition to new API, I think this is can be helpful - page should emit the signal 'networkError' if it couldn't load the resource.

Waiting for your comment, questions, etc.

Thanks. And sorry for my English.

Ariya Hidayat

unread,

Dec 27, 2012, 11:15:34 AM12/27/12

to phan...@googlegroups.com

Hi Vitaly,

The new API will be very useful! Thank for looking into this.

Some minor comments follow.

Using a different method signature would cause problem with
compatibility with existing code. I don't see how we can detect
upfront whether one form is used instead of the other. Another
strategy would be to add an extra function argument, e.g.:

page.onResourceRequested = function(requestData, requestController) {
requestController.cancel();
}

I like the idea of network error callback. Looking at your
implementation, it seems to relate to one particular network "reply
and thus we can just call it something like onResourceError or
something similar.

Thank you!

Best regards,

--
Ariya Hidayat, http://ariya.ofilabs.com
http://twitter.com/ariyahidayat
http://gplus.to/ariyahidayat

James Greene

unread,

Dec 27, 2012, 1:09:32 PM12/27/12

to phan...@googlegroups.com

Initial thoughts:
1. I'm not familiar enough with QtWebKit to comment on the implementation but I love the functionality.
2. I'd prefer to see `cancel` renamed to `abort`, which is the common term used for this HTTP/XHR functionality.
3. Is `data` the right word? I think its OK but it makes me think of the request/response body instead of what I would generally consider the request/response metadata.
4. More broadly (beyond just this commit), it would be good to get our HTTP classes/models more aligned with Node.js's HTTP classes where possible.

~~James

--
You received this message because you are subscribed to the Google Groups "phantomjs" group.
Visit this group at http://groups.google.com/group/phantomjs?hl=en.

James Greene

unread,

Dec 27, 2012, 1:16:03 PM12/27/12

to phan...@googlegroups.com

Oh yes, I also really like the network resource error callback idea. :)

I think Ariya's proposal for an alternate method signature is a good idea, too.

I would still advise renaming `cancel` to `abort`, and otherwise working to align our HTTP models with Node.js's where feasible.

~~James

Vitaliy Slobodin

unread,

Dec 27, 2012, 1:23:34 PM12/27/12

to phan...@googlegroups.com

1. I'm looked to internals of WebKit specially, to see how I can implement this API much better.

2. Already done :)

3. I'll do it as Ariya mentioned above.

4. I'm not familiar with node.js much, but I'll take a look

Vitaliy Slobodin

unread,

Dec 28, 2012, 11:07:41 AM12/28/12

to phan...@googlegroups.com

Hi again.

All done. Waiting for further review.

Branch: https://github.com/Vitallium/phantomjs/tree/API-allow-to-cancel-requests

Thanks.

Ariya Hidayat

unread,

Dec 30, 2012, 2:50:10 AM12/30/12

to phan...@googlegroups.com

Hi Vitaliy,

Overall it looks good to me. Some minor picks (as you clean-up the
branch for the pull request):

* Please separate the network error callback to its own commit and
issue. This makes it easy to track it for the ChangeLog.
* Add a simple test to verify that abort() really works.

A possible bonus: a new example where e.g. all CSS files are blocked
from loading.

Vivek Simha

unread,

Dec 31, 2012, 7:16:42 AM12/31/12

to phan...@googlegroups.com

Hi Vitaliy,

Just came across your fix.

I compiled the branch and using a simple modified version of examples/netsniff.js (attached).

abort or cancel are not exposed on the request object passed into the onResourceRequested handler.

Seems like the branch is compiled correctly.

What am I missing?

-vs

netsniff.js

James Greene

unread,

Dec 31, 2012, 8:05:23 AM12/31/12

to phan...@googlegroups.com

Vivek --
Check the updated code in Vitaliy's branch: the handler for `onResourceRequested` now takes a 2nd argument which exposes the `abort` method. The 1st argument remains in its original form instead of adding a layer with `abort` and `data`.
~~James

--

Vitaliy Slobodin

unread,

Dec 31, 2012, 8:06:50 AM12/31/12

to phan...@googlegroups.com

You missing the second argument in the onResourceRequested callback.
It has two arguments:
1) The request data
2) The request handler
you need to modify onResourceRequested handler, e.g.
page.onResourceRequested = function (reqData, req) {
// some code
}

Vivek Simha wrote:

Branch:
https://github.com/Vitallium/phantomjs/tree/API-allow-to-cancel-requests

Vivek Simha

unread,

Dec 31, 2012, 8:18:06 AM12/31/12

to phan...@googlegroups.com

Aaah! I see it working now! :)

Not sure I understand the Ariya's rationale for suggesting an extra parameter.

But then, this is the first time I've looked into the code itself - so my opinion should totally be discounted.

Thanks for a wonderful project and the quick responses.

-vs

Tom Moor

unread,

Jan 31, 2013, 8:25:32 PM1/31/13

to phan...@googlegroups.com

Hey all,

This doesn't seem to be on the current stable still - is that correct? In which case I guess I'll have to compile from source?

Cheers,

James Greene

unread,

Jan 31, 2013, 8:46:53 PM1/31/13

to phan...@googlegroups.com

That's correct, it will be in the 1.9 release (March 20th):

http://code.google.com/p/phantomjs/issues/detail?id=230

Otherwise, you can build it locally as you already stated:

http://phantomjs.org/build.html

Sincerely,
James Greene

--
You received this message because you are subscribed to the Google Groups "phantomjs" group.

To unsubscribe from this group and stop receiving emails from it, send an email to phantomjs+...@googlegroups.com.

Visit this group at http://groups.google.com/group/phantomjs?hl=en.

For more options, visit https://groups.google.com/groups/opt_out.

Tom Aizenberg

unread,

Mar 1, 2013, 9:32:55 AM3/1/13

to phan...@googlegroups.com

I installed master just today. Then I tried the following:

page_inst.onResourceRequested = function (reqData, req) {

setTimeout(function(){req.abort()}, 1);

}

If I do req.abort() without the setTimeout everything goes well, but with the setTimeout it doesn't:

TypeError: 'undefined' is not a function (evaluating 'req.abort()')

Is this expected behaviour or could it be a bug?

James Greene

unread,

Mar 1, 2013, 11:08:07 AM3/1/13

to phan...@googlegroups.com

Vitaliy can answer more authoritatively but I would say this behavior is [mostly] expected as the `setTimeout` call causes the callback to become asynchronous and skip to the next event loop, which is AFTER the `onResourceRequested` callback handler has finished. This also implies that the request was already issued and cannot be aborted anymore. Think of this `abort` implementation as more of a `prevent`/`disallow` concept: it is only effective/possible BEFORE the HTTP call is allowed to be sent.

Likewise, because of the nature of this API, I would assume the JsNetworkRequest object is disposed of just a few lines after the resolution of the `onResourceRequested` callback, thus why the `req` object becomes `undefined` for you. The mixed C++/JS nature of Qt makes it more complex as a simple closure like yours does not guarantee the survival of the object.

~~James

Tom Aizenberg

unread,

Mar 1, 2013, 11:16:56 AM3/1/13

to phan...@googlegroups.com

Thanks for the reply, that sounds logical to me.

What I am trying to achieve with this setTimeout is that the request kills itself when the server takes too long too answer (the 1 millisecond is just for testing).

James Greene

unread,

Mar 2, 2013, 8:34:43 AM3/2/13

to phan...@googlegroups.com

Sure, your use case makes sense.

Oddly enough, I don't think Qt/WebKit offers a way to specify the max timeout for QNetworkRequest/QNetworkReply objects, which would be the ideal approach for your scenario.

As such, you could actually need access to the QNetworkReply object [instead of a proxy to the QNetworkRequest object as here] to abort the request after it has been sent. We could probably achieve this with a similar proxy object for the `onResourceReceived` callbacks but those callbacks only get triggered on first byte received and last byte received. Therefore, these callbacks are not useful in this scenario if it takes a long time to establish the initial connection; only if the actual data transfer is slow/long. Perhaps we could workaround this scenario with either/both a "onResourceRequestSent" or a polling "onResourceProgress" callback?

I think there is actually a pending PR for the latter but I haven't looked at it yet. I'm unsure if this callback would trigger before or only after the initial `onResourcedReceived` callback.

~~James

Message has been deleted

Tom Aizenberg

unread,

Mar 11, 2013, 7:25:35 AM3/11/13

to phan...@googlegroups.com

I've looked around the web and indeed it seems that there is no built-in way to handle network timeout. There is even an open bug for it here: https://bugreports.qt-project.org/browse/QTBUG-3443. Most people facing this issue handle it by initializing a timer after the request is sent that aborts the request after the given timeout. This is impossible to do from the Javascript code because of the earlier mentioned problem that there is no reference to the request object once running.

Handling timeouts is something normal for networking libraries IMO, compare for example libraries like Python's Requests or Ruby's Net::HTTP. It would be nice I think if PhantomJS implements a timer internally that handles timeouts. Then e.g. one can set a maximum timeout on page.open that on timeout aborts the request and throws an error so the developer can handle it.

James Greene

unread,

Mar 11, 2013, 8:05:06 AM3/11/13

to phan...@googlegroups.com

Tom —

Good idea. Please file an issue for the feature request:

https://code.google.com/p/phantomjs/issues/entry

Sincerely,
James Greene

On Mon, Mar 11, 2013 at 6:25 AM, Tom Aizenberg <tom.ai...@gmail.com> wrote:

I've looked around the web and indeed it seems that there is no built-in way to handle network timeout. There is even an open bug for it here: https://bugreports.qt-project.org/browse/QTBUG-3443. Most people facing this issue handle it by initializing a timer after the request is sent that aborts the request after the given timeout. This is impossible to do from the Javascript code because of the earlier mentioned problem that there is no reference to the request object once running.

Handling timeouts is something normal for networking libraries IMO, compare for example libraries like Python's Requests or Ruby's Net::HTTP. It would be nice I think if PhantomJS implements a timer internally that handles timeouts. Then e.g. one can set a maximum timeout on page.open that on timeout aborts the request and throws an error so the developer can handle it.

--

Tom Aizenberg

unread,

Mar 11, 2013, 8:46:04 AM3/11/13

to phan...@googlegroups.com

Thanks James, I created an issue here: https://code.google.com/p/phantomjs/issues/detail?id=1129. I will try to implement this myself first.

Reply all

Reply to author

Forward