Re: [nodejs] How to catch timeouts for individual http client requests?

270 views
Skip to first unread message

Julian Lannigan

unread,
Oct 11, 2012, 9:52:15 AM10/11/12
to nod...@googlegroups.com
Hi Jason,

This looks like a job for async's queue.  It will help you correctly manage the download of extremely large tasks. Refer to my example and the documentation on how to use it.

See this updated version of your gist.

Julian Lannigan



On Thu, Oct 11, 2012 at 12:13 AM, Jason Venable <jason....@gmail.com> wrote:
First a disclaimer that I am very new to node so please point me in the right direction if this is not the right place or I have not provided enough or the correct details.

I have a use case where I need to perform many http get request against many URLs. This could be in the thousands. There is no need at all for these to be synchronous or complete in order. Node seems like a great fit but there is one thing I can quite figure out.

I have a loop that is queuing up a ton of http get request and node starts chipping away at these but in this big pool there may be some urls that will timeout. I'd like to capture these individual url timeouts and handle appropriately. I have set the .setTimeout() method to the get request but this appears to be attached to an individual socket and it appears this socket could very well be handling or queued up with 100s of requests (if not all of them) and if one url times out the entire socket gets kill also timing out the other URLs that are pending. 

Code is below reproduce. It simply does 1000 GETs against different google domains and has a setTimeout() option for 5 seconds. It works great for about 5 seconds and then nearly every request that has been queued up is killed. I assume this is because I am killing the entire socket all the pooled requests are depending on

Do I need to open a unique socket for each request? How would I go about doing this? 

Is there anything else am I doing wrong or should be doing differently to handle this type of scenario.


var http = require('http');
var extensions = ['com', 'ac', 'ad', 'ae', 'com.af', 'com.ag', 'com.ai', 'am', 'it.ao', 
'com.ar', 'as', 'at', 'com.au', 'az', 'ba', 'com.bd', 'be', 'bf', 'bg', 'com.bh', 'bi', 
'bj', 'com.bn', 'com.bo', 'com.br', 'bs', 'co.bw', 'com.by', 'com.bz', 'ca', 'com.kh', 
'cc', 'cd', 'cf', 'cat', 'cg', 'ch', 'ci', 'co.ck', 'cl', 'cm', 'cn', 'com.co', 'co.cr', 
'com.cu', 'cv', 'cz', 'de', 'dj', 'dk', 'dm', 'com.do', 'dz', 'com.ec', 'ee', 'com.eg', 
'es', 'com.et', 'fi', 'com.fj', 'fm', 'fr', 'ga', 'gd', 'ge', 'gf', 'gg', 'com.gh', 
'com.gi', 'gl', 'gm', 'gp', 'gr', 'com.gt', 'gy', 'com.hk', 'hn', 'hr', 'ht', 'hu', 
'co.id', 'iq', 'ie', 'co.il', 'im', 'co.in', 'io', 'is', 'it', 'je', 'com.jm', 'jo', 
'co.jp', 'co.ke', 'com.kh', 'ki', 'kg', 'co.kr', 'com.kw', 'kz', 'la', 'com.lb', 
'com.lc', 'li', 'lk', 'co.ls', 'lt', 'lu', 'lv', 'com.ly', 'co.ma', 'md', 'me', 'mg', 
'mk', 'ml', 'mn', 'ms', 'com.mt', 'mu', 'mv', 'mw', 'com.mx', 'com.my', 'co.mz', 
'com.na', 'ne', 'com.nf', 'com.ng', 'com.ni', 'nl', 'no', 'com.np', 'nr', 'nu', 'co.nz', 
'com.om', 'com.pa', 'com.pe', 'com.ph', 'com.pk', 'pl', 'pn', 'com.pr', 'ps', 'pt', 
'com.py', 'com.qa', 'ro', 'rs', 'ru', 'rw', 'com.sa', 'com.sb', 'sc', 'se', 'com.sg', 
'sh', 'si', 'sk', 'com.sl', 'sn', 'sm', 'so', 'st', 'com.sv', 'td', 'tg', 'co.th', 
'com.tj', 'tk', 'tl', 'tm', 'to', 'com.tn', 'com.tr', 'tt', 'com.tw', 'co.tz', 'com.ua', 
'co.ug', 'co.uk', 'us', 'com.uy', 'co.uz', 'com.vc', 'co.ve', 'vg', 'co.vi', 'com.vn', 
'vu', 'ws', 'co.za', 'co.zm', 'co.zw'];
var extCount = 0;

//Do a lot of http get requests to different google pages
for(var i = 0; i < 1000; i++){
var url = 'www.google.' + extensions[extCount];
doRequest(url);
//Increment the extension counter
extCount++;
//Reset the extension counter if necessary
if(extCount == extensions.length){
extCount = 0;
}
}

function doRequest(url){
http.get({host: url}, function(res) {
console.log(url + ' : ' + res.statusCode);
}).on('error', function(e) {
console.log(url + ' error: ' + e.message);
}).setTimeout(5000,function(){
this.abort(); //This kills all pooled/queue http get requests waiting to be processed
//How to I catch a timeout for each individual url request
});
}

Oh, and should I post this on stack exchange or are double posts frowned upon?

Thanks,
Jason

--
Job Board: http://jobs.nodejs.org/
Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com
To unsubscribe from this group, send email to
nodejs+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

Jason Venable

unread,
Oct 13, 2012, 11:22:27 AM10/13/12
to nod...@googlegroups.com
Julian,

Thank you so much! This works wonderfully.

I had actually looked at async queue about a week ago and thought, "Hmm, this may work for what I am trying to do", but never got around to looking at in detail.

Thanks again,
Jason

Jason Venable

unread,
Oct 13, 2012, 4:01:35 PM10/13/12
to nod...@googlegroups.com
And here it is with the timeout added back that does not kill every queued up request.

Julian Lannigan

unread,
Oct 14, 2012, 12:03:08 PM10/14/12
to nod...@googlegroups.com
Hi Jason,

I'm glad this was able to help.  I really like async :).

My only comment on your gist is the fact that your http.get has the possibility to execute the queue callback up to 2 times.  This is because you have it being calling in both the callback of the http.get and in the error event handler of the http.get.  I'm not sure if the http callback still gets executed if you abort the request in the timeout.

Note that if a request does call the error event, it will run the error handler and the http callback with a error in the first argument. And if you call the queue callback more than once in the same task it can/will spawn queue worker(s) above the concurrency limit if your queue is already saturated. And these queue workers will persist until the queue is empty/not saturated.  Which could turn into a real problem with really large queues.

Check out my queue test to demonstrate the above statement.  Run it once as is, then uncomment line 15 and run again.  See the difference in the  concurrency of the tasks.

Julian Lannigan


On Sat, Oct 13, 2012 at 4:01 PM, Jason Venable <jason....@gmail.com> wrote:
And here it is with the timeout added back that does not kill every queued up request.

Reply all
Reply to author
Forward
Message has been deleted
0 new messages