Max parallel http.client requests

141 views
Skip to first unread message

Mick

unread,
Jun 1, 2012, 8:00:12 AM6/1/12
to nodejs
I have to scrape thousands of different websites, as fast as possible.
On a single node process I was able to fetch 10 urls per second.
Though if I fork the task to 10 worker processes, I can reach 64 reqs/
sec.

Why is so?
Why I am limited to 10 reqs/sec on a single process and have to spawn
workers to reach 64 reqs/sec?

- I am not reaching max sockets/host (agent.maxSockets) limit: all
urls are from unique hosts.
- I am not reaching max file descriptors limit (AFAIK): my ulimit -n
is 2560, and lsof shows that my scraper never uses more than 20 file
descriptors.

Is there any limit I don't know about? I am on Mac OS-X.

Ben Noordhuis

unread,
Jun 1, 2012, 10:23:46 AM6/1/12
to nod...@googlegroups.com
Can you post or link to your code?

Mick

unread,
Jun 1, 2012, 11:42:44 AM6/1/12
to nodejs
I don't think posting the whole code here would benefit the question.
The process is very simple: 1. start 10 workers with
child_process.fork; 2. read 400 random urls from db; 3. pass each url
to random worker; 4. calculate number of responses per second.

I've increased settings for kern.maxfiles, kern.maxfilesperproc,
kern.ipc.somaxconn, and kern.ipc.maxsockets in sysctl.conf, and
rebooted. No effect.

On Jun 1, 5:23 pm, Ben Noordhuis <i...@bnoordhuis.nl> wrote:

Isaac Schlueter

unread,
Jun 5, 2012, 12:29:22 AM6/5/12
to nod...@googlegroups.com
On Fri, Jun 1, 2012 at 8:42 AM, Mick <micko...@gmail.com> wrote:
> I don't think posting the whole code here would benefit the question.

Describing code in english is notoriously buggy.

Please post a link to the code. Otherwise, it is impossible to help
you, because we can't investigate what you're doing.
Reply all
Reply to author
Forward
0 new messages