Error: too many open files

347 views
Skip to first unread message

Robin Chou

unread,
May 1, 2014, 4:47:14 PM5/1/14
to nsq-...@googlegroups.com
Hey there,

I'm running into an issue on my NSQ setup.  After a certain number of messages, NSQ starts closing connections when I try to send to it. After checking the logs, I see the output:

ERROR: diskqueue(api_calls) failed to sync - open api_calls.diskqueue.meta.dat.tmp: too many open files

Now I know I can raise the open file limit and it seems to fix it but why is this needed in the first place?  Is my implementation at fault? (I'm using a mux service to expose a public interface to my web apps).

Thanks and any advice would be appreciated.

NSQ v. 0.2.27
Go 1.2
EC2 micro instance behind ELB

Matt Reiferson

unread,
May 1, 2014, 4:55:41 PM5/1/14
to Robin Chou, nsq-...@googlegroups.com
Hi Robin,

As more topics/channels/clients interact with the process it will use more file descriptors for various IO purposes.  We aren’t aware of any leaks, so it could just be that your producers are opening persistent connections or you have more topics/channels/clients than your current limit.

Hope this helps.

Thanks,

Matt

Andrew Diamond

unread,
Aug 1, 2014, 5:32:28 PM8/1/14
to nsq-...@googlegroups.com, chou...@gmail.com
I get this error as well using nsqd v0.2.29-alpha and go-nsq 0.3.7.

I have three consumers reading out of three channels. My HandleMessage functions all call message.DisableAutoResponse() so they can process messages asynchronously.

One of my consumers does some potentially long-running processing, so it calls message.Touch() in three different places, so that nsqd knows it's still working on the message.

When this consumer has been running for a while and I run this command

lsof | grep TCP | wc -l

I'll often see 40,000+ open TCP connections. Running this command

lsof | grep TCP

All the open connections look like this:

bag_proce 14751 14766 ubuntu 112u IPv4 3579235 0t0 TCP localhost:43709->localhost:4151 (ESTABLISHED)
bag_proce 14751 14766 ubuntu 114u IPv4 3579304 0t0 TCP localhost:43740->localhost:4151 (ESTABLISHED)

That is, they are all TCP connections to nsqd on port 4151.

As soon as I kill the consumer that's making the Touch() calls, the number of TCP connections drops from 40,000+ to a dozen or so.

If I restart that consumer, but comment out all the message.Touch() calls, the number of open TCP connections hovers around 400, which is fine. I can see connections are getting closed, and I don't get the "too many open files" error.

Is there a problem with the Touch command in the V2 API in either nsqd or go-nsq?

Andrew

Matt Reiferson

unread,
Aug 1, 2014, 5:40:22 PM8/1/14
to Andrew Diamond, nsq-...@googlegroups.com, Robin Chou
Hi Andrew,

Can you provide some code?

What’s really odd is that 4151 is the HTTP port, which would mean something is publishing to nsqd but not properly closing connections.

Andrew Diamond

unread,
Aug 1, 2014, 5:44:01 PM8/1/14
to nsq-...@googlegroups.com, chou...@gmail.com
Looks like I may have spoken too soon. I'm still seeing too many open TCP connections to nsqd, even with the Touch calls commented out.

I'll dig in a little further and see what I can find.

Andrew Diamond

unread,
Aug 1, 2014, 5:45:13 PM8/1/14
to nsq-...@googlegroups.com, andrew....@aptrust.org, chou...@gmail.com
OK. Thanks for that info. I will look at that. One of my consumers does post to another topic. I'm probably not closing that connection.

Thanks for that info!

Andrew

Andrew Diamond

unread,
Aug 1, 2014, 6:17:19 PM8/1/14
to nsq-...@googlegroups.com, andrew....@aptrust.org, chou...@gmail.com
Looks like you were right. Thanks for steering me in the right direction.

When you publish a message to a topic via HTTP, nsqd sends a simple "OK" response. If your HTTP client does not read and close the response body, the connection stays open forever.

Thanks for your help!

Andrew

Reply all
Reply to author
Forward
0 new messages