couchdb server connection refused error

1,509 views
Skip to first unread message

Tommy Chheng

unread,
Jul 24, 2009, 6:51:40 PM7/24/09
to couchd...@incubator.apache.org
If i have couchdb running for a period of time, it returns back a
"Connection refused" when i simply curl it.

I did a ps and found the beam process is still running. If I restart
it, it'll work again.. but only for a period of time before it gives
back a connection refused.

I gisted some error output in more detail here:
http://gist.github.com/154594

I'm running on a ubuntu 8.10 EC2 instance.

Any ideas why it reverts to a "connection refused' error and how can i
fix it?

Thanks,
tommy

Paul Davis

unread,
Jul 24, 2009, 6:58:05 PM7/24/09
to us...@couchdb.apache.org
Tommy,

When you get into a connection refused error, try doing:

$ netstat -tap tcp

To see if its a crap load of open sockets. If so, my money is on your
client library not properly closing connections.

Paul Davis

Tommy Chheng

unread,
Jul 24, 2009, 8:23:44 PM7/24/09
to us...@couchdb.apache.org
You are right, there was about 30 of these:
tcp 0 0 localhost.localdo:60607 localhost.localdom:5984
TIME_WAIT -

I'll let the CouchRest team know.

-
Tommy

Brian Candler

unread,
Jul 26, 2009, 8:34:56 PM7/26/09
to Tommy Chheng, us...@couchdb.apache.org
On Fri, Jul 24, 2009 at 05:23:44PM -0700, Tommy Chheng wrote:
> You are right, there was about 30 of these:
> tcp 0 0 localhost.localdo:60607 localhost.localdom:5984
> TIME_WAIT -

If there are only TIME_WAIT connections then that's pretty normal. It's a
consequence of RestClient opening a separate TCP/HTTP connection for each
request. I don't think it should cause the couchdb process to stop
accepting further connections.

Paul Davis

unread,
Jul 26, 2009, 8:42:43 PM7/26/09
to us...@couchdb.apache.org
AFAIK, you only have to exhaust whatever the server port's backlog is
set to before it starts return connection refused errors.

Tommy Chheng

unread,
Jul 27, 2009, 1:07:02 AM7/27/09
to us...@couchdb.apache.org

Any other thoughts why CouchDB would give a connection refused error?
I'm doing an ajax call so it's sending 10 requests at a time. I would
think the couchdb server could handle 10 at a time but it only returns
back 5 or so responses successfully.

Chris Anderson

unread,
Jul 27, 2009, 2:54:03 PM7/27/09
to us...@couchdb.apache.org

I've found on my Mac it's hard to get Ruby to open more than about 10
concurrent requests. There's also normally a limit to the # of
requests an Ajax application can make at the same time.

Maybe you're running into something like this?

Chris

--
Chris Anderson
http://jchrisa.net
http://couch.io

Tommy Chheng

unread,
Jul 27, 2009, 3:15:06 PM7/27/09
to us...@couchdb.apache.org
Hi Chris,
The odd thing is i'm only seeing the problem only on the EC2 instance;
the rails app runs fine locally with 10 ajax calls on my 2 yr old
macbook.

When you had problems to get Ruby to run more than 10 requests, how
did it fail? did it return a 500 or did the client app just not make
all the requests?

Thanks,
tommy

Brian Candler

unread,
Jul 29, 2009, 5:32:17 AM7/29/09
to Tommy Chheng, us...@couchdb.apache.org
On Sun, Jul 26, 2009 at 10:07:02PM -0700, Tommy Chheng wrote:
> Any other thoughts why CouchDB would give a connection refused error?
> I'm doing an ajax call so it's sending 10 requests at a time.

10 requests concurrently?

I'd have thought that Erlang could pick these up quickly, but there *is* a
TCP listen queue, which on many systems defaults to 5.

That is, if Erlang were sequentially doing accept - process - accept -
process ..., you might get this error if there were more than 5 outstanding
requests. It should be easy to test if you write a little _external handler
which just does a sleep.

The TCP listen queue is tunable at the C level - I don't know if Erlang
provides a way to set it though.

Regards,

Brian.

Tommy Chheng

unread,
Aug 10, 2009, 12:55:05 AM8/10/09
to us...@couchdb.apache.org
I found the source of the problem for the connection refused error.

Ubuntu defaults a open file limit of 1024. I had a lot of files opened
on the system so any new TCP connections could not be created. The
file limit can be checked with ulimit -n

I raised the limits in the /etc/security/limits.conf file by adding
these lines:
* soft nofile 32768
* hard nofile 32768

-
Tommy

Paul Davis

unread,
Aug 10, 2009, 1:38:40 AM8/10/09
to us...@couchdb.apache.org
Tommy,

Thanks for sending a resolution on that one.

Odd that the error is a failed to connect error and not an error about
open files. Unless of course that was just curl not showing the
underlying errno and just lumping all errors in to connection errors.

Paul

Brian Candler

unread,
Aug 10, 2009, 4:19:57 AM8/10/09
to Tommy Chheng, us...@couchdb.apache.org
On Sun, Aug 09, 2009 at 09:55:05PM -0700, Tommy Chheng wrote:
> I found the source of the problem for the connection refused error.
>
> Ubuntu defaults a open file limit of 1024. I had a lot of files opened
> on the system so any new TCP connections could not be created. The file
> limit can be checked with ulimit -n

Was the open files bottleneck hit on the client process, or the couchdb
erlang process?

I imagine it was the former.

> I raised the limits in the /etc/security/limits.conf file by adding
> these lines:
> * soft nofile 32768
> * hard nofile 32768

For the uid which was running the client application?

Regards,

Brian.

Tommy Chheng

unread,
Aug 10, 2009, 2:17:01 PM8/10/09
to Brian Candler, us...@couchdb.apache.org

On Aug 10, 2009, at 1:19 AM, Brian Candler wrote:

> On Sun, Aug 09, 2009 at 09:55:05PM -0700, Tommy Chheng wrote:
>> I found the source of the problem for the connection refused error.
>>
>> Ubuntu defaults a open file limit of 1024. I had a lot of files
>> opened
>> on the system so any new TCP connections could not be created. The
>> file
>> limit can be checked with ulimit -n
>
> Was the open files bottleneck hit on the client process, or the
> couchdb
> erlang process?
>
> I imagine it was the former.

They are on both the same machine so it wouldn't matter because it is
at the OS level?
If too many connections are being open, the OS will be refusing to
open more, no matter the client or couchdb process.

>
>> I raised the limits in the /etc/security/limits.conf file by adding
>> these lines:
>> * soft nofile 32768
>> * hard nofile 32768
>
> For the uid which was running the client application?

I set it for all users in the limits file.

>
> Regards,
>
> Brian.

Brian Candler

unread,
Aug 10, 2009, 3:49:14 PM8/10/09
to Tommy Chheng, us...@couchdb.apache.org
On Mon, Aug 10, 2009 at 11:17:01AM -0700, Tommy Chheng wrote:
>> Was the open files bottleneck hit on the client process, or the couchdb
>> erlang process?
>>
>> I imagine it was the former.
>>
> They are on both the same machine so it wouldn't matter because it is
> at the OS level?
> If too many connections are being open, the OS will be refusing to
> open more, no matter the client or couchdb process.

It matters because it is a per-process limit, not a system-wide limit.
(Well, there may be a system-wide limit on file descriptors too, but that
would be set elsewhere, as a sysctl tunable I think)

> I raised the limits in the /etc/security/limits.conf file by adding
>
> these lines:
>
> * soft nofile 32768
>
> * hard nofile 32768
>
> For the uid which was running the client application?
>
> I set it for all users in the limits file.

That's a bit of a sledgehammer approach, and will leave you vulnerable to
denial-of-service attacks from other users.

The limits are set like they are to give you some protection from this. If
the client program runs as uid foo, then you should just give uid foo this
benefit.

I don't know what the client program actually is, though. Is it a web
browser? In that case you would have to give every user who runs a web
browser on that machine this privilege. However it seems remarkable that a
web browser would open 1000+ concurrent file handles, since code running in
the browser doesn't have direct filesystem access anyway (unless you're
running Java applets?)

Or is it some middleware application, which receives requests from the
browser clients, and forwards them onto the backends?

You might want to see if you can improve the application by closing files
when you're no longer using them. If your app really needs to have 1,000
files open concurrently then so be it, but if it's a file descriptor leak
then you'll want to plug it, otherwise you'll just die a bit later when you
reach 32K open files.

Regards,

Brian.

Tommy Chheng

unread,
Aug 10, 2009, 4:05:42 PM8/10/09
to Brian Candler, us...@couchdb.apache.org
> I don't know what the client program actually is, though. Is it a web
> browser? In that case you would have to give every user who runs a web
> browser on that machine this privilege. However it seems remarkable
> that a
> web browser would open 1000+ concurrent file handles, since code
> running in
> the browser doesn't have direct filesystem access anyway (unless
> you're
> running Java applets?)
>
> Or is it some middleware application, which receives requests from the
> browser clients, and forwards them onto the backends?

It is a Ruby app using Couchrest(which uses restclient/net ruby lib)

I'm basically comparing one document against all other documents(+30K
documents in the dataset; so it's huge number of connections if the
connections aren't being closed properly) like this:
grants = NsfGrant.all.paginate(:page => current_page, :per_page
=> page_size)
grants.each do |doc2|
NsfGrantSimilarity.compute_and_store(doc1, doc2)


I suspect there could be a file descriptor(due to not closing the
connection) leak in the Ruby app, I'll have to investigate more to see
where the source of the problem is.

Brian Candler

unread,
Aug 10, 2009, 4:19:14 PM8/10/09
to Tommy Chheng, us...@couchdb.apache.org
On Mon, Aug 10, 2009 at 01:05:42PM -0700, Tommy Chheng wrote:
> It is a Ruby app using Couchrest(which uses restclient/net ruby lib)
>
> I'm basically comparing one document against all other documents(+30K
> documents in the dataset; so it's huge number of connections if the
> connections aren't being closed properly) like this:
> grants = NsfGrant.all.paginate(:page => current_page, :per_page =>
> page_size)
> grants.each do |doc2|
> NsfGrantSimilarity.compute_and_store(doc1, doc2)

But presumably NsfGrant.all only makes a single HTTP request, not 30K
separate requests? Looking at "netstat -n" will give you a rough idea, at
least for seeing how many sockets are left in TIME_WAIT state, but the
surest way is with tcpdump:

tcpdump -i lo -n -s0 'host 127.0.0.1 and tcp dst port 5984 and
(tcp[tcpflags] & tcp-syn != 0)'

should show you one line for each new HTTP connection made to CouchDB.

But in any case, for parsing 30K documents, you may not want to load all 30K
into RAM and then compare then afterwards. Couchrest lets you do a streaming
view, so that one object is read at a time - I think if you call view with a
block, then it works this way automatically. You need to have curl installed
for this to work, as it shells out a separate curl process and then reads
the response one line at a time.

# Query a CouchDB view as defined by a <tt>_design</tt> document. Accepts
# paramaters as described in http://wiki.apache.org/couchdb/HttpViewApi
def view(name, params = {}, &block)
keys = params.delete(:keys)
name = name.split('/') # I think this will always be length == 2, but maybe not...
dname = name.shift
vname = name.join('/')
url = CouchRest.paramify_url "#{@uri}/_design/#{dname}/_view/#{vname}", params
if keys
CouchRest.post(url, {:keys => keys})
else
if block_given?
@streamer.view("_design/#{dname}/_view/#{vname}", params, &block)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
else
CouchRest.get url
end
end
end

HTH,

Brian.

Paul Davis

unread,
Aug 10, 2009, 4:00:23 PM8/10/09
to us...@couchdb.apache.org
Idle thought, but I'm suddenly fairly certain that there was a bug fix
in 0.9.1 to fix leaking file handles. I also realized that it must be
the server with too many open files as the call to curl certainly
isn't running out of descriptors and the server running out could
definitely cause a connection refused error. Narf. Not sure why it
took so long to put that one together.

Can you try upgrade to 0.9.1 to see if the error persists? There
shouldn't be any sort of incompatibility in the releases so it'd just
be a matter of builidng and installing.

Paul Davis

Tommy Chheng

unread,
Aug 10, 2009, 4:31:59 PM8/10/09
to us...@couchdb.apache.org

On Aug 10, 2009, at 1:19 PM, Brian Candler wrote:

> On Mon, Aug 10, 2009 at 01:05:42PM -0700, Tommy Chheng wrote:
>> It is a Ruby app using Couchrest(which uses restclient/net ruby lib)
>>
>> I'm basically comparing one document against all other documents(+30K
>> documents in the dataset; so it's huge number of connections if the
>> connections aren't being closed properly) like this:
>> grants = NsfGrant.all.paginate(:page => current_page, :per_page =>
>> page_size)
>> grants.each do |doc2|
>> NsfGrantSimilarity.compute_and_store(doc1, doc2)
>
> But presumably NsfGrant.all only makes a single HTTP request, not 30K
> separate requests?

NsfGrant.all will make one query(per paginated result) but I make
another query PER document to get a document's word count list(via a
view) in the NsfGrantSimilarity.compute_and_store method. so it will
be trying to do 30k separate requests.

> Looking at "netstat -n" will give you a rough idea, at
> least for seeing how many sockets are left in TIME_WAIT state, but the
> surest way is with tcpdump:
>
> tcpdump -i lo -n -s0 'host 127.0.0.1 and tcp dst port 5984 and
> (tcp[tcpflags] & tcp-syn != 0)'
>
> should show you one line for each new HTTP connection made to CouchDB.

it'll show 13 lines of this:
20:29:03.255746 IP 127.0.0.1.58119 > 127.0.0.1.5984: S
3662357700:3662357700(0) win 32792 <mss 16396,sackOK,timestamp
112518115 0,nop,wscale 6>
failing on the client side with Errno::ECONNREFUSED: Connection
refused - connect(2)
from /usr/lib/ruby/1.8/net/http.rb:560:in `initialize
from /usr/lib/ruby/1.8/net/http.rb:560:in `open'
from /usr/lib/ruby/1.8/net/http.rb:560:in `connect'
from /usr/lib/ruby/1.8/timeout.rb:53:in `timeout''


>
> But in any case, for parsing 30K documents, you may not want to load
> all 30K
> into RAM and then compare then afterwards. Couchrest lets you do a
> streaming
> view, so that one object is read at a time - I think if you call
> view with a
> block, then it works this way automatically. You need to have curl
> installed
> for this to work, as it shells out a separate curl process and then
> reads
> the response one line at a time.

Thanks, i'll have to try this approach.

Brian Candler

unread,
Aug 12, 2009, 4:36:17 PM8/12/09
to Tommy Chheng, us...@couchdb.apache.org
On Mon, Aug 10, 2009 at 01:31:59PM -0700, Tommy Chheng wrote:
> NsfGrant.all will make one query(per paginated result) but I make
> another query PER document to get a document's word count list(via a
> view) in the NsfGrantSimilarity.compute_and_store method. so it will be
> trying to do 30k separate requests.

Then you will speed up your application tons by doing this in batches using
a multi-key fetch with POST :-)

Cheers,

Brian.

Reply all
Reply to author
Forward
0 new messages