I did a ps and found the beam process is still running. If I restart
it, it'll work again.. but only for a period of time before it gives
back a connection refused.
I gisted some error output in more detail here:
http://gist.github.com/154594
I'm running on a ubuntu 8.10 EC2 instance.
Any ideas why it reverts to a "connection refused' error and how can i
fix it?
Thanks,
tommy
When you get into a connection refused error, try doing:
$ netstat -tap tcp
To see if its a crap load of open sockets. If so, my money is on your
client library not properly closing connections.
Paul Davis
I'll let the CouchRest team know.
-
Tommy
If there are only TIME_WAIT connections then that's pretty normal. It's a
consequence of RestClient opening a separate TCP/HTTP connection for each
request. I don't think it should cause the couchdb process to stop
accepting further connections.
Any other thoughts why CouchDB would give a connection refused error?
I'm doing an ajax call so it's sending 10 requests at a time. I would
think the couchdb server could handle 10 at a time but it only returns
back 5 or so responses successfully.
I've found on my Mac it's hard to get Ruby to open more than about 10
concurrent requests. There's also normally a limit to the # of
requests an Ajax application can make at the same time.
Maybe you're running into something like this?
Chris
--
Chris Anderson
http://jchrisa.net
http://couch.io
When you had problems to get Ruby to run more than 10 requests, how
did it fail? did it return a 500 or did the client app just not make
all the requests?
Thanks,
tommy
10 requests concurrently?
I'd have thought that Erlang could pick these up quickly, but there *is* a
TCP listen queue, which on many systems defaults to 5.
That is, if Erlang were sequentially doing accept - process - accept -
process ..., you might get this error if there were more than 5 outstanding
requests. It should be easy to test if you write a little _external handler
which just does a sleep.
The TCP listen queue is tunable at the C level - I don't know if Erlang
provides a way to set it though.
Regards,
Brian.
Ubuntu defaults a open file limit of 1024. I had a lot of files opened
on the system so any new TCP connections could not be created. The
file limit can be checked with ulimit -n
I raised the limits in the /etc/security/limits.conf file by adding
these lines:
* soft nofile 32768
* hard nofile 32768
-
Tommy
Thanks for sending a resolution on that one.
Odd that the error is a failed to connect error and not an error about
open files. Unless of course that was just curl not showing the
underlying errno and just lumping all errors in to connection errors.
Paul
Was the open files bottleneck hit on the client process, or the couchdb
erlang process?
I imagine it was the former.
> I raised the limits in the /etc/security/limits.conf file by adding
> these lines:
> * soft nofile 32768
> * hard nofile 32768
For the uid which was running the client application?
Regards,
Brian.
> On Sun, Aug 09, 2009 at 09:55:05PM -0700, Tommy Chheng wrote:
>> I found the source of the problem for the connection refused error.
>>
>> Ubuntu defaults a open file limit of 1024. I had a lot of files
>> opened
>> on the system so any new TCP connections could not be created. The
>> file
>> limit can be checked with ulimit -n
>
> Was the open files bottleneck hit on the client process, or the
> couchdb
> erlang process?
>
> I imagine it was the former.
They are on both the same machine so it wouldn't matter because it is
at the OS level?
If too many connections are being open, the OS will be refusing to
open more, no matter the client or couchdb process.
>
>> I raised the limits in the /etc/security/limits.conf file by adding
>> these lines:
>> * soft nofile 32768
>> * hard nofile 32768
>
> For the uid which was running the client application?
I set it for all users in the limits file.
>
> Regards,
>
> Brian.
It matters because it is a per-process limit, not a system-wide limit.
(Well, there may be a system-wide limit on file descriptors too, but that
would be set elsewhere, as a sysctl tunable I think)
> I raised the limits in the /etc/security/limits.conf file by adding
>
> these lines:
>
> * soft nofile 32768
>
> * hard nofile 32768
>
> For the uid which was running the client application?
>
> I set it for all users in the limits file.
That's a bit of a sledgehammer approach, and will leave you vulnerable to
denial-of-service attacks from other users.
The limits are set like they are to give you some protection from this. If
the client program runs as uid foo, then you should just give uid foo this
benefit.
I don't know what the client program actually is, though. Is it a web
browser? In that case you would have to give every user who runs a web
browser on that machine this privilege. However it seems remarkable that a
web browser would open 1000+ concurrent file handles, since code running in
the browser doesn't have direct filesystem access anyway (unless you're
running Java applets?)
Or is it some middleware application, which receives requests from the
browser clients, and forwards them onto the backends?
You might want to see if you can improve the application by closing files
when you're no longer using them. If your app really needs to have 1,000
files open concurrently then so be it, but if it's a file descriptor leak
then you'll want to plug it, otherwise you'll just die a bit later when you
reach 32K open files.
Regards,
Brian.
It is a Ruby app using Couchrest(which uses restclient/net ruby lib)
I'm basically comparing one document against all other documents(+30K
documents in the dataset; so it's huge number of connections if the
connections aren't being closed properly) like this:
grants = NsfGrant.all.paginate(:page => current_page, :per_page
=> page_size)
grants.each do |doc2|
NsfGrantSimilarity.compute_and_store(doc1, doc2)
I suspect there could be a file descriptor(due to not closing the
connection) leak in the Ruby app, I'll have to investigate more to see
where the source of the problem is.
But presumably NsfGrant.all only makes a single HTTP request, not 30K
separate requests? Looking at "netstat -n" will give you a rough idea, at
least for seeing how many sockets are left in TIME_WAIT state, but the
surest way is with tcpdump:
tcpdump -i lo -n -s0 'host 127.0.0.1 and tcp dst port 5984 and
(tcp[tcpflags] & tcp-syn != 0)'
should show you one line for each new HTTP connection made to CouchDB.
But in any case, for parsing 30K documents, you may not want to load all 30K
into RAM and then compare then afterwards. Couchrest lets you do a streaming
view, so that one object is read at a time - I think if you call view with a
block, then it works this way automatically. You need to have curl installed
for this to work, as it shells out a separate curl process and then reads
the response one line at a time.
# Query a CouchDB view as defined by a <tt>_design</tt> document. Accepts
# paramaters as described in http://wiki.apache.org/couchdb/HttpViewApi
def view(name, params = {}, &block)
keys = params.delete(:keys)
name = name.split('/') # I think this will always be length == 2, but maybe not...
dname = name.shift
vname = name.join('/')
url = CouchRest.paramify_url "#{@uri}/_design/#{dname}/_view/#{vname}", params
if keys
CouchRest.post(url, {:keys => keys})
else
if block_given?
@streamer.view("_design/#{dname}/_view/#{vname}", params, &block)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
else
CouchRest.get url
end
end
end
HTH,
Brian.
Can you try upgrade to 0.9.1 to see if the error persists? There
shouldn't be any sort of incompatibility in the releases so it'd just
be a matter of builidng and installing.
Paul Davis
> On Mon, Aug 10, 2009 at 01:05:42PM -0700, Tommy Chheng wrote:
>> It is a Ruby app using Couchrest(which uses restclient/net ruby lib)
>>
>> I'm basically comparing one document against all other documents(+30K
>> documents in the dataset; so it's huge number of connections if the
>> connections aren't being closed properly) like this:
>> grants = NsfGrant.all.paginate(:page => current_page, :per_page =>
>> page_size)
>> grants.each do |doc2|
>> NsfGrantSimilarity.compute_and_store(doc1, doc2)
>
> But presumably NsfGrant.all only makes a single HTTP request, not 30K
> separate requests?
NsfGrant.all will make one query(per paginated result) but I make
another query PER document to get a document's word count list(via a
view) in the NsfGrantSimilarity.compute_and_store method. so it will
be trying to do 30k separate requests.
> Looking at "netstat -n" will give you a rough idea, at
> least for seeing how many sockets are left in TIME_WAIT state, but the
> surest way is with tcpdump:
>
> tcpdump -i lo -n -s0 'host 127.0.0.1 and tcp dst port 5984 and
> (tcp[tcpflags] & tcp-syn != 0)'
>
> should show you one line for each new HTTP connection made to CouchDB.
it'll show 13 lines of this:
20:29:03.255746 IP 127.0.0.1.58119 > 127.0.0.1.5984: S
3662357700:3662357700(0) win 32792 <mss 16396,sackOK,timestamp
112518115 0,nop,wscale 6>
failing on the client side with Errno::ECONNREFUSED: Connection
refused - connect(2)
from /usr/lib/ruby/1.8/net/http.rb:560:in `initialize
from /usr/lib/ruby/1.8/net/http.rb:560:in `open'
from /usr/lib/ruby/1.8/net/http.rb:560:in `connect'
from /usr/lib/ruby/1.8/timeout.rb:53:in `timeout''
>
> But in any case, for parsing 30K documents, you may not want to load
> all 30K
> into RAM and then compare then afterwards. Couchrest lets you do a
> streaming
> view, so that one object is read at a time - I think if you call
> view with a
> block, then it works this way automatically. You need to have curl
> installed
> for this to work, as it shells out a separate curl process and then
> reads
> the response one line at a time.
Thanks, i'll have to try this approach.
Then you will speed up your application tons by doing this in batches using
a multi-key fetch with POST :-)
Cheers,
Brian.