Hi,
On Mar 12, 12:31 pm, mamcxyz <
mamc...@gmail.com> wrote:
> I'm getting this message from a live site, more or less 1-2 each week:
Do you have your Django apps setup to email you when a 500 Server
Error occurs in such cases? If so, does the full trace show such
failures whenever a search engine crawler is accessing your site?
> OperationalError: could not connect to server: Network is unreachable
> Is the server running on host "localhost" and accepting
> TCP/IP connections on port 5432?
>
> The site run recent trunk of django and postgress with pyso2.
>
> I don't see a pattern (each time is a diferent page) and don't see a
> way to replicate it. My local test are fine.
>
> The site is deployed on joyent....
Are you on a Joyent Solaris "Accelerator"? If so, here is some
analysis that might help:
On Solaris, the socket TIME_WAIT parameter defaults to 4 minutes (RFC
recommended value but way too high for local TCP/IP sockets). Linux,
FreeBSD, etc. default to something like 1 or 2 minutes. What this
means is that closed TCP/IP pgsql connections take longer to clear up
on Solaris. So, if let's say you allow 100 max simultaneous pgsql
connections and a search engine crawler hits your site and manages to
issue 110 requests in a span of a minute (the Joyent Accelerator boxes
are pretty fast and can serve many more than these many requests per
minute), you will easily hit the pgsql max limit and further
connections will be disallowed until the TIME_WAIT state connections
clear out. You can see if this is happening by looking for the number
of pgsql connections in TIME_WAIT using:
netstat -an | grep "5432"
Do this right when you get your above mentioned error message. Chances
are that you will see a lot of sockets in the TIME_WAIT state.
The easiest solution is to simply switch to Unix domain sockets. For
pgsql, that means settings your DATABASE_HOST to "" (an empty string)
instead of localhost. Make a few requests to your app after that and
run the above netstat command. You should see no new TCP/IP
connections in a TIME_WAIT state. If it doesn't work, open up your
pg_hba.conf and make sure the Unix domain sockets entry is uncommented
and set to an appropriate authentication (and then restart pgsql). You
might also want to disable the TCP socket entries while you are there.
The Unix domain sockets solution works for you because your pgsql DB
is on the same host as your Django app. If that weren't so, you would
have to switch back to TCP/IP sockets.
The second easiest solution is to use connection pooling (pgpool2)
which allows you to reuse a controlled small number of connections for
multiple requests.
There is another issue on Joyent/Solaris that you should be aware of:
if you are using postgresql from the Blastwave packaging system, it's
not compiled with the option that enables a thread-safe libpq (the
library that ultimately is used by psycopg). This causes steady memory
leaks and could lead to intermittent problems. See the note I've
quoted below from the psycopg2 INSTALL file.
Hope this helps,
-Rajesh
Compiling and installing psycopg
********************************
** Important note: if you plan to use psyopg2 in a multithreaed
application
make sure that your libpq has been compiled with the --with-thread-
safety
option. psycopg2 will work correctly even with a non-thread-safe
libpq but
libpq will leak memory.