I have a django and celery application running inside of Google Kubernetes Engine. I am connecting to my CloudSQL instance (postgres) using a Kubernetes service running the CloudSQL Proxy. Database connections and queries generally work fine, but occasionally we get spurts of errors with connections breaking. They are raised in python like this:
OperationalError: could not connect to server: Connection refused Is the server running on host "cloudsql-proxy-service" and accepting TCP/IP connections on port 3306?
or
OperationalError: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request.
I can't find anything that might cause that in the logs of the CloudSQL instance. There are some messages like this in the CloudSQL proxy logs:
2018/04/24 18:55:18 Instance <project_name>:us-central1:<instance_name> closed connection
But I can't necessarily correlate the timestamps between when those messages appear and when we get the python errors. I have tried setting CONN_MAX_AGE and tcp keepalives like this inside django's settings.py:
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'NAME': '<db_name>',
'USER': os.environ.get('DB_USER', None),
'HOST': os.environ.get('DB_HOST', None),
'PORT': os.environ.get('DB_PORT', None),
'PASSWORD': os.environ.get('DB_PASSWORD', None),
'CONN_MAX_AGE': int(os.environ.get('CONN_MAX_AGE', 0)),
'OPTIONS': {
'keepalives': 1,
'keepalives_idle': 480,
'keepalives_interval': 10,
'keepalives_count': 3,
},
},
}
But that didn't seem to make a difference. We still get the same errors in bunches, about 20 errors over the span of 2-3 minutes, 2-3 times per day.