Troubleshooting "too many concurrent requests"

Jake Czyz

unread,

Jun 22, 2012, 5:48:47 AM6/22/12

to google-cloud...@googlegroups.com

Hello,

I've been working on a GAE / Cloud SQL app for many weeks and keep
hitting the wall on various GC-SQL limits that have forced me to
refactor my app several times. A persistent one I'm running into is
the error:

google.storage.speckle.python.api.rdbms.0: Instance has too many
concurrent requests: 101

Does anyone have any advice for troubleshooting this error or
otherwise coding database connections in a way that
minimizes/mitigates this limitation?

My app has a couple hundred clients, and I've already had to refactor
it substantially to use the datastore instead of GC-SQL for most
interaction with clients (data is then synced to-from GC-SQL via cron
jobs... which also are exceeding the 60 second Deadline occasionally,
but that's another issue...). Right now, there's only a single type
of request that my clients still perform that needs to access the
GC-SQL DB directly (as opposed to datastore), due to consistency
requirements. This request happens, ballpark, once every 30 seconds or
so, on average, though sometimes there are a bunch in a row (i.e. one
every couple seconds for a minute or two).

I can't even view processes (i.e. via "show processlist") and kill
threads via the Google APIs console SQL Prompt, as it displays "Failed
to load databases." I know that restarting my instance (and waiting
several minutes) usually will get me back on track, but I've gotten
frustrated and am wondering what other strategies or ideas folks have
for dealing with the database connection limits.

Thanks in advance,
Jake

Ken Ashcraft

unread,

Jun 22, 2012, 4:12:46 PM6/22/12

to google-cloud...@googlegroups.com

These might help:

http://stackoverflow.com/questions/10424442/what-are-the-connection-limits-for-google-cloud-sql-from-app-engine-and-how-to

http://stackoverflow.com/questions/10158805/whats-a-good-approach-to-managing-the-db-connection-in-a-google-cloud-sql-gae/10438622#10438622

Jake Czyz

unread,

Jun 23, 2012, 8:46:25 AM6/23/12

to google-cloud...@googlegroups.com

Hi Ken,

I'm the one that asked both of those questions. :-)

I still hit the limits and my application becomes unusable after a
while with everything waiting for the SQL processes that are hung and
using up the 100 connection limit it seems. It's only after I restart
my instance in the API console that things start working again. Is
there a way to programatically restart the instance? Is there a way to
reduce the timeout on the connections so that it's the same/less than
the GAE timeout? This way, long-running SQL queries would fail and
GAE could catch them, instead of GAE failing on deadline and the SQL
sitting there using up a connection.

Other ideas? I can't be the only one with 200+ clients that
occasionally trigger GC-SQL queries. I have a hard time believing a
Google cloud product isn't designed to scale to my miniscule load.

Thanks!
Jake

Ken Ashcraft

unread,

Jun 25, 2012, 7:42:03 PM6/25/12

to google-cloud...@googlegroups.com

On Sat, Jun 23, 2012 at 5:46 AM, Jake Czyz <jjc....@gmail.com> wrote:

Hi Ken,

I'm the one that asked both of those questions. :-)

I still hit the limits and my application becomes unusable after a
while with everything waiting for the SQL processes that are hung and
using up the 100 connection limit it seems. It's only after I restart
my instance in the API console that things start working again. Is
there a way to programatically restart the instance?

No, sorry.

Is there a way to
reduce the timeout on the connections so that it's the same/less than
the GAE timeout?

Not really. You can reduce the deadlock timeout within mysql.

http://dev.mysql.com/doc/refman/5.5/en/innodb-parameters.html#sysvar_innodb_lock_wait_timeout

You could also set up a cron job to kill long-running queries.

Other ideas? I can't be the only one with 200+ clients that
occasionally trigger GC-SQL queries. I have a hard time believing a
Google cloud product isn't designed to scale to my miniscule load.

200+ simultaneous clients sounds like thousands of QPS. I wouldn't call that miniscule. Cloud SQL doesn't do anything magical to make your database scale. It is a mysql server running on fault tolerant storage. No autosharding or anything like that. Keep in mind that because we do synchronous replication, write latencies are higher than what you'd get with a local mysql. That might cause you to hold locks longer than you'd expect.

We can increase the limit, but every time we've seen somebody run into this situation, increasing the limit would just make things worse. More simultaneous queries would mean more contention and more timeouts.

What is your application?

Ken

Jake Czyz

unread,

Jun 28, 2012, 5:23:17 AM6/28/12

to google-cloud...@googlegroups.com

Hi Ken,

Thanks. The 200+ clients rarely trigger a DB call now, as I'm using
the datastore for caching data and have cron jobs that periodically
flush data to/from GC-SQL and datastore. The DB hangs under a load of
under 1 QPS (according to the GAE dashboard, though I know those
numbers aren't one-to-one for GC-SQL queries) and remains hung even
when there's only a single client in 10 minutes trying to access the
DB. I.e. once the connections are maxed out, none seem to time out in
a quick enough fashion for the system to heal itself.

The application is a research project that involves worker nodes and
the GAE/GC-SQL app as the control master that allocates work and
aggregates results. I can give you more specific detail off-channel,
if that might help. It's a university/Google collaboration, actually.

In general, things work for long stretches now, except that every x
number of hours I come back to find the DB hung with maxed out
connections ("Instance has too many concurrent requests: 101"), which
I can only resolve by restarting my instance manually via the APIs
console. I'm unable to see what the maxed out connections are doing,
as neither the GC-SQL APIs console nor commandline tool can even
connect to the DB to poke around when the connections are maxed out.
There's probably some deadlock scenario that creeps up occasionally,
or otherwise it's the case of random load occasionally leading to a
QPS spike that causes a chain of events (i.e. GAE deadlines --> DB
connections hanging --> GAE using up all 100 connections --> splat).

I think I'd like to take your advice and write another cron job that
kills GC-SQL processes that are stuck waiting too long when the DB
appears down. But, this job would still fail to work once all 100
connections are established (after all, it needs a connection itself
to see the processlist and execute the kill -- same issue that makes
the APIs console mostly useless in this scenario). I'd have to run it
very often to have a chance at being useful. So, this brings me to
another related question:

Is there any programmatic introspection/management available for
GC-SQL instances? E.g. is there a better way to detect and/or
troubleshoot the type of scenario I'm running into? It seems that if
there's an APIs console for it that the team could code up a meta API
for controlling the instance. For example, if I could have GAE calls
that detect the state of the DB, check the number of connections,
(maybe even get the min,max,avg process connection time), test when
the DB fails to load (as the APIs console typically reports under the
above scenario), and allow me to make a programmatic call to restart
the instance, that would be very useful.

Relatedly, is there any logging visible for the GC-SQL itself?
There's a "Logs" tab in the APIs console, but that only shows when I
first created the instance. It could be a bit more verbose. :-)

Thanks for your time and help!
Jake

Ken Ashcraft

unread,

Jun 28, 2012, 12:16:39 PM6/28/12

to google-cloud...@googlegroups.com

Let's take this discussion off list since we'll need to get into details specific to your application.

Reply all

Reply to author

Forward