App serving HTTP 500 out of nowhere

Bernd Final

unread,

Sep 19, 2011, 6:08:57 PM9/19/11

to Google App Engine

I've had this issue for 5 or 6 times already. My app is running for a
long time and I've had no problems so far, except this one:

Out of nowhere, 20-30 seconds latency and only serving HTTP 500 errors
for 99,9% of all requests. In most Servlets (yeah I'm using Java if
that matters) I'm accessing the memcache, sometimes the datastore
(Master/Slave). As you can imagine the app is totally unusable when
this happens. When it occured the first time I didn't to any special,
just waited for a couple of hours until the app recovered on it's own.

Dashboard Milliseconds/Request Chart:
Yesterday: http://bit.ly/pSF66E
Last 30 Days: http://bit.ly/nYdZ2S

When it happend again and again I tried to figure out what's going on,
luckily I had some kind of Admin-Servlet deployed which allowed me to
clear the entire memcache. Tried it, and voila, everything back to
normal.

- The memcache size was up to 1.3-1.5 MB in total (I did a rough
estimation with the ByteArrayOutputStream trick). 12 Objects are
stored in the memcache. Each Object is POJO which implements
Serializable and contains some Strings, Integers and ArrayLists.
- My first idea was to minimize the memory usage, because I though I
was hitting the 1MB limit. At this time I didn't know the limit is for
each object not for the whole cache, anyway I was able to reduce the
size by 30-40% so the size is now about 800kb - 1MB
- I'm running a cron job which cleans out garbage of the memcache.
- This is how I access the memcache on every request:
CacheManager.getInstance().getCacheFactory().createCache(Collections.emptyMap());
- I'm using put and get Methods for storing / retrieving,
- The key length is about 160 bytes

Unfortunately I was not able to solve this issue yet, I would be very
happy for ANY input.

Bernd F

unread,

Sep 27, 2011, 7:39:06 AM9/27/11

to Google App Engine

Tonight it happened again, but after ~1 hour the app recovered on its
own. There must be something wrong with app engine.

Ikai Lan (Google)

unread,

Sep 28, 2011, 4:51:34 PM9/28/11

to google-a...@googlegroups.com

Could these be related to datastore latency spikes? Those have been known to occur regularly on master/slave datastore, and we are encouraging developers to move to high replication. You can sign up for migration tool access here:

https://docs.google.com/a/google.com/spreadsheet/viewform?authkey=CLXR0LMN&formkey=dERMcDZuMnlycHoyZDd4Vy1PNXlhWlE6MQ#gid=0

--

Ikai Lan
Developer Programs Engineer, Google App Engine

plus.ikailan.com | twitter.com/ikai

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Bernd F

unread,

Oct 2, 2011, 1:09:17 PM10/2/11

to Google App Engine

I don't think this is related to M/R datastore spikes. The app has
been running stable since months and I'm suffering of this problems
since 3-4 weeks now.

FYI, I have also created an issue in the production tracker:
http://code.google.com/p/googleappengine/issues/detail?id=5790

- The whole app is down, even Servlets that doesn't use the datastore
respond with an HTTP 500 error

- When I do nothing the app recoveres itself after a few hours

- I once managed to fix it when I changed the Max Idle Instance
setting back to automatic. So this could be an hint that all my
instances were somehow blocked and app engine wasn't able to spin up
more instances.

- The usual fix is to clear all data from the memcache. If the
memcache is empty everything wents back to normal immediately. The
memcache is constantly filled with 12 items, each has a size <1MB. I
agree one could think it is an code issue, but as I said the whole app
is down, even Servlets that doesn't use Memcache/Datastor or vice
versa are not accessible. I've refactored my code 10 times already.

I don't know what to do anymore. All I get is "move to HRD", but I
think I have to move away from App Engine. To make this clear: I will
move to HRD when the tool is released. Yeah I know there is a beta.
But anyway what has that todo with my problem? The whole app is down,
when the system status of M/R says okie dokie.

On 28 Sep., 22:51, "Ikai Lan (Google)" <ika...@google.com> wrote:
> Could these be related to datastore latency spikes? Those have been known to
> occur regularly on master/slave datastore, and we are encouraging developers
> to move to high replication. You can sign up for migration tool access here:
>

> https://docs.google.com/a/google.com/spreadsheet/viewform?authkey=CLX...

Jay Young

unread,

Oct 2, 2011, 2:27:33 PM10/2/11

to google-a...@googlegroups.com

It sounds like you have a bad value in memcache and when a servlet hits it, it blocks the instance that's executing it. When you have multiple instances running, other requests can be served by other instances. When you set it to 1, that one faulty instance gets backed up and requests start timing out. That's the only reason I can think of that clearing memcache AND changing your # of instances would solve the problem. It's like your bad memcache value is a blockage in a stream. Clearing memcache clears the block. Increasing the number of instances allows other requests to flow around the block.

Bernd F

unread,

Oct 2, 2011, 4:58:16 PM10/2/11

to Google App Engine

Thank you for your input Jay, at least there is something I can cling
to. Could you be a little bit more specific about this bad value.

What could that be? Or better said: Is that anything I can control?

Btw: After I set the Max Idle value to Automatic a few days ago it
happened again (earlier today).

Stephen Johnson

unread,

Oct 2, 2011, 5:24:05 PM10/2/11

to google-a...@googlegroups.com

You don't mention if you have threadsafe on. If so, you might be experiencing some sort of deadlock or something with the way you are creating your JCache on every request. I'd try moving away from JCache and accessing the MemcacheService directly. I create a MemcacheService object for each thread and reuse that object. I've had no problems/issues like you've described but my QPS are probably not as much as yours. Here's some sample code

private static ThreadLocal<MemcacheService> memcacheService = new ThreadLocal<MemcacheService>() {

protected synchronized MemcacheService initialValue() {

return MemcacheServiceFactory.getMemcacheService();

}

};

public static MemcacheService getService() {

return memcacheService.get();

}

If you try switching to MemcacheService directly, please let us know if it helps or not.

Stephen

CortexConnect.com

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.

Bernd F

unread,

Oct 2, 2011, 6:16:45 PM10/2/11

to Google App Engine

Hi Stephen,

yes I switched to threadsafe on Sept 12th, because of the new billing
policy, but that issue already happened before.

I'll change my code to use the low-level MemcacheService instead of
JCache immediately. With JCache I tried both: saving the instance in a
static variable for reuse and creating the cache object on every
request. I can tell you that there is no difference (at least with
JCache) in cpu usage and servlet execution time.

Thank you very much for your suggestion!!

On 2 Okt., 23:24, Stephen Johnson <onepagewo...@gmail.com> wrote:
> You don't mention if you have threadsafe on. If so, you might be
> experiencing some sort of deadlock or something with the way you are
> creating your JCache on every request. I'd try moving away from JCache and
> accessing the MemcacheService directly. I create a MemcacheService object
> for each thread and reuse that object. I've had no problems/issues like
> you've described but my QPS are probably not as much as yours. Here's some
> sample code
>
> private static ThreadLocal<MemcacheService> memcacheService =

> newThreadLocal<MemcacheService>() {

Stephen Johnson

unread,

Oct 2, 2011, 8:31:33 PM10/2/11

to google-a...@googlegroups.com

Not sure it will help any but hopefully you never know.

Bernd F

unread,

Oct 3, 2011, 4:57:34 AM10/3/11

to Google App Engine

Just want to let you know that the app, including the new memcache
code, survived the first night.

Stephen Johnson

unread,

Oct 3, 2011, 12:44:44 PM10/3/11

to google-a...@googlegroups.com

That's excellent news! Let's hope it lasts. Please let me know. Thx.

Reply all

Reply to author

Forward