URGENT: memcache errors out on write, the entire site is down

204 views
Skip to first unread message

jon

unread,
Oct 25, 2011, 3:29:06 AM10/25/11
to Google App Engine
Hi,

Writing to memcache is down for one of our apps, other apps are OK.

The error message says:
java.lang.RuntimeException:
com.google.appengine.api.memcache.MemcacheServiceException: Memcache
put: Set failed to set 20 keys

The affected app id: thecrowdvoice

Could someone from Google urgently take a look please?

Thanks in advance,
Jon

jon

unread,
Oct 25, 2011, 5:16:45 AM10/25/11
to Google App Engine
Memcache for our app is back now.

According to the log, memcache write operations started throwing
com.google.appengine.api.memcache.MemcacheServiceException at 17:49
(Melbourne
time) and stopped at 18:56. In other words memcache was unavailable
for over 1
hour.

We use memcache heavily, so our site would've been down for that long
if we
hadn't stepped in to turn off all uses of memcache.

Question for Google: is there an affinity between an app and it's
memcache
service provider? How can memcache consistently error out for the same
application for that long?

James Broberg

unread,
Oct 25, 2011, 8:26:05 AM10/25/11
to google-a...@googlegroups.com
Sounds familiar:

http://code.google.com/p/googleappengine/issues/detail?id=5790
http://code.google.com/p/googleappengine/issues/detail?id=6167

> --
> You received this message because you are subscribed to the Google Groups "Google App Engine" group.
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
>
>

jon

unread,
Oct 25, 2011, 9:17:34 AM10/25/11
to Google App Engine
Thanks James for pointing them out. They're not entirely identical in
that the symptom is different (i.e. the exception I got is different).
However there's a common pattern whereby an app seems to be assigned a
memcache "service provider" and if this provider misbehaves the app
will be stuck with it instead of getting reassigned a new, healthy
replacement.

On Oct 25, 11:26 pm, James Broberg <jbrob...@gmail.com> wrote:
> Sounds familiar:
>
> http://code.google.com/p/googleappengine/issues/detail?id=5790http://code.google.com/p/googleappengine/issues/detail?id=6167

James Broberg

unread,
Oct 25, 2011, 6:32:28 PM10/25/11
to google-a...@googlegroups.com
Fair enough. At least you got a memcache exception :) In our case
performance just deteriorated and eventually it timed out.

jon

unread,
Oct 26, 2011, 7:28:55 PM10/26/11
to Google App Engine
How did you fix/get around this problem?

It was pointed out to me that the MemcacheService by default should
*NOT* throw any exception, therefore what I was seeing is a bug.

Can anyone from Google confirm if this is the case? I'm using 1.5.4.

On Oct 26, 9:32 am, James Broberg <jbrob...@gmail.com> wrote:
> Fair enough. At least you got a memcache exception :) In our case
> performance just deteriorated and eventually it timed out.
>
> On 26 October 2011 00:17, jon <jonni.g...@gmail.com> wrote:
>
>
>
>
>
>
>
> > Thanks James for pointing them out. They're not entirely identical in
> > that the symptom is different (i.e. the exception I got is different).
> > However there's a common pattern whereby an app seems to be assigned a
> > memcache "service provider" and if this provider misbehaves the app
> > will be stuck with it instead of getting reassigned a new, healthy
> > replacement.
>
> > On Oct 25, 11:26 pm, James Broberg <jbrob...@gmail.com> wrote:
> >> Sounds familiar:
>
> >>http://code.google.com/p/googleappengine/issues/detail?id=5790http://...

Johan Euphrosine

unread,
Oct 26, 2011, 8:12:28 PM10/26/11
to google-a...@googlegroups.com
Hi Jon,

It is important that you have proper exception handling for all your
API calls, as there is always a possibility of them failing (otherwise
we wouldn't document those methods as throwing an exception). In the
catch block you should usually fallback or retry gracefully: for
memcache it makes senses to fallback on datastore (more latency, but
more reliable).

In addition you can use the capabilities API to proactively query if a
given API is available, this is described in details by Nick Johnson
in the following blog post:
http://blog.notdot.net/2010/03/Handling-downtime-The-capabilities-API-and-testing

Hope that helps.

--
Johan Euphrosine (proppy)
Developer Programs Engineer
Google Developer Relations

Jeff Schnitzer

unread,
Oct 27, 2011, 12:37:39 AM10/27/11
to google-a...@googlegroups.com
Are you saying that we should not expect the MemcacheService.set/getErrorHandler(), with its default value of LogAndContinueErrorHandler, to do what it says?  I expect memcacheservice to fail silently.

Is it possible that the error handler showed up in 1.5.5?  The OP mentioned in another thread that he is running on 1.5.4.

Jeff

Johan Euphrosine

unread,
Oct 27, 2011, 1:19:08 AM10/27/11
to google-a...@googlegroups.com
Jeff, you are right the default handler is supposed to handle both
DeserializationError and ServiceError:
http://code.google.com/p/googleappengine/source/browse/trunk/java/src/main/com/google/appengine/api/memcache/LogAndContinueErrorHandler.java

Jon did you change the default memcache error handler ?

Johan Euphrosine

unread,
Oct 27, 2011, 1:34:00 AM10/27/11
to google-a...@googlegroups.com
After taking a closer look at the implementation, I found the method
throwing the reported exception (MemcacheServiceException):
http://www.google.com/codesearch#Qx8E-7HUBTk/trunk/java/src/main/com/google/appengine/api/memcache/AsyncMemcacheServiceImpl.java&l=462

It appears that it doesn't rely on getErrorHandler for routing errors
and throws MemcacheServiceException directly.

That should be pretty easy to verify using an unittest.

Let me know if I missed something.

Simon Knott

unread,
Oct 27, 2011, 4:54:38 AM10/27/11
to google-a...@googlegroups.com
Sorry Johan, so are you confirming that this is a bug in the Memcache Service?

The JavaDoc for the handleServiceError method on ErrorHandler quite clearly states that this method will be called for all MemcacheService methods in the event of a service error.

Simon Knott

unread,
Oct 27, 2011, 5:04:02 AM10/27/11
to google-a...@googlegroups.com
Looking at that code for the MemcacheService implementations, the error handler isn't used at all!

My code is currently written to depend on the ErrorHandler allowing the service to fail silently - it would be great if you could confirm one way or another whether the error handlers are now useless.

Cheers,
Simon

Simon Knott

unread,
Oct 27, 2011, 11:07:25 AM10/27/11
to google-a...@googlegroups.com
Now that I've got some caffeine in my system, I'll correct myself - it looks like the error handler isn't used for "put" operations at all and it's possible that individual increment calls with fail non-silently as well.  The rest of the calls either use the error handler correctly, or just fail silently anyway.

Cheers,
Simon

jon

unread,
Oct 27, 2011, 5:18:21 PM10/27/11
to Google App Engine
Johan no I didn't change the default error handler.

In our case memcache failed when executing putAll().

Are you confirming that this is currently a bug in the trunk (hence
1.5.5)?

It has a rather scary implication i.e. misbehaving memcache would take
down a site.

On Oct 27, 4:34 pm, Johan Euphrosine <pro...@google.com> wrote:
> After taking a closer look at the implementation, I found the method
> throwing the reported exception (MemcacheServiceException):http://www.google.com/codesearch#Qx8E-7HUBTk/trunk/java/src/main/com/...
>
> It appears that it doesn't rely on getErrorHandler for routing errors
> and throws MemcacheServiceException directly.
>
> That should be pretty easy to verify using an unittest.
>
> Let me know if I missed something.
>
>
>
>
>
>
>
>
>
> On Thu, Oct 27, 2011 at 2:19 PM, Johan Euphrosine <pro...@google.com> wrote:
> > Jeff, you are right the default handler is supposed to handle both
> > DeserializationError and ServiceError:
> >http://code.google.com/p/googleappengine/source/browse/trunk/java/src...
>
> > Jon did you change the default memcache error handler ?
>
> > On Thu, Oct 27, 2011 at 1:37 PM, Jeff Schnitzer <j...@infohazard.org> wrote:
> >> Are you saying that we should not expect the
> >> MemcacheService.set/getErrorHandler(), with its default value of
> >> LogAndContinueErrorHandler, to do what it says?  I expect memcacheservice to
> >> fail silently.
> >> Is it possible that the error handler showed up in 1.5.5?  The OP mentioned
> >> in another thread that he is running on 1.5.4.
> >> Jeff
> >> On Wed, Oct 26, 2011 at 5:12 PM, Johan Euphrosine <pro...@google.com> wrote:
>
> >>> Hi Jon,
>
> >>> It is important that you have proper exception handling for all your
> >>> API calls, as there is always a possibility of them failing (otherwise
> >>> we wouldn't document those methods as throwing an exception). In the
> >>> catch block you should usually fallback or retry gracefully: for
> >>> memcache it makes senses to fallback on datastore (more latency, but
> >>> more reliable).
>
> >>> In addition you can use the capabilities API to proactively query if a
> >>> given API is available, this is described in details by Nick Johnson
> >>> in the following blog  post:
>
> >>>http://blog.notdot.net/2010/03/Handling-downtime-The-capabilities-API...
>
> >>> Hope that helps.

Johan Euphrosine

unread,
Oct 27, 2011, 11:03:08 PM10/27/11
to google-a...@googlegroups.com
Could you open a bug on the public issue tracker ?
http://code.google.com/p/googleappengine/issues/entry?template=Java%20defect

If you can please attach an unittest that exhibit the bad behaviour.

Thanks in advance.

> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.

> To view this discussion on the web visit
> https://groups.google.com/d/msg/google-appengine/-/wqNb9L1OX4cJ.

Johan Euphrosine

unread,
Oct 28, 2011, 3:14:00 AM10/28/11
to google-a...@googlegroups.com
After taking a look at the documentation it is explicit that put will
throw an exception in case of an RPC error.

See:
http://code.google.com/appengine/docs/java/javadoc/com/google/appengine/api/memcache/MemcacheService.html#put(java.lang.Object,
java.lang.Object, com.google.appengine.api.memcache.Expiration,
com.google.appengine.api.memcache.MemcacheService.SetPolicy)

put

void put(java.lang.Object key,
java.lang.Object value)
A convenience shortcut, equivalent to put(key, value, null,
SetPolicy.SET_ALWAYS).
Parameters:
key - key of the new entry
value - value for the new entry
Throws:
java.lang.IllegalArgumentException - if the key or value type can't be
stored as a cache item. They should be Serializable.
MemcacheServiceException - if server respond with an error.

Hope that clear things up.

jon

unread,
Nov 1, 2011, 8:04:30 AM11/1/11
to Google App Engine

> After taking a look at the documentation it is explicit that put will
> throw an exception in case of an RPC error.

OK just to make sure that I understand this correctly,
MemcacheServiceException will be thrown when put() encounters an RPC
error (and this is the correct behaviour), therefore the calling code
is expected to handle it. Is that correct?

>
> See:http://code.google.com/appengine/docs/java/javadoc/com/google/appengi...,
> java.lang.Object, com.google.appengine.api.memcache.Expiration,
> com.google.appengine.api.memcache.MemcacheService.SetPolicy)
>
> put
>
> void put(java.lang.Object key,
>          java.lang.Object value)
> A convenience shortcut, equivalent to put(key, value, null,
> SetPolicy.SET_ALWAYS).
> Parameters:
> key - key of the new entry
> value - value for the new entry
> Throws:
> java.lang.IllegalArgumentException - if the key or value type can't be
> stored as a cache item. They should be Serializable.
> MemcacheServiceException - if server respond with an error.
>
> Hope that clear things up.
>
>
>
>
>
>
>
>
>
> On Fri, Oct 28, 2011 at 12:03 PM, Johan Euphrosine <pro...@google.com> wrote:
> > Could you open a bug on the public issue tracker ?
> >http://code.google.com/p/googleappengine/issues/entry?template=Java%2...
>
> > If you can please attach an unittest that exhibit the bad behaviour.
>
> > Thanks in advance.
>

jon

unread,
Nov 2, 2011, 9:36:25 PM11/2/11
to Google App Engine
According to
http://code.google.com/appengine/docs/java/javadoc/com/google/appengine/api/memcache/BaseMemcacheService.html#setErrorHandler(com.google.appengine.api.memcache.ErrorHandler)
:

Registers a new ErrorHandler. The handler is called for errors which
are not the application's fault, like a *network timeout*. The handler
can choose to propagate the error or suppress it. Errors which are
caused by an incorrect use of the API will not be directed to the
handler but rather will be thrown directly.

I'm going to report a bug to the tracker.

jon

unread,
Nov 2, 2011, 9:46:39 PM11/2/11
to Google App Engine
Could GAE/J users help star this issue as it has a very scary
implication? If you're unlucky enough to get this error, your putAll()
attempts will fail with a MemcacheServiceException, effectively taking
down your site!

http://code.google.com/p/googleappengine/issues/detail?id=6236

On Nov 3, 12:36 pm, jon <jonni.g...@gmail.com> wrote:
> According tohttp://code.google.com/appengine/docs/java/javadoc/com/google/appengi...)
Reply all
Reply to author
Forward
0 new messages