Just last night at midnight I updated my app to use <sessions-enabled>
and to store data in the datastore for logged-in users.
Ever since then, I've been getting dozens of
DatastoreTimeoutExceptions, both reading from the database and writing
to it. My code goes like this, on every authenticated request:
Key key = KeyFactory.createKey("alterego", user.getNickname());
Entity e = datastoreService.get(key);
// ... do stuff with the data ...
Entity entity = new Entity("alterego", user.getNickname());
for (String key : myProps.keySet()) {
entity.setProperty(key, myProps.get(key));
}
datastoreService.put(entity);
These very simple get/put calls are returning these DTE exceptions...
not all the time, but quite frequently. I can see no way to debug
this.
What can I do now?
-Dan
PS Prior to last night, I just stored all of the user's data on the
client-side in cookies, so I could be stateless on the server-side.
But due to bug 1227 http://code.google.com/p/googleappengine/issues/detail?id=1227 I couldn't store the data all in one cookie. So I split the data up
into multiple cookies. But then my users started reporting cookie
corruption; my logs showed the cookies would contain garbage like
"alteregHTTP/1.1 200 OK". So then I ported the whole thing over to
use the datastore, and now I get this!
In your issue Dan, you providing loads of info about what the bug is
and how to reproduce it.
However, for playalterego.com specifically you're likely to work
through the issue (with or without AppEngine folk's help). You have
ten available apps to deploy to. Would it be possible to dedicate one
of them to the minimalist representation of this bug?
> Just last night at midnight I updated my app to use <sessions-enabled>
> and to store data in the datastore for logged-in users.
> Ever since then, I've been getting dozens of
> DatastoreTimeoutExceptions, both reading from the database and writing
> to it. My code goes like this, on every authenticated request:
> These very simple get/put calls are returning these DTE exceptions...
> not all the time, but quite frequently. I can see no way to debug
> this.
> What can I do now?
> -Dan
> PS Prior to last night, I just stored all of the user's data on the
> client-side in cookies, so I could be stateless on the server-side.
> But due to bug 1227 http://code.google.com/p/googleappengine/issues/detail?id=1227 > I couldn't store the data all in one cookie. So I split the data up
> into multiple cookies. But then my users started reporting cookie
> corruption; my logs showed the cookies would contain garbage like
> "alteregHTTP/1.1 200 OK". So then I ported the whole thing over to
> use the datastore, and now I get this!
I'm sorry that your experience hasn't been entirely positive up to this
point -- hopefully we can help change that.
Have you re-deployed your application since these errors first appeared?
This shouldn't make any difference, but some developers have reported that
other intermittent issues have disappeared after a simple re-deploy, so it's
as good a place as any to start.
Are you in a position to roll back to using cookies to manage the state of
some or all of your users? It will help us know whether the introduction of
App Engine sessions is at fault here or if there is something else happening
behind-the-scenes. I would also be interested in knowing what type of data
you're using the session for since App Engine's datastore is used to back
these sessions (along with memcache).
I'm sorry again that you're running into these issues, and I hope we can get
them fixed soon.
> Just last night at midnight I updated my app to use <sessions-enabled>
> and to store data in the datastore for logged-in users.
> Ever since then, I've been getting dozens of
> DatastoreTimeoutExceptions, both reading from the database and writing
> to it. My code goes like this, on every authenticated request:
> These very simple get/put calls are returning these DTE exceptions...
> not all the time, but quite frequently. I can see no way to debug
> this.
> What can I do now?
> -Dan
> PS Prior to last night, I just stored all of the user's data on the
> client-side in cookies, so I could be stateless on the server-side.
> But due to bug 1227
> http://code.google.com/p/googleappengine/issues/detail?id=1227 > I couldn't store the data all in one cookie. So I split the data up
> into multiple cookies. But then my users started reporting cookie
> corruption; my logs showed the cookies would contain garbage like
> "alteregHTTP/1.1 200 OK". So then I ported the whole thing over to
> use the datastore, and now I get this!
I've redeployed dozens of times while investigating the problem.
I only have to persist one object: a simple dictionary hashmap of
strings to strings. It's always smaller than 4K (because it used to
fit in a single cookie when I ran it on a regular Tomcat machine).
The hashmap typically has a few dozen keys in it.
When I put the dictionary in the session I just put the entire map in
with session.setAttribute(myProps). The session auto-saves it to the
datastore as a blob.
When writing to the DB, I create a new entity, copy the values over,
and datastoreService.put(entity). That way it's saved to the
datastore as a structured entity.
In either case, sometimes the datastore barfs out a
DatastoreTimeoutException; the request fails, my users get errors, and
the write attempt is lost.
When you suggest partially switching back to cookies, what do you
mean, exactly? Do you mean I should try setting up an alternate
version that doesn't touch the datastore? (i.e. doesn't use sessions
AND doesn't use the datastore API?) I've done that; obviously I don't
get DatastoreTimeoutExceptions when I don't use the datastore... but I
assume that's not what you meant. :-)
Do you mean to try setting up a version of the app that doesn't have
<sessions-enabled> (so it's only playable if you're logged in), but
still uses the datastore API directly? I'm about 99% sure that I'll
get the same problem, but I really don't dare to do it because the
problem doesn't repro 100% of the time; the only way to get it to show
up would be to turn it on for all my users or test the app with a load
test tool, which I'd really rather not do because Google will bill me
for that! But if you write back and say that's the ONLY way you can
help me, I'll do it.
I will note that my logs show that I'm getting DTEs not only in my own
code, but also in Google's code! Here's a stacktrace, but the key
lines are highlighted below. It shows that even Google's code can't
write a 2 kilobyte hashmap to the datastore without timing out.
at com.google.apphosting.runtime.jetty.SessionManager
$AppEngineSession.save(SessionManager.java:153)
at com.google.apphosting.runtime.jetty.SaveSessionFilter.doFilter
(SaveSessionFilter.java:41)
Uncaught exception from servlet
com.google.appengine.api.datastore.DatastoreTimeoutException: Unknown
at
com.google.appengine.api.datastore.DatastoreApiHelper.translateError
(DatastoreApiHelper.java:37)
at com.google.appengine.api.datastore.DatastoreApiHelper.makeSyncCall
(DatastoreApiHelper.java:55)
at com.google.appengine.api.datastore.DatastoreServiceImpl$2.run
(DatastoreServiceImpl.java:169)
at
com.google.appengine.api.datastore.TransactionRunner.runInTransaction
(TransactionRunner.java:29)
at com.google.appengine.api.datastore.DatastoreServiceImpl.put
(DatastoreServiceImpl.java:157)
at com.google.appengine.api.datastore.DatastoreServiceImpl.put
(DatastoreServiceImpl.java:137)
at com.google.appengine.api.datastore.DatastoreServiceImpl.put
(DatastoreServiceImpl.java:129)
at com.google.apphosting.runtime.jetty.SessionManager
$AppEngineSession.save(SessionManager.java:153)
at com.google.apphosting.runtime.jetty.SaveSessionFilter.doFilter
(SaveSessionFilter.java:41)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter
(ServletHandler.java:1084)
at
com.google.apphosting.utils.servlet.TransactionCleanupFilter.doFilter
(TransactionCleanupFilter.java:43)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter
(ServletHandler.java:1084)
at org.mortbay.jetty.servlet.ServletHandler.handle
(ServletHandler.java:360)
at org.mortbay.jetty.security.SecurityHandler.handle
(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle
(SessionHandler.java:181)
at org.mortbay.jetty.handler.ContextHandler.handle
(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:
405)
at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.handle
(AppVersionHandlerMap.java:237)
at org.mortbay.jetty.handler.HandlerWrapper.handle
(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:313)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:
506)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete
(HttpConnection.java:830)
at com.google.apphosting.runtime.jetty.RpcRequestParser.parseAvailable
(RpcRequestParser.java:76)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:381)
at
com.google.apphosting.runtime.jetty.JettyServletEngineAdapter.serviceReques t
(JettyServletEngineAdapter.java:125)
at com.google.apphosting.runtime.JavaRuntime.handleRequest
(JavaRuntime.java:235)
at com.google.apphosting.base.RuntimePb$EvaluationRuntime
$6.handleBlockingRequest(RuntimePb.java:4755)
at com.google.apphosting.base.RuntimePb$EvaluationRuntime
$6.handleBlockingRequest(RuntimePb.java:4753)
at com.google.net.rpc.impl.BlockingApplicationHandler.handleRequest
(BlockingApplicationHandler.java:24)
at com.google.net.rpc.impl.RpcUtil.runRpcInApplication(RpcUtil.java:
359)
at com.google.net.rpc.impl.Server$2.run(Server.java:800)
at com.google.tracing.LocalTraceSpanRunnable.run
(LocalTraceSpanRunnable.java:56)
at com.google.tracing.LocalTraceSpanBuilder.internalContinueSpan
(LocalTraceSpanBuilder.java:510)
at com.google.net.rpc.impl.Server.startRpc(Server.java:756)
at com.google.net.rpc.impl.Server.processRequest(Server.java:348)
at com.google.net.rpc.impl.ServerConnection.messageReceived
(ServerConnection.java:459)
at com.google.net.rpc.impl.RpcConnection.parseMessages
(RpcConnection.java:319)
at com.google.net.rpc.impl.RpcConnection.dataReceived
(RpcConnection.java:290)
at com.google.net.async.Connection.handleReadEvent(Connection.java:
419)
at com.google.net.async.EventDispatcher.processNetworkEvents
(EventDispatcher.java:762)
at com.google.net.async.EventDispatcher.internalLoop
(EventDispatcher.java:207)
at com.google.net.async.EventDispatcher.loop(EventDispatcher.java:
101)
at com.google.net.rpc.RpcService.runUntilServerShutdown
(RpcService.java:251)
at com.google.apphosting.runtime.JavaRuntime$RpcRunnable.run
(JavaRuntime.java:373)
at java.lang.Thread.run(Unknown Source)
On Jun 8, 6:01 pm, "Jason (Google)" <apija...@google.com> wrote:
> I'm sorry that your experience hasn't been entirely positive up to this
> point -- hopefully we can help change that.
> Have you re-deployed your application since these errors first appeared?
> This shouldn't make any difference, but some developers have reported that
> other intermittent issues have disappeared after a simple re-deploy, so it's
> as good a place as any to start.
> Are you in a position to roll back to using cookies to manage the state of
> some or all of your users? It will help us know whether the introduction of
> App Engine sessions is at fault here or if there is something else happening
> behind-the-scenes. I would also be interested in knowing what type of data
> you're using the session for since App Engine's datastore is used to back
> these sessions (along with memcache).
> I'm sorry again that you're running into these issues, and I hope we can get
> them fixed soon.
> - Jason
> On Sun, Jun 7, 2009 at 11:35 AM, dfabulich <danfabul...@gmail.com> wrote:
> > Just last night at midnight I updated my app to use <sessions-enabled>
> > and to store data in the datastore for logged-in users.
> > Ever since then, I've been getting dozens of
> > DatastoreTimeoutExceptions, both reading from the database and writing
> > to it. My code goes like this, on every authenticated request:
> > These very simple get/put calls are returning these DTE exceptions...
> > not all the time, but quite frequently. I can see no way to debug
> > this.
> > What can I do now?
> > -Dan
> > PS Prior to last night, I just stored all of the user's data on the
> > client-side in cookies, so I could be stateless on the server-side.
> > But due to bug 1227
> >http://code.google.com/p/googleappengine/issues/detail?id=1227 > > I couldn't store the data all in one cookie. So I split the data up
> > into multiple cookies. But then my users started reporting cookie
> > corruption; my logs showed the cookies would contain garbage like
> > "alteregHTTP/1.1 200 OK". So then I ported the whole thing over to
> > use the datastore, and now I get this!
Hi Dan,
Your announcement that you had ported this game to App Engine showed up in
my Google Alerts a few weeks after our launch and I was really excited to
see it. I think this is a great example of the kind of application that we
want to host on App Engine, and I'm definitely interested in helping to
resolve these issues.
The first problem you mentioned is
http://code.google.com/p/googleappengine/issues/detail?id=1227. I've
confirmed that this is a legitimate bug, although the behavior is the same
in both Java and Python so I'm a bit surprised that it has gone unnoticed
for this long. I'll try to get this fixed in the next major release of App
Engine.
As for the datastore errors, I've been monitoring your error rate for the
past few days. You're seeing about a 0.05% - 0.1% error rate on your
dynamic requests. If I assume those are all attributable to datastore
timeouts, and that you are making an average of 1-2 datastore API calls per
request, that's about a 0.1%-0.2% rate of datastore API calls that result in
timeouts. This is a bit higher than average, but within the expected error
rate for the datastore. The datastore is a highly distributed piece of
infrastructure, and there will always be some non-zero failure rate. We're
working on adding additional retry logic at a lower level which should help
in some of the cases where your code is not directly involved (e.g.
retrieving and storing sessions), but in your own code I would encourage you
to add some retry logic.
Another approach to avoid timeouts in Get's is to also store your data in
memcache. You can eliminate a large number of your datastore gets by first
trying to retrieve entities from memcache (and writing through to memcache
after a put), which will significantly reduce the number of datastore get
calls that you are making (and correspondingly reduce your exposure to
datastore timeout issues). This is already happening behind the scenes for
when retrieving sessions.
Let me know if there is anything else I can do to help.
On Tue, Jun 9, 2009 at 4:10 PM, dfabulich <danfabul...@gmail.com> wrote:
> I've redeployed dozens of times while investigating the problem.
> I only have to persist one object: a simple dictionary hashmap of
> strings to strings. It's always smaller than 4K (because it used to
> fit in a single cookie when I ran it on a regular Tomcat machine).
> The hashmap typically has a few dozen keys in it.
> When I put the dictionary in the session I just put the entire map in
> with session.setAttribute(myProps). The session auto-saves it to the
> datastore as a blob.
> When writing to the DB, I create a new entity, copy the values over,
> and datastoreService.put(entity). That way it's saved to the
> datastore as a structured entity.
> In either case, sometimes the datastore barfs out a
> DatastoreTimeoutException; the request fails, my users get errors, and
> the write attempt is lost.
> When you suggest partially switching back to cookies, what do you
> mean, exactly? Do you mean I should try setting up an alternate
> version that doesn't touch the datastore? (i.e. doesn't use sessions
> AND doesn't use the datastore API?) I've done that; obviously I don't
> get DatastoreTimeoutExceptions when I don't use the datastore... but I
> assume that's not what you meant. :-)
> Do you mean to try setting up a version of the app that doesn't have
> <sessions-enabled> (so it's only playable if you're logged in), but
> still uses the datastore API directly? I'm about 99% sure that I'll
> get the same problem, but I really don't dare to do it because the
> problem doesn't repro 100% of the time; the only way to get it to show
> up would be to turn it on for all my users or test the app with a load
> test tool, which I'd really rather not do because Google will bill me
> for that! But if you write back and say that's the ONLY way you can
> help me, I'll do it.
> I will note that my logs show that I'm getting DTEs not only in my own
> code, but also in Google's code! Here's a stacktrace, but the key
> lines are highlighted below. It shows that even Google's code can't
> write a 2 kilobyte hashmap to the datastore without timing out.
> at com.google.apphosting.runtime.jetty.SessionManager
> $AppEngineSession.save(SessionManager.java:153)
> at com.google.apphosting.runtime.jetty.SaveSessionFilter.doFilter
> (SaveSessionFilter.java:41)
> Uncaught exception from servlet
> com.google.appengine.api.datastore.DatastoreTimeoutException: Unknown
> at
> com.google.appengine.api.datastore.DatastoreApiHelper.translateError
> (DatastoreApiHelper.java:37)
> at
> com.google.appengine.api.datastore.DatastoreApiHelper.makeSyncCall
> (DatastoreApiHelper.java:55)
> at com.google.appengine.api.datastore.DatastoreServiceImpl$2.run
> (DatastoreServiceImpl.java:169)
> at
> com.google.appengine.api.datastore.TransactionRunner.runInTransaction
> (TransactionRunner.java:29)
> at com.google.appengine.api.datastore.DatastoreServiceImpl.put
> (DatastoreServiceImpl.java:157)
> at com.google.appengine.api.datastore.DatastoreServiceImpl.put
> (DatastoreServiceImpl.java:137)
> at com.google.appengine.api.datastore.DatastoreServiceImpl.put
> (DatastoreServiceImpl.java:129)
> at com.google.apphosting.runtime.jetty.SessionManager
> $AppEngineSession.save(SessionManager.java:153)
> at com.google.apphosting.runtime.jetty.SaveSessionFilter.doFilter
> (SaveSessionFilter.java:41)
> at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter
> (ServletHandler.java:1084)
> at
> com.google.apphosting.utils.servlet.TransactionCleanupFilter.doFilter
> (TransactionCleanupFilter.java:43)
> at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter
> (ServletHandler.java:1084)
> at org.mortbay.jetty.servlet.ServletHandler.handle
> (ServletHandler.java:360)
> at org.mortbay.jetty.security.SecurityHandler.handle
> (SecurityHandler.java:216)
> at org.mortbay.jetty.servlet.SessionHandler.handle
> (SessionHandler.java:181)
> at org.mortbay.jetty.handler.ContextHandler.handle
> (ContextHandler.java:712)
> at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:
> 405)
> at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.handle
> (AppVersionHandlerMap.java:237)
> at org.mortbay.jetty.handler.HandlerWrapper.handle
> (HandlerWrapper.java:139)
> at org.mortbay.jetty.Server.handle(Server.java:313)
> at
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:
> 506)
> at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete
> (HttpConnection.java:830)
> at
> com.google.apphosting.runtime.jetty.RpcRequestParser.parseAvailable
> (RpcRequestParser.java:76)
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:381)
> at
> com.google.apphosting.runtime.jetty.JettyServletEngineAdapter.serviceReques t
> (JettyServletEngineAdapter.java:125)
> at com.google.apphosting.runtime.JavaRuntime.handleRequest
> (JavaRuntime.java:235)
> at com.google.apphosting.base.RuntimePb$EvaluationRuntime
> $6.handleBlockingRequest(RuntimePb.java:4755)
> at com.google.apphosting.base.RuntimePb$EvaluationRuntime
> $6.handleBlockingRequest(RuntimePb.java:4753)
> at com.google.net.rpc.impl.BlockingApplicationHandler.handleRequest
> (BlockingApplicationHandler.java:24)
> at com.google.net.rpc.impl.RpcUtil.runRpcInApplication(RpcUtil.java:
> 359)
> at com.google.net.rpc.impl.Server$2.run(Server.java:800)
> at com.google.tracing.LocalTraceSpanRunnable.run
> (LocalTraceSpanRunnable.java:56)
> at com.google.tracing.LocalTraceSpanBuilder.internalContinueSpan
> (LocalTraceSpanBuilder.java:510)
> at com.google.net.rpc.impl.Server.startRpc(Server.java:756)
> at com.google.net.rpc.impl.Server.processRequest(Server.java:348)
> at com.google.net.rpc.impl.ServerConnection.messageReceived
> (ServerConnection.java:459)
> at com.google.net.rpc.impl.RpcConnection.parseMessages
> (RpcConnection.java:319)
> at com.google.net.rpc.impl.RpcConnection.dataReceived
> (RpcConnection.java:290)
> at com.google.net.async.Connection.handleReadEvent(Connection.java:
> 419)
> at com.google.net.async.EventDispatcher.processNetworkEvents
> (EventDispatcher.java:762)
> at com.google.net.async.EventDispatcher.internalLoop
> (EventDispatcher.java:207)
> at com.google.net.async.EventDispatcher.loop(EventDispatcher.java:
> 101)
> at com.google.net.rpc.RpcService.runUntilServerShutdown
> (RpcService.java:251)
> at com.google.apphosting.runtime.JavaRuntime$RpcRunnable.run
> (JavaRuntime.java:373)
> at java.lang.Thread.run(Unknown Source)
> On Jun 8, 6:01 pm, "Jason (Google)" <apija...@google.com> wrote:
> > I'm sorry that your experience hasn't been entirely positive up to this
> > point -- hopefully we can help change that.
> > Have you re-deployed your application since these errors first appeared?
> > This shouldn't make any difference, but some developers have reported
> that
> > other intermittent issues have disappeared after a simple re-deploy, so
> it's
> > as good a place as any to start.
> > Are you in a position to roll back to using cookies to manage the state
> of
> > some or all of your users? It will help us know whether the introduction
> of
> > App Engine sessions is at fault here or if there is something else
> happening
> > behind-the-scenes. I would also be interested in knowing what type of
> data
> > you're using the session for since App Engine's datastore is used to back
> > these sessions (along with
Note I had actually already added retry logic before your post, so I
think my "natural" timeout rate is probably higher than a simple
glance at the error log would show. I look forward to the day when
bug 1227 is fixed and I can return to statelessness on the server-
side. :-)
Today, I only use Appengine sessions when the user is not logged in;
when the user is logged in, I always read/write directly to the
datastore. Based on this conversation, I'm tempted to always store
the data in the session, but to also back the data up in the datastore
when possible.
But what about those DTEs I'm seeing in Google's SaveSessionFilter?
Does the session filter already implement retry logic? If so, how
many times does it retry?
What exactly do users see when Google's SaveSessionFilter gets a
timeout writing to the data store? Does the client get a 500 error?
Is the data updated in memcache, or does a DTE in the session filter
prevent writing to the memcache?
In other words: can I trust Google's SaveSessionFilter? Or do I need
to rewrite it myself if I want to avoid these timeouts?
On Jun 10, 9:31 am, Don Schwarz <schwa...@google.com> wrote:
> Hi Dan,
> Your announcement that you had ported this game to App Engine showed up in
> my Google Alerts a few weeks after our launch and I was really excited to
> see it. I think this is a great example of the kind of application that we
> want to host on App Engine, and I'm definitely interested in helping to
> resolve these issues.
> The first problem you mentioned ishttp://code.google.com/p/googleappengine/issues/detail?id=1227. I've
> confirmed that this is a legitimate bug, although the behavior is the same
> in both Java and Python so I'm a bit surprised that it has gone unnoticed
> for this long. I'll try to get this fixed in the next major release of App
> Engine.
> As for the datastore errors, I've been monitoring your error rate for the
> past few days. You're seeing about a 0.05% - 0.1% error rate on your
> dynamic requests. If I assume those are all attributable to datastore
> timeouts, and that you are making an average of 1-2 datastore API calls per
> request, that's about a 0.1%-0.2% rate of datastore API calls that result in
> timeouts. This is a bit higher than average, but within the expected error
> rate for the datastore. The datastore is a highly distributed piece of
> infrastructure, and there will always be some non-zero failure rate. We're
> working on adding additional retry logic at a lower level which should help
> in some of the cases where your code is not directly involved (e.g.
> retrieving and storing sessions), but in your own code I would encourage you
> to add some retry logic.
> Another approach to avoid timeouts in Get's is to also store your data in
> memcache. You can eliminate a large number of your datastore gets by first
> trying to retrieve entities from memcache (and writing through to memcache
> after a put), which will significantly reduce the number of datastore get
> calls that you are making (and correspondingly reduce your exposure to
> datastore timeout issues). This is already happening behind the scenes for
> when retrieving sessions.
> Let me know if there is anything else I can do to help.
> Thanks,
> Don
> On Tue, Jun 9, 2009 at 4:10 PM, dfabulich <danfabul...@gmail.com> wrote:
> > I've redeployed dozens of times while investigating the problem.
> > I only have to persist one object: a simple dictionary hashmap of
> > strings to strings. It's always smaller than 4K (because it used to
> > fit in a single cookie when I ran it on a regular Tomcat machine).
> > The hashmap typically has a few dozen keys in it.
> > When I put the dictionary in the session I just put the entire map in
> > with session.setAttribute(myProps). The session auto-saves it to the
> > datastore as a blob.
> > When writing to the DB, I create a new entity, copy the values over,
> > and datastoreService.put(entity). That way it's saved to the
> > datastore as a structured entity.
> > In either case, sometimes the datastore barfs out a
> > DatastoreTimeoutException; the request fails, my users get errors, and
> > the write attempt is lost.
> > When you suggest partially switching back to cookies, what do you
> > mean, exactly? Do you mean I should try setting up an alternate
> > version that doesn't touch the datastore? (i.e. doesn't use sessions
> > AND doesn't use the datastore API?) I've done that; obviously I don't
> > get DatastoreTimeoutExceptions when I don't use the datastore... but I
> > assume that's not what you meant. :-)
> > Do you mean to try setting up a version of the app that doesn't have
> > <sessions-enabled> (so it's only playable if you're logged in), but
> > still uses the datastore API directly? I'm about 99% sure that I'll
> > get the same problem, but I really don't dare to do it because the
> > problem doesn't repro 100% of the time; the only way to get it to show
> > up would be to turn it on for all my users or test the app with a load
> > test tool, which I'd really rather not do because Google will bill me
> > for that! But if you write back and say that's the ONLY way you can
> > help me, I'll do it.
> > I will note that my logs show that I'm getting DTEs not only in my own
> > code, but also in Google's code! Here's a stacktrace, but the key
> > lines are highlighted below. It shows that even Google's code can't
> > write a 2 kilobyte hashmap to the datastore without timing out.
> > at com.google.apphosting.runtime.jetty.SessionManager
> > $AppEngineSession.save(SessionManager.java:153)
> > at com.google.apphosting.runtime.jetty.SaveSessionFilter.doFilter
> > (SaveSessionFilter.java:41)
> > Uncaught exception from servlet
> > com.google.appengine.api.datastore.DatastoreTimeoutException: Unknown
> > at
> > com.google.appengine.api.datastore.DatastoreApiHelper.translateError
> > (DatastoreApiHelper.java:37)
> > at
> > com.google.appengine.api.datastore.DatastoreApiHelper.makeSyncCall
> > (DatastoreApiHelper.java:55)
> > at com.google.appengine.api.datastore.DatastoreServiceImpl$2.run
> > (DatastoreServiceImpl.java:169)
> > at
> > com.google.appengine.api.datastore.TransactionRunner.runInTransaction
> > (TransactionRunner.java:29)
> > at com.google.appengine.api.datastore.DatastoreServiceImpl.put
> > (DatastoreServiceImpl.java:157)
> > at com.google.appengine.api.datastore.DatastoreServiceImpl.put
> > (DatastoreServiceImpl.java:137)
> > at com.google.appengine.api.datastore.DatastoreServiceImpl.put
> > (DatastoreServiceImpl.java:129)
> > at com.google.apphosting.runtime.jetty.SessionManager
> > $AppEngineSession.save(SessionManager.java:153)
> > at com.google.apphosting.runtime.jetty.SaveSessionFilter.doFilter
> > (SaveSessionFilter.java:41)
> > at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter
> > (ServletHandler.java:1084)
> > at
> > com.google.apphosting.utils.servlet.TransactionCleanupFilter.doFilter
> > (TransactionCleanupFilter.java:43)
> > at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter
> > (ServletHandler.java:1084)
> > at org.mortbay.jetty.servlet.ServletHandler.handle
> > (ServletHandler.java:360)
> > at org.mortbay.jetty.security.SecurityHandler.handle
> > (SecurityHandler.java:216)
> > at org.mortbay.jetty.servlet.SessionHandler.handle
> > (SessionHandler.java:181)
> > at org.mortbay.jetty.handler.ContextHandler.handle
> > (ContextHandler.java:712)
> > at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:
> > 405)
> > at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.handle
> > (AppVersionHandlerMap.java:237)
> > at org.mortbay.jetty.handler.HandlerWrapper.handle
> > (HandlerWrapper.java:139)
> > at org.mortbay.jetty.Server.handle(Server.java:313)
> > at
> > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:
> > 506)
> > at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete
> > (HttpConnection.java:830)
> > at
> > com.google.apphosting.runtime.jetty.RpcRequestParser.parseAvailable
> > (RpcRequestParser.java:76)
> > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:381)
> > at
> > com.google.apphosting.runtime.jetty.JettyServletEngineAdapter.serviceReques t
> > (JettyServletEngineAdapter.java:125)
> > at com.google.apphosting.runtime.JavaRuntime.handleRequest
> > (JavaRuntime.java:235)
> > at com.google.apphosting.base.RuntimePb$EvaluationRuntime
> > $6.handleBlockingRequest(RuntimePb.java:4755)
> > at com.google.apphosting.base.RuntimePb$EvaluationRuntime
> > $6.handleBlockingRequest(RuntimePb.java:4753)
> > at com.google.net.rpc.impl.BlockingApplicationHandler.handleRequest
> > (BlockingApplicationHandler.java:24)
> > at com.google.net.rpc.impl.RpcUtil.runRpcInApplication(RpcUtil.java:
> > 359)
> > at com.google.net.rpc.impl.Server$2.run(Server.java:800)
> > at com.google.tracing.LocalTraceSpanRunnable.run
> > (LocalTraceSpanRunnable.java:56)
> > at com.google.tracing.LocalTraceSpanBuilder.internalContinueSpan
> > (LocalTraceSpanBuilder.java:510)
> > at com.google.net.rpc.impl.Server.startRpc(Server.java:756)
> > at com.google.net.rpc.impl.Server.processRequest(Server.java:348)
> > at com.google.net.rpc.impl.ServerConnection.messageReceived
> > (ServerConnection.java:459)
> > at
Another question: I never need to query my entities; I only need to
store them in full and retrieve them by key. Would I have better luck
with the datastore timeouts if I saved my data as a single-column
blob? Does the number of columns have a meaningful impact on
datastore performance?
On Jun 12, 9:25 pm, dfabulich <danfabul...@gmail.com> wrote:
> Note I had actually already added retry logic before your post, so I
> think my "natural" timeout rate is probably higher than a simple
> glance at the error log would show. I look forward to the day when
> bug 1227 is fixed and I can return to statelessness on the server-
> side. :-)
> Today, I only use Appengine sessions when the user is not logged in;
> when the user is logged in, I always read/write directly to the
> datastore. Based on this conversation, I'm tempted to always store
> the data in the session, but to also back the data up in the datastore
> when possible.
> But what about those DTEs I'm seeing in Google's SaveSessionFilter?
> Does the session filter already implement retry logic? If so, how
> many times does it retry?
> What exactly do users see when Google's SaveSessionFilter gets a
> timeout writing to the data store? Does the client get a 500 error?
> Is the data updated in memcache, or does a DTE in the session filter
> prevent writing to the memcache?
> In other words: can I trust Google's SaveSessionFilter? Or do I need
> to rewrite it myself if I want to avoid these timeouts?
> On Jun 10, 9:31 am, Don Schwarz <schwa...@google.com> wrote:
> > Hi Dan,
> > Your announcement that you had ported this game to App Engine showed up in
> > my Google Alerts a few weeks after our launch and I was really excited to
> > see it. I think this is a great example of the kind of application that we
> > want to host on App Engine, and I'm definitely interested in helping to
> > resolve these issues.
> > The first problem you mentioned ishttp://code.google.com/p/googleappengine/issues/detail?id=1227. I've
> > confirmed that this is a legitimate bug, although the behavior is the same
> > in both Java and Python so I'm a bit surprised that it has gone unnoticed
> > for this long. I'll try to get this fixed in the next major release of App
> > Engine.
> > As for the datastore errors, I've been monitoring your error rate for the
> > past few days. You're seeing about a 0.05% - 0.1% error rate on your
> > dynamic requests. If I assume those are all attributable to datastore
> > timeouts, and that you are making an average of 1-2 datastore API calls per
> > request, that's about a 0.1%-0.2% rate of datastore API calls that result in
> > timeouts. This is a bit higher than average, but within the expected error
> > rate for the datastore. The datastore is a highly distributed piece of
> > infrastructure, and there will always be some non-zero failure rate. We're
> > working on adding additional retry logic at a lower level which should help
> > in some of the cases where your code is not directly involved (e.g.
> > retrieving and storing sessions), but in your own code I would encourage you
> > to add some retry logic.
> > Another approach to avoid timeouts in Get's is to also store your data in
> > memcache. You can eliminate a large number of your datastore gets by first
> > trying to retrieve entities from memcache (and writing through to memcache
> > after a put), which will significantly reduce the number of datastore get
> > calls that you are making (and correspondingly reduce your exposure to
> > datastore timeout issues). This is already happening behind the scenes for
> > when retrieving sessions.
> > Let me know if there is anything else I can do to help.
> > Thanks,
> > Don
> > On Tue, Jun 9, 2009 at 4:10 PM, dfabulich <danfabul...@gmail.com> wrote:
> > > I've redeployed dozens of times while investigating the problem.
> > > I only have to persist one object: a simple dictionary hashmap of
> > > strings to strings. It's always smaller than 4K (because it used to
> > > fit in a single cookie when I ran it on a regular Tomcat machine).
> > > The hashmap typically has a few dozen keys in it.
> > > When I put the dictionary in the session I just put the entire map in
> > > with session.setAttribute(myProps). The session auto-saves it to the
> > > datastore as a blob.
> > > When writing to the DB, I create a new entity, copy the values over,
> > > and datastoreService.put(entity). That way it's saved to the
> > > datastore as a structured entity.
> > > In either case, sometimes the datastore barfs out a
> > > DatastoreTimeoutException; the request fails, my users get errors, and
> > > the write attempt is lost.
> > > When you suggest partially switching back to cookies, what do you
> > > mean, exactly? Do you mean I should try setting up an alternate
> > > version that doesn't touch the datastore? (i.e. doesn't use sessions
> > > AND doesn't use the datastore API?) I've done that; obviously I don't
> > > get DatastoreTimeoutExceptions when I don't use the datastore... but I
> > > assume that's not what you meant. :-)
> > > Do you mean to try setting up a version of the app that doesn't have
> > > <sessions-enabled> (so it's only playable if you're logged in), but
> > > still uses the datastore API directly? I'm about 99% sure that I'll
> > > get the same problem, but I really don't dare to do it because the
> > > problem doesn't repro 100% of the time; the only way to get it to show
> > > up would be to turn it on for all my users or test the app with a load
> > > test tool, which I'd really rather not do because Google will bill me
> > > for that! But if you write back and say that's the ONLY way you can
> > > help me, I'll do it.
> > > I will note that my logs show that I'm getting DTEs not only in my own
> > > code, but also in Google's code! Here's a stacktrace, but the key
> > > lines are highlighted below. It shows that even Google's code can't
> > > write a 2 kilobyte hashmap to the datastore without timing out.
> > > at com.google.apphosting.runtime.jetty.SessionManager
> > > $AppEngineSession.save(SessionManager.java:153)
> > > at com.google.apphosting.runtime.jetty.SaveSessionFilter.doFilter
> > > (SaveSessionFilter.java:41)
> > > Uncaught exception from servlet
> > > com.google.appengine.api.datastore.DatastoreTimeoutException: Unknown
> > > at
> > > com.google.appengine.api.datastore.DatastoreApiHelper.translateError
> > > (DatastoreApiHelper.java:37)
> > > at
> > > com.google.appengine.api.datastore.DatastoreApiHelper.makeSyncCall
> > > (DatastoreApiHelper.java:55)
> > > at com.google.appengine.api.datastore.DatastoreServiceImpl$2.run
> > > (DatastoreServiceImpl.java:169)
> > > at
> > > com.google.appengine.api.datastore.TransactionRunner.runInTransaction
> > > (TransactionRunner.java:29)
> > > at com.google.appengine.api.datastore.DatastoreServiceImpl.put
> > > (DatastoreServiceImpl.java:157)
> > > at com.google.appengine.api.datastore.DatastoreServiceImpl.put
> > > (DatastoreServiceImpl.java:137)
> > > at com.google.appengine.api.datastore.DatastoreServiceImpl.put
> > > (DatastoreServiceImpl.java:129)
> > > at com.google.apphosting.runtime.jetty.SessionManager
> > > $AppEngineSession.save(SessionManager.java:153)
> > > at com.google.apphosting.runtime.jetty.SaveSessionFilter.doFilter
> > > (SaveSessionFilter.java:41)
> > > at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter
> > > (ServletHandler.java:1084)
> > > at
> > > com.google.apphosting.utils.servlet.TransactionCleanupFilter.doFilter
> > > (TransactionCleanupFilter.java:43)
> > > at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter
> > > (ServletHandler.java:1084)
> > > at org.mortbay.jetty.servlet.ServletHandler.handle
> > > (ServletHandler.java:360)
> > > at org.mortbay.jetty.security.SecurityHandler.handle
> > > (SecurityHandler.java:216)
> > > at org.mortbay.jetty.servlet.SessionHandler.handle
> > > (SessionHandler.java:181)
> > > at org.mortbay.jetty.handler.ContextHandler.handle
> > > (ContextHandler.java:712)
> > > at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:
> > > 405)
> > > at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.handle
> > > (AppVersionHandlerMap.java:237)
> > > at org.mortbay.jetty.handler.HandlerWrapper.handle
> > > (HandlerWrapper.java:139)
> > > at org.mortbay.jetty.Server.handle(Server.java:313)
> > > at
> > > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:
> > > 506)
> > > at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete
> > > (HttpConnection.java:830)
> > > at
> > > com.google.apphosting.runtime.jetty.RpcRequestParser.parseAvailable
> > > (RpcRequestParser.java:76)
> > > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:381)
> > > at
> > > com.google.apphosting.runtime.jetty.JettyServletEngineAdapter.serviceReques t
> > > (JettyServletEngineAdapter.java:125)
> > > at com.google.apphosting.runtime.JavaRuntime.handleRequest
> > > (JavaRuntime.java:235)
> > > at com.google.apphosting.base.RuntimePb$EvaluationRuntime
> > > $6.handleBlockingRequest(RuntimePb.java:4755)
> > > at com.google.apphosting.base.RuntimePb$EvaluationRuntime
> > > $6.handleBlockingRequest(RuntimePb.java:4753)
> > > at
> Another question: I never need to query my entities; I only need to
> store them in full and retrieve them by key. Would I have better luck
> with the datastore timeouts if I saved my data as a single-column
> blob? Does the number of columns have a meaningful impact on
> datastore performance?
It does have an impact on performance (mostly due to having to update
indexes), but the timeouts you are almost certainly due to data being
temporarily unavailable rather than actually taking too long to process, so
I would not expect it to matter in your case.
> On Jun 12, 9:25 pm, dfabulich <danfabul...@gmail.com> wrote:
> > Thanks for your helpful remarks, Don.
> > Note I had actually already added retry logic before your post, so I
> > think my "natural" timeout rate is probably higher than a simple
> > glance at the error log would show. I look forward to the day when
> > bug 1227 is fixed and I can return to statelessness on the server-
> > side. :-)
> > Today, I only use Appengine sessions when the user is not logged in;
> > when the user is logged in, I always read/write directly to the
> > datastore. Based on this conversation, I'm tempted to always store
> > the data in the session, but to also back the data up in the datastore
> > when possible.
> > But what about those DTEs I'm seeing in Google's SaveSessionFilter?
> > Does the session filter already implement retry logic? If so, how
> > many times does it retry?
> > What exactly do users see when Google's SaveSessionFilter gets a
> > timeout writing to the data store? Does the client get a 500 error?
> > Is the data updated in memcache, or does a DTE in the session filter
> > prevent writing to the memcache?
> > In other words: can I trust Google's SaveSessionFilter? Or do I need
> > to rewrite it myself if I want to avoid these timeouts?
We do not currently have any retry logic in SaveSessionFilter. This is
something that we'll investigate to help resolve the issue that you filed.
I believe that a datastore timeout received while saving a session currently
results in a 500 to the user. It does not flush the updated session to
memcache (which I believe is a good thing, as otherwise they would be
inconsistent).
> > On Jun 10, 9:31 am, Don Schwarz <schwa...@google.com> wrote:
> > > Hi Dan,
> > > Your announcement that you had ported this game to App Engine showed up
> in
> > > my Google Alerts a few weeks after our launch and I was really excited
> to
> > > see it. I think this is a great example of the kind of application
> that we
> > > want to host on App Engine, and I'm definitely interested in helping to
> > > resolve these issues.
> > > The first problem you mentioned ishttp://
> code.google.com/p/googleappengine/issues/detail?id=1227. I've
> > > confirmed that this is a legitimate bug, although the behavior is the
> same
> > > in both Java and Python so I'm a bit surprised that it has gone
> unnoticed
> > > for this long. I'll try to get this fixed in the next major release of
> App
> > > Engine.
> > > As for the datastore errors, I've been monitoring your error rate for
> the
> > > past few days. You're seeing about a 0.05% - 0.1% error rate on your
> > > dynamic requests. If I assume those are all attributable to datastore
> > > timeouts, and that you are making an average of 1-2 datastore API calls
> per
> > > request, that's about a 0.1%-0.2% rate of datastore API calls that
> result in
> > > timeouts. This is a bit higher than average, but within the expected
> error
> > > rate for the datastore. The datastore is a highly distributed piece of
> > > infrastructure, and there will always be some non-zero failure rate.
> We're
> > > working on adding additional retry logic at a lower level which should
> help
> > > in some of the cases where your code is not directly involved (e.g.
> > > retrieving and storing sessions), but in your own code I would
> encourage you
> > > to add some retry logic.
> > > Another approach to avoid timeouts in Get's is to also store your data
> in
> > > memcache. You can eliminate a large number of your datastore gets by
> first
> > > trying to retrieve entities from memcache (and writing through to
> memcache
> > > after a put), which will significantly reduce the number of datastore
> get
> > > calls that you are making (and correspondingly reduce your exposure to
> > > datastore timeout issues). This is already happening behind the scenes
> for
> > > when retrieving sessions.
> > > Let me know if there is anything else I can do to help.
> > > Thanks,
> > > Don
> > > On Tue, Jun 9, 2009 at 4:10 PM, dfabulich <danfabul...@gmail.com>
> wrote:
> > > > I've redeployed dozens of times while investigating the problem.
> > > > I only have to persist one object: a simple dictionary hashmap of
> > > > strings to strings. It's always smaller than 4K (because it used to
> > > > fit in a single cookie when I ran it on a regular Tomcat machine).
> > > > The hashmap typically has a few dozen keys in it.
> > > > When I put the dictionary in the session I just put the entire map in
> > > > with session.setAttribute(myProps). The session auto-saves it to the
> > > > datastore as a blob.
> > > > When writing to the DB, I create a new entity, copy the values over,
> > > > and datastoreService.put(entity). That way it's saved to the
> > > > datastore as a structured entity.
> > > > In either case, sometimes the datastore barfs out a
> > > > DatastoreTimeoutException; the request fails, my users get errors,
> and
> > > > the write attempt is lost.
> > > > When you suggest partially switching back to cookies, what do you
> > > > mean, exactly? Do you mean I should try setting up an alternate
> > > > version that doesn't touch the datastore? (i.e. doesn't use sessions
> > > > AND doesn't use the datastore API?) I've done that; obviously I
> don't
> > > > get DatastoreTimeoutExceptions when I don't use the datastore... but
> I
> > > > assume that's not what you meant. :-)
> > > > Do you mean to try setting up a version of the app that doesn't have
> > > > <sessions-enabled> (so it's only playable if you're logged in), but
> > > > still uses the datastore API directly? I'm about 99% sure that I'll
> > > > get the same problem, but I really don't dare to do it because the
> > > > problem doesn't repro 100% of the time; the only way to get it to
> show
> > > > up would be to turn it on for all my users or test the app with a
> load
> > > > test tool, which I'd really rather not do because Google will bill me
> > > > for that! But if you write back and say that's the ONLY way you can
> > > > help me, I'll do it.
> > > > I will note that my logs show that I'm getting DTEs not only in my
> own
> > > > code, but also in Google's code! Here's a stacktrace, but the key
> > > > lines are highlighted below. It shows that even Google's code can't
> > > > write a 2 kilobyte hashmap to the datastore without timing out.