Re: [google-appengine-go] App crashing without any info

75 views
Skip to first unread message

David Symonds

unread,
May 11, 2013, 7:11:54 PM5/11/13
to alexandr...@gmail.com, google-ap...@googlegroups.com
Yes, panics are normally captured and presented in the admin console
logs. The failure mode you are observing will usually happen if either
some part of your app invokes something like log.Fatal (or os.Exit),
or it panics outside the goroutine that is servicing a request.

Alexandre Normand

unread,
May 11, 2013, 10:28:26 PM5/11/13
to google-ap...@googlegroups.com, alexandr...@gmail.com
Interesting. This makes sense to me that an explicit exit (direct or indirect) would manifest like this but it's not the case here: I double checked and I'm not triggering any form of exit. The last possibility is a panic in a goroutine but I'm not using any goroutines in my app at the moment, at least not knowingly. I guess it's possible that the datastore calls I'm making could trigger execution of goroutines which could potentially panic. I alluded to using task queues for the persistence of my batches of data in my original post but I still haven't started any of that work and my current implementation still runs datastore Puts serially and in the scope of a user request.

I'll do a quick scan of the datastore source for goroutines hoping to shed some light on this but that could potentially just isolate the problem to my usage of the datastore or a bug in the datastore? Fixing it is still tricky because I don't have any trace of that potential panic in an independent goroutine. The only bit of info I can actually use to troubleshoot this is the last data that gets Put in the datastore. I should be able to identify what bit of data it's failing on and maybe that could be enough to expose the issue. 

All that said, I'm open to suggestions/tips. I'm fairly new to GAE/Go troubleshooting. 

Cheers! 

Alexandre Normand

unread,
May 12, 2013, 2:03:49 AM5/12/13
to google-ap...@googlegroups.com, alexandr...@gmail.com
I'm not sure how much this could be relevant to others but I'll post some findings here about this issue I'm having:
  • the datastore does use a goroutine for some work (the saveEntity() runs propertiesToProto() inside of a goroutine and loadEntity() does the same with protoToProperties())
  • after looking at the delay package, I decided to give that a try since I was going to do that anyway. It's somewhat helpful because I can clearly see two tasks that fail repeatedly in the queue. Each tasks actually does the persist of a chunk of data. If I could see the Call parameters, I would probably be able to identify quickly what the issue is but as far as I can tell, there's no way to see that from the task's mapreduce job details.  
  • one benefit of using tasks and delay to call each one is that I can actually see logs now for all that were successful and all units of work that are not causing problems actually do complete. It's not terribly useful to find what the problem is though. 
Overall, it seems like my problem occurs during the propertiesToProto() call done inside of the goroutine invoked in the datastore's saveEntity() and I'll probably be able to identify what data is causing the crash after some analysis. Hopefully, once I see the data, I'll know how to address it. 

I wish there was a way for the datastore to log all panics or that the behaviour of the dev instance allowed me to reproduce it locally. Which brings me to a question I keep asking myself: would the latter be a bug? I'm guessing it's expected that the dev instance isn't 100% faithful to a real GAE instance but should a scenario like the one I'm experiencing be consistent between dev/the real thing™?

David Symonds

unread,
May 13, 2013, 12:15:55 PM5/13/13
to Alexandre Normand, google-ap...@googlegroups.com
On Sat, May 11, 2013 at 11:03 PM, Alexandre Normand
<alexandr...@gmail.com> wrote:

> I wish there was a way for the datastore to log all panics or that the
> behaviour of the dev instance allowed me to reproduce it locally. Which
> brings me to a question I keep asking myself: would the latter be a bug? I'm
> guessing it's expected that the dev instance isn't 100% faithful to a real
> GAE instance but should a scenario like the one I'm experiencing be
> consistent between dev/the real thing™?

The dev and prod datastore behaviour is meant to be identical as much
as possible. It should be considered a bug if they diverge, though of
course there will be some cases where it is simply infeasible to
replicate the exact behaviour.

Alexandre Normand

unread,
May 13, 2013, 1:19:21 PM5/13/13
to google-ap...@googlegroups.com, Alexandre Normand
It's what I thought. I've seen problems before that only show up when running on prod (the 500 Puts in a MultiPut limit was one of them, but this one seemed legitimate) but I'm not sure yet if there's a valid explanation for this different behavior. I've been switching my code around to make it easier to troubleshoot but I'm still working on it. I'm doing more of that this week and hopefully, I'll find the root cause and be in a good position to determine if there's a bug or if there's a legitimate reason for the failure I'm experiencing. 

Thanks for the feedback.

Alexandre Normand

unread,
May 14, 2013, 2:26:41 AM5/14/13
to google-ap...@googlegroups.com, Alexandre Normand
Hey David,
Thanks again for the reply. The more I play with this, the more this seems like a bug. I'm still not sure if the bug would be that the datastore api fails ungracefully caused by me misusing it or if it's a legitimate bug for something that should be working. In any case, this isn't related to me reaching a quota limit and it works fine with the same data/scenario on a dev instance. 

I opened a ticket but I'm still very much in the dark as to what's really happening so there's not as much context as I'd like. 


Cheers,

David Symonds

unread,
May 14, 2013, 11:48:18 AM5/14/13
to Alexandre Normand, google-ap...@googlegroups.com
Thanks. Let's take further discussion on to the bug.
Reply all
Reply to author
Forward
0 new messages