Regularly hanging datastore backups

41 views
Skip to first unread message

re...@el-tramo.be

unread,
Jul 21, 2015, 6:00:41 AM7/21/15
to google-a...@googlegroups.com
Our app has been having recurring issues with never-ending datastore backups since the beginning of May. We do a nightly scheduled backup of our datastore database. Every now and then since May (sometimes a few days in a row, sometimes only once every weeks), we have backups that keep on running forever: 

- We can see in the datastore admin page that the backup is still running
- We see mapreduce tasks being constantly queued in our default queue, and backups never end (we have had backups running for more than weeks without ending). 
- Even after aborting the infinite backups, we keep seeing mapreduce tasks being queued every second, and the process can't be stopped. The result is that we constantly have ah-builtin-datastore instances running handling these requests that seemingly fail, and so our monthly bill is a lot higher than it usually is.

The only workaround we found was, after aborting the backup, disabling our default queue for a few seconds, purging it a few times, and then it seems to stop queuing mapreduce tasks when we enable it again.

Does anyone have any clue what's going on?

thanks,
Remko


PS: I filed an issue for this on https://code.google.com/p/googleappengine/issues (12120), but not getting any response there, so I'm trying the forums.

Nick (Cloud Platform Support)

unread,
Jul 21, 2015, 12:33:34 PM7/21/15
to google-a...@googlegroups.com, re...@el-tramo.be
Hi Remko,

Google Groups isn't the place to post specific technical issues, as this forum is meant more for general discussion of the platform and services. 

If you would like help with a technical issue, you should post to stackoverflow [1] or serverfault [2]. 

If you believe you've identified and can reproduce an issue with the platform itself (behaviour is different from documentation or error occurs during normal use), then you should proceed (as I note you have done) to open a public issue tracker [3] issue with enough detail to reproduce the issue on our side, or if possible, an attached app that can be used to directly observe the behaviour. 

Your issue report has a decent amount of information, and according to the traige process for public issue tracker issues, it will be shortly picked up and followed-up on. This triage process is done generally based on the most recent issues, the number of stars, and the severity of the issue.

As some general advice on your issue, given that the backup appears to be coordinated via a MapReduce job, pausing the relevant queue and purging all tasks should terminate the job, if you don't have access to the MapReduce job ID and the ability to call a programmatic abort.

In order to determine better what's happening, a more in-depth technical investigation will be needed, and at the time the issue is processed, more information may be requested from you and an engineer Google-side may be able to look into any information they can.

Given that you've already done everything you can (short of purchasing support [4] and opening a ticket), it appears the ball is in our court now, and given that public issue tracker support for individual issues is completely free, I hope you don't mind waiting a little while until the issue is triaged.

If you would like to open a thread in this forum discussing the platform or services in more broad terms, starting a discussion that would be useful for other users to join in to, feel free to do so.

Have a great day!

Reply all
Reply to author
Forward
0 new messages