How to delete 800 mln records from Datastore?

86 views
Skip to first unread message

Kuba Włodarczyk

unread,
Nov 26, 2018, 11:13:44 AM11/26/18
to Google App Engine
I've got around 800 mln (!) records in my Datastore entities. How I can remove them quickly? 
I'm trying to delete them via deferred tasks (python script) but it is extremly slow...
I would appreciate any help. Thanks.

Vitaly Bogomolov

unread,
Nov 26, 2018, 2:27:56 PM11/26/18
to Google App Engine
Hi Kuba.

Free quota per day for entity deletes is 20K records. So for free you will deletes data for 80K days

Or you can delete this data in one day and will be charged $1.6K ($0.02 for every 20K deletes over quota) + additional costs for running instanses.


WBR, Vitaly.

Kuba Włodarczyk

unread,
Nov 26, 2018, 3:13:47 PM11/26/18
to google-a...@googlegroups.com
Thank you for your answer. That clarifies a lot. But beside costs how I can do this, let say in one day (expensive option)? I prefer python.
--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengi...@googlegroups.com.
To post to this group, send email to google-a...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/a57fc329-6aca-4db5-9c55-5650809374fb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Vitaly Bogomolov

unread,
Nov 26, 2018, 5:09:51 PM11/26/18
to Google App Engine
something like this. code not tested and constants may be different. except 800M ;)

from google.appengine.api.taskqueue import Queue, Task
from google.appengine.ext import ndb

QUEUE = Queue('default')


def backend_function():
    for j in xrange(800000000 / (100 * 20)):
        QUEUE.add([Task(url='/remove_100_records_handler') for i in range(20)])


def remove_100_records_handler():
    for i in range(100 / 20):
        ndb.delete_multi(YourDatastoreTable.query().fetch(20, keys_only=True))

WBR, Vitaly

Mohammad I (Cloud Platform Support)

unread,
Nov 26, 2018, 6:50:22 PM11/26/18
to Google App Engine

Hello Kuba,


You can delete entities in bulk from Cloud Datastore using Cloud Dataflow[1] which is a managed service for developing and executing data processing workflows. Please look at this section[2] for best practices for deletion from Cloud Datastore.


[1]https://cloud.google.com/datastore/docs/bulk-delete

[2]https://cloud.google.com/datastore/docs/best-practices#deletions



On Monday, November 26, 2018 at 2:13:47 PM UTC-6, Kuba Włodarczyk wrote:
Thank you for your answer. That clarifies a lot. But beside costs how I can do this, let say in one day (expensive option)? I prefer python.
W dniu pon., 26.11.2018 o 20:28 Vitaly Bogomolov <vit...68@gmail.com> napisał(a):
Hi Kuba.

Free quota per day for entity deletes is 20K records. So for free you will deletes data for 80K days

Or you can delete this data in one day and will be charged $1.6K ($0.02 for every 20K deletes over quota) + additional costs for running instanses.


WBR, Vitaly.

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscribe@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.

Attila-Mihaly Balazs

unread,
Nov 27, 2018, 12:07:50 AM11/27/18
to Google App Engine
AFAIK the simplest way to delete them which requires *no code to be written* is the deprecated but still working datastore admin. In your Google Cloud Console go to Datastore > Admin click on "Open Datastore Admin", select the entity kind you want to delete and click "Delete Entities". This will kick off a distributed, fan-out map reduce job which will delete the entities in a couple of hours.

Of course, as Vitaly said, this will cost you money.

Attila

Kuba Włodarczyk

unread,
Nov 27, 2018, 6:42:31 AM11/27/18
to Google App Engine
Thanks Attila-Mihaly,

That's not a problem if cost will around what Vitaly said. Regarding your solution I've tried "Datastore Admin" but delete tasks behave weird.
The job on the list says "(0 steps completed, 1 active) "
Going to details gives me this - please see screenshot attached.

Mohammad, thanks for your suggestions. I've tried this as well, however I couldn't track progress, so I stoped this task after 7h. Also I wasn't sure how to set up this task. I couldn't find any guide. I've entered query like "SELECT * from Transaction" - is that ok? Transaction is my entity I would like to remove totally.


Jakub
Screen Shot 2018-11-27 at 12.38.51.png

Amit (Google Cloud Support)

unread,
Nov 28, 2018, 6:01:46 PM11/28/18
to Google App Engine

Hello Kuba,


As I can see from the link [1] Mohammad shared, it provided the steps on how to setup the Cloud Dataflow to create a job to delete entities in bulk. I can see you are on the right track already. After selecting the ‘Datastore to Datastore Delete’ , you need to put that query if ‘Transaction’ is your entity name. You can monitor the progress using Cloud Dataflow Monitoring Interface. For more details regarding this , please check this link [2].


[1] https://cloud.google.com/datastore/docs/bulk-delete#deleting_entities_in_bulk

[2] https://cloud.google.com/dataflow/docs/guides/using-monitoring-intf


Reply all
Reply to author
Forward
0 new messages