Java App Engine bulk delete entities

43 views
Skip to first unread message

Nick

unread,
Feb 21, 2010, 4:13:20 PM2/21/10
to Google App Engine
I'm trying to delete all the entities of a specific type (150k+). My
app engine is in Java. This is a 5 minute task on any other database/
platform. I am unable to find a working, custom or OTB, solution!

Steps I have taken
1. Tried to follow the documented way to delete entities.
http://code.google.com/appengine/docs/java/datastore/queriesandindexes.html#Delete_By_Query
-This times out (App engine has a 30 second request limit) and deletes
nothing

2. Used this java class to delete records via a custom task (50
requests put in queue at a time)
http://stackoverflow.com/questions/108822/delete-all-data-for-a-kind-in-google-app-engine#answer-1882697
-dss.delete(keys) says I've deleted 128 to 256 entities each request
-I hit my CPU quota in ~4 hours. I've tried this for days with no end
in sight

3. Tried to manually delete the records via the "Datastore Viewer" on
the app engine website
-Datastore Viewer returns "Error: Server Error. The server encountered
an error and could not complete your request. If the problem persists,
please report your problem and mention this error message and the
query that caused it." (for days)
-Datastore Statistics returns the same error (for days)

4. Tried to learn/use the Python remote_api to delete my entities.
-After configuring the remote_api, installing the mac
GoogleAppEngineLauncher, and trying to follow a bunch of tutorials
(http://code.google.com/appengine/articles/remote_api.html). Still
working on this route, but this cannot be a real solution, right?
Requires custom code, working knowledge of Python, and manual
iterative execution- if it even works for Java entities.

5. Tried to search the forums
Datastore Viewer returns 500 error
-open since May 23, 2008 http://code.google.com/p/googleappengine/issues/detail?id=384.
Cites null properties in entities as the cause. Solution (might be) to
update or delete the entities....!

Jeff Schnitzer

unread,
Feb 22, 2010, 3:35:15 PM2/22/10
to google-a...@googlegroups.com
I don't think going to Python is going to help you in any way. Look
at your logs, is the bulk of the time spent in api_cpu_ms?

I've also noticed that deletes are slow and expensive. I haven't
found any public documentation about this, but I'm (wildly)
speculating it's an inherent aspect of BigTable - it probably does
some equivalent of a vacuum with every delete. It makes sense that
they would be more expensive than simple writes.

Your delete speed is not far off from my experience, but still pretty
bad. How many indexed fields do you have on that entity?

Here's what I'd suggest:

* Use the task queue and the low-level api with a keys-only query to
iterate and delete everything. Use smallish batches (maybe 50). I
find this handy:
http://code.google.com/p/gaevfs/source/browse/trunk/src/com/newatlanta/appengine/taskqueue/Deferred.java

* Sign up for billing so you don't get CPU limited. Maybe it'll cost
you a buck or two to delete everything. Yawn.

* Run and wait.

Alternatively, delete your project and start from scratch in a new application.

Jeff

> --
> You received this message because you are subscribed to the Google Groups "Google App Engine" group.
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
>
>

Reply all
Reply to author
Forward
0 new messages