how should I clean up inside the datastore?

58 views
Skip to first unread message

conman

unread,
Jun 14, 2008, 2:01:20 AM6/14/08
to Google App Engine
I must ask my question again:
how should one manage to delete a large set of entities in the
datastore if process time is restricted?
It is easy to create a large amount of entites (distributed over many
requests) but I don't see how I can get rid of them.

e = Model.all().fetch(...)
db.delete(e)
That seems to be the most efficent way to do it and it must run insede
a loop because of the fetch limit of 1000 - is that correct?
But then all entities must be in one entity group. The docs state that
entity groups should be kept es small as possible so storing a large
amount in a group only for efficent deleting is no clever solution.

A silly thought:
Maybe it is not necessary to delete datastore entites if I cut the
connection to them?
i.e. I have a lot Posts in a Forum and I delete the Forum - my Posts
woun't be used by my application anymore and I can forget about
them... but does that lead to exceeding my app quota? I read a storage
limit of 500 MB - but is that datastore storage or file system
storage?... Ok I am pretty sure it's datastore storage :)

You see I am pretty stuck how to handle this problem...
How do you manage the cleanup in you applications?

Thanks!
Constantin

Marzia Niccolai

unread,
Jun 16, 2008, 3:06:18 PM6/16/08
to google-a...@googlegroups.com
Hi Constantin,
 In order to bulk delete entities from the datastore, you will need to set up a handler that will handle fetching and deleting entities a few at a time, then set up a job to hit that URL every 10 or so seconds. Or you can delete them manually through the admin console's data viewer.

 I know this isn't the ideal solution, and we are working on making these tasks easier.

 You could, as you have suggested, simply not delete the data, and just ignore the entities in your datastore you don't want to use.  However, as you pointed out, these entities will count against your 500MB storage limit.

-Marzia

Brett Morgan

unread,
Jun 17, 2008, 5:43:27 AM6/17/08
to google-a...@googlegroups.com
Marzia,

I have a short (ten line?) script that is polling my gae app
(taskr.appspot.com) and deleting dead data. I appear to have confused
the data viewer in the gae admin interface and it is throwing 500
errors.

I'm just curious if i am doing anything bad here?

brett

--

Brett Morgan http://brett.morgan.googlepages.com/

Roger Filomeno

unread,
Jun 17, 2008, 5:47:23 AM6/17/08
to google-a...@googlegroups.com
Datastore seems to be down at the moment?
--
--
Roger P. Filomeno
TwitSnap: The coolest Twitter widget for your site! http://www.twitsnap.com

http://corruptedpartition.blogspot.com/
TEL#: +1-360-968-1767
SMS: send MSG GODIE <YOUR MESSAGE> to 2948

$> who | grep -i blond | date; cd ~; unzip; touch; strip; finger; mount; gasp; yes; uptime; umount; sleep

javaDinosaur

unread,
Jun 17, 2008, 6:51:08 AM6/17/08
to Google App Engine
> Datastore seems to be down at the moment?

I make that 3 recent reports about a Datastore outage.

Best keep an eye on things here... http://groups.google.com/group/google-appengine-downtime-notify

GAEfan

unread,
Jun 19, 2008, 12:41:02 PM6/19/08
to Google App Engine
Marzia,

Please help me to understand the structure here...

I have uploaded my CSV file via GAE BulkUploader. Then, I made
changes to it and uploaded it again. It did not overwrite the first
set, so now my data is duplicated. Understood.

I then went into the Dashboard/DataViewer and manually deleted all the
entries, then uploaded my CSV fresh. Now, the ID/Name starts at 1234,
instead of 1. I really would just like to start from a clean slate,
and delete that entire Datastore/Model. Any way to do that?

What if I just rename my Model locally, say from 'db1' to 'db2' and re-
deploy? I would imagine that would give me a fresh start, but what
happens to db1? It will no longer be linked to any model I have in my
app. Will it eventually die? Will it count against my quota?

Again, the question is, "Any way to delete it?"

Thanks,

Ken

Bryan Donlan

unread,
Jun 19, 2008, 4:09:09 PM6/19/08
to Google App Engine
The automatically-generated keys are not guarenteed to go in
sequential order, nor are they guarenteed to start at any particular
value. Just don't worry about it, and engineer your app to not care
what values are automatically assigned (either deal with whatever GAE
picks, or provide a key of your own)

GAEfan

unread,
Jun 19, 2008, 4:46:49 PM6/19/08
to Google App Engine
Then what purpose do they serve, and what good are they as KEYS?

Keys are not meant to be random.

Still would like to delete my Models on the real datastore.

Thanks.

Bryan Donlan

unread,
Jun 19, 2008, 4:53:21 PM6/19/08
to Google App Engine


On Jun 19, 4:46 pm, GAEfan <ken...@gmail.com> wrote:
> Then what purpose do they serve, and what good are they as KEYS?
>
> Keys are not meant to be random.

Sure they are. A key names a record. It doesn't particularly have any
meaning besides what you assign to it. Do you really need to control
the meaning of the key?

Remember that having a monotonously incrementing counter is a write
bottleneck and will limit scalability. Not to mention that if a server
fails between getting an ID and writing an object using it, you'll
have a sequence number gap.

> Still would like to delete my Models on the real datastore.

Delete everything in the model and wait a while. The models aren't
explicitly stored (as explained in one of the IO videos), but
(apparently) instead computed from the data inserted, and periodically
pruned.

>
> Thanks.

Bryan Donlan

unread,
Jun 19, 2008, 4:54:39 PM6/19/08
to Google App Engine
On Thu, Jun 19, 2008 at 4:53 PM, Bryan Donlan <bdo...@gmail.com> wrote:
>
>
>
> On Jun 19, 4:46 pm, GAEfan <ken...@gmail.com> wrote:
>> Then what purpose do they serve, and what good are they as KEYS?
>>
>> Keys are not meant to be random.
>
> Sure they are. A key names a record. It doesn't particularly have any
> meaning besides what you assign to it. Do you really need to control
> the meaning of the key?

I should clarify this - if you want to assign meaning to the key, you
should pick the key yourself. For example, a username might make for a
good key; or you can just not care what the key is.

GAEfan

unread,
Jun 19, 2008, 5:02:52 PM6/19/08
to Google App Engine
So that may answer the question...

If the old datastore models are "periodically pruned", then I can just
rename the model and start with a new one? The old one will get
purged eventually?

If that is the correct answer, I can live with that (as I don't think
I'll run up against the quota while waiting for the purge).

Still would be nice and concrete to be able to clear the datastore and
start fresh.

I'll go with that and see what happens.

Thanks, Bryan.

Bryan Donlan

unread,
Jun 19, 2008, 6:17:48 PM6/19/08
to google-a...@googlegroups.com
On Thu, Jun 19, 2008 at 5:02 PM, GAEfan <ken...@gmail.com> wrote:
>
> So that may answer the question...
>
> If the old datastore models are "periodically pruned", then I can just
> rename the model and start with a new one? The old one will get
> purged eventually?
>
> If that is the correct answer, I can live with that (as I don't think
> I'll run up against the quota while waiting for the purge).

I don't think models count against your quota - provided there's no
objects in the datastore with that model.

Why bother renaming it though? Just use the same model. You /will/ get
keys out of sequence later, no matter what, so might as well deal with
it now.

javaDinosaur

unread,
Jun 19, 2008, 6:57:53 PM6/19/08
to Google App Engine
> Keys are not meant to be random.

GAEfan you could find it useful to do a bit of background reading
on"surrogate" v. "natural" keys. The appengine datastore defaults to a
surrogate key allocation system. Google has made the correct design
call here IMO.

http://en.wikipedia.org/wiki/Surrogate_key

http://www.agiledata.org/essays/keys.html#Comparison

I agree that the Datastore needs to support a batch truncate API. Also
an option to reset the surrogate key base number to 1 would be nice,
Bryan is correct that such a reset feature is not essential but
classic SQL databases support the reset concept so it must be a
reasonable idea. When regenerating test or demo datasets it is useful
if entities have familiar ID values.

Brett Morgan

unread,
Jun 19, 2008, 7:39:34 PM6/19/08
to google-a...@googlegroups.com
First point, we are working on a distributed system.

Counters are inherently hard to do on a distributed system. To be able
to scale and to handle failure, keys need to be random.

There are ways of getting a sequence for created items (see the google
io talks), but the problem is that you introduce a serious bottleneck
in doing so.

Why do you need a sequence for the keys? Auditability for completeness?

--

Brett Morgan http://brett.morgan.googlepages.com/

Brett Morgan

unread,
Jun 19, 2008, 7:42:07 PM6/19/08
to google-a...@googlegroups.com
On Fri, Jun 20, 2008 at 8:57 AM, javaDinosaur <jonat...@hotmail.co.uk> wrote:
> I agree that the Datastore needs to support a batch truncate API.

Do we have a issue for this yet?

> Also
> an option to reset the surrogate key base number to 1 would be nice,
> Bryan is correct that such a reset feature is not essential but
> classic SQL databases support the reset concept so it must be a
> reasonable idea. When regenerating test or demo datasets it is useful
> if entities have familiar ID values.

Keeping people in familiar territory is not a reasonable goal. It is
mutually incompatible with GAE's goal of scaling to millions of users.
Using sequences, and other global state, are the first things you need
to toss out the window to hit scale.

javaDinosaur

unread,
Jun 19, 2008, 8:24:46 PM6/19/08
to Google App Engine
> It is
> mutually incompatible with GAE's goal of scaling to millions of users.
> Using sequences, and other global state, are the first things you need
> to toss out the window to hit scale.

Err. I am only suggesting that the existing per application global
datastore ID sequence number could be reset when a future datastore
purge_all function completed it purge.
Reply all
Reply to author
Forward
0 new messages