Hi Everyone,
So I have been going through the process of deleting more things and
wanted to share some results. Previously I had deleted about ~300k
entities each with a single indexed list, with an average cost of
132ms per delete, using the mapreduce method.
Now I realise I need to change my schema again, and needed to delete
288373 entries from a kind, with the following properties:
public class Pw {
@Persistent private String[] a;
@Persistent private String[] b;
}
This time I used the Java remote_api library to batch delete with the
following python code:
class deleter(object):
def run(self, kind, batch_size=200):
q = db.GqlQuery("select __key__ from %s" % (kind))
entities = q.fetch(batch_size)
while entities:
db.delete(entities)
q.with_cursor(q.cursor())
entities = q.fetch(batch_size)
This job completed with 8.8 billed hours, or a 110ms average per
delete. However I had to restart the deletion several times, as I
would receive an "Unknown Java Error" from remote_api, which
correlated with a spike in the latency of the instance running the
remote_api, and nothing in the dashboard logs.
Later I decided to delete another 288373 entries from a kind, with the
following properties:
public class Pi {
@Persistent private String n;
@Persistent @Extension(vendorName = "datanucleus", key =
"gae.unindexed", value="true") private Long[] a;
@Persistent @Extension(vendorName = "datanucleus", key =
"gae.unindexed", value="true") private Long[] b;
}
This time I uploaded a Python project with the remote_api enabled and
fired up a shell, then deleted those entries with only 3.5 billed
hours, for an average of 44ms per delete!
The delete process using the Python remote_api occurred without
incident, no "Unknown Errors" were received.
Also, interestingly, according to the datastore status during this
time period it should have been costing between 100ms and 500ms per
delete!
Not sure if my 110ms average was due to more indexed properties or
using the Java remote_api?
Hopefully I won't have to delete much else in the future but at least
44ms is much more tolerable.
It would be nice to be able to mark a kind as "lazy delete" and allow
Google to delete the data at their convenience without being billed
(nightly maintenance?).
Cheers!
> > > > <
google-appengine%2Bunsu...@googlegroups.com<
google-appengine%252Buns...@googlegroups.com>
> > <
google-appengine%252Buns...@googlegroups.com<
google-appengine%25252Bun...@googlegroups.com>