Bulk upload creates duplicates

11 views
Skip to first unread message

caryp palmer

unread,
Apr 19, 2008, 7:55:23 PM4/19/08
to google-a...@googlegroups.com
I'm using the buildupload tool to post a csv file, but if I post the same file more than once the items are duplicated.  Assuming the primary key is auto-generate, how do I get a unique index on the model with a bulk load operation, to boot upload only allows 10 items per post.
 

Kevin Kuphal

unread,
Apr 20, 2008, 1:56:34 AM4/20/08
to Google App Engine
You have to define a handler that sets the key_name during the load.
I used something like this in myloader.py

class SomethingLoader(bulkload.Loader):
def __init__(self):
bulkload.Loader.__init__(self, 'Something',
[('key_name', str),
('name',str),
('status',str)
])
def HandleEntity(self, entity):
newent = datastore.Entity('Something',name=entity['key_name'])
del entity['key_name']
newent.update(entity)

I removed the key_name property because it also ended up being stored
(which I didn't want). Just pass the key_name in your CSV and it will
use it or just remove it from the CSV and generate it in the
HandleEntity if you need to. This ends up setting the entity just as
if you had passed key_name during the constructor in regular code.

Kevin

Jeff Hinrichs

unread,
Apr 20, 2008, 12:13:12 PM4/20/08
to Google App Engine


On Apr 19, 6:55 pm, "caryp palmer" <atomst...@gmail.com> wrote:
number of items to upload is user configurable. see usage --
batch_size

Usage:
../google_appengine/tools/bulkload_client.py [flags]

--debug Show debugging information. (Optional)
--cookie=<string> Whole Cookie header to supply to the server,
including
the parameter name (e.g., "ACSID=...").
(Optional)
--url=<string> URL endpoint to post to for importing data.
(Required)
--batch_size=<int> Number of Entity objects to include in each
post to
the URL endpoint. The more data per row/
Entity, the
smaller the batch size should be. (Default 10)
--filename=<path> Path to the CSV file to import. (Required)
--kind=<string> Name of the Entity object kind to put in the
datastore.
(Required)

Filip

unread,
Apr 24, 2008, 7:54:47 AM4/24/08
to Google App Engine
Kevin,

First, how did you find this info?

Second, what's the update function? I know put, and get_or_insert,
could you point me to the update docs? I'm assuming it does an update-
or-insert (upsert) kind of operation?

Third, for the handler, you do an update and return nothing, whereas
the Bulk Upload article returns a record. What's the story behind
that?

Thanks for the input,
Filip

ilial

unread,
Apr 26, 2008, 12:08:50 PM4/26/08
to Google App Engine
I've tried this with a SearchableEntity but it didn't work. Datastore
viewer showed the Key Name as empty.
Should work since SearchableEntity inherits from datastore.Entity.

def HandleEntity(self, entity):
name = entity['agency_name']+entity['reference_number']
logging.debug("name="+name)
ent = search.SearchableEntity(entity, name=name)
return ent

Any ideas?

ilia.



On Apr 20, 1:56 am, Kevin <kkup...@gmail.com> wrote:
> On Apr 19, 6:55 pm, "caryp palmer" <atomst...@gmail.com> wrote:
>
> > I'm using the buildupload tool to post a csv file, but if I post the same
> > file more than once the items are duplicated.  Assuming the primary key is
> > auto-generate, how do I get a unique index on the model with a bulk load
> > operation, to boot upload only allows 10 items per post.
>
> You have to define a handler that sets thekey_nameduring the load.
> I used something like this in myloader.py
>
> class SomethingLoader(bulkload.Loader):
>   def __init__(self):
>     bulkload.Loader.__init__(self, 'Something',
>                          [('key_name', str),
>                           ('name',str),
>                           ('status',str)
>                           ])
>   def HandleEntity(self, entity):
>     newent = datastore.Entity('Something',name=entity['key_name'])
>     del entity['key_name']
>     newent.update(entity)
>
> I removed thekey_nameproperty because it also ended up being stored
> (which I didn't want).  Just pass thekey_namein your CSV and it will
> use it or just remove it from the CSV and generate it in the
> HandleEntity if you need to.  This ends up setting the entity just as
> if you had passedkey_nameduring the constructor in regular code.
>
> Kevin

ilial

unread,
Apr 26, 2008, 12:36:27 PM4/26/08
to Google App Engine
argh finally got it:

def HandleEntity(self, entity):
name = entity['agency_name']+entity['reference_number']
logging.debug("name="+name)
newent = datastore.Entity('Contract', name=name)
newent.update(entity)
ent = search.SearchableEntity(newent)
return ent
Reply all
Reply to author
Forward
0 new messages