Bulk Uploader with ReferenceProperty

35 views
Skip to first unread message

Gayle Laakmann

unread,
May 1, 2008, 2:17:22 PM5/1/08
to Google App Engine
Suppose my datastore looks something like this:

class UserInfo(db.Model):
name = db.StringProperty()
email = db.StringProperty()

class Message(db.Model):
author = db.ReferenceProperty(UserInfo)
message = db.StringProperty()

How do I use the bulk uploaded to create the reference properties?
The documentation is rather lacking for the bulk uploader.

Matteo Crippa

unread,
May 1, 2008, 4:23:44 PM5/1/08
to Google App Engine

Brett Morgan

unread,
May 1, 2008, 9:07:50 PM5/1/08
to google-a...@googlegroups.com
Why are you normalising your data structure like that?

This will mean to render any one page, say a page of communications
for a user, is going to require pulling a large number of DataStore
entities from the database. This isn't optimal.

It would be more performant to denormalise your datastore such that to
render a user's messages it only requires loading one DataStore entry.

--

Brett Morgan http://brett.morgan.googlepages.com/

Gayle Laakmann

unread,
May 2, 2008, 2:16:10 PM5/2/08
to Google App Engine
That article is a good explanation on how to use BlobProperty, but I'm
not sure how that helps me with the bulk uploader...

Gayle Laakmann

unread,
May 2, 2008, 2:38:19 PM5/2/08
to Google App Engine
Good point Brett, and I'll keep that in mind. My question is really
about how you do any sort of processing on the data. The example
provided at http://code.google.com/appengine/articles/bulkload.html
simply takes a row of data and translates that into an entity. What
if I need to do something - anything - else?

Brett Morgan

unread,
May 2, 2008, 2:52:14 PM5/2/08
to google-a...@googlegroups.com
On Sat, May 3, 2008 at 4:38 AM, Gayle Laakmann <gay...@gmail.com> wrote:
>
> Good point Brett, and I'll keep that in mind. My question is really
> about how you do any sort of processing on the data. The example
> provided at http://code.google.com/appengine/articles/bulkload.html
> simply takes a row of data and translates that into an entity. What
> if I need to do something - anything - else?

For one of the tools that I am working on, a time series data
visualiser, I am actually contemplating whether to do the decimating
calculations in the server side code, or in the client side code, of
my load process. The calculations are actually reasonably light in
this case, so I can probably get away with server side code. But if I
were doing anything heavier, i'd probably pre-process my data client
side before uploading it.

You can do work server side, but you just have to do it in multiple
calls, polled from some client side web client. It means you can add a
progress bar to an admin interface, if nothing else.

Matteo Crippa

unread,
May 2, 2008, 3:03:46 PM5/2/08
to Google App Engine
You are right :x Sorry for mis-posting...

Btw according to this: http://code.google.com/appengine/docs/datastore/entitiesandmodels.html#References

You could first populate your UserInfo db

And then the Message one passing message and email.

You have to pass email due to the fact that it is an unique input and
it will allow you easily to retrieve the key of proper user and store
it in the db populating the author value.

Gayle Laakmann

unread,
May 2, 2008, 3:44:51 PM5/2/08
to Google App Engine
Could you give a bit more detail on the syntax? I mean, I understand
the algorithm - that's easy. I just don't understand the syntax for
doing it. Here, I'll take a shot at it and maybe you can correct it?

The csv looks like this:
jo...@email.com, Hello
b...@foo.com, How are you

And I have this method:
def getUserByEmail(self, email):
user = ... # look up in the datastore
return user

And here's my shot at when to use getUserByEmail. I know that this
isn't correct, because this syntax would imply that getUserByEmail
takes two parameters, which it doesn't. So what's the correct way to
write this?

class MessageLoader(bulkload.Loader):
def __init__(self):
# Our 'Message' entity contains an email string and a message
bulkload.Loader.__init__(self, 'Message',
[self.getUserByEmail('email', str),
('message', str),
])

def HandleEntity(self, entity):
ent = search.SearchableEntity(entity)
return ent

if __name__ == '__main__':
bulkload.main(MessageLoader())



On May 2, 12:03 pm, Matteo Crippa <matteo.cri...@gmail.com> wrote:
> You are right :x Sorry for mis-posting...
>
> Btw according to this:http://code.google.com/appengine/docs/datastore/entitiesandmodels.htm...

RobT

unread,
May 2, 2008, 5:32:58 PM5/2/08
to Google App Engine
Brett,

Because I'm curious, how would you suggest setting up the data
structure in this example?


On May 1, 6:07 pm, "Brett Morgan" <brett.mor...@gmail.com> wrote:

Matteo Crippa

unread,
May 3, 2008, 6:15:41 AM5/3/08
to Google App Engine
I have just had a look to /ext/bulkload/__init__.py and probably you
can do what are you willing for in only one step and not in two as i
suggest above.

Looking at source code I found this:

If you want to do extra processing before the entities are stored, you
can
subclass Loader and override HandleEntity. HandleEntity is called once
with
each entity that is imported from the CSV data. You can return one or
more
entities from HandleEntity to be stored in its place, or None if
nothing
should be stored.

For example, this loads calendar events and stores them as
datastore_entities.Event entities. It also populates their author
field with a
reference to the corresponding datastore_entites.Contact entity. If no
Contact
entity exists yet for the given author, it creates one and stores it
first.

class EventLoader(bulkload.Loader):
def __init__(self):
EventLoader.__init__(self, 'Event',
[('title', str),
('creator', str),
('where', str),
('startTime', lambda x:

datetime.datetime.fromtimestamp(float(x))),
])

def HandleEntity(self, entity):
event = datastore_entities.Event(entity.title)
event.update(entity)

creator = event['creator']
if creator:
contact = datastore.Query('Contact', {'title': creator}).Get(1)
if not contact:
contact = [datastore_entities.Contact(creator)]
datastore.Put(contact[0])
event['author'] = contact[0].key()

return event

if __name__ == '__main__':
bulkload.main(EventLoader())
"""

In other words you can try to modify this snippet in order to populate
both the two db, but you can operate on datas only inside HandleEvent
re-declaration and not in bulkload.Loader.
Reply all
Reply to author
Forward
0 new messages