Appengine datastore performance

22 views
Skip to first unread message

nick

unread,
Dec 4, 2010, 7:04:17 AM12/4/10
to google-a...@googlegroups.com
Hi!

I want to handle about millions of entries in one table. Normally i would query over every row and check for my filters (like name="foobar").

Is there a best practise or some stuff to read about to get more performance?


greets
nick

Wim den Ouden

unread,
Dec 4, 2010, 7:14:57 AM12/4/10
to google-a...@googlegroups.com
http://code.google.com/p/relat/wiki/gaetips#Mapreduce
gr
wim

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.



--
gr
Wim den Ouden
Gae based E-business (web) apps

nick

unread,
Dec 4, 2010, 7:34:43 AM12/4/10
to google-a...@googlegroups.com
wow. nice infos thanks! :-)

if i had millions of log entries stored in one table. (l_entry: title, content, date, host)

and now i want to have all log entries where host="myhost" (as a json response), how would you design the query? (this would be slow, doenst it?)

(excuse my bad english :-) )


thx and greets
nick

Wim den Ouden

unread,
Dec 4, 2010, 7:48:13 AM12/4/10
to google-a...@googlegroups.com
This is a link to what you need for *.yaml and a very simple demo, in the def (in this example lower_case_posts(entity)) you put the code. Mapreduce runs (sometimes 20 parallel sessions) thru all entities (no filter possible yet). First a check if the entity is what you need and then your code. Mapreduce is now in the standard library. 

(your kind) class Post(db.Model):
  name = db.StringProperty(default="")
  message = db.TextProperty(default="")
  time = db.DateTimeProperty(auto_now_add=True)


def lower_case_posts(entity):
  entity.message = entity.message.lower()
  yield db_op.Put(entity)


def upper_case_posts(entity):
  entity.message = entity.message.upper()
  yield db_op.Put(entity)

gr
wim

nick

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

nick

unread,
Dec 4, 2010, 7:50:15 AM12/4/10
to google-a...@googlegroups.com
many thanks!

i'll would read me throw this stuff

thanks!

nick

Wim den Ouden

unread,
Dec 4, 2010, 7:54:56 AM12/4/10
to google-a...@googlegroups.com
forgot to say,
Mapreduce can allso walks (amazing fast) thru zip and text files stored in blogs and the list is growing.
gr
wim

nick

unread,
Dec 4, 2010, 8:13:13 AM12/4/10
to google-a...@googlegroups.com
ive done the mapreduce with hadoop a year ago.
this is mapreduce thing has many potential

but its very hard for me to understand how to use it with the datastore ;-)

i that right?:
i setup some "processes" that split the table
and every process just parses his part
and then they join the results?

is it modular? could i write my app and if my normal queries are to slow i can add the mapreduce part? :-)

greets
nick

Wim den Ouden

unread,
Dec 4, 2010, 8:57:01 AM12/4/10
to google-a...@googlegroups.com
The splitting proces is mapreduce doing by parallel sessions for you.
There is a control module to start it from your code (normal from a dashboard), but i don't now yet if the overhead is taken to much time.
There are no docs yet.
In the local development server some libraries are not there, see http://code.google.com/p/relat/wiki/python25 at the end.
I'am also trying how far it get and where to use it.
gr
wim


greets
nick

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Nick Heppner

unread,
Dec 4, 2010, 9:00:27 AM12/4/10
to google-a...@googlegroups.com
what about:

have a host entry as a parent and his logs as his child?

i just need to write one at a time but i could read very fast every entry thats a child of my parent host. does that work?

nick

2010/12/4 Wim den Ouden <wden...@gmail.com>

Wim den Ouden

unread,
Dec 4, 2010, 9:06:53 AM12/4/10
to google-a...@googlegroups.com

nick

unread,
Dec 4, 2010, 9:25:13 AM12/4/10
to google-a...@googlegroups.com
no :-)

does that work for me?

Wim den Ouden

unread,
Dec 5, 2010, 8:46:02 AM12/5/10
to google-a...@googlegroups.com
Hi Nick,
Was playing (and learning) with mapreduce but i'm starting to try to do a kind of mapreduce myself, flexible, better integrated (http://code.google.com/p/relat/wiki/gaetips#Mapreduce), more possible maybe a bit slower.
gr
wim

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
Reply all
Reply to author
Forward
0 new messages