efficient DB queries

100 views
Skip to first unread message

Udi Milo

unread,
Mar 21, 2012, 5:08:40 PM3/21/12
to web...@googlegroups.com
Hi,

I have two db tables:

board (name, created_on)
article(board, name, title)

currently, in my html I do a naive loop {{for article in articles}} {{=article.board.name}} {{pass}}

I would like to change it and do something like:
articles = db.select.all.......
board_ids = set(map(lambda a: a.board_id, articles))
boards = db.selected all of the id.

my problem is:
if I do: board_ids = set(map(lambda a: a.board.id, articles))
it goes to fetch the board every time, to get the id.
and if I don't I have a reference and not an Id...

can someone explain how to do it efficiently in python?


UDi.



Anthony

unread,
Mar 21, 2012, 5:22:50 PM3/21/12
to web...@googlegroups.com
I have two db tables:

board (name, created_on)
article(board, name, title)

currently, in my html I do a naive loop {{for article in articles}} {{=article.board.name}} {{pass}}

I would like to change it and do something like:
articles = db.select.all.......
board_ids = set(map(lambda a: a.board_id, articles))

Instead of a.board_id, do you mean a.board (I don't see a "board_id" field listed in your article table definition)? Also, is db.article.board a reference field to the db.board table? In that case, that means it is already storing the id of the referenced record in the db.board table (that's what reference fields store), so you don't need to retrieve it separately.

Anyway, if you need the db.board.name value for all the records you are selecting from the db.article table, you should probably just do a join so you can get everything in a single query -- see http://web2py.com/books/default/chapter/29/6#Inner-joins.
 
Anthony

Udi Milo

unread,
Mar 21, 2012, 7:30:58 PM3/21/12
to web...@googlegroups.com
Anthony,

Thanks for answering so quickly.
I did mean board and not board_id

My question is very basic.
I know that the article table has an Id column that is really the board reference, but when I look at the result of the query and unification of:
board_ids = set(map(lambda a: a.board, articles))

I get a set of references,
how do I use that set in another query to get all the boards?
also, just to be sure, running this lambda expression does not hit the db to fetch the boards, right?

I can't use join b/c I'm running on GAE.

Anthony

unread,
Mar 21, 2012, 8:30:54 PM3/21/12
to web...@googlegroups.com
I didn't realize you were on GAE. I'm not quite sure how GAE handles this, so perhaps someone with more GAE experience can chime in. In an RDBMS, you should be able to do:

board_ids = set([a.board for a in articles])
boards = db(db.boards.id.belongs(board_ids)).select()

But not sure if that works on GAE.

Anthony

Udi Milo

unread,
Mar 21, 2012, 9:26:44 PM3/21/12
to web...@googlegroups.com
It doesn't.
I found this: 
""""
GAE does not support belongs and does not support OR. You have to do: 

rows = db(db.media_type.name=='paper').select()&db 
(db.media_type.name=='cd').select() 

The & is done at the web2py level but since records are exclusive and 
you are not sorting them, there is no major slowdown. 
"""""

Since I'm a python newbie, how would you go about building the loop that creates a very long query? you need to do some kind of python eval tricks here that I don't know.
I'm guessing its going to look like

for id in board_ids:
   query = query + new query(id)

but how do you do it in python?

Anthony

unread,
Mar 21, 2012, 11:52:33 PM3/21/12
to web...@googlegroups.com
I think that is outdated -- according to this (https://groups.google.com/d/msg/web2py/vWqOET74qg4/92DLUFTUsN0J), GAE does now support belongs but is limited to 30 items per query (so you have to break it up). Perhaps there is a better way, though.

Anthony

Udi Milo

unread,
Mar 22, 2012, 1:05:46 AM3/22/12
to web...@googlegroups.com
I went with:

    query = ''
    for id in board_ids:
        query += 'db(db.board.id==%d).select()|' %id
    query = query[:-1]
    boards = eval(query)

I have no idea how to evaluate performance of that query though....
if you have any other ideas (I cannot limit 30 right now without rethinking other things), please share them.

Wikus van de Merwe

unread,
Mar 22, 2012, 2:25:36 PM3/22/12
to web...@googlegroups.com
This is not the best way. It will run a separate query for each board id. On GAE you want to do
as much as you can in a single query. So the best way is to construct a list of GAE keys and
get all entities together. Unfortunately, AFAIK, there is no support for GAE batch queries in DAL.
You can, however, try to refer to the GAE datastore directly.

from google.appengine.ext import db as gdb

articles = db().select(db.articles.ALL)
keys = [gdb.Key.from_path("boards", a.board_id) for a in articles]
boards = gdb.get(keys)

But I'm afraid this is not an elegant solution as it probably won't work without a gdb.Model defined
for the board entity, and that would go against DRY [1]. Anyway, if you want to do it, see the GAE
docs [2] for details on model definition.

Depending on how often are you going to perform this query a better solution might be redesigning
your database. Forget about 3NF and add the board attributes to each article entity so that you
can fetch everything together in a single query.

[1] https://en.wikipedia.org/wiki/Don%27t_repeat_yourself
[2] http://code.google.com/appengine/docs/python/datastore/entities.html

howesc

unread,
Mar 22, 2012, 7:32:28 PM3/22/12
to web...@googlegroups.com
i use:

ids = [...generate your list of ids...]

rows = db(db.board.id.belongs(ids[0:30]).select()
for i in range(30,len(ids), 30):
  rows & db(db.board.id.belongs(ids[i:i+30]).select()

so that gets me my "in" statement working with GAE's 30 items per in limit.

Anthony

unread,
Mar 22, 2012, 8:45:44 PM3/22/12
to web...@googlegroups.com
i use:

ids = [...generate your list of ids...]

rows = db(db.board.id.belongs(ids[0:​30]).select()
for i in range(30,len(ids), 30):
  rows & db(db.board.id.belongs(ids[i:​i+30]).select()

so that gets me my "in" statement working with GAE's 30 items per in limit.

According to the GAE docs, under the hood that still results in a separate datastore query for every id: http://code.google.com/appengine/docs/python/datastore/gqlreference.html.

Anthony
Reply all
Reply to author
Forward
0 new messages