Re: [web2py] Complex query with left outer join

700 views
Skip to first unread message

Alec Taylor

unread,
Jul 25, 2012, 12:28:45 PM7/25/12
to web...@googlegroups.com


On Thu, Jul 26, 2012 at 12:13 AM, jw <jens...@googlemail.com> wrote:
Hi,

my question is about the following query. I'm planning to program an e-learning software. For this I use three tables to select the cards to learn for today.
Table "cards" contains the question, "userCard" contains the progress for the special student (lastTimeLearned as a multiplicator for the calculating of the next time to learn), "subject" contains the name of the subject and groupCard contains the answer for the special group.

groupCards = db((db.groupCards.card_id == db.card.id) \
        & (db.groupCards.studyGroup_id == auth.user.studyGroup_id) \
        & (session.chosenSubject_id == db.card.subject_id)    
        & (db.userCard.stage <= 5) \
        & (db.userCard.lastTimeLearned < (datetime.date.today() - datetime.timedelta(days=(1))))
        ).select(db.userCard.id,db.card.id, left=db.userCard.on((db.userCard.card_id == db.card.id)))

I would like to select in this query all the cards for a specific student in a choosen subject, with the answers of his learnGroup (groupCard) and only the cards he doesn't learned today. I try to use the left outer join, because there are cards, the user learns them for the first time.
I tried many hours to find a correct query but now i feel desperate.

The other thing is the '1' in the timedelta. I get the exception, that it is not allowed to use a field in timedelta.

from datetime import date, timedelta

        & (db.userCard.lastTimeLearned < (date.today() - timedelta(days=1)))

Can somebody please help me?
P.S.: Sory for my english!

--
 
 
 

pbreit

unread,
Jul 25, 2012, 12:58:11 PM7/25/12
to web...@googlegroups.com
I don't really understand what you're trying to do but I would recommend trying to simplify. First, you don't always need to do a join or a left join. Try to figure out if those are really necessary. For example, do you really need to join the questions and the progress or can you retrieve those separately? Second, you're only selecting for two fields. That may be OK but I would suggest selecting for all fields (ie, delete "db.userCard.id,db.card.id," from select()). If you want to optimize *after* you get it working, then you can do so. Third, working with datetime and DB can be difficult. Try to get the query working before adding a timedate query component.

When I build a complex query, I start with one query, make it work, then add a query, make it work, and so on.

vinic...@gmail.com

unread,
Jul 25, 2012, 1:06:59 PM7/25/12
to web...@googlegroups.com
timedelta syntax is timedelta(days=1)

vinic...@gmail.com

unread,
Jul 25, 2012, 1:11:40 PM7/25/12
to web...@googlegroups.com
Indeed, pbreit.

Did you check if your SQL query is correct?
print db._lastsql would help, too.

Baby steps, baby. Baby steps.

--
Vinicius Assef
> --
>
>
>

jw

unread,
Jul 26, 2012, 11:23:51 AM7/26/12
to web...@googlegroups.com
Ok guys... I did step for step and now the query works for the stage (progress).
The second thing (stage is multiplicator for time) I have no glue how to move on.

cards2learn = db((db.groupCards.card_id == db.card.id) \

        & (db.groupCards.studyGroup_id == auth.user.studyGroup_id) \
        & (session.chosenSubject_id == db.card.subject_id) \
        & (db.userCard.card_id == db.card.id)
        & (db.userCard.stage <= 5) \
        & (db.userCard.lastTimeLearned < (date.today() - timedelta(days=(1))))
        ).select(db.userCard.ALL,db.card.ALL,db.groupCards.ALL, left=db.userCard.on((db.userCard.card_id == db.card.id)))

timedelta(days=(1)) should be timedelta(days=(db.userCard.stage)) but it says:

<type 'exceptions.TypeError'> unsupported type for timedelta days component: Field


How do I use the db.userCard.stage in timedelta (stage is an integer field in the table)?

Many thanks for the previous answers!

vinic...@gmail.com

unread,
Jul 26, 2012, 11:41:40 AM7/26/12
to web...@googlegroups.com
Some suggestions, but I've not tested any of them:

1) Use timedelta(days=int(db.userCard.stage)) to convert your Field to int.

2) Split your query in 2 parts. The 2nd should use find() or exclude().

3) Use db.executesql() to insert your SQL statement.


--
Vinicius Assef
> --
>
>
>

Niphlod

unread,
Jul 26, 2012, 11:53:31 AM7/26/12
to
Currently you'd have to code a view in the database (or simply a "string" condition to pass on) to achieve that.

date.today() - timedelta(something) is a python "fixed" value that doesn't get re-evaluated at runtime by the database substituting the "something" for each value of every row.

If you need the structure to be portable and absolutely need to have a single query fetching all of this, you need to change your db structure.

If you're going to run always on a "fixed" db engine you could write something exploiting the functionalities of that db engine.

Remember that most of the times is easier (and the resources negligible) to fetch rows from the db and filter them with python.

cards2learn = db((db.groupCards.card_id == db.card.id) \
       
& (db.groupCards.studyGroup_id == auth.user.studyGroup_id) \
       
& (session.chosenSubject_id == db.card.subject_id) \
       
& (db.userCard.card_id == db.card.id)
       
& (db.userCard.stage <= 5) \

       
#& (db.userCard.lastTimeLearned < (date.today() - timedelta(days=(1))))
       
).select(db.userCard.ALL,db.card.ALL,db.groupCards.ALL, left=db.userCard.on((db.userCard.card_id == db.card.id)))

cards2learn
.exclude(lambda row: row.usercard.lasttimelearned < request.now - timedelta(days=row.usercard.stage))





 should work. (totally untested)

jw

unread,
Jul 26, 2012, 11:54:02 AM7/26/12
to


On Thursday, July 26, 2012 5:41:40 PM UTC+2, viniciusban wrote:
Some suggestions, but I've not tested any of them:

1) Use timedelta(days=int(db.userCard.stage)) to convert your Field to int.

The typecast brings the following exception:

<type 'exceptions.TypeError'> int() argument must be a string or a number, not 'Field'

I think the problem is, that in this case it is not the particalur db.userCard.stage selected which correspond to itself in db.userCard.lastTimeLearned. Could that be?
 

2) Split your query in 2 parts. The 2nd should use find() or exclude().

I'll try that after reading the qeb2py manual.
 

3) Use db.executesql() to insert your SQL statement.

I haven't heard about this opportunity. I'll look it up

Thank you!
 

vinic...@gmail.com

unread,
Jul 26, 2012, 11:58:24 AM7/26/12
to web...@googlegroups.com
I think you're right, @Niphlod.

I didn't note this.

@jw, tell us how you solved it, ok?

--
Vinicius Assef



On 07/26/2012 12:52 PM, Niphlod wrote:
> Currently you'd have to code a view in the database (or simply a
> "string" condition to pass on) to achieve that.
>
> date.today() - timedelta(something) is a python "fixed" value that
> doesn't get re-evaluated at runtime by the database substituting the
> "something" for each value of every row.
>
> If you need the structure to be portable and absolutely need to have a
> single query fetching all of this, you need to change your db structure.
>
> If you're going to run always on a "fixed" db engine you could write
> something exploiting the functionalities of that db engine.
>
> Remember that most of the times is easier (and the resources negligible)
> to fetch rows from the db and filter them with python.
>
> |
> cards2learn =db((db.groupCards.card_id ==db.card.id <http://db.card.id>)\
> &(db.groupCards.studyGroup_id ==auth.user.studyGroup_id)\
> &(session.chosenSubject_id ==db.card.subject_id)\
> &(db.userCard.card_id ==db.card.id <http://db.card.id>)
> &(db.userCard.stage <=5)\
> #& (db.userCard.lastTimeLearned < (date.today() - timedelta(days=(1))))
> ).select(db.userCard.ALL,db.
>
> card.ALL,db.groupCards.ALL,left=db.userCard.on((db.userCard.card_id
> ==db.card.id <http://db.card.id>)))
>
> cards2learn.exclude(lambdarow:row.usercard.lasttimelearned
> <request.now -timedelta(days=row.usercard.stage))
>
> |
>
>
>
>
>
> should work. (totally untested)
>
> On Thursday, July 26, 2012 5:23:51 PM UTC+2, jw wrote:
>
> Ok guys... I did step for step and now the query works for the stage
> (progress).
> The second thing (stage is multiplicator for time) I have no glue
> how to move on.
>
> cards2learn = db((db.groupCards.card_id == db.card.id
> <http://db.card.id>) \
> & (db.groupCards.studyGroup_id == auth.user.studyGroup_id) \
> & (session.chosenSubject_id == db.card.subject_id) \
> & (db.userCard.card_id == db.card.id <http://db.card.id>)
> & (db.userCard.stage <= 5) \
> & (db.userCard.lastTimeLearned < (date.today() -
> timedelta(days=(1))))
> ).select(db.userCard.ALL,db.card.ALL,db.groupCards.ALL,
> left=db.userCard.on((db.userCard.card_id == db.card.id
> <http://db.card.id>)))
>
> timedelta(days=(1)) should be timedelta(days=(db.userCard.stage))
> but it says:
>
>
> <type 'exceptions.TypeError'> unsupported type for timedelta
> days component: Field
>
>
> How do I use the db.userCard.stage in timedelta (stage is an integer
> field in the table)?
>
> Many thanks for the previous answers!
>
> --
>
>
>

jw

unread,
Jul 26, 2012, 12:03:24 PM7/26/12
to web...@googlegroups.com
cards2learn.exclude(lambda row: row.userCard.lastTimeLearned > date.today() - timedelta(days=row.userCard.stage))  

Niphlod, you made my day!! Thank you so much! And thank you all for the advices!



On Thursday, July 26, 2012 5:52:29 PM UTC+2, Niphlod wrote:
Currently you'd have to code a view in the database (or simply a "string" condition to pass on) to achieve that.

date.today() - timedelta(something) is a python "fixed" value that doesn't get re-evaluated at runtime by the database substituting the "something" for each value of every row.

If you need the structure to be portable and absolutely need to have a single query fetching all of this, you need to change your db structure.

If you're going to run always on a "fixed" db engine you could write something exploiting the functionalities of that db engine.

Remember that most of the times is easier (and the resources negligible) to fetch rows from the db and filter them with python.

cards2learn = db((db.groupCards.card_id == db.card.id) \
       
& (db.groupCards.studyGroup_id == auth.user.studyGroup_id) \
       
& (session.chosenSubject_id == db.card.subject_id) \
       
& (db.userCard.card_id == db.card.id)
       
& (db.userCard.stage <= 5) \

       
#& (db.userCard.lastTimeLearned < (date.today() - timedelta(days=(1))))
       
).select(db.userCard.ALL,db.card.ALL,db.groupCards.ALL, left=db.userCard.on((db.userCard.card_id == db.card.id)))

cards2learn
.exclude(lambda row: row.usercard.lasttimelearned < request.now - timedelta(days=row.usercard.stage))





 should work. (totally untested)

On Thursday, July 26, 2012 5:23:51 PM UTC+2, jw wrote:

Niphlod

unread,
Jul 26, 2012, 2:57:10 PM7/26/12
to web...@googlegroups.com
no probl.
To elaborate this further for others having the same issue....

People (programmers, often) tend to forget that db engines are programs. They have their own algorithms too (and they take time).
It's true that they have pretty rock-solid algos and do their work very fast, but having python filtering rows objects sometimes is even faster than specifying some super-duper "where" filters in the query itself.
When the "super-duper" where is going to cut out the 90% of 10000 rows returned it can be safer to look into the database algos, but if you're going to fetch a (relatively small) set and cut it out (with filter() or search()) by the 20%, it's generally a good advice to do that in plain python.
As a general "rule of thumb", condition based on a function like yours (the db should have been forced to calculate the date whose difference is now - the number of days in another column, always variable) is the best scenario where this kind of things is high valuable, i.e. probably faster in python than the db (assuming that you're able to write that in plain T-SQL).

I actually had this kind of problem and (I'm a professional db developer and not a python programmer) tried to write a super-optimized query (was a scoring system based on pretty complicated combined averages).

When I implemented the same logic in plain python, other than being far more readable and maintainable, I cut off execution times from ~25 min to ~47 sec.
Reply all
Reply to author
Forward
0 new messages