(More complex) Aggregations in web2py and using SQLFORM.grid (DAL vs. SQL vs. pandas vs. ???)

422 views
Skip to first unread message

ste...@gmail.com

unread,
May 20, 2016, 2:34:08 PM5/20/16
to web2py-users
Hello,

I've got a problem and don't know if it's (easily) solvable in web2py.
There're two tables, one for properties and one for their valuations (one property can have zero to multiple valuations):

    db.define_table('property',
                    Field('property_id', 'id'),
                    Field('name', 'string'),
                    Field('purchase_price', 'float'))

    db.define_table('property_valuation',
                    Field('property_id', 'reference property'),
                    Field('valuation_date', 'date'),
                    Field('amount', 'float'))

Now I want as a result a table with one record for each property with the purchase price and the last valuation amount (and date). In SQL (e.g. SQLite) it would be

    with t1 as (
        -- get last valuation for each property (id and date)
        select property_id, max(valuation_date) as last_valuation_date
        from property_valuation
        group by property_id
    ), t2 as (
        -- get last valuation record for each property
        select pv.*
        from property_valuation pv
        inner join t1
            on t1.property_id = pv.property_id
            and t1.last_valuation_date = pv.valuation_date
    )
    select
        p.name
        ,p.purchase_price
        ,t2.valuation_date
        ,t2.amount
    from property p
    left join t2
        on t2.property_id = p.property_id

How is this best done in web2py? I'm struggling with many similar problems where the analysis are a little bit more difficult than just a simple count etc. I have no problems formulating the solution with e.g. SQL but want to use the SQLFORM.grid function on the other side for representing the results to have the same "appearance" for end user. So I'm thinking about to create views in the database and than create table definitions with "migrate=False". What do you think about that approach?
Or should I leave the web2py DAL as soon as I've more complex analytics, do them in SQL directly or e.g. pandas and use something like Javascript DataTable for representing results? (a feature for downloading results as csv would also be nice)

stex

villas

unread,
May 20, 2016, 4:11:24 PM5/20/16
to web2py-users
Your sql looks a little over-complicated...
Anyhow,  see grouping and counting in the book.
If you're interested in creating a rows object from raw sql,  take a look at this idea.
Best wishes.

ste...@gmail.com

unread,
May 24, 2016, 10:54:37 AM5/24/16
to web2py-users

Hi villas,

thanks for your comment and the links (especially the second one).
Since I want to learn if you can show me a better solution in SQL I would highly appreciate it. Normally I prefer do write my SQL "stepwise" so I don't have to read inside out. Here's a small SQLite setup:

-- setup
create table prop (id int, name text, price int);
insert into prop values (1, 'aaa', 10), (2, 'bbb', 15), (3, 'ccc', 20);

create table prop_val (id int, val_date text, amount int);
insert into prop_val values
(1, '2016-01-01', 12),
(1, '2016-01-10', 15),
(2, '2016-01-01', 16),
(2, '2016-01-10', 17),
(3, '2016-01-01', 21),
(3, '2016-01-10', 22);

-- my query

with t1 as (
    -- get last valuation for each property (id and date)
    select id, max(val_date) as last_val_date
    from prop_val
    group by id

), t2 as (
    -- get last valuation record for each property
    select pv.*
    from prop_val pv
    inner join t1
        on t1.id = pv.id
        and t1.last_val_date = pv.val_date
)
select
    p.name
    ,p.price
    ,t2.val_date
    ,t2.amount
from prop p
left join t2
    on t2.id = p.id

-- (expected) result
name    price   val_date    amount
aaa     10      2016-01-10  15
bbb     15      2016-01-10  17
ccc     20      2016-01-10  22



But still regarding web2py: I've already read the grouping etc. documentation but still find it hard. Often one has to aggregate data and further aggregation is based on that aggregated data (therefore the small example above where one first has to find the max or last date). In SQL that's easy, one just can work with common table expressions; same in R or pandas, where one creates data frames on the fly. But in web2py I cannot "aggregate"/group a table and join the result back to another table, am I wrong? How are web2py professionals are solving that? Iterating through resulting Rows objects and comparing/aggregating them in new data structures?

Thanks and best regards,
stex


villas

unread,
May 24, 2016, 6:26:03 PM5/24/16
to web...@googlegroups.com
So why can't you do something like this...

db.define_table('prop',

                Field('name','string'),
                Field('price','integer'),
               )
if not db(db.prop.id>0).count():
    db.prop.insert(name="aaa", price=10)
    db.prop.insert(name="bbb", price=15)
    db.prop.insert(name="ccc", price=20)

db.define_table('prop_val',
                Field('prop_id','reference prop'),
                Field('val_date',length=8),
                Field('amount','integer'),
               )
if not db(db.prop_val.id>0).count():
    db.prop_val.insert(prop_id=1, val_date="2016-01-01", amount=1)
    db.prop_val.insert(prop_id=1, val_date="2016-01-15", amount=15)
    db.prop_val.insert(prop_id=2, val_date="2016-01-01", amount=10)
    db.prop_val.insert(prop_id=2, val_date="2016-01-20", amount=6)
    db.prop_val.insert(prop_id=3, val_date="2016-01-01", amount=29)
    db.prop_val.insert(prop_id=3, val_date="2016-01-25", amount=25)

maxdate = db.prop_val.val_date.max()
maxval = db.prop_val.amount.max()
rows = db(db.prop.id == db.prop_val.prop_id).select(db.prop.name,maxdate,maxval,groupby=db.prop.name)
for r in rows: print r

SQL would be:
SELECT  prop.name, MAX(prop_val.val_date), MAX(prop_val.amount) FROM prop_val, prop WHERE (prop.id = prop_val.prop_id) GROUP BY prop.name

ste...@gmail.com

unread,
May 25, 2016, 5:32:02 AM5/25/16
to web2py-users
Hi villas,

thanks for your example but that delivers the wrong result. With your query we get

name     date         amount
aaa      2016-01-15   15
bbb      2016-01-20   10
ccc      2016-01-25   29


The result with your data should be (perhaps with additional price column)

name     date         amount   (price)
aaa      2016-01-15   15        10
bbb      2016-01-20   6         15
ccc      2016-01-25   25        20


I want the last/max val_date for each prop_id and the amount of this last val_date; in your query you just select the max val_date and the max amount.
With

    maxdate = db.prop_val.val_date.max()
    rows1 = db().select(db.prop_val.prop_id, maxdate, groupby=db.prop_val.prop_id)


I get the last date for each prop_id; but after that how do I get the amount for that prop_id/date and combine this result with the name and price for each prop (in the most simple way)?

stex

ste...@gmail.com

unread,
May 25, 2016, 7:53:08 AM5/25/16
to web2py-users
Ok, with the DAL I solved it in the following way and did the join "manually":

    # get last val_date for each prop
    maxdate = db.prop_val.val_date.max().with_alias('max_val_date')
    rows1 = db().select(db.prop_val.prop_id.with_alias('prop_id'), maxdate, groupby=db.prop_val.prop_id)
    # get name, price, last val_date and amount for each prop
    result = []
    for row in rows1:
        query = (db.prop_val.val_date == row.max_val_date) & \
                (db.prop_val.prop_id == row.prop_id) & \
                (db.prop_val.prop_id == db.prop.id)
        rows = db(query).select(db.prop.name.with_alias('name'),
                                db.prop.price.with_alias('price'),
                                db.prop_val.val_date.with_alias('val_date'),
                                db.prop_val.amount.with_alias('amount'))
        result.extend(rows)


But I think this method will become cumbersome if the analysis is a little bit more complicated and more aggregates have to be joined together.

stex

villas

unread,
May 25, 2016, 6:21:56 PM5/25/16
to web2py-users
Hi Stex

Hmm, again your solution seems very complex. 

Luckily,  you confirmed the results you expect so I propose the following solution would be simpler?

maxdate = db.prop_val.val_date.max()
rows = db(db.prop.id == db.prop_val.prop_id).select(
db.prop.name,maxdate,db.prop_val.amount,db.prop.price,groupby=db.prop.name)


Result:
prop.nameMAX(prop_val.val_date)prop_val.amountprop.price

ste...@gmail.com

unread,
May 26, 2016, 3:11:27 PM5/26/16
to web2py-users
Hi villas,

perhaps my "problem" is: if you group/aggregate in SQL, you are allowed only to put attributes in the select-clause which are either grouped (i.e. which also appear in the group-clause) or which are aggregated (e.g. with sum, max, min, avg) at least as far as my knowledge goes. The following is allowed/valid, because it is grouped by a and b, c and d are aggregated.

    SELECT a, b, sum(c), sum(d)
    FROM mytable
    GROUP BY a, b


The following isn't allowed, because e is not aggregated nor grouped.

    SELECT a, b, sum(c), sum(d), e
    FROM mytable
    GROUP BY a, b


The point is: the second query is e.g. allowed by SQLite - if you do the same query in e.g. MS SQL Server, you'll get an error message because e has to be either aggregated with an function or it has to be in the group-by-clause.

Your web2py-code/-example generates the following SQL statement

    SELECT  prop.name, MAX(prop_val.val_date), prop_val.amount, prop.price
    FROM prop_val, prop
    WHERE (prop.id = prop_val.prop_id)
    GROUP BY prop.name;

which has the above described "problem"; it will be accepted by SQLite but e.g. not by SQL Server because prop_val.amount and prop.price aren't in the group-by-clause nor are they aggregated. Perhaps web2py/SQLite delivers the right result by random or SQLite has another algorithm in the background and is more intelligent than MS SQL Server :-), I don't know.

Thanks for your effort and example!

stex

villas

unread,
May 26, 2016, 5:13:36 PM5/26/16
to web2py-users
I think I understand and you are right,  DBs normally require the fielded listed in the GROUP BY and ORDER BY too. 

However,  the DAL enables you to build such queries,  so I'm not sure what the problem is.  However,  I hope you have found a solution now.

Best wishes.
Reply all
Reply to author
Forward
0 new messages