Getting only the latest "version" in a query of items with versions.

57 Aufrufe
Direkt zur ersten ungelesenen Nachricht

Encompass solutions

ungelesen,
22.08.2016, 03:52:5522.08.16
an web2py-users
Consider the following pseudo model.

item
 ->name = "string"

version
 ->item_id =  item.id
 ->version_date = "datetime"


While I can easily create a collection of the item with it's versions.
all_items = db((db.item.id > 0) & (db.version.item_id == db.item.id)).select(orderby=db.item.name | db.version.version_date)

How do get just all items with just the latest version of each item without having to do this....
items = []
current_id = all_items.first().item.id 
for thing in all_items:
    if thing.item.id != current_id:
        current_id = thing.item.id
        items.append(thing)

It seems a bit silly and heavy to be doing this especially since my data could get quite large.  I imaging the database has some way to do this, just never learned how.

Ideas on how this could be done?

BR,
Jason Brower

Niphlod

ungelesen,
22.08.2016, 07:45:4422.08.16
an web2py-users
max(version_date) ..... group by item.id 

Encompass solutions

ungelesen,
23.08.2016, 05:05:5223.08.16
an web2py-users
This document doesn't mention your method or using max()

http://web2py.com/books/default/chapter/29/06/the-database-abstraction-layer#sum-avg-min-max-and-len

Or I don't understand how you would do it.
Could you provide greater detail on how to build that query?
BR,
Jason

Niphlod

ungelesen,
23.08.2016, 05:15:1323.08.16
an web2py-users
what you want is the latest version for each item_id . That is the row having the greatest version_date if you divide your dataset for each item_id.

that is what groupby item_id does..... and what max(version_date) does too.

Encompass solutions

ungelesen,
25.08.2016, 01:29:0925.08.16
an web2py-users
Does this seem sensible?  It seems to work with my initial tests.

latest_versions = db(  (db.item.id == db.item_version.artifact_id) &
                            (db.item_version.id > 0)
                        ).select(db.item.ALL,db.item_version.version_date.max(), groupby=db.item.id)

the .max() feature, at least what I found, was totally undocumented.
We should have an example database as part of the documentation with a collection of examples around it so we can all relate better. :/

On Monday, August 22, 2016 at 10:52:55 AM UTC+3, Encompass solutions wrote:

Dave S

ungelesen,
25.08.2016, 01:36:5825.08.16
an web2py-users


On Wednesday, August 24, 2016 at 10:29:09 PM UTC-7, Encompass solutions wrote:
Does this seem sensible?  It seems to work with my initial tests.

latest_versions = db(  (db.item.id == db.item_version.artifact_id) &
                            (db.item_version.id > 0)
                        ).select(db.item.ALL,db.item_version.version_date.max(), groupby=db.item.id)

the .max() feature, at least what I found, was totally undocumented.
We should have an example database as part of the documentation with a collection of examples around it so we can all relate better. :/


Examples in book, using it on severity of logged events.

/dps
 

Encompass solutions

ungelesen,
25.08.2016, 01:41:5325.08.16
an web2py-users
Grr, And now I can't get the db.item_version.ALL without Postgresql panicking about not having the item in the group buy.  How do I get the fields in the result set?  Do I need to place it in as a belongs or something?

latest_versions = db(  (db.item.id == db.item_version.artifact_id) &
                            (db.item_version.id > 0)
                        ).select(db.item.ALL, db.item_version.ALL, db.item_version.version_date.max(), groupby=db.item.id)

Encompass solutions

ungelesen,
25.08.2016, 01:52:2325.08.16
an web2py-users
I suppose your right, but I was a little thrown, by the:
max(variable_here)
That was mentioned was not the solution at all, I kept looking for ways to use max as a function.
My issue now, is that the group by doesn't like me getting all the tables I want in the return.
BR,
Jason Brower

Niphlod

ungelesen,
25.08.2016, 15:31:4625.08.16
an web2py-users
you're thinking with a human mind instead of thinking on how a database works... with sets of data.

How can you ask a database to return a single set grouped by something and at the same time as for granular records??? 
When you use groupby, you can just ask for granular records of the columns you grouped for AND aggregates on every other (min, max, count, sum, avg, count distinct, etc).

id--name--variety
1--apple--fruit
2--pear--fruit
3--salad--vegetable
4--spinach--vegetable
5--spinach--vegetable

Let's answer "Please, tell me how many varieties I hold"... 
Two. You can have 2 lines back if you group by variety. No more, no less.
"Ok, database, tell me how many of each variety I have"
Always two lines, no more, no less. In addition to variety, you ask for the count of each.

count--variety
2--fruit
3--vegetable

"Ok, database, tell me how many names there are for each variety"
Always two lines. In addition, you ask for the count distinct of fruits

count distinct--variety
2--fruit
2--vegetable

In set theory, THERE'S no way to ask for the id of the original row if you're grouping by something.

However, you can nest queries (or use windowing functions, that ATM aren't in pydal) to first ask the database to group, and then select a record that matches the grouped property plus the aggregate you chose, like "the row in the group that has that variety and has the last id (which is max())".

the first set would be
max(id)--variety
2--fruit
5--vegetable

and the second a join to the original, returning 

2--pear--fruit
5--spinach--vegetable

sooooo.... you can have the latest version_date for each item_id (which correspond to max(version_date) for each group), but you can't ask for anything more which isn't an aggregate, like the item name, in a single shot. At least without resorting to more complicated queries.
Allen antworten
Antwort an Autor
Weiterleiten
0 neue Nachrichten