Selecting distinct entries based on max timestamp

2,279 views
Skip to first unread message

Paul Giralt

unread,
Nov 18, 2016, 2:08:05 PM11/18/16
to sqlalchemy
I'm having trouble figuring out how to accomplish this task using SQLAlchemy. Basically I have a table that maps a user's skill levels as follows: 

class Skillmap(db.Model):
    __tablename__ = 'skillmap'
    id = db.Column(db.UUID(), primary_key=True)
    user_id = db.Column(db.UUID(), db.ForeignKey('user.id'))
    skill_id = db.Column(db.UUID(), db.ForeignKey('skill.id'))
    level_id = db.Column(db.UUID(), db.ForeignKey('level.id'))
    timestamp = db.Column(db.DateTime, default=datetime.utcnow(), onupdate=datetime.utcnow())

A particular user will be associated with a variety of skills and those skills are at a certain level (say 1 - 10 for simplicity). 

So for example, I could insert into the DB an entry that says: 

user1   skill1    level 5    timestamp 1
user1   skill2    level 4    timestamp 2
user1   skill3    level 7    timestamp 3

then later on, the user updates their skill: 

user1   skill1    level 10    timestamp 4
user1   skill2    level 10    timestamp 5

When the user updates the skill, I don't update the DB record, Rather, I insert a new record into the DB so that I can store the history of the skill level over time. 

When I want to query for the current skill levels for a user (latest timestamp), I want to get back the latest entries for any given skill. In other words, for the above example I'd like to get the results: 

user1   skill1    level 10    timestamp 4
user1   skill2    level 10    timestamp 5
user1   skill3    level 7      timestamp 3

I've tried playing around with func.max() but I haven't been able to get it to do what I want. 

I found this article that describes a solution for a similar question but the solution is just SQL syntax: 


Is there an easy way to accomplish this using SQLAlchemy? Any help would be greatly appreciated. 



 

mike bayer

unread,
Nov 18, 2016, 5:16:38 PM11/18/16
to sqlal...@googlegroups.com
this style of query is common and a query of similar form is illustrated
here:

https://docs.sqlalchemy.org/en/latest/orm/tutorial.html#using-subqueries

the query in the SO answer would look like:

subq = s.query(
topten.home,
func.max(topten.datetime).label("maxdatetime")
).group_by(topten.home).subquery("groupedtt")

q = s.query(TT).join(
subq,
and_(
TT.home == subq.c.home, TT.datetime == subq.c.maxdatetime))




>
>
>
>
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google
> Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to sqlalchemy+...@googlegroups.com
> <mailto:sqlalchemy+...@googlegroups.com>.
> To post to this group, send email to sqlal...@googlegroups.com
> <mailto:sqlal...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

Jonathan Rogers

unread,
Nov 18, 2016, 5:17:59 PM11/18/16
to sqlalchemy

The top answer uses a JOIN to a subquery from the same table, which can be a bit tricky in SQLAlchemy. Start with "Using Subqueries" in the "Object Relational Tutorial". Whenever I need to use the same table independently in different parts of the same query, I define aliases to keep everything straight. Use the aliased() function on a mapped class.

http://docs.sqlalchemy.org/en/rel_1_0/orm/tutorial.html#using-subqueries
http://docs.sqlalchemy.org/en/rel_1_0/orm/query.html?highlight=aliased#sqlalchemy.orm.aliased

Though your table design will work, there is an easier to query design called a "temporal state table" in which there are two timestamps instead of one. The pair of timestamps define the period during which the row applies. Current rows will have no end timestamp. I use Postgres's range types to implement this design:

CREATE TABLE skillmap_temporal (
    valid_period daterange,
    user_id integer,
    skill_id integer,
    level_id integer
);

To get the current row for user 5:

SELECT * FROM skillmap_temporal WHERE user_id = 5 and upper_inf(valid_period);

I learned about this approach and various others from the excellent "Developing Time-Oriented Database Applications in SQL." Reading that book transformed my thinking use dates and timestamps. What we often need to model, as in your case, is a dimension of time independent of other column values. Because the book is out of print, the author has made it available to download free of cost: http://www.cs.arizona.edu/people/rts/tdbbook.pdf

Paul Giralt

unread,
Nov 21, 2016, 10:32:10 AM11/21/16
to sqlalchemy
Than you Mike and Jonathan for your replies. I will experiment with this and let you know how it works out. 

b...@sfi.ca

unread,
Nov 21, 2016, 1:06:46 PM11/21/16
to sqlalchemy
Thanks for the book reference Jonathan, it looks very useful!

Jonathan Vanasco

unread,
Nov 21, 2016, 4:33:43 PM11/21/16
to sqlalchemy
FWIW, I recently had a similar situation and ended using 2 tables -- one as a transaction log and the other with a constantly updated entry.   While your approach worked fine, the database performed significantly better with 2 tables because the (vast) majority of queries only needed the most recent element.
Reply all
Reply to author
Forward
0 new messages