Query object modification

58 views
Skip to first unread message

Luis Del Rio IV

unread,
Apr 12, 2023, 5:21:22 PM4/12/23
to sqlalchemy
I am currently using the following sqlalchemy code,

        _query = super().get_query(model, info, sort, **args)
        query = _query.group_by(
            func.date_format(DataModel.timestamp, "%Y-%m-%d %H")
        )
        return query

I am trying to aggregate the the max value of a field from DataModel.value utilizing the group by clause.

In simple sql, you would do the following.

SELECT  max(data.value)  AS data_value
from data
GROUP BY date_format(data.timestamp, "%Y-%m-%d %H")

What would the proper way to express this? The various methods I have tried somehow overwrite the original query and do not map to our attributes.

I have some additional context on the issue
  1. We are using the graphene-sqlalchemy package.
  2. When you do a GraphQL web api query, the package creates a sqlalchemy.orm.query.Query object.
  3. We want to modify this standard query that the package creates so that we can do the group_by action to help us get the maximum time series value for every hour because this is not possible using GraphQL.
  4. Talking to the graphene-sqlalchemy team they told us the best place to do the modification to the standardized query is in the get_query method (line 67) in this file: https://github.com/graphql-python/graphene-sqlalchemy/blob/master/graphene_sqlalchemy/fields.py
  5. The standardized query we need to modify translates to this SQL statement: 
SELECT sy_data.oid_id, sy_data.rrd_timestamp, sy_data.cabinet_id, sy_data
.customer_id, sy_data.value, sy_data.active_flag, sy_data.timestamp
FROM sy_data
WHERE sy_data.rrd_timestamp >= %(rrd_timestamp_1)s AND sy_data.rrd_timestamp <= %(rrd_timestamp_2)s AND (sy_data.oid_id = %(oid_id_1)s OR sy_data.oid_id = %(o
id_id_2)s) ORDER BY sy_data.oid_id ASC, sy_data.rrd_timestamp ASC

Therefore we need to find a way to insert a func.max for the values in the first part of the SELECT statement, (before the FROM). It is easy for us to apend the group_by like this.

query.group_by(
func.date_format(DataModel.timestamp, "%Y-%m-%d %H")
)

The big issue for us is to figure out how to insert the func.max

Getting a solution to this will help the graphene-sqlalchemy team create better documentation.

Mike Bayer

unread,
Apr 12, 2023, 6:12:46 PM4/12/23
to noreply-spamdigest via sqlalchemy


On Wed, Apr 12, 2023, at 5:21 PM, Luis Del Rio IV wrote:
I am currently using the following sqlalchemy code,

        _query = super().get_query(model, info, sort, **args)
        query = _query.group_by(
            func.date_format(DataModel.timestamp, "%Y-%m-%d %H")
        )
        return query

I am trying to aggregate the the max value of a field from DataModel.value utilizing the group by clause.

In simple sql, you would do the following.

SELECT  max(data.value)  AS data_value
from data
GROUP BY date_format(data.timestamp, "%Y-%m-%d %H")

What would the proper way to express this? The various methods I have tried somehow overwrite the original query and do not map to our attributes.

using legacy query style:

q1 = session.query(func.max(Data.value)).group_by(
    func.date_format(Data.timestamp, "%Y-%m-%d %H")
)


using 2.0 style select(), replace "session.query" with "select":

s1 = select(func.max(Data.value)).group_by(
    func.date_format(Data.timestamp, "%Y-%m-%d %H")
)

POC script is at the bottom of this email.



I have some additional context on the issue
  1. We are using the graphene-sqlalchemy package.
  2. When you do a GraphQL web api query, the package creates a sqlalchemy.orm.query.Query object.
  3. We want to modify this standard query that the package creates so that we can do the group_by action to help us get the maximum time series value for every hour because this is not possible using GraphQL.
  4. Talking to the graphene-sqlalchemy team they told us the best place to do the modification to the standardized query is in the get_query method (line 67) in this file: https://github.com/graphql-python/graphene-sqlalchemy/blob/master/graphene_sqlalchemy/fields.py
  5. The standardized query we need to modify translates to this SQL statement: 
SELECT sy_data.oid_id, sy_data.rrd_timestamp, sy_data.cabinet_id, sy_data
.customer_id, sy_data.value, sy_data.active_flag, sy_data.timestamp
FROM sy_data
WHERE sy_data.rrd_timestamp >= %(rrd_timestamp_1)s AND sy_data.rrd_timestamp <= %(rrd_timestamp_2)s AND (sy_data.oid_id = %(oid_id_1)s OR sy_data.oid_id = %(o
id_id_2)s) ORDER BY sy_data.oid_id ASC, sy_data.rrd_timestamp ASC

Therefore we need to find a way to insert a func.max for the values in the first part of the SELECT statement, (before the FROM). It is easy for us to apend the group_by like this.

if you have a Query which renders the above SQL, you can add group_by() to it in place.   but if these queries are being translated into GraphQL, and GraphQL does not have any concept of GROUP BY, then it wont work, what you want to do would not be possible unless a GraphQL query exists that does what you need.


query.group_by(
func.date_format(DataModel.timestamp, "%Y-%m-%d %H")
)

The big issue for us is to figure out how to insert the func.max



from sqlalchemy import Column
from sqlalchemy import DateTime
from sqlalchemy import func
from sqlalchemy import Integer
from sqlalchemy import select
from sqlalchemy.orm import declarative_base
from sqlalchemy.orm import Session

Base = declarative_base()


class Data(Base):
    __tablename__ = "data"

    id = Column(Integer, primary_key=True)
    value = Column(Integer)
    timestamp = Column(DateTime)


s = Session()

q1 = s.query(func.max(Data.value)).group_by(
    func.date_format(Data.timestamp, "%Y-%m-%d %H")
)

print(q1)

s1 = select(func.max(Data.value)).group_by(
    func.date_format(Data.timestamp, "%Y-%m-%d %H")
)

print(s1)

Peter Harrison

unread,
Apr 13, 2023, 1:13:44 PM4/13/23
to sqlalchemy
Thanks Mike,

Ideally we'd prefer to find a solution via Graphene-SQLAlchemy. Unfortunately we don't have the luxury of creating our own query when interacting with Graphene-SQLAlchemy.

So the key question for us is, can you modify an existing sqlalchemy.orm.query.Query object to insert a "func.max(Data.value)" object into the select? This is what Graphene-SQLAlchemy is giving us.
If this is possible, then the group_by part is easy. We have tested that frequently, the hard part is the modifying the original "select" object.

1. We have tried add_columns, but that adds in incompatible object type in the GraphQL results making it an unusable option.
2. We thought that modifying the select would be  possible using data with "statement.froms" but can't figure out how to update the MetaData object in it

If modifying the "select" after its creation is not possible, we need to start considering using a separate reporting table with hourly data.

Luis Del Rio IV

unread,
Apr 13, 2023, 4:15:00 PM4/13/23
to sqlalchemy
Mike,

Would this select query be able to get our aggregated data?

        query = select(func.max(DataModel.value)).select_from(_query.subquery()).group_by(
            func.date_format(DataModel.timestamp, "%Y-%m-%d %H")
        )

We tried this route but are now getting this error

  "errors": [
    {
      "message": "'Select' object has no attribute 'statement'",
      "locations": [
        {
          "line": 2,
          "column": 3
        }
      ],

Mike Bayer

unread,
Apr 13, 2023, 5:56:15 PM4/13/23
to noreply-spamdigest via sqlalchemy


On Thu, Apr 13, 2023, at 1:13 PM, Peter Harrison wrote:
Thanks Mike,

Ideally we'd prefer to find a solution via Graphene-SQLAlchemy. Unfortunately we don't have the luxury of creating our own query when interacting with Graphene-SQLAlchemy.

So the key question for us is, can you modify an existing sqlalchemy.orm.query.Query object to insert a "func.max(Data.value)" object into the select?






This is what Graphene-SQLAlchemy is giving us.
If this is possible, then the group_by part is easy. We have tested that frequently, the hard part is the modifying the original "select" object.

1. We have tried add_columns, but that adds in incompatible object type in the GraphQL results making it an unusable option.

I dont know what that means... wouldn't this be an issue for graphene-SQLAlchemy to fix?  this is a basic Query method.  if you can query individual columns to GraphQL, it should work.  Then as stated before, if whatever graphene-sqlalchemy does is not compatible with individual columns, that would have to be resolved on that end.


2. We thought that modifying the select would be  possible using data with "statement.froms" but can't figure out how to update the MetaData object in it

again this is very vague and doesn't make much sense.   if you want to add a fromclause to a select() or a query, there's an add_froms() method.   This has nothing to do with updating MetaData objects, which are not part of the "froms" of a select and are only a collection that Table objects belong towards, and they have nothing to do with how SELECT statements are rendered.


If modifying the "select" after its creation is not possible, we need to start considering using a separate reporting table with hourly data.

yes unfortunately I know nothing about graphene-sqlalchemy or graphql and these sound like details that are deeply embedded in a larger system, the people who created this extension would have the best chance of helping with new integrations.  

Mike Bayer

unread,
Apr 13, 2023, 5:58:20 PM4/13/23
to noreply-spamdigest via sqlalchemy


On Thu, Apr 13, 2023, at 4:14 PM, Luis Del Rio IV wrote:
Mike,

Would this select query be able to get our aggregated data?

        query = select(func.max(DataModel.value)).select_from(_query.subquery()).group_by(
            func.date_format(DataModel.timestamp, "%Y-%m-%d %H")
        )

We tried this route but are now getting this error

  "errors": [
    {
      "message": "'Select' object has no attribute 'statement'",
      "locations": [
        {
          "line": 2,
          "column": 3
        }
      ],

While there is no stack trace that would illustrate the line of code that is actually making this mistake and this is not illustrated in the line of code indicated above, it looks like you are attempting to call upon an attribute named .statement, which is something that's on Query, but not on select(), so if you have a select() object, you would not want to call .statement first.



Reply all
Reply to author
Forward
0 new messages