Unable to reference label in subquery at top level of query

15 views
Skip to first unread message

Ian Miller

unread,
Feb 5, 2019, 11:51:25 AM2/5/19
to sqlalchemy
Hello all -

I am relatively new to using SQLAlchemy for more complex use cases. I am in the process of creating a time series query, but I am unable to reference a column by its alias at the top level of the query.

This is the query that I am trying to address that SQLAlchemy is currently generating:

SELECT non_interval_query.metadata_value AS non_interval_query_metadata_value,
       coalesce
(sum(non_interval_query.coalesce_2), 0) AS coalesce_1,
       timestamp
FROM
 
(SELECT generate_series(date_trunc('day', date('2019-01-06T00:00:00+00:00')), date_trunc('day', date('2019-01-12T00:00:00+00:00')), '1 day') AS timestamp) AS time_series
LEFT OUTER JOIN
 
(SELECT post_metadata_1.metadata_value AS post_metadata_1_metadata_value,
          post_metadata_2
.metadata_value AS post_metadata_2_metadata_value,
          vw_post
.created_at AS vw_post_created_at,
          coalesce
(count(DISTINCT vw_post.id), 0) AS coalesce_1
   FROM vw_post
   JOIN post_metadata AS post_metadata_1 ON post_metadata_1
.post_id = vw_post.id
   JOIN post_metadata AS post_metadata_2 ON post_metadata_2
.post_id = vw_post.id
   WHERE post_metadata_1
.metadata_value IN ('<metadata_values>')
     AND post_metadata_2
.metadata_value IN ('<metadata_value>')
     AND vw_post
.created_at >= '2019-01-06T00:00:00+00:00'
     AND vw_post
.created_at <= '2019-01-12T00:00:00+00:00'
     AND post_metadata_1
.schema_uid = '<schema_uid>'
     AND post_metadata_1
.metadata_name = '<metadata_name>'
     AND post_metadata_2
.schema_uid = '<schema_uid>'
     AND post_metadata_2
.metadata_name = '<metadata_name>'
     AND vw_post
.license_id IN (<license_ids>)
   GROUP BY vw_post
.created_at,
            post_metadata_1
.metadata_value,
            post_metadata_2
.metadata_value,
            vw_post
.created_at) AS non_interval_query ON date_trunc('day', created_at) = timestamp;

You'll notice that "non_interval_query.metadata_value AS non_interval_query_metadata_value" specified at the beginning of the query is ambiguous due to the 2 "metadata_value" selects in the "non_interval_query" subquery. What I'm trying to do is have 2 selects at the top level - one for "non_interval_query.post_metadata_1_metadata_value" and one for "non_interval_query.post_metadata_2_metadata_value".


For reference, here is the code used to generate the above query:


def apply_date_group_by(self, session, query, range_gb_params):
    field_name = self.db.get("column")
    model
= self._object.get("model")

   
if not field_name or not model:
        raise ValueError("Invalid date group by")

    gb_column
= self._build_column()
    interval
= range_gb_params.get("interval")
    interval_type
= range_gb_params.get("interval_type")

    time_series
= func.generate_series(
        func
.date_trunc(interval_type, func.date(range_gb_params["start"])),
        func
.date_trunc(interval_type, func.date(range_gb_params["end"])),
        interval
,
   
).label("timestamp")

    ts_column
= column("timestamp")

    time_series_query
= session.query(time_series).subquery("time_series")
    non_interval_query
= query.subquery("non_interval_query")
   
# have to replace the original gb_column with the 'timestamp' column
    # in order to properly merge the dataset into the time series dataset
    non_gb_columns, gbs = self._prepare_non_gb_columns(
        ts_column
, gb_column, non_interval_query.columns
   
)

   
# construct query with correct position passed in from `range_gb_params`
    query_position = range_gb_params.get("query_index_position", 0)
    non_gb_columns
.insert(query_position, ts_column)

    date_gb_query
= session.query(*non_gb_columns).select_from(
        time_series_query
.outerjoin(
            non_interval_query
,
            func
.date_trunc(interval_type, column(field_name)) == ts_column,
       
)
   
)

   
if gbs:
        date_gb_query = date_gb_query.group_by(*gbs)

   
return date_gb_query.order_by(ts_column)


Any help on this would be greatly appreciated!

Mike Bayer

unread,
Feb 5, 2019, 12:59:19 PM2/5/19
to sqlal...@googlegroups.com
so this code is incomplete, referring to something called
"_prepare_non_gb_columns" which is likely where this is going wrong,
the subquery that you SELECT from has a .c namespace from which you
would be selecting both
non_interval_query.c.post_metadata_1_metadata_value and
non_interval_query.c.post_metadata_2_metadata_value from, separately.
The names that are available on .c. come directly from the label names
you use in the subquery, like
metadata_value.label("post_metadata_1_metadata_value").

I mocked the important part there up as a script below, but I did it
in Core which is easier for this kind of query, but then for
demonstration I adapted it to Query as well. long term plan is to
unify these two query interfaces more completely.

Core:

from sqlalchemy import table, column, select


post_metadata = table(
"post_metadata", column("post_id"), column("metadata_value")
)

vw_post = table("vw_post", column("id"))


post_metadata_1 = post_metadata.alias("post_metadata_1")
post_metadata_2 = post_metadata.alias("post_metadata_2")

non_interval_query = (
select(
[
post_metadata_1.c.metadata_value.label(
"post_metadata_1_metadata_value"
),
post_metadata_2.c.metadata_value.label(
"post_metadata_2_metadata_value"
),
]
)
.select_from(
vw_post.join(
post_metadata_1, post_metadata_1.c.post_id == vw_post.c.id
).join(post_metadata_2, post_metadata_2.c.post_id == vw_post.c.id)
)
.alias("non_interval_query")
)


stmt = select(
[
non_interval_query.c.post_metadata_1_metadata_value,
non_interval_query.c.post_metadata_2_metadata_value,
]
).apply_labels()

print(stmt)



ORM version:

from sqlalchemy import table, column, select


post_metadata = table(
"post_metadata", column("post_id"), column("metadata_value")
)

vw_post = table("vw_post", column("id"))


from sqlalchemy.orm import Session, aliased

s = Session()

post_metadata_1 = aliased(post_metadata, "post_metadata_1")
post_metadata_2 = aliased(post_metadata, "post_metadata_2")

non_interval_query = (
s.query(
post_metadata_1.c.metadata_value.label(
"post_metadata_1_metadata_value"
),
post_metadata_2.c.metadata_value.label(
"post_metadata_2_metadata_value"
),
)
.select_from(vw_post)
.join(post_metadata_1, post_metadata_1.c.post_id == vw_post.c.id)
.join(post_metadata_2, post_metadata_2.c.post_id == vw_post.c.id)
.subquery("non_interval_query")
)


stmt = s.query(
non_interval_query.c.post_metadata_1_metadata_value,
non_interval_query.c.post_metadata_2_metadata_value,
)

print(stmt)
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
> ---
> You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
> To post to this group, send email to sqlal...@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

Ian Miller

unread,
Feb 7, 2019, 10:45:07 AM2/7/19
to sqlalchemy
Hello Mike,

Thank you for your response! I have currently constructed the ORM implementation that you suggested in your response. Here's the code:

def _rebuild_non_interval_query_for_group_by(self, session, query):

   
from sqlalchemy import table, column, select

   
from sqlalchemy.orm import aliased
   
from collections import defaultdict, OrderedDict


    post_metadata
= table(
       
"post_metadata", column("post_id"), column("metadata_value")
   
)

    campaign_metadata
= table(
       
"campaign_metadata", column("campaign_id"), column("metadata_value")
   
)
    asset_metadata
= table(
       
"asset_metadata", column("asset_id"), column("metadata_value")
   
)

    vw_asset
= table("vw_asset", column("id"))
    vw_campaign
= table("vw_campaign", column("id"))

    vw_post
= table("vw_post", column("id"))


    METADATA_PRIMARY_TABLE_MAP
= {
        asset_metadata
.name: vw_asset,
        campaign_metadata
.name: vw_campaign,
        post_metadata
.name: vw_post,
   
}

    METADATA_NAME_TABLE_MAP
= {
        asset_metadata
.name: asset_metadata,
        campaign_metadata
.name: campaign_metadata,
        post_metadata
.name: post_metadata,
   
}

    primary_tables
= set()
    metadata_columns_count
= defaultdict(int)
    metadata_alias
= OrderedDict()
    columns
= []
   
for c in query.c._all_columns:
       
if c.name == "metadata_value":
            parent_column
= list(c.base_columns)[0]
            table
= parent_column.table
            primary_tables
.add(METADATA_PRIMARY_TABLE_MAP[table.name])
            metadata_columns_count
[METADATA_NAME_TABLE_MAP[table.name]] += 1
            alias_number
= metadata_columns_count[METADATA_NAME_TABLE_MAP[table.name]]
            alias_name
= "{}_{}".format(table.name, alias_number)
           
alias = aliased(parent_column.table, alias_name)
            metadata_alias
[alias_name] = alias
            column
= alias.c.metadata_value.label(
               
"{}_{}_{}".format(table.name, alias_number, "metadata_value")
           
)
            columns
.append(column)
       
else:
            columns
.append(c)

   
# start constructing query
    non_interval_query
= session.query(*columns).select_from(*primary_tables)

   
for alias_name, alias in metadata_alias.items():
        object_type
= self._get_object_type_from_metadata_name(alias.original.name)
        non_interval_query
= (
            non_interval_query
           
.join(
               
alias,
                getattr
(alias.c, "{}_id".format(object_type)) == METADATA_PRIMARY_TABLE_MAP[alias.original.name].c.id
           
)
       
)

    non_interval_query
= non_interval_query.subquery("non_interval_query")

   
return non_interval_query



The "metadata_alias" values are [('post_metadata_1", alias), ('post_metadata_2', alias)] - the alias correspond to the post_metadata_1 and post_metadata_2 alias in your example. However, when I reference these in the join, the aliased table names are not "post_metadata_1" or "post_metadata_2" - they're "post_metadata_3" and "post_metadata_4". I'm unable to figure out why there's a new join seemingly created instead of referencing the aliased tables that were passed in.

Here's the query that the above generates:

SELECT post_metadata_1.metadata_value AS post_metadata_1_metadata_value,
       post_metadata_2
.metadata_value AS post_metadata_2_metadata_value,

       non_interval_query
.created_at,
       non_interval_query
.coalesce_1 \nFROM
 
(SELECT post_metadata_3.metadata_value AS metadata_value, post_metadata_4.metadata_value AS metadata_value, vw_post.created_at AS created_at, coalesce(count(DISTINCT vw_post.id), :coalesce_2) AS coalesce_1 \nFROM vw_post
   JOIN post_metadata AS post_metadata_3 ON post_metadata_3
.post_id = vw_post.id
   JOIN post_metadata AS post_metadata_4 ON post_metadata_4
.post_id = vw_post.id \nWHERE post_metadata_3.metadata_value IN (:metadata_value_1, :metadata_value_2)
   AND post_metadata_4
.metadata_value IN (:metadata_value_3, :metadata_value_4)
   AND vw_post
.created_at >= :created_at_1
   AND vw_post
.created_at <= :created_at_2
   AND post_metadata_3
.schema_uid = :schema_uid_1
   AND post_metadata_3
.metadata_name = :metadata_name_1
   AND post_metadata_4
.schema_uid = :schema_uid_2
   AND post_metadata_4
.metadata_name = :metadata_name_2
   AND vw_post
.license_id IN (:license_id_1, :license_id_2)
   GROUP BY vw_post
.created_at, post_metadata_3.metadata_value, post_metadata_4.metadata_value, vw_post.created_at) AS non_interval_query,

                                      vw_post
JOIN post_metadata AS post_metadata_1 ON post_metadata_1
.post_id = vw_post.
id
JOIN post_metadata AS post_metadata_2 ON post_metadata_2
.post_id = vw_post.id;

Ian Miller

unread,
Feb 7, 2019, 11:18:42 AM2/7/19
to sqlalchemy
Ah - this occurs because the non-metadata_value columns added in from "query.c._all_columns" are still referencing the original query. I need to figure out how to rebuild those columns as well so that they correctly reference the clauses in the new query.


On Tuesday, February 5, 2019 at 11:51:25 AM UTC-5, Ian Miller wrote:

Mike Bayer

unread,
Feb 7, 2019, 12:53:51 PM2/7/19
to sqlal...@googlegroups.com
just as a note, don't access .c._all_columns, just iterate over selectable.c
Message has been deleted

Ian Miller

unread,
Feb 7, 2019, 2:51:07 PM2/7/19
to sqlalchemy
The reason why I iterated over .c._all_columns was because it shows the references to the 2 "metadata_value" columns, whereas selectable.c only references 1.

For example, selectable.c shows the following:

(Pdb++) list(query.c)
[Column('metadata_value', VARCHAR(length=255), table=<non_interval_query>), Column('created_at', TIMESTAMP(), table=<non_interval_query>), <sqlalchemy.sql.elements.ColumnClause at 0x7f15e1f6fef0; %(139731962070520 coalesce)s>]

selectable.c._all_columns shows the following:

(Pdb++) query.c._all_columns
[Column('metadata_value', VARCHAR(length=255), table=<non_interval_query>), Column('metadata_value', VARCHAR(length=255), table=<non_interval_query>), Column('created_at', TIMESTAMP(), table=<non_interval_query>), <sqlalchemy.sql.elements.ColumnClause at 0x7f15e1f6fef0; %(139731962070520 coalesce)s>]

That way I'm able to keep track of the position of the column in the query, and replace with the rebuilt column accordingly.

Mike Bayer

unread,
Feb 7, 2019, 5:13:41 PM2/7/19
to sqlal...@googlegroups.com
On Thu, Feb 7, 2019 at 2:49 PM Ian Miller <irmil...@gmail.com> wrote:
>
> The reason why I iterated over .c._all_columns was because it shows the references to the 2 "metadata_value" columns, whereas selectable.c only references 1.


so that's a sign that you need to apply labels to these columns
because they are conflicting. with select(), call
select().apply_labels() so that these two columns are prefixed with
their table name in a label.


>
> For example, selectable.c shows the following:
>
> (Pdb++) list(query.c)
> [Column('metadata_value', VARCHAR(length=255), table=<non_interval_query>), Column('created_at', TIMESTAMP(), table=<non_interval_query>), <sqlalchemy.sql.elements.ColumnClause at 0x7f15e1f6fef0; %(139731962070520 coalesce)s>]Enter code here...
>
> selectable.c._all_columns shows the following:
>
> (Pdb++) query.c._all_columns
> [Column('metadata_value', VARCHAR(length=255), table=<non_interval_query>), Column('metadata_value', VARCHAR(length=255), table=<non_interval_query>), Column('created_at', TIMESTAMP(), table=<non_interval_query>), <sqlalchemy.sql.elements.ColumnClause at 0x7f15e1f6fef0; %(139731962070520 coalesce)s>]Enter code here...
>
>
>
>
> On Thursday, February 7, 2019 at 12:53:51 PM UTC-5, Mike Bayer wrote:
>>

Ian Miller

unread,
Feb 7, 2019, 7:04:29 PM2/7/19
to sqlalchemy
:facepalm: i should have seen that much sooner. Thank you so much for your help. No longer need to do any of that crazy stuff I attempted!
Reply all
Reply to author
Forward
0 new messages