recursive cte use current row as anchor

39 views
Skip to first unread message

niuji...@gmail.com

unread,
Sep 24, 2021, 1:47:50 AM9/24/21
to sqlalchemy
class Staff(Base):
      id  = Column(Integer, primary_key=True)
      selfreferencing_staff_id = Column(
        Integer,
        ForeignKey('staff_table.id',
                   onupdate="CASCADE",
                   ondelete='SET NULL'))
      _enabling_factor = Column(Integer)
      effective_enabling_factor = column_property(...) # I have trouble in this line here.
      team_members = relationship('Staff',
                               backref=backref(
                                   'supervisor',
                                   remote_side='Staff.id'))


This is a self-referencing lineage. Each staff has one supervisor above them. Each staff has a `_enabling_factor`, which can be either a Integer, or Null. A staff's `effective_enabling_factor` is either their own `_enabling_factor` value, or their supervisor's `effective_enabling_factor` if their own is Null.

This seems to be a case to use recursive CTE.
I can construct the query for a certain staff member, e.g. staff #5:

recursive_cte = select([Staff.id,  Staff._enabling_factor, Staff.selfreferencing_staff_id]).where(Staff.id==5).cte(recursive=True)

lineage_nodes = recursive_cte.union_all(select([Staff.id,  Staff._enabling_factor, Staff.selfreferencing_staff_id]).join(recursive_cte, recursive_cte.c.selfreferencing_staff_id==Staff.record_id).filter(recursive_cte.c._enabling_factor == None))

marker_carrying_supervisor_id = select(sasql.func.min(lineage_nodes.c.id)).scalar_subquery()

select(Staff._enabling_factor).where(Staff.id==marker_carrying_supervisor_id)


However, I don't see how I can write this recursive CTE as a column_property on the `Staff` class. Instead of giving specific primary key  (e.g. #5), I need to somehow reference current row as the anchor.

How to solve this?

Mike Bayer

unread,
Sep 24, 2021, 4:00:01 PM9/24/21
to noreply-spamdigest via sqlalchemy
this is a hefty query to dig in to but column_property() subqueries have to be formed in terms of a correlated subquery.  So instead of injecting a particular primary key into it, you set it to point to the Staff.id column.

correlated subqueries are not terrific performers and the construct can be a little bit clumsy in the ORM as well, however, the second example at https://docs.sqlalchemy.org/en/14/orm/mapped_sql_expr.html#using-column-property shows the general idea.

a first step to understanding might be to write out the SQL you think you want when you SELECT some Staff rows, where one of the columns in the row is the "effective_enabling_factor".   that column needs to be written as a correlated subquery for it to be compatible with column_property().
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
 
 
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.

niuji...@gmail.com

unread,
Sep 25, 2021, 12:04:49 AM9/25/21
to sqlalchemy
Hi Mike, thanks for pointing out the direction.
I've worked out the SQL, but failed when converting to SQLAlchemy construct.

My SQL query looks like this:

SELECT id, (
        WITH lineage_nodes (id, _enabling_factor, selfreferencing_staff_id) AS
        (
            SELECT anchor_s.id, anchor_s._enabling_factor, anchor_s.selfreferencing_staff_id
            FROM staff_table AS anchor_s
            WHERE anchor_s.id = outer_s.id
            
            UNION ALL
                
            SELECT s.id, s._enabling_factor, s.selfreferencing_staff_id
            FROM lineage_nodes AS l
                INNER JOIN staff_table AS s
                    ON l.selfreferencing_staff_id = s.id
            WHERE l._enabling_factor IS NULL
        ),
            
        top_node_id (top_id) AS
        (
            SELECT MIN(id) AS top_id FROM lineage_nodes
        )
            
        SELECT staff_table._enabling_factor
        FROM staff_table
            INNER JOIN top_node_id
                ON staff_table.id = top_node_id.top_id
        ) AS effective_enabling_factor
FROM staff_table AS outer_s;




My Python codes looks like this:


class Staff(Base):
      id  = Column(Integer, primary_key=True)
      selfreferencing_staff_id = Column(
        Integer,
        ForeignKey('staff_table.id',
                   onupdate="CASCADE",
                   ondelete='SET NULL'))
      _enabling_factor = Column(Integer)

      ## codes below doesn't work:
      anchor_s = __class__.__table__.alias(name="anchor_s")
      s = __class__.__table__.alias(name="s")

       recursive_cte = select([
            id, _enabling_factor, selfreferencing_staff_id
           ]).select_from(anchor_s).where(anchor_s.c.id == id).cte(
            name="lineage_nodes", recursive=True)

       lineage_nodes = recursive_cte.union_all(
           select([
            s.c.id, s.c._enabling_factor, s.c.selfreferencing_staff_id
            ]).join(recursive_cte, recursive_cte.c.selfreferencing_staff_id ==
                s.c.id).filter(recursive_cte.c._enabling_factor == None))

       top_id = select(sasql.func.min(lineage_nodes.c.id))
       effective_enabling_factor = column_property(...) # I have trouble in this line here.
       ## codes above has a NameError: name '__class__' is not defined
     
       team_members = relationship('Staff',
                               backref=backref(
                                   'supervisor',
                                   remote_side='Staff.id'))

Mike Bayer

unread,
Sep 25, 2021, 11:26:46 PM9/25/21
to noreply-spamdigest via sqlalchemy
well __class__.__table__ isn't going to be there inside the class body, just to make things simple you will want to add this column_property() after the Staff class is fully defined; then you make use of Staff.<col> to get at columns.  https://docs.sqlalchemy.org/en/14/orm/mapped_sql_expr.html#using-column-property will show how to assign a new column_property to the class after the fact.

next, the query you have is correlating, but i really dont have any idea if SQLAlchemy is going to handle a correlated column deep inside of a CTE like that.   The form you have, with the CTE embedded in the parenthesis, is only possible with the most recent SQLAlchemy 1.4.24 release where someone contributed a new parameter called "nesting", which means the CTE should not be moved to the top level of the SELECT.   still, not really sure what a CTE will do inside a correlated subquery like that.   For an example of how to use "nesting" see example four at https://docs.sqlalchemy.org/en/14/core/selectable.html?highlight=hascte#sqlalchemy.sql.expression.HasCTE.cte .  for the correlate, when you write out the select() that refers to "outer_s", add .correlate(Staff) to it, which means Staff isn't added to the FROM list, it's assumed to be on the outside.

give those a try but im not totally sure CTEs work as correlated subqueries right now, it's not been tried.  I assume you've confirmed this query actually runs, im surprised you can even correlate inside of a CTE like that.

niuji...@gmail.com

unread,
Sep 26, 2021, 2:56:06 AM9/26/21
to sqlalchemy

Hi Mike,

Yes the SQL code runs as desired, but the Python code doesn't, unfortunately.
After reading the references you pointed out, my Python code looks like this:

class Staff(Base):
      id  = Column(Integer, primary_key=True)
      selfreferencing_staff_id = Column(
        Integer,
        ForeignKey('staff_table.id',
                   onupdate="CASCADE",
                   ondelete='SET NULL'))
      _enabling_factor = Column(Integer)
     
       team_members = relationship('Staff',
                               backref=backref(
                                   'supervisor',
                                   remote_side='Staff.id'))
                                   
anchor_s = Staff.__table__.alias(name="anchor_s")
s = Staff.__table__.alias(name="s")
final_s = Staff.__table__.alias(name="final_s")

recursive_cte = select([
            anchor_s.c.id, anchor_s.c._enabling_factor, anchor_s.c.selfreferencing_staff_id
           ]).select_from(anchor_s).where(anchor_s.c.id == Staff.id).cte(
            name="lineage_nodes", recursive=True)

lineage_nodes = recursive_cte.union_all(
           select([
            s.c.id, s.c._enabling_factor, s.c.selfreferencing_staff_id
            ]).join(recursive_cte, recursive_cte.c.selfreferencing_staff_id ==
                s.c.id).filter(recursive_cte.c._enabling_factor == None))

top_id = select(sasql.func.min(lineage_nodes.c.id))
Staff.effective_enabling_factor = column_property(select(final_s.c._enabling_factor).where(
        final_s.c.record_id == top_id).scalar_subquery())  # This is where I define the desired column_property.




The problems seems to be that this code doesn't generate `correlated subquery` for each `Staff` row ( as in the SQL code `WHERE anchor_s.id = outer_s.id`). Could you take a look?

Mike Bayer

unread,
Sep 26, 2021, 10:35:20 AM9/26/21
to noreply-spamdigest via sqlalchemy
OK there's various small issues here but overall SQLAlchemy's new "nesting" feature does not seem to work correctly yet for a RECURSIVE CTE with UNION, meaning it's not very useful for RECURSIVE right now.

But here's the thing, I'm not as SQL expert as some people think but I don't think it should be necessary for a CTE to actually be embedded in a subquery.  You can keep the CTE on top, where it can more efficiently create a set of all rows, and then refer to it within a normal correlated subquery that's in the column_property().

ive written the below MCVE we can work with going forward.  see how close this is, but the goal is, keep the CTE on top as it normally can be and do correlations outside of it.

from sqlalchemy import Column
from sqlalchemy import ForeignKey
from sqlalchemy import func
from sqlalchemy import Integer
from sqlalchemy import select
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import backref
from sqlalchemy.orm import column_property
from sqlalchemy.orm import relationship
from sqlalchemy.orm import Session

Base = declarative_base()


class Staff(Base):
    __tablename__ = "staff"

    id = Column(Integer, primary_key=True)

    selfreferencing_staff_id = Column(
        Integer,
        ForeignKey("staff.id", onupdate="CASCADE", ondelete="SET NULL"),
    )
    _enabling_factor = Column(Integer)
    record_id = Column(Integer)

    team_members = relationship(
        "Staff", backref=backref("supervisor", remote_side="Staff.id")
    )


anchor_s = Staff.__table__.alias(name="anchor_s")
s = Staff.__table__.alias(name="s")
final_s = Staff.__table__.alias(name="final_s")

recursive_cte = (
    select(
        [
            anchor_s.c.id.label("anchor_id"),
            anchor_s.c._enabling_factor,
            anchor_s.c.selfreferencing_staff_id,
        ]
    ).select_from(anchor_s)
    # this part we are going to move to the outside
    # .where(anchor_s.c.id == Staff.id)
    .cte(name="lineage_nodes", recursive=True)
)

recursive_cte_a = recursive_cte.alias()

lineage_nodes = recursive_cte.union_all(
    select([s.c.id, s.c._enabling_factor, s.c.selfreferencing_staff_id])
    .join(
        recursive_cte_a, recursive_cte_a.c.selfreferencing_staff_id == s.c.id
    )
    .filter(recursive_cte_a.c._enabling_factor == None)
)

top_id = (
    select(func.min(lineage_nodes.c.anchor_id))
    # here's where we moved it
    .where(lineage_nodes.c.anchor_id == Staff.id)
    .correlate(Staff)
    .scalar_subquery()
)


Staff.effective_enabling_factor = column_property(
    select(final_s.c._enabling_factor)
    .where(final_s.c.record_id == top_id)
    .scalar_subquery()
)  # This is where I define the desired column_property.


s = Session()

q = s.query(Staff)
print(q)

niuji...@gmail.com

unread,
Sep 26, 2021, 3:53:54 PM9/26/21
to sqlalchemy
Hi Mike,
This code doesn't seem to yield desired result, now all the `effective_enabling_factor` is `None`, except for those rows that carries a `_enabling_factor` on itself. In other words, it doesn't seem to recusively search the next immediately parent's marker at all.

And the SQL it generates is a little confusing, it seems to me that we introduce a `recursive_cte_a` aliased table and gain nothing from it:

WITH RECURSIVE lineage_nodes(
  anchor_id, _enabling_factor, selfreferencing_staff_id
) AS (
  SELECT 
    anchor_f.id AS anchor_id, 
    anchor_f._enabling_factor AS _enabling_factor, 
    anchor_f.selfreferencing_staff_id AS selfreferencing_staff_id 
  FROM 
    staff AS anchor_f 
  UNION ALL 
  SELECT 
    anchor_f.id AS id, 
    anchor_f._enabling_factor AS _enabling_factor, 
    anchor_f.selfreferencing_staff_id AS selfreferencing_staff_id 
  FROM 
    staff AS anchor_f 
    JOIN lineage_nodes AS anon_2 ON anon_2.selfreferencing_staff_id = anchor_f.id 
  WHERE 
    anon_2._enabling_factor IS NULL
SELECT 
  staff.id AS staff_id, 
  staff._enabling_factor AS staff__enabling_factor, 
  (
    SELECT 
      final_f._enabling_factor 
    FROM 
      staff AS final_f 
    WHERE 
      final_f.id = (
        SELECT 
          min(lineage_nodes.anchor_id) AS min_1 
        FROM 
          lineage_nodes 
        WHERE 
          lineage_nodes.anchor_id = staff.id
      )
  ) AS anon_1 
FROM 
  staff


Mike Bayer

unread,
Sep 26, 2021, 4:01:25 PM9/26/21
to noreply-spamdigest via sqlalchemy
i applied an alias to the CTE before unioning it, please take that out and try again.  CTE inside the subquery can't happen, but shoudnt be necessary.

Mike Bayer

unread,
Sep 26, 2021, 4:05:38 PM9/26/21
to noreply-spamdigest via sqlalchemy
changing it more the way you had

lineage_nodes = recursive_cte.union_all(
    select([s.c.id, s.c._enabling_factor, s.c.selfreferencing_staff_id])
    .join(
        recursive_cte, recursive_cte.c.selfreferencing_staff_id == s.c.id
    )
    .filter(recursive_cte.c._enabling_factor == None)
)

produces:

WITH RECURSIVE lineage_nodes(anchor_id, _enabling_factor, selfreferencing_staff_id) AS
(SELECT anchor_s.id AS anchor_id, anchor_s._enabling_factor AS _enabling_factor, anchor_s.selfreferencing_staff_id AS selfreferencing_staff_id
FROM staff AS anchor_s UNION ALL SELECT s.id AS id, s._enabling_factor AS _enabling_factor, s.selfreferencing_staff_id AS selfreferencing_staff_id
FROM staff AS s JOIN lineage_nodes ON lineage_nodes.selfreferencing_staff_id = s.id
WHERE lineage_nodes._enabling_factor IS NULL)
SELECT staff.id AS staff_id, staff.selfreferencing_staff_id AS staff_selfreferencing_staff_id, staff._enabling_factor AS staff__enabling_factor, staff.record_id AS staff_record_id, (SELECT final_s._enabling_factor
FROM staff AS final_s
WHERE final_s.record_id = (SELECT min(lineage_nodes.anchor_id) AS min_1
FROM lineage_nodes
WHERE lineage_nodes.anchor_id = staff.id)) AS anon_1
FROM staff

so the UNION joins out to "lineage_nodes" again.  I've not really worked with recursive CTEs much but i think that's the idea.
Reply all
Reply to author
Forward
0 new messages