ORM relationships and (PostgreSQL) partition constraint-exclusion

252 views
Skip to first unread message

Adrian

unread,
Dec 5, 2013, 5:44:35 AM12/5/13
to sqlal...@googlegroups.com
Hi All,

I have a few partitioned tables in my PostgreSQL database but I do not know yet how to make the ORM relationship() with partition constraint-exclusion on the instance level. Constraint-exclusion does not work with joins and requires scalar values - the problem is that I would need to add an additional WHERE clause to the primaryjoin (which adds the partition key) if the relationship is accessed from the instance level, e.g. user.addresses. Is there a mechanism in relationship() to distinguish between class-based joins (User.addresses) and instance-level access?

Adrian Schreyer

unread,
Dec 5, 2013, 5:58:09 AM12/5/13
to sqlal...@googlegroups.com
Never mind,

the problem was that I specified the clause in a secondaryjoin and not in the primaryjoin of the relationship().


On Thu, Dec 5, 2013 at 10:44 AM, Adrian <adrian....@gmail.com> wrote:
Hi All,

I have a few partitioned tables in my PostgreSQL database but I do not know yet how to make the ORM relationship() with partition constraint-exclusion on the instance level. Constraint-exclusion does not work with joins and requires scalar values - the problem is that I would need to add an additional WHERE clause to the primaryjoin (which adds the partition key) if the relationship is accessed from the instance level, e.g. user.addresses. Is there a mechanism in relationship() to distinguish between class-based joins (User.addresses) and instance-level access?

--
You received this message because you are subscribed to a topic in the Google Groups "sqlalchemy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sqlalchemy/ov-mYWA7XAM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to sqlalchemy+...@googlegroups.com.
To post to this group, send email to sqlal...@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Adrian Schreyer

unread,
Dec 5, 2013, 6:57:38 AM12/5/13
to sqlal...@googlegroups.com
Actually that was a bit too early but I tracked the problem down to the many-to-many relationship. Parameters are only interpolated (e.g. %(param_1)s) for the primaryjoin to the secondary table. Is there a technique to force relationship() to interpolate a parameter between the 1st and 3rd table instead of using only table.column=table.column?

Michael Bayer

unread,
Dec 5, 2013, 10:31:41 AM12/5/13
to sqlal...@googlegroups.com
On Dec 5, 2013, at 6:57 AM, Adrian Schreyer <adrian....@gmail.com> wrote:

Actually that was a bit too early but I tracked the problem down to the many-to-many relationship. Parameters are only interpolated (e.g. %(param_1)s) for the primaryjoin to the secondary table. Is there a technique to force relationship() to interpolate a parameter between the 1st and 3rd table instead of using only table.column=table.column?

there’s no reason why that would be the case can you provide more specifics?






On Thu, Dec 5, 2013 at 10:58 AM, Adrian Schreyer <adrian....@gmail.com> wrote:
Never mind,

the problem was that I specified the clause in a secondaryjoin and not in the primaryjoin of the relationship().


On Thu, Dec 5, 2013 at 10:44 AM, Adrian <adrian....@gmail.com> wrote:
Hi All,

I have a few partitioned tables in my PostgreSQL database but I do not know yet how to make the ORM relationship() with partition constraint-exclusion on the instance level. Constraint-exclusion does not work with joins and requires scalar values - the problem is that I would need to add an additional WHERE clause to the primaryjoin (which adds the partition key) if the relationship is accessed from the instance level, e.g. user.addresses. Is there a mechanism in relationship() to distinguish between class-based joins (User.addresses) and instance-level access?

--
You received this message because you are subscribed to a topic in the Google Groups "sqlalchemy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sqlalchemy/ov-mYWA7XAM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to sqlalchemy+...@googlegroups.com.
To post to this group, send email to sqlal...@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.



--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
signature.asc

Adrian Schreyer

unread,
Dec 5, 2013, 11:03:12 AM12/5/13
to sqlal...@googlegroups.com

Given the three mappings FirstSecond and Partitioned, I want to declare a relationship between First and Partitioned. The problem is that Partitioned is partitioned by partition_key which is a column in First but not in SecondSecond however contains the identifier that actually links First to specific rows in the partitioned table.

So far the mapping looks like this mock example:


partitioned = relationship("Partitioned",
    secondary=Base.metadata.tables['schema.seconds'],
    primaryjoin="and_(First.first_id==Second.first_id, First.partition_key==Partitioned.partition_key)",
    secondaryjoin="Second.other_id==Partitioned.other_id",
    foreign_keys="[Second.first_id, Partitioned.partition_key, Partitioned.other_id]",
    uselist=True, innerjoin=True, lazy='dynamic')

It works, but it only interpolates the First.first_id with the actual value which normally makes sense but to make the PostgreSQL constraint-exclusion work the First.partition_key would need to be interpolated with the proper value as well. Right now it is only given as First.partition_key==Partitioned.partition_key.

Does that make sense? I am not sure if my relationship configuration is wrong or if this kind of mapping is simply not supported.

Michael Bayer

unread,
Dec 5, 2013, 11:37:13 AM12/5/13
to sqlal...@googlegroups.com
oh, you want to refer to the tertiary table in both the primary and secondary join.    so right this pattern does not correspond to the A->secondary->B pattern and isn’t really a classic many-to-many.

a quick way to map these are to use non primary mappers (was going to just paraphrase, but let me just try it out b.c. these are fun anyway, and I want to see the new joining behavior we have in 0.9…):

from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class First(Base):
    __tablename__ = 'first'

    first_id = Column(Integer, primary_key=True)
    partition_key = Column(String)

    def __repr__(self):
        return ("First(%s, %s)" % (self.first_id, self.partition_key))

class Second(Base):
    __tablename__ = 'second'

    id = Column(Integer, primary_key=True)
    first_id = Column(Integer)
    other_id = Column(Integer)

class Partitioned(Base):
    __tablename__ = 'partitioned'

    id = Column(Integer, primary_key=True)
    partition_key = Column(String)
    other_id = Column(Integer)

    def __repr__(self):
        return ("Partitioned(%s, %s)" % (self.partition_key, self.other_id))


j = join(Partitioned, Second, Partitioned.other_id == Second.other_id)

partitioned_second = mapper(Partitioned, j, non_primary=True, properties={
        # note we need to disambiguate columns here - the join()
        # will provide them as j.c.<tablename>_<colname> for access,
        # but they retain their real names in the mapping
        "id": j.c.partitioned_id,
        "other_id": [j.c.partitioned_other_id, j.c.second_other_id],
        "secondary_id": j.c.second_id
    })

First.partitioned = relationship(
                            partitioned_second,
                            primaryjoin=and_(
                                First.partition_key == partitioned_second.c.partition_key,
                                First.first_id == foreign(partitioned_second.c.first_id)
                            ), innerjoin=True)

e = create_engine("postgresql://scott:tiger@localhost/test", echo=True)
Base.metadata.drop_all(e)
Base.metadata.create_all(e)
s = Session(e)
s.add_all([
    First(first_id=1, partition_key='p1'),
    First(first_id=2, partition_key='p1'),
    First(first_id=3, partition_key='p2'),
    Second(first_id=1, other_id=1),
    Second(first_id=2, other_id=1),
    Second(first_id=3, other_id=2),
    Partitioned(partition_key='p1', other_id=1),
    Partitioned(partition_key='p1', other_id=2),
    Partitioned(partition_key='p2', other_id=2),
])
s.commit()

for row in s.query(First, Partitioned).join(First.partitioned):
    print(row)

for f in s.query(First):
    for p in f.partitioned:
        print(f.partition_key, p.partition_key)


I mapped to a join directly, and not a select, so as long as we aren’t using SQLite (and are using 0.9) we get nested join behavior like this:

SELECT first.first_id AS first_first_id, first.partition_key AS first_partition_key, partitioned.id AS partitioned_id, partitioned.partition_key AS partitioned_partition_key, partitioned.other_id AS partitioned_other_id 
FROM first JOIN (partitioned JOIN second ON partitioned.other_id = second.other_id) ON first.partition_key = partitioned.partition_key AND first.first_id = second.first_id
2013-12-05 11:27:18,347 INFO sqlalchemy.engine.base.Engine {}
(First(1, p1), Partitioned(p1, 1))
(First(2, p1), Partitioned(p1, 1))
(First(3, p2), Partitioned(p2, 2))


the load of f.partitioned will load the Partitioned objects in terms of the “partitioned_second” mapper, so those objects will have those extra cols from “second” on them.  You can screw around with this using exclude_properties for those cols you don’t need to refer to on the mapping, and perhaps primary_key if the mapper complains, such as:

partitioned_second = mapper(Partitioned, j, non_primary=True, properties={
        "id": j.c.partitioned_id,
        "other_id": [j.c.partitioned_other_id, j.c.second_other_id],
    }, exclude_properties=[j.c.second_id], primary_key=[j.c.partitioned_id, j.c.second_other_id])

or you can just ignore those extra attributes on some of your Partitioned objects.
signature.asc

Adrian Schreyer

unread,
Dec 5, 2013, 1:55:53 PM12/5/13
to sqlal...@googlegroups.com

The partitioned relationship actually referred to the tertiary table in both the primary and secondary join - the problem for me was that in the primaryjoin


primaryjoin="and_(First.first_id==Second.first_id, First.partition_key==Partitioned.partition_key)"

only First.first_id will be interpolated with the actual value first_id of the instance in question whereas First.partition_key on the other hand will be interpolated as column object. The problem is that in this case First.partition_key has to be interpolated with the actual value to get the constraint-exclusion to work. In a normal many-to-many relationship this would not be necessary and maybe that is why it only interpolates the values for the join on the secondary table.

The partitioned relationship emits a query like this if the attribute is accessed:


SELECT partitioned.*
  FROM partitioned, second, first
 WHERE %(param_1)s = second.first_id
       AND first.partition_key = partitioned.partition_key
       AND second.other_id = partitioned.other_id

But I would need first.partitioned_key to be %(param_2)s.

So far I used a @property around a query function to add the partition_key to query.filter() manually.

Michael Bayer

unread,
Dec 5, 2013, 2:18:23 PM12/5/13
to sqlal...@googlegroups.com
With the example I gave, when accessing .partitioned on a First instance, the lazy loader will convert all columns from “First” into a bound parameter, it emits this:


 SELECT partitioned.other_id AS partitioned_other_id, second.other_id AS second_other_id, partitioned.partition_key AS partitioned_partition_key, second.first_id AS second_first_id 
FROM partitioned JOIN second ON partitioned.other_id = second.other_id 
WHERE ? = partitioned.partition_key AND ? = second.first_id
2013-12-05 14:14:42,689 INFO sqlalchemy.engine.base.Engine (u'p1', 2)


“first.partition_key” is not in the query, it’s replaced by ‘p1’ in this case, the value that was assigned to that First instance.    There is no “secondary” table per se in the example I gave.
signature.asc

Adrian Schreyer

unread,
Dec 5, 2013, 2:41:54 PM12/5/13
to sqlal...@googlegroups.com
I will try this out then, thanks for your help! I assume this works in 0.9 only?

Michael Bayer

unread,
Dec 5, 2013, 2:47:25 PM12/5/13
to sqlal...@googlegroups.com
it should work in 0.8 as well (and can be done even in 0.7 with some adjustments), just not the more optimized nested JOIN part.
signature.asc
Reply all
Reply to author
Forward
0 new messages