Re: [sqlalchemy] limit in context of entities not rows

53 views
Skip to first unread message

Simon King

unread,
Jul 4, 2019, 6:20:19 AM7/4/19
to sqlal...@googlegroups.com
I would first figure out how you would do this in SQL, and then
translate that to SQLAlchemy. In this case, the EXISTS operator might
work:

SELECT *
FROM department
WHERE EXISTS (
SELECT 1
FROM employee
WHERE employee.department_id = department.id
AND employee.name IN (...)
)
LIMIT 2

The docs contain some examples of how to use EXISTS:

https://docs.sqlalchemy.org/en/13/orm/tutorial.html#using-exists

session.query(Department).filter(Department.employees.any(Employee.name.in_(...)))

I've no idea about the performance of this though. It seems like it's
going to have to evaluate the subquery for every row of the parent
table. If the parent table is huge, and you're only looking at a small
number of employees, that might be wasted effort. Under those
conditions, it would be better to find the employees first, then
select the matching departments:

SELECT *
FROM department
WHERE id in (
SELECT department_id
FROM employee
WHERE name IN (...)
)
LIMIT 2

employee_subquery =
session.query(Employee.department_id).filter(Employee.name.in_(...)).subquery()
departments = session.query(Department).filter(Department.id.in_(employee_subquery)).limit(2)

Simon

On Thu, Jul 4, 2019 at 10:27 AM Victor Olex
<victo...@vtenterprise.com> wrote:
>
> Using ORM querying what is the best practice for limiting the output to a given number of resulting entities?
>
> Consider this model:
>
> from sqlalchemy import Column, DateTime, String, Integer, ForeignKey, func
> from sqlalchemy.orm import relationship, backref
> from sqlalchemy.ext.declarative import declarative_base
> from sqlalchemy import create_engine
> from sqlalchemy.orm import sessionmaker
>
> Base = declarative_base()
>
> class Department(Base):
> __tablename__ = 'department'
> id = Column(Integer, primary_key=True)
> name = Column(String)
>
> def __repr__(self):
> return f'Department({self.id}, {self.name})'
>
>
> class Employee(Base):
> __tablename__ = 'employee'
> id = Column(Integer, primary_key=True)
> name = Column(String)
> department_id = Column(Integer, ForeignKey('department.id'))
> # Use cascade='delete,all' to propagate the deletion of a Department onto its Employees
> department = relationship(
> Department,
> backref=backref('employees', uselist=True,
> cascade='delete,all'))
> engine = create_engine('sqlite://', echo=True)
> Base.metadata.create_all(engine)
>
> Session = sessionmaker(bind=engine)
> session = Session()
> session.add(Department(id=1, name='IT'))
> session.add(Department(id=2, name='Finance'))
> session.add(Department(id=3, name='Sales'))
> session.add(Employee(id=1, name='Victor', department_id=1))
> session.add(Employee(id=2, name='Michal', department_id=1))
> session.add(Employee(id=3, name='Kinga', department_id=2))
> session.add(Employee(id=4, name='Andy', department_id=3))
> session.commit()
>
> Now, let's query for the list of Departments given some criteria on the Employee:
> session.query(Department).join(Employee).filter(Employee.name.in_(['Andy', 'Kinga', 'Victor', 'Michal'])).all()
>
> As expected we get:
>
> [Department(1, IT), Department(2, Finance), Department(3, Sales)]
>
> Now suppose our intent is to limit the number of results, and we would prefer to use a LIMIT clause to do the filtering on the database side:
>
> session.query(Department).join(Employee).filter(Employee.name.in_(['Andy', 'Kinga', 'Victor', 'Michal'])).limit(2).all()
>
> This (in my case) resulted in:
>
> [Department(1, IT)]
>
> The reason for this is that the limit gets applied to the resulting rows from the joined tables, which happen to begin with the two employees from IT (this is non-deterministic unless order by is also used). Since both represent the same entity, only one instance is returned.
>
> Other approaches (and their shortcomings) are:
>
> Using a DISTINCT clause prior to LIMIT - won't work if any field is include non-comparable types like IMAGE, BLOB
> Using a subquery on the Department with LIMIT - this may filter out departments, which would otherwise match
>
> What works is wrapping the entire query in a subquery, but only selecting a DISTINCT Department.id field, and using that in the IN clause. It seems quite convoluted and some databases might not do great to optimize this away.
>
> sq = session.query(Department.id).join(Employee).filter(Employee.name.in_(['Andy', 'Kinga', 'Victor', 'Michal'])).distinct().limit(2).subquery()
> session.query(Department).filter(Department.id.in_(sq)).all()
>
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
> ---
> You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
> To post to this group, send email to sqlal...@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/d4e19b87-3442-4b5e-801a-3005fc4c433e%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages