Flask SQlAlchemy BaseQuery Paginate not working correctly

80 views
Skip to first unread message

Prerna Pandit

unread,
Aug 13, 2020, 10:30:47 AM8/13/20
to sqlalchemy
Hello, I've been struggling with this issue for the past couple of days and  would really, truly appreciate if someone could please give me pointers or direction as to what I might be missing.


Here are my models;
class Aggregate(db.Model):
    id = db.Column(UUID(as_uuid=True), primary_key=True,
                   server_default=db.text('uuid_generate_v4()'))
    blocks = db.relationship('AggregateBlock', cascade='all, delete-orphan',
                             passive_deletes=True, back_populates='aggregate')..


class AggregateBlock(db.Model):

    id
= db.Column(UUID(as_uuid=True), primary_key=True,
                   server_default
=db.text('uuid_generate_v4()'))
    block_id
= db.Column(UUID(as_uuid=True),
                         db
.ForeignKey('blocks.id', ondelete='CASCADE'), nullable=False, index=True)
    aggregate_id
= db.Column(UUID(as_uuid=True),
                             db
.ForeignKey('aggregates.id', ondelete='RESTRICT'), nullable=False)
    block
= db.relationship('Block', back_populates='aggregates')
    aggregate
= db.relationship('Aggregate', back_populates='blocks')


    

class Block(db.Model):
    id = db.Column(UUID(as_uuid=True), primary_key=True,
                   server_default
=db.text('uuid_generate_v4()'))
    is_complete
= db.Column(db.Boolean, default=False)
    aggregates
= db.relationship('AggregateBlock', cascade='all, delete-orphan',
                                 passive_deletes
=True, back_populates='block')


from flask_sqlalchemy import SQLAlchemy
db = SQLAlchemy()

select = db.session.query(Aggregate).join(AggregateBlock).join(Block).filter(Block.is_complete == complete)

print(len(select.all())

print(len(select.paginate(per_page=20).items())


If I do a select.all(), I get the right number of rows which is 47.  However, if I try to paginate for a per_page size say 20, I lot a less rows like 11.  
select.paginate(per_page=20).
The number could go up to 21 or so as I increase the page size.  Why would paginate decrease the number of returned records?

Simon King

unread,
Aug 14, 2020, 10:08:23 AM8/14/20
to sqlal...@googlegroups.com
"paginate" is not an SQLAlchemy function, so you'd be better off
asking the author of whatever is providing that feature.

However, I would guess that maybe paginate is naively applying
something like "LIMIT 20" to the query. This doesn't work properly
when you join along a one-to-many relationship, because if you have
(for example) 2 "parent" objects, each with 5 "child" objects, the
query will return 10 rows, but SQLAlchemy de-duplicates the results to
return just the 2 parent objects.

Simon
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
> ---
> You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/e5a14305-4e63-4467-9610-1faf3f8c8412o%40googlegroups.com.

Prerna Pandit

unread,
Aug 14, 2020, 10:56:21 AM8/14/20
to sqlalchemy
Hey Simon,

Thanks so much for replying to my question.  I reworked my code to use sqlalchemy ORM and took off flask and paginate so I can narrow down the issue. My models now extend from declarative_base.

engine = create_engine('postgresql://postgres:postgres@localhost:5400/postgres')
Session = sessionmaker(bind=engine)

Base = declarative_base()
session = Session()

Models

class Aggregate(Base):
    __tablename__ = 'aggregates'
    id = Column(UUID(as_uuid=True), primary_key=True,
                   server_default=text('uuid_generate_v4()'))
    site_id = Column(UUID(as_uuid=True), nullable=True)
    created_at = Column(DateTime, default=sa.func.now())
    created_by = Column(UUID(as_uuid=True), nullable=True)
    updated_at = Column(DateTime, default=sa.func.now(), onupdate=sa.func.now())
    updated_by = Column(UUID(as_uuid=True), nullable=True)
    blocks = relationship('AggregateBlock', cascade='all, delete-orphan',
                              passive_deletes=True, back_populates='aggregate')
                              

class Block(Base):

    __tablename__ = 'blocks'
    id = Column(UUID(as_uuid=True), primary_key=True,
                   server_default=text('uuid_generate_v4()'))
    type = Column(Text, nullable=False)
    heading = Column(Text, nullable=True)
    subheading = Column(Text, nullable=True)
    label = Column(Text, nullable=True)
    is_complete = Column(Boolean, default=False)
    created_at = Column(DateTime, default=sa.func.now())
    created_by = Column(UUID(as_uuid=True), nullable=True)
    updated_at = Column(DateTime, default=sa.func.now(), onupdate=sa.func.now())
    updated_by = Column(UUID(as_uuid=True), nullable=True)
    aggregates = relationship('AggregateBlock', cascade='all, delete-orphan',
                                 passive_deletes=True, back_populates='block')


class AggregateBlock(Base):
    __tablename__ = 'aggregate_blocks'
    id = Column(UUID(as_uuid=True), primary_key=True,
                   server_default=text('uuid_generate_v4()'))
    block_id = Column(UUID(as_uuid=True),
                         ForeignKey('blocks.id', ondelete='CASCADE'), nullable=False, index=True)
    aggregate_id = Column(UUID(as_uuid=True),
                             ForeignKey('aggregates.id', ondelete='RESTRICT'), nullable=False)
    position = Column(Integer, nullable=False)
    block = relationship('Block', back_populates='aggregates')
    aggregate = relationship('Aggregate', back_populates='blocks')


Query:

select = session.query(Aggregate).order_by(Aggregate.created_at) \
            .join(AggregateBlock) \
            .join(Block) \
            .filter(Block.is_complete == complete) \

all_results = select.all()
limit_results = select.limit(20).all()


I still get inconsistent results when I apply limit. Like select.all() will return 47 rows but  with limit it'll return anywhere between 11 to 15.  If I take the generated SQL query and run it directly in psql, I get the correct count.

SELECT aggregates.id AS aggregates_id, aggregates.site_id AS aggregates_site_id, aggregates.created_at AS aggregates_created_at, aggregates.created_by AS aggregates_created_by, aggregates.updated_at AS aggregates_updated_at, aggregates.updated_by AS aggregates_updated_by
FROM aggregates JOIN aggregate_blocks ON aggregates.id = aggregate_blocks.aggregate_id JOIN blocks ON blocks.id = aggregate_blocks.block_id
WHERE blocks.is_complete = false ORDER BY aggregates.created_at

 LIMIT 20





On Friday, August 14, 2020 at 10:08:23 AM UTC-4, Simon King wrote:
"paginate" is not an SQLAlchemy function, so you'd be better off
asking the author of whatever is providing that feature.

However, I would guess that maybe paginate is naively applying
something like "LIMIT 20" to the query. This doesn't work properly
when you join along a one-to-many relationship, because if you have
(for example) 2 "parent" objects, each with 5 "child" objects, the
query will return 10 rows, but SQLAlchemy de-duplicates the results to
return just the 2 parent objects.

Simon

> To unsubscribe from this group and stop receiving emails from it, send an email to sqlal...@googlegroups.com.

Simon King

unread,
Aug 15, 2020, 3:22:02 PM8/15/20
to sqlal...@googlegroups.com
This is the same problem: you're writing a query that joins 3 tables
together, and then applying a "LIMIT 20" to that query. If you look
carefully at your 20 rows of psql output, I expect you'll see the same
aggregates_id appear more than once. There are less than 20 distinct
Aggregate objects. When SQLAlchemy receives those rows, it skips the
duplicates.

The SQL looks something like this:

SELECT ... FROM aggregates
LEFT JOIN aggregate_blocks ON ...
LEFT JOIN blocks ON ...
LIMIT 20

You need the LIMIT to apply just to the aggregates table, rather than
the joined tables. This is a bit tricky because you want to filter by
a field in one of those joined tables. You could use an EXISTS
function, something like this:

SELECT * FROM aggregates
WHERE EXISTS (
SELECT 1
FROM aggregate_blocks
INNER JOIN blocks ON aggregate_blocks.block_id = block.id
WHERE aggregate_blocks.aggregate_id = aggregates.id
AND blocks.is_complete = false
)
LIMIT 20

If you run that in psql, I think you should get 20 *different*
aggregates rows back.

If that works, then to turn it into SQLAlchemy ORM syntax, you should
use Query.exists():

https://docs.sqlalchemy.org/en/13/orm/query.html#sqlalchemy.orm.query.Query.exists

Hope that helps,

Simon
> To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/d643cddb-3c9d-4ac9-8388-6746f584c6e6o%40googlegroups.com.

Prerna Pandit

unread,
Aug 17, 2020, 9:32:41 AM8/17/20
to sqlalchemy
Hey Simon,

I am indeed getting back duplicates of aggregate ids. I reworked my query to using exists like you suggested and now I get back correct result count.  Thanks so much for your suggestion and saving my day!  I can now move ahead on my work ticket.


Many, many, thanks again, appreciate it!!

Prerna
Reply all
Reply to author
Forward
0 new messages