Bulk Operations with Joined Table Inheritance

59 views
Skip to first unread message

Robert Sami

unread,
Jan 24, 2017, 4:49:40 PM1/24/17
to sqlalchemy
Hi SQLAlchemy wizards.

I was interested in using the new bulk operations API (http://docs.sqlalchemy.org/en/latest/orm/persistence_techniques.html#bulk-operations) but was looking for some advice based on my use case.

I have a class “FooDerived” which corresponds to a table that is linked to “FooBase” using joined table inheritance. I want to use the bulk_save_objects method to save, lets say 100,000 instances of “FooDerived”.

One option would be the following:

```
session.bulk_save_objects([FooBase() for i in range(100000)])
session.flush()
foo_base_models = FooBase.query.filter(/* Assume its possible to filter for the newly created objects*/).all()
session.bulk_save_objects([FooDerived(id=base.id) for base in foo_base_models])
```

Is there a better way?

Thank you!

mike bayer

unread,
Jan 24, 2017, 6:00:19 PM1/24/17
to sqlal...@googlegroups.com
this would be expressed much more clearly and efficiently using Core
constructs, and you need a way of knowing that primary key for FooBase()
because how you have it above where it auto-generates the primary key,
it would perform 100K SELECT statements :


foobase = FooBase.__table__
fooderived = FooDerived.__table__
with engine.begin() as conn:
my_first_pk = conn.scalar(select([func.max(foobase.c.id)]))
conn.execute(
foobase.insert(),
{"id": ident + my_first_pk, "data": "whatever"} for ident in
range(100000)
)
conn.execute(
fooderived.insert(),
{"id": ident + my_firstpk, "data": "whatever"} for ident in
range(100000)
)


of course you need to make sure no other transactions are INSERTing rows
with this approach or they will throw off your primary key counter.



>




> Thank you!
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google
> Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to sqlalchemy+...@googlegroups.com
> <mailto:sqlalchemy+...@googlegroups.com>.
> To post to this group, send email to sqlal...@googlegroups.com
> <mailto:sqlal...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

Robert Sami

unread,
Jan 24, 2017, 7:40:55 PM1/24/17
to sqlal...@googlegroups.com
Thanks for the response Mike,

I agree that using Core is pretty clean. One approach I considered was the following:

```
res = conn.execute(FooBase.__table__.insert(returning=[FooBase.id], values=[{} for i in range(100000)]))
conn.execute(FooDerived.__table__.insert(values=[{'id': _id, data: 'whatever'} for _id, in res.fetchall()]))
```

This is similar to the approach you outlined above, but also robust to other transactions inserting in the table.

The reason I would prefer to use `session.bulk_save_objects()` is that this method is aware of default attribute values of objects. For example:

```
class FooDerived(..):
  ...
  data = db.Column(db.Integer, default=17)
```

Creating a bunch of `FooDerived` objects will automatically set the `data` attributes to their default value. So I was hoping there was some way of using `session.bulk_save_objects()` to a similar effect as the Core approach I shared above, which uses a "RETURNING" clause to know the primary keys of the newly inserted `FooBase` rows. If not, do you have any other thoughts or suggestions on how to get the best of both worlds?

Many thanks!


To post to this group, send email to sqlal...@googlegroups.com
--
SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- You received this message because you are subscribed to a topic in the Google Groups "sqlalchemy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sqlalchemy/MpC7I2WV4aM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to sqlalchemy+unsubscribe@googlegroups.com.
To post to this group, send email to sqlal...@googlegroups.com.

mike bayer

unread,
Jan 25, 2017, 10:02:07 AM1/25/17
to sqlal...@googlegroups.com


On 01/24/2017 07:40 PM, Robert Sami wrote:
> Thanks for the response Mike,
>
> I agree that using Core is pretty clean. One approach I considered was
> the following:
>
> ```
> res = conn.execute(FooBase.__table__.insert(returning=[FooBase.id],
> values=[{} for i in range(100000)]))
> conn.execute(FooDerived.__table__.insert(values=[{'id': _id, data:
> 'whatever'} for _id, in res.fetchall()]))
> ```
>
> This is similar to the approach you outlined above, but also robust to
> other transactions inserting in the table.

OK, so usually RETURNING doesn't work for "executemany()", but I see
there you are packing them into one big VALUES clause and ultimately
using cursor.execute(), so that should work, though you want to chunk
the sizes into batches of 1000 or so or your SQL statement will grow too
large.


>
> The reason I would prefer to use `session.bulk_save_objects()` is that
> this method is aware of default attribute values of objects. For example:
>
> ```
> class FooDerived(..):
> ...
> data = db.Column(db.Integer, default=17)

that "default" is interpreted by the Core, not the ORM. So your core
statement should handle it too and you'd see those "17"s going in. If
not, let's get an MCVE and figure out why.


> ```
>
> Creating a bunch of `FooDerived` objects will automatically set the
> `data` attributes to their default value. So I was hoping there was some
> way of using `session.bulk_save_objects()` to a similar effect as the
> Core approach I shared above, which uses a "RETURNING" clause to know
> the primary keys of the newly inserted `FooBase` rows. If not, do you
> have any other thoughts or suggestions on how to get the best of both
> worlds?
>
> Many thanks!
>
> On Tue, Jan 24, 2017 at 3:00 PM, mike bayer <mik...@zzzcomputing.com
> <mailto:mik...@zzzcomputing.com>> wrote:
>
>
>
> On 01/24/2017 04:49 PM, Robert Sami wrote:
>
> Hi SQLAlchemy wizards.
>
> I was interested in using the new bulk operations API
> (http://docs.sqlalchemy.org/en/latest/orm/persistence_techniques.html#bulk-operations
> <http://docs.sqlalchemy.org/en/latest/orm/persistence_techniques.html#bulk-operations>)
> but was looking for some advice based on my use case.
>
> I have a class “FooDerived” which corresponds to a table that is
> linked
> to “FooBase” using joined table inheritance. I want to use the
> bulk_save_objects method to save, lets say 100,000 instances of
> “FooDerived”.
>
> One option would be the following:
>
> ```
> session.bulk_save_objects([FooBase() for i in range(100000)])
> session.flush()
> foo_base_models = FooBase.query.filter(/* Assume its possible to
> filter
> for the newly created objects*/).all()
> session.bulk_save_objects([FooDerived(id=base.id
> <http://base.id>) for base in
> foo_base_models])
> ```
>
> Is there a better way?
>
>
> this would be expressed much more clearly and efficiently using Core
> constructs, and you need a way of knowing that primary key for
> FooBase() because how you have it above where it auto-generates the
> primary key, it would perform 100K SELECT statements :
>
>
> foobase = FooBase.__table__
> fooderived = FooDerived.__table__
> with engine.begin() as conn:
> my_first_pk = conn.scalar(select([func.max(foobase.c.id
> <http://foobase.c.id>)]))
> an email to sqlalchemy+...@googlegroups.com
> <mailto:sqlalchemy%2Bunsu...@googlegroups.com>
> <mailto:sqlalchemy+...@googlegroups.com
> <mailto:sqlalchemy%2Bunsu...@googlegroups.com>>.
> To post to this group, send email to sqlal...@googlegroups.com
> <mailto:sqlal...@googlegroups.com>
> <mailto:sqlal...@googlegroups.com
> <mailto:sqlal...@googlegroups.com>>.
> <https://groups.google.com/group/sqlalchemy>.
> For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
>
> --
> SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve
> <http://stackoverflow.com/help/mcve> for a full description.
> --- You received this message because you are subscribed to a topic
> in the Google Groups "sqlalchemy" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/sqlalchemy/MpC7I2WV4aM/unsubscribe
> <https://groups.google.com/d/topic/sqlalchemy/MpC7I2WV4aM/unsubscribe>.
> To unsubscribe from this group and all its topics, send an email to
> sqlalchemy+...@googlegroups.com
> <mailto:sqlalchemy%2Bunsu...@googlegroups.com>.
> To post to this group, send email to sqlal...@googlegroups.com
> <mailto:sqlal...@googlegroups.com>.
> <https://groups.google.com/group/sqlalchemy>.
> For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google
> Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to sqlalchemy+...@googlegroups.com
> <mailto:sqlalchemy+...@googlegroups.com>.
> To post to this group, send email to sqlal...@googlegroups.com
> <mailto:sqlal...@googlegroups.com>.

Robert Sami

unread,
Jan 25, 2017, 5:07:04 PM1/25/17
to sqlal...@googlegroups.com
On Wed, Jan 25, 2017 at 7:02 AM, mike bayer <mik...@zzzcomputing.com> wrote:


On 01/24/2017 07:40 PM, Robert Sami wrote:
Thanks for the response Mike,

I agree that using Core is pretty clean. One approach I considered was
the following:

```
res = conn.execute(FooBase.__table__.insert(returning=[FooBase.id],
values=[{} for i in range(100000)]))
conn.execute(FooDerived.__table__.insert(values=[{'id': _id, data:
'whatever'} for _id, in res.fetchall()]))
```

This is similar to the approach you outlined above, but also robust to
other transactions inserting in the table.

OK, so usually RETURNING doesn't work for "executemany()", but I see there you are packing them into one big VALUES clause and ultimately using cursor.execute(), so that should work, though you want to chunk the sizes into batches of 1000 or so or your SQL statement will grow too large.

Ah, thanks for the tip!
 



The reason I would prefer to use `session.bulk_save_objects()` is that
this method is aware of default attribute values of objects. For example:

```
class FooDerived(..):
  ...
  data = db.Column(db.Integer, default=17)

that "default" is interpreted by the Core, not the ORM.   So your core statement should handle it too and you'd see those "17"s going in.  If not, let's get an MCVE and figure out why.


OK, thanks for explaining. FWIW this was an incorrect assumption on my part, rather than based on any experience or observation, so I'll get you a MCVE if anything unexpected comes up. Thanks for clarifying!
 

```

        To post to this group, send email to sqlal...@googlegroups.com
        <mailto:sqlalchemy@googlegroups.com>
        <mailto:sqlalchemy@googlegroups.com
        <mailto:sqlalchemy@googlegroups.com>>.

        Visit this group at https://groups.google.com/group/sqlalchemy
        <https://groups.google.com/group/sqlalchemy>.
        For more options, visit https://groups.google.com/d/optout
        <https://groups.google.com/d/optout>.


    --
    SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper

    http://www.sqlalchemy.org/

    To post example code, please provide an MCVE: Minimal, Complete, and
    Verifiable Example.  See  http://stackoverflow.com/help/mcve
    <http://stackoverflow.com/help/mcve> for a full description.
    --- You received this message because you are subscribed to a topic
    in the Google Groups "sqlalchemy" group.
    To unsubscribe from this topic, visit
    https://groups.google.com/d/topic/sqlalchemy/MpC7I2WV4aM/unsubscribe
    <https://groups.google.com/d/topic/sqlalchemy/MpC7I2WV4aM/unsubscribe>.

    To unsubscribe from this group and all its topics, send an email to

    To post to this group, send email to sqlal...@googlegroups.com



--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and
Verifiable Example. See http://stackoverflow.com/help/mcve for a full
description.
---
You received this message because you are subscribed to the Google
Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send

To post to this group, send email to sqlal...@googlegroups.com
--
SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- You received this message because you are subscribed to a topic in the Google Groups "sqlalchemy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sqlalchemy/MpC7I2WV4aM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to sqlalchemy+unsubscribe@googlegroups.com.
To post to this group, send email to sqlal...@googlegroups.com.

Thanks for building an amazing piece of software :D 
Reply all
Reply to author
Forward
0 new messages