On 02/29/2016 05:38 PM, Alex Hewson wrote:
> Hello All,
>
> I'm trying to use the new bulk_save_objects() to improve performance on
> bulk inserts, and have run into a problem. If bulk_save_objects() is
> used to save objects of a polymorphic class..
>
> 1. They are created correctly in the DB, with polymorphic type column
> populated correctly
> 2. BUT queries for the new objects will return one of incorrect type.
> In my case I'm getting instances of Child1 back when I would expect
> to get a Child2.
turn on echo=True, and you'll see this:
INSERT INTO "Child1" DEFAULT VALUES
2016-02-29 17:48:11,349 INFO sqlalchemy.engine.base.Engine ((), (), (),
(), (), (), (), () ... displaying 10 of 1000 total bound parameter sets
... (), ())
what you will notice here is that this is the Child1 table receiving
entirely empty rows; the primary key values from Entity are nowhere to
be found. SQLite does not enforce foreign keys by default so it's just
auto-generating identifiers here, something that wouldn't happen on most
other databases where this column wouldn't work as an autoincrement by
default. If you run it on Postgresql you get:
sqlalchemy.exc.IntegrityError: (psycopg2.IntegrityError) null value in
column "MyId" violates not-null constraint
DETAIL: Failing row contains (null).
[SQL: 'INSERT INTO "Child1" DEFAULT VALUES'] [parameters: ({}, {}, {},
{}, {}, {}, {}, {} ... displaying 10 of 1000 total bound parameter sets
... {}, {})]
What's happening here is documented, however the documentation for bulk
is a little long and the documentation referring to the use case here
might benefit from a little more boldface and probably should be more
clearly listed as "will not work", instead of a somewhat casual
"however". At
http://docs.sqlalchemy.org/en/rel_1_0/orm/persistence_techniques.html#orm-compatibility
(emphasis added):
Multi-table mappings, such as joined-inheritance - **however**, an
object to be inserted across multiple tables either **needs to have
primary key identifiers fully populated ahead of time**, else the
Session.bulk_save_objects.return_defaults flag must be used, which will
greatly reduce the performance benefits
what we mean here is this:
c1_many = [Child1(Id=i+1, MyId=i+1, Content="c1inst_%d"%i) for i in
range(0,1000)]
c2_many = [Child2(Id=i+1001, MyId=i+1001, Content="c2inst_%d"%i)
for i in range(0,1000)]
In SQLAlchemy 1.1, things are much easier to spot, even if you're using
a non-FK/non-autoincrement enforcing database like SQLite; running this
program immediately catches the problem on the Python side:
sqlalchemy.exc.CompileError: Column 'Child1.MyId' is marked as a
member of the primary key for table 'Child1', but has no Python-side or
server-side default generator indicated, nor does it indicate
'autoincrement=True' or 'nullable=True', and no explicit value is
passed. Primary key columns typically may not store NULL.
This is because 1.1 has changed the logic of the "autoincrement" flag
and adds deeper checks for NULL primary key values as described at
http://docs.sqlalchemy.org/en/latest/changelog/migration_11.html#the-autoincrement-directive-is-no-longer-implicitly-enabled-for-a-composite-primary-key-column.
So the mitigation for the fact that your specific test case silently
fails include:
1. this whole issue only silently passes on SQLite, not on any of the
higher volume databases where you'd want to use bulk operations in the
first place
2. documentation here should be spruced up to list this practice as a
**warning**, including that we should also have a boldface up in the
earlier paragraph talking about fetching of inserted primary keys being
disabled (this is the slowest part of the INSERT so has no place within
bulk inserts, hence you must populate columns dependent on a PK up front
which means the PK itself needs to be populated up front in those cases
where you need it)
3. SQLAlchemy 1.1 won't let these INSERTs without a primary key value
when the column is not configured as an "autoincrement" proceed
Thanks for the clear test case here.
>
> The following code demonstrates the problem:
>
> |
> #!/usr/bin/env python3
> # -*- coding: utf-8 -*-
>
> fromsqlalchemy importcreate_engine
> fromsqlalchemy importColumn,Integer,SmallInteger,String,ForeignKey
> fromsqlalchemy.orm importsessionmaker
> fromsqlalchemy.ext.declarative importdeclarative_base
>
> Base=declarative_base()
>
> classEntity(Base):
> __tablename__ ='Entity'
> Id=Column(Integer,primary_key=True,nullable=False)
> Content=Column(String)
> _polytype =Column(SmallInteger,nullable=False)
>
> __mapper_args__ ={
> 'polymorphic_identity':1,
> 'polymorphic_on':_polytype
> }
>
> classChild1(Entity):
> __tablename__ ='Child1'
> MyId=Column(ForeignKey("Entity.Id"),primary_key=True)
> __mapper_args__ ={'polymorphic_identity':11}
>
> classChild2(Entity):
> __tablename__ ='Child2'
> MyId=Column(ForeignKey("Entity.Id"),primary_key=True)
> __mapper_args__ ={'polymorphic_identity':12}
>
>
> if__name__ =='__main__':
> --
> You received this message because you are subscribed to the Google
> Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to
sqlalchemy+...@googlegroups.com
> <mailto:
sqlalchemy+...@googlegroups.com>.
> To post to this group, send email to
sqlal...@googlegroups.com
> <mailto:
sqlal...@googlegroups.com>.
> Visit this group at
https://groups.google.com/group/sqlalchemy.
> For more options, visit
https://groups.google.com/d/optout.