removing entity_name in 0.5

44 views
Skip to first unread message

Michael Bayer

unread,
Jul 27, 2008, 4:40:21 PM7/27/08
to sqlalchemy
hey list -

I've hinted at this before, and today I took a last look over the
current "entity_name" implementation in 0.5, since I'd like either a
completion or total removal of this feature in the final release
(which has to be in the next month or so). I've come up with some
alternative implementations, but with all of them, the fundamental
flaw of entity_name remains present, and prevents any implementation
from making sense completely. The usefulness of entity_name is only
present within very specific guidelines, which with 0.5's more
prominent class-oriented behavior become that much more restricted.
There's also no entity_name use case which isn't addressed more simply
using multiple classes.

First, some history. entity_name popped up in SQLAlchemy sometime
around the 0.1 or 0.2 series, back when I was the sole creator of SA,
we didn't have a lot of users indicating what worked and what didn't,
and a lot of SA's features were created in response to their presence
on Hibernate, the widely used Java ORM. entity_name is straight out
of Hibernate's featureset, the idea being that a single class can be
mapped to multiple tables. Hibernate's central use case for this
feature is described here: http://www.hibernate.org/hib_docs/v3/reference/en/html/persistent-classes.html#persistent-classes-dynamicmodels
. You can see its still tagged as "experimental", with lots of
caveats about "expect many type errors" and such.

Within SQLA, entity_name is one of two methods to create multiple
mappers for the same class. In contrast to "non_primary", which
creates a load-only mapper which requires that there be a primary
"persistence" mapper already defined, entity_name seeks to define
multiple "persistence" mappers with no requirement of any "primary"
non-entity-name mapper. What it amounts to is that the persistence
strategy of a class instance is not decided until the instance is
associated with a Session.

In Hibernate, you are responsible for explicitly defining your class
and all of its accessor methods fully, including scalar values, object
references and collections. Hibernate then uses its XML-configured
mappings to "instrument" the user defined class using various forms of
proxies, some of which are instrumented directly within bytecode, but
these instrumentations define very little of the class' behavior;
you've already defined that by using explicitly typed methods which
return things like ints, Sets, Maps, etc. You also define getter/
setters for everything, and you have to determine the method by which
all data is ultimately associated with an instance (i.e. all the
instance variables). This is what people mean when they talk about
"XML pushups".

SQLAlchemy, because it's Python, doesn't have to jump through those
kinds of hoops - we can just use Python descriptors (i.e. like
property) to make each accessor on the class behave however we want at
mapper definition time, and duplicate specification of attributes and
collection behaviors are not necessary, nor are explicit "getter" and
"setter" methods. The behavior of attributes is determined directly
by the mapping definition, and user-defined explicitness regarding
behavior is only an option (i.e. descriptors, "synonym"). So while
both Hibernate and SQLAlchemy affect class and instance behavior via
the persistence strategy, in SQLA and most other Python ORMs its to a
much greater degree. If it weren't, the level of explicitness
required to use the ORM would be overly cumbersome.

So, since the persistence strategy determined by the mapping ties
directly to class and instance behavior, the entity name feature
amounts to nothing more than "late typing" of an instance, with the
added anti-bonus of the instance existing in an ambiguously typed
state before the explicit "persistence" step is taken. The fact that
entity_name introduces an inherent implicitness which says "we're not
entirely sure what class this really is" leads to all kinds of broken
scenarios, and the fact that it's an exotic way of avoiding the usage
of distinct classes means its essentially a redundant, more implicit
way of doing something which is easily achieved explicitly.

For example, we have a class Record, which can be stored in one of two
tables. So we map Record twice using two different entity names:

mapper(Record, table1, entity_name="t1")
mapper(Record, table2, entity_name="t2")

This seems straightforward, and we'd expect that we can now deal with
an instance of Record as well as it's class, specifying entity_name
only at the point of persistence:

r1 = Record("some record")
session.save(r1, entity_name="t1")
session.commit()

But, suppose Record is mapped like this:

t1_mapper = mapper(Record, table1, properties={
'subrec':relation(SubRec)
}, entity_name="t1")

t2_mapper = mapper(Record, table2, entity_name="t2")

Now when we say "r1 = Record()", do we have a "subrec" collection or
not ? It's entirely ambiguous. There's an unlimited number of
ambiguous scenarios here, including totally different sets of
relations on each mapper, relations with the same name but different
collection/object types, column-based relations with different
behavior, etc. If you were to experiment with conundrums like these
in 0.4, you'd find that the behavior is more or less arbitrary.

The issues become worse when we look at the behavior of cascades,
which are another way that instances are associated with a session:

mapper(Parent, sometable, properties={
'r1s':relation(t1_mapper),
'r2s':relation(t2_mapper)
})

Parent can reference a collection of Record objects via its "r1s"
collection, in which case "t1" persistence is desired, or "r2s", in
which case "t2" persistence is desired. So the following breakage
occurs readily:

p1 = Parent()
p2 = Parent()
r1 = Record()
p1.r1s.append(r1)
p2.r2s.append(r1)

Which is a paradox, since r1 can't be mapped to both "table1" and
"table2" at the same time ! The Session needs to raise an error on
the second "append()" since "r1" would already be persisted under the
"t1" entity_name.

The 0.4 series more or less ignored issues like these, but in 0.5 they
become impossible to ignore, since now we want to say:

session.query(Record.data,
Record.status).filter(Record.subrec.contains(foo)).all()

We have a much greater reliance on class-bound attributes as
persistence descriptors, as in 0.5 they have distinctly different
meaning than a plain Table-bound column does. Entity name totally
gets in the way here, and requires something along these lines:

r1_entity = entity(Record, "t1")
session.query(r1_entity.data,
r1_entity.status).filter(r1_entity.subrec.contains(foo)).all()

Above, it's clear that an entity_name mapped class is useless for
describing the persistence of a class, and an explicit token tied to a
particular mapping is needed in any case.

To which any sane observer should be crying out now, "why aren't you
just using distinct classes for this?" The answer is we should.
The only difference between when we use explicit classes and when we
use entity name is one of "I don't want to decide yet what type this
object is". Its the difference between:

r1 = Record()
sess.save(r1, entity_name="t1") # decide that Record is a T1Record

and:

r1 = T1Record() # decide that Record is a T1Record
sess.save(r1)

The first example, which uses "late typing", is entirely possible
without the ORM being involved. You can build a conversion function:

def make_t1_record(rec):
return T1Record(rec.data, rec.status)

r1 = Record()
sess.save(make_t1_record(r1))

Or you can go deeper, and change the __class__ (you'd want to abstract
this out more, but this is the general idea):

r1 = Record()
r1.__class__ = T1Record
sess.save(r1)

Either of the above are fine with SQLA (the second example needs us to
modify a particular assertion within the session code, but that's
trivial). But more importantly they move the issue of 'deciding the
type of an object' out into the application space and away from the
ORM. The "ideal" of an instance existing in complete isolation from
its persistence is one of the ideals SQLAlchemy (as well as Hibernate)
was founded upon, but this is usually not entirely true except for the
simplest of cases, and even Hibernate's document suggests this. For
a feature to be a core SA feature, and not just a recipe, it has to
"scale up" to the non-simple cases, which is something entity_name has
never really been able to do.

So, removing entity_name entirely for 0.5.0 is my plan at the
moment. Just like the Python 2.3 removal (which is done), this is a
last call for appeals/suggestions/etc regarding entity_name, as I'm
going to start on a removal branch (which will greatly simpify the
instrumentation internals, too).

a...@svilendobrev.com

unread,
Jul 28, 2008, 5:21:14 AM7/28/08
to sqlal...@googlegroups.com
if u're targeting the ORM to be consistent, it would mean any feature
that cannot withstand an D/A\B\C inheritance and its usage/variants
(esp. relations) has to either get fixed - or removed - or explicitly
warned about its inabilities.

entity-naming seems like one - it carries some implicitness that looks
like (sub)typing. i'm thinking if its gone, how one object can be
persisted in two completely separate ways (different schemas etc) but
maybe there is a way. This usecase is not really a subtyping,
although it probably could be represented as such -- the main app.
using only the base, while different persistency-dealers subclass
on-the-run into their own sub-representation - or make copies.
Conversion from one db-repr to another should go through base...
Something like each persistency-dealer having its own variant of
whole main class-hierarchy... entity_name does this implicitly,
right?

another one that comes to mind is concrete-table inheritance; it works
in simple cases and breaks in more complex ones. While using dbcook i
have invented some way of avoiding it, by having base-classes that
define attributes/relations but are not ORMapped
(DBCOOK_no_mapping=True) - but then the ORM has no idea of that
inheritance _at_all. the other way is "dont use", use
joined-table-inheriance.
do u have any plans for fixing that? i'm not really needing it - this
is for consistency.

anything else similar?

svilen

Michael Bayer

unread,
Jul 28, 2008, 9:59:31 AM7/28/08
to sqlal...@googlegroups.com

On Jul 28, 2008, at 5:21 AM, a...@svilendobrev.com wrote:

> another one that comes to mind is concrete-table inheritance; it works
> in simple cases and breaks in more complex ones. While using dbcook i
> have invented some way of avoiding it, by having base-classes that
> define attributes/relations but are not ORMapped
> (DBCOOK_no_mapping=True) - but then the ORM has no idea of that
> inheritance _at_all. the other way is "dont use", use
> joined-table-inheriance.
> do u have any plans for fixing that? i'm not really needing it - this
> is for consistency.

concrete should always be improved as needed. unlike entity_name, it
is a real use case without any simpler substitute. i dont think
concrete's issues are as severe, either.


jason kirtland

unread,
Aug 2, 2008, 5:39:47 PM8/2/08
to sqlal...@googlegroups.com


+1 on removing it. The basic aims can be achieved reasonably sanely by
defining additional classes on the user side. To my mind, the big
drawback is that the multi-class approach does not help people mapping
onto existing domain objects. entity_name did promise to fulfill the
data mapper pattern in that use case and allow persistence behavior to
be controlled externally. But having been fairly deep in the guts on
this myself I don't see this particular approach ever getting as solid
as I might like.

Reply all
Reply to author
Forward
0 new messages