whats going to break in 0.4

Michael Bayer

unread,

Jul 1, 2007, 9:09:06 PM7/1/07

to sqlal...@googlegroups.com

hi gang -

seems like I am getting a grip on whats really going to be different
and such in 0.4. I think it would be a good idea for me to put out
there some of the things i want to remove, as well as a few notable
backwards-incompatible changes, just to give a heads up. Note that
this list is only things that youll *have* to do in order to use
0.4...im not including things that are deprecated but will still work
throughout 0.4.

1. import structure

The biggest thing, and im not sure if people are ready for this one,
is separating "sqlalchemy" from "sqlalchemy.orm". Its been the case
for a long time that you can do your imports like this:

from sqlalchemy import *
from sqlalchemy.orm import *

and obviously you can import the specific classes explicitly, i.e.

from sqlalchemy import Table, Column
from sqlalchemy.orm import mapper, relation, backref

in 0.4, "sqlalchemy" is no longer going to pull in the whole list of
"sqlalchemy.orm" into its namespace. this means, to use mappers, you
*have* to import from the "sqlalchemy.orm" package explicitly as
above. this is partially to raise awareness of the fact that there
is a pretty strict separation between the two packages, and to
encourage better organization of concerns. so that means if you use
mapper(), relation(), backref(), create_session(), eagerload()/
lazyload(), you have to import them from sqlalchemy.orm. this
upcoming change has been mentioned on the tutorial page for several
months now too.

what the frameworks and such can do *right now*, is to start
importing as appropriate from 'sqlalchemy.orm' for object-relational
functionality. that way this wont present an issue with an 0.4 upgrade.

like i mentioned, if theres really some severe issue with this, we
can perhaps come up with a hack to "backfill" 'sqlalchemy.orm' into
'sqlalchemy' for some software package that cant be updated, but i
really want to try to get just this one little cleanup of concerns
out there.

2. assignmapper query methods

The next biggest thing is assignmapper. OK, im not taking
assignmapper away. But I am going to change the interface. All
querying will be available off of a single attribute, "query". most
likely the parenthesis (i.e. class.query()) will not be needed. so:

MyClass.query.filter_by(street='123 green street').all()

this is because we cant just keep putting every single method from
Query on the mapped class. plus with Query being generative, it
makes even less sense for any of the methods to be off of the class
directly. id like to just take all the other select(), select_by()
methods off of it, because i really want people to stop using them.
you can start using MyClass.query() right now, which will still work
with the parens in 0.4, and in 0.3.9 ill try to get
MyClass.query.<foo> to work as well so you can be totally forwards
compatible with 0.3.9.

3. assignmapper myobject.flush()

this is the other thing I really want to get rid of on assignmapper.
this is the most overused and anti-patternish thing out there. it
doesnt predictably handle the things attached to it (which is not for
any strong technical reason, just that its a complicated case which
id rather not have to bother with), and it works against the kinds of
patterns the Session is intended to be used for. you still can flush
an individual or group of instances, which is valid in certain cases,
by calling session.flush([<objects>])...but that ensures that you
really mean to do that.

4. global_connect() / default_metadata

as well as the ability to say "Table('sometable', Column(...)...)"
etc without using any MetaData. This one i know is going to raise
some ire. But I look at it this way: someday, Guido is going to
take a look at SQLAlchemy, and when that day comes, i dont want there
to be a trace that this ugly thing ever existed...it screams "SA cant
decide how its API should look". Plus it used DynamicMetaData which
is a totally misunderstood object that isnt going away but will be
much more downplayed. youre responsible for your own MetaData object
and telling your Table objects about it.

5. clear_mapper()

note this is *not* clear_mappers(), which is a pretty important
function. this one, the ability to clear just *one* mapper, is going
away..because this capability never really existed anyway. the total
set of mappers for your application organize themselves into their
own symbiotic ecosystem....splicing just one from out of it is a
configurational game of jenga you'll never win. i dont think this
was a very commonly used function.

Taking more of a stroll down memory lane, see if you remember these
tunes:

6. cascade_mappers()

If you've been around long enough to know what this one does, I
commend you. but if you've then not been paying attention enough to
know that i hate this function, shame on you ! this thing just
sucks. folks are free to copy it into their own library of mediocre
functions if they actually use it for something.

7. query.select_by_whatever('something')

yeah, back when we thought ActiveRecord was cool. or at least
someone told me it was. anyway, i dont think this one is too common.

8. table.select().execute(col1=5, col2=7) == "SELECT * FROM TABLE
WHERE col1=5 AND col2=7".

I am fairly certain that nobody knows what i am talking about here
since this one was never documented. If you think its cool, too bad
you missed the past 2 years to enjoy this doomed trick.

9. ProxyEngine

This is a feature that I *like*, its just that it hasnt worked for
about 18 months. The ProxyEngine *will* be back at some later date,
in a new form, but for now its offline.

10. SelectResults (sort of)

if youre using SelectResults, it will still import and act *mostly*
the same as it used to...but its just a placeholder now that does
almost nothing, since the Query has all of its functionality. so the
behavior of join() changes slightly (see #13 below).

11. everything in sqlalchemy.mods

I just realized theres nothing in here we should be keeping. This
would include "sqlalchemy.mods.threadlocal", which is evil, and the
never used "legacy_session" and "selectresults" mods. selectresults
mod is particularly unneeded becuase like in 10, all of SelectResults
is on Query now.

Next up are some things that are not going away but are going to
change backwards-incompatibly:

12. select.order_by() and select.group_by() (i.e. on the SQL select()
construct) are going to be generative, i.e. they return a new select
object. the old behavior will be present in select.append_order_by()
and select.append_group_by() (yes, I decided to keep both
generative and in-place APIs for this. but we're going to push the
generative API and leave the "append_" versions just in docstrings).

13. also slightly backwards incompatible: session.query(A).join
('b').join('c') in 0.3 produces "select * from a join b on a.id=b.id
join c on b.id=c.id". in 0.4, both 'b' and 'c' will be interpreted
as attributes on 'A', as in "select * from a join b on a.id=b.id join
c on a.id=c.id". to get a join from A->b->c, do session.query(A).join
(['b', 'c']). this new way allows you to re-join from the root as
often as you like. if you need totally custom joins, use
query.select_from(<custom join objects>). there is also a cool
"aliased" version of joins you can do in query.filter_by().

14. on the subject of filter_by(), filter_by(*clauseelements,
**kwargs) becomes just filter_by(**kwargs). you put the Table/
Column expressions in a filter() call. the deprecated select_by()/
count_by()/etc. will still be there and will allow the old style.

15. custom dictionary collections are now much more powerful at the
expense of being slightly more finicky (i.e.
collection_class=MyDict). quickest way to make your custom
dictionary class compatible is to either subclass dict, or add the
mixin class sqlalchemy.orm.collections.MappedCollection (i.e. class
MyDict(MappedCollection):). the __iter__() method goes back to
normal (returns keys) and you also need to implement a remove()
method. We'll see if we can produce a forwards compatible emulation
of this in 0.3.9.

...and there you have it, all I can think of for now.

Alexandre CONRAD

unread,

Jul 2, 2007, 6:50:32 AM7/2/07

to sqlal...@googlegroups.com

Hi Mike,

I've been reading all the changes, it all sounds like a good clean up of
the current SQLAlchemy. I'm not concerned in all the modifications, but
mostly about point #2, assignmapper. Moving everything under query()
really helps against conflicting names that you may give to your class
methods/attributes.

About #3, I did start playing with myobj.flush(), but it did crashe
sometimes because I wasn't flushing the right objects in the right
order. It was just a pain. I can even remember some situation (in a for
loop, AFAIR) where I just couldn't figure out how to flush objects
correctly. So I just session.flush() all of it. SA does the work for me.

I really don't like #7.

I like what you'll be doing on #13.

It's nice seeing such effort. Thanks Mike for this great toolkit.

Regards,
--
Alexandre CONRAD

Michael Bayer wrote:

> ---------------------------------------------------------------------------------------------------
> Texte inséré par Platinum 2007:
>
> S'il s'agit d'un mail indésirable (SPAM), cliquez sur le lien suivant pour le reclasser : http://127.0.0.1:6083/Panda?ID=pav_39695&SPAM=true
> ---------------------------------------------------------------------------------------------------
>
>

Paul Johnston

unread,

Jul 3, 2007, 2:36:34 PM7/3/07

to sqlal...@googlegroups.com

Hi Mike,

Your proposal sounds pretty sensible, in fact I don't use most of those
features. Just one concern:

>2. assignmapper query methods
>
>
Are the basics like select and select_by still going to be available
directly? I hope so, as I use them extensively.

Paul

Michael Bayer

unread,

Jul 3, 2007, 5:49:39 PM7/3/07

to sqlalchemy

select and select_by are exactly the methods that wont be directly off
the class; they will be available as class.query.select_by(whatever).

however, select() and select_by() are deprecated throughout 0.4, in
favor of filter()/filter_by() and other generative methods...theyll be
gone by 0.5.

David Bolen

unread,

Jul 3, 2007, 6:46:21 PM7/3/07

to sqlal...@googlegroups.com

Michael Bayer <mik...@zzzcomputing.com> writes:

> 4. global_connect() / default_metadata
(...)
> (...) Plus it used DynamicMetaData which

> is a totally misunderstood object that isnt going away but will be
> much more downplayed. youre responsible for your own MetaData object
> and telling your Table objects about it.

Just wondering - I've noticed you make similar DynamicMetaData
comments in other responses in the past. Downplayed or not, do you
consider it deprecated in an way? For me it seems the only natural
way to set up my meta data.

For example, my server has a schema.py module establishing the table
structure, but which at module import time has no idea where the
database engine is going to be. So the metadata in the schema.py
module is a DynamicMetaData instance.

Elsewhere, startup code determines the database location, creating the
engine referencing it, and then connects the meta data instance in the
already loaded schema module to that engine.

Any other approach would seem to imply knowing the engine URL at the
point when my schema module is being imported (so it can be used to
create the meta data).

Is this what you're considering being "totally misunderstood"?

-- David

Barry Warsaw

unread,

Jul 3, 2007, 7:01:19 PM7/3/07

to sqlal...@googlegroups.com

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I have exactly the same architecture and exactly the same question.
In fact, my goal in Mailman 3 is to be able to let sites configure
the system to use any supported database backend, just by tweaking
the configuration variable that specifies the engine url. We'll ship
with SQLite, but it would be awesome if I didn't have to do anything
else to 'automatically' support PostgreSQL or MySQL, etc. Although I
haven't tried it with these other backends, it currently works great
with alternative SQLite database file locations (such as the tempfile
one I use during a test suite run).

Note that I'm using Elixir, but I don't think that should matter.

I really hope the feature is retained. I'd actually be surprised if
it went away because it seems like such a huge win for SA.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRorVQHEjvBPtnXfVAQJCQgP9EKxAKqfEq3upZP3JzI+SaE78sDy3db3c
AKF2mmqa+2C6PIaY7sXCpD3e2ewRwrVKxZU+KpPjoAk+yoscn7gkIIwqVfrsvasK
5seYu0pHLeoHJtIyc36Us2o3NS941oeP4WalBhHqJ/rG0Vn1Lmy/Qzry/wEbHGo6
9ZORjvwGMtE=
=TaWK
-----END PGP SIGNATURE-----

jason kirtland

unread,

Jul 3, 2007, 8:14:35 PM7/3/07

to sqlal...@googlegroups.com

David wrote:
> Just wondering - I've noticed you make similar DynamicMetaData
> comments in other responses in the past. Downplayed or not, do
> you consider it deprecated in an way? For me it seems the only
> natural way to set up my meta data.
>
> For example, my server has a schema.py module establishing the
> table structure, but which at module import time has no idea
> where the database engine is going to be. So the metadata in the
> schema.py module is a DynamicMetaData instance.
>
> Elsewhere, startup code determines the database location,
> creating the engine referencing it, and then connects the meta
> data instance in the already loaded schema module to that engine.

DynamicMetaData is not deprecated, but it may be renamed in 0.4 to
clarify its role. For this setup, a MetaData will suffice- it is
"dynamic" as in late binding and re-bindable. After you get your
engine sorted out, you can connect() the engine and metadata just
as you do now. The binding will take effect process-wide.

DynamicMetaData works just the same, but the binding is late AND
scoped per-thread. It's fairly rare to need that, except...

Barry wrote:
> I have exactly the same architecture and exactly the same
> question. In fact, my goal in Mailman 3 is to be able to let
> sites configure the system to use any supported database backend,
> just by tweaking the configuration variable that specifies the
> engine url. We'll ship with SQLite, but it would be awesome if I
> didn't have to do anything else to 'automatically' support
> PostgreSQL or MySQL, etc. Although I haven't tried it with these
> other backends, it currently works great with alternative SQLite
> database file locations (such as the tempfile one I use during a
> test suite run).

The regular MetaData gives you this configuration flexibility for a
given installation.

If you want to support simultaneous and distinct 'configurations'
within a threaded process, each with its own set of database tables
(possibly mixing backends as well), then a DMD is perfect. Every
thread connects the DMD to the engine of its choice before work
starts.

In a Mailman context I could imagine a single fat worker process at
an ISP that serviced lots of domains, each "owned" by a different
user with separate data storage.

-j

David Bolen

unread,

Jul 3, 2007, 10:00:12 PM7/3/07

to sqlal...@googlegroups.com

jason kirtland <j...@discorporate.us> writes:

> DynamicMetaData is not deprecated, but it may be renamed in 0.4 to
> clarify its role. For this setup, a MetaData will suffice- it is
> "dynamic" as in late binding and re-bindable. After you get your
> engine sorted out, you can connect() the engine and metadata just
> as you do now. The binding will take effect process-wide.

Ah, that's definitely a misunderstanding on my part (and is perhaps
the sort of thing Michael was referring to). I don't need
thread-specific connections, but did mentally linked the ability to
use connect() at all with DynamicMetaData.

In re-reading the documentation with your comments in mind, it's clear
that the docs do mention connecting a normal MetaData object, but the
examples (and those in the source tree) tend to use BoundMetaData or
DynamicMetaData, so I think I probably mentally excluded a plain
MetaData as an option. The fact that the discussion of the single
global Metadata object also uses DynamicMetaData probably didn't help
my mental picture. Of course, my bad for not also checking out the
class itself. Perhaps having the docs show the use of connect() with
MetaData itself might be helpful to other new users.

-- David

Michael Bayer

unread,

Jul 4, 2007, 12:27:58 AM7/4/07

to sqlalchemy

On Jul 3, 10:00 pm, David Bolen <db3l....@gmail.com> wrote:

> In re-reading the documentation with your comments in mind, it's clear
> that the docs do mention connecting a normal MetaData object, but the
> examples (and those in the source tree) tend to use BoundMetaData or
> DynamicMetaData, so I think I probably mentally excluded a plain
> MetaData as an option. The fact that the discussion of the single
> global Metadata object also uses DynamicMetaData probably didn't help
> my mental picture. Of course, my bad for not also checking out the
> class itself. Perhaps having the docs show the use of connect() with
> MetaData itself might be helpful to other new users.

this is all stuff thats been sorting itself out as we've gone through
0.3 versions. At this stage I can see that having the name
"BoundMetaData" floating around there works against things being
simple, so the docs/book are going to talk mostly about "MetaData".
for DMD it seems like it would probably be a good idea to call it
ThreadLocalMetaData. ThreadLocalMetaData is very specific to that one
less common use case where a single process dishes out among many
independent databases.

the pattern im starting to use in 0.4 with metadata replaces
'connect()' with just the 'engine' property:

meta = MetaData()
engine = create_engine(...)
meta.engine = engine

as well as:

meta = MetaData()
meta.engine = 'sqlite://'

Paul Johnston

unread,

Jul 4, 2007, 5:22:27 PM7/4/07

to sqlal...@googlegroups.com

Hi,

>select and select_by are exactly the methods that wont be directly off
>the class; they will be available as class.query.select_by(whatever).
>
>

Personally, I would like the most common methods to stay - probably
select, select_by and get. The reason being that I like those operations
to be quick to code, as I use them a lot. Still, I realise you have a
lot of things to consider in defining the API and nothing will be
perfect for everyone.

Paul

Jose Galvez

unread,

Jul 4, 2007, 6:19:37 PM7/4/07

to sqlal...@googlegroups.com

Dear Micheal,

so far I really like all the new stuff, especially using the query
generator. I've got a question, what is going to be the preferred
method to replace get (or get_by). What I've been using is
query().filter_by(something='something').list()[0] but that feels
cumbersome. From one the previous posts it looks like one() might be
what I'm looking for, but that obviously not in the current release. Do
you have an ETA for the 0.4 build?
Thanks for the info and a great product
Jose

Michael Bayer

unread,

Jul 4, 2007, 6:21:34 PM7/4/07

to sqlalchemy

two huge issues with select(); the biggest, is that people confuse it
with sql.select(). they say, " my select isnt taking argument X" or
"its not doing Y", and its because they are confusing the two.

the other is that you have more than one way to do the same thing, and
select[_by]() is the more rigid and inconsistent way. theres
query.select(whereclause, order_by=foo, group_by=foo) which becomes
query.filter(whereclause).order_by(foo).group_by(foo). but then,
there is *no* query.select_by(arg1=foo, arg2=bar, order_by=foo,
group_by=foo); select_by doesnt work that way, and we tell people, oh
do query.filter_by(**kwargs).order_by(foo).group_by(foo).select().
too confusing.

the new interface is super clean, consistent and flexible. and with
that, we are able to add more features onto it. a cluttered interface
doesn't accept new functionalities as easily.

Michael Bayer

unread,

Jul 4, 2007, 6:55:06 PM7/4/07

to sqlal...@googlegroups.com

On Jul 4, 2007, at 6:19 PM, Jose Galvez wrote:

>
> Dear Micheal,
>
> so far I really like all the new stuff, especially using the query
> generator. I've got a question, what is going to be the preferred
> method to replace get (or get_by). What I've been using is
> query().filter_by(something='something').list()[0] but that feels
> cumbersome.

sa 0.3.8 supports query().filter_by(crit).scalar(). in 0.3.9, most
of the 0.4 query interface will be available, i.e. youll be able to
use one(), first(), and all(). we're going to *try* to get a decent
level of "forwards compatibility" into 0.3.9.

get() itself is a special method which is remaining.

as far as ETA, from a "new feature" perspective most of what i wanted
to be there is implemented in the branch. we still want to nail down
some naming convention stuff (and that is important too, so that we
might get the naming convention stuff as an optional set of things in
0.3.9 too). theres also a moderate amount of bug fixes which may or
may not be done on day one. also totally not done at all are the
updated docs, which I am really looking forward to because 0.4 is
going to be much more of a joy to document, due to it being more
consistent and having better answers to questions like "how do i load
polymorphically".

so ETA, id like it to be very close to going out in the next 4-6
weeks, or perhaps sooner if things go well.

Jose Galvez

unread,

Jul 4, 2007, 7:30:19 PM7/4/07

to sqlal...@googlegroups.com

Thanks Michael,
I went back and reread the "Proposal" thread and I finally get what
scalar() does and how it is different form one(). but how would first()
differ from scalar() and how would all() differ from list()? At first
blush they look like they would return the same type of query object
Jose

Michael Bayer

unread,

Jul 4, 2007, 10:20:36 PM7/4/07

to sqlalchemy

On Jul 4, 7:30 pm, Jose Galvez <jj.gal...@gmail.com> wrote:
> Thanks Michael,
> I went back and reread the "Proposal" thread and I finally get what
> scalar() does and how it is different form one(). but how would first()
> differ from scalar() and how would all() differ from list()? At first
> blush they look like they would return the same type of query object
> Jose

list() and scalar() get deprecated and go away in 0.5.

sdo...@sistechnology.com

unread,

Jul 5, 2007, 1:29:45 AM7/5/07

to sqlal...@googlegroups.com

> the new interface is super clean, consistent and flexible. and
> with that, we are able to add more features onto it. a cluttered
> interface doesn't accept new functionalities as easily.

There was one more difference betwen filter* and select* - first is
just building a query, 2nd is building a query and executing it.
Are u going to have .execute() just like the sql interface?
it does make a lot of sense to have 2 _similar_ interfaces, for
similar things, e.g. querying via orm and querying via sql.

Jose Galvez

unread,

Jul 5, 2007, 2:55:47 AM7/5/07

to sqlal...@googlegroups.com

Got it thanks
jose

Michael Bayer

unread,

Jul 5, 2007, 10:33:16 AM7/5/07

to sqlal...@googlegroups.com

youd think so, but in this case its wrong. Query does something
distinctly different from a SQL statement. hence removing the name
overlap will help eliminate confusion over this.

the closest analogy on the SQL statement would be to provide all(),
one(), and first() methods. but the SQL statement returns
ResultProxy which has fetchXXX semantics, so doesnt really fit so much.

the reason theyre different is because the ResultProxy is a lot more
flexible in its particular situation,which is that theres no "class"
representing the structure of the row...so an index based, column-
name-based, and column-object based indexing system makes more sense.

Rick Morrison

unread,

Jul 5, 2007, 10:52:54 AM7/5/07

to sqlal...@googlegroups.com

But scalar() is useful on the SQL-API side for getting real scalar values like count(*) and etc. In this role, it functions as one would expect scalar() to do, getting a scalar value instead of a result set.

...or is it just the badly-named Query.scalar() that will be going away?

Michael Bayer

unread,

Jul 5, 2007, 12:53:30 PM7/5/07

to sqlalchemy

On Jul 5, 10:52 am, "Rick Morrison" <rickmorri...@gmail.com> wrote:
> But scalar() is useful on the SQL-API side for getting real scalar values
> like count(*) and etc. In this role, it functions as one would expect
> scalar() to do, getting a scalar value instead of a result set.
>
> ...or is it just the badly-named Query.scalar() that will be going away?

Query.scalar() goes away, result.scalar() stays since yes, "scalar" is
the best word there (first column of first row).

Barry Warsaw

unread,

Jul 5, 2007, 5:46:35 PM7/5/07

to sqlal...@googlegroups.com

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Jul 3, 2007, at 8:14 PM, jason kirtland wrote:

> Barry wrote:
>> I have exactly the same architecture and exactly the same
>> question. In fact, my goal in Mailman 3 is to be able to let
>> sites configure the system to use any supported database backend,
>> just by tweaking the configuration variable that specifies the
>> engine url. We'll ship with SQLite, but it would be awesome if I
>> didn't have to do anything else to 'automatically' support
>> PostgreSQL or MySQL, etc. Although I haven't tried it with these
>> other backends, it currently works great with alternative SQLite
>> database file locations (such as the tempfile one I use during a
>> test suite run).
>
> The regular MetaData gives you this configuration flexibility for a
> given installation.

Cool.

> If you want to support simultaneous and distinct 'configurations'
> within a threaded process, each with its own set of database tables
> (possibly mixing backends as well), then a DMD is perfect. Every
> thread connects the DMD to the engine of its choice before work
> starts.
>
> In a Mailman context I could imagine a single fat worker process at
> an ISP that serviced lots of domains, each "owned" by a different
> user with separate data storage.

Possibly, although I'm not thinking about that level of division.
Once thing I /would/ like to be able to do is to connect to different
databases for each 'storage domain' within a single process (Mailman
itself will continue to be single threaded).

What I mean by that is that Mailman has at least three separate
related collections of data: mailing lists, users, and messages. It
should be possible to put each of those three in separate databases
using three different engine urls. The classic use case is this: say
my user database lived in my web application, but that web app was
separate from the mailing list system, and the data for the lists
lived in a separate database. It should be possible for Mailman to
get list configuration data from database B and user data from the
web app's database A. Similarly, you might want to put the message
storage in database C, say the one that your archiver used.

Of course, this means that you can't design your data model to have
foreign keys across these three storages, but that's not hard, though
you have to manage consistency at the application layer. I'm not
sure such a design is actually feasible with SQLAlchemy though; I
think it wasn't back when I actually tried to do this ages ago.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRo1mu3EjvBPtnXfVAQLI1AP9EaNt/rlYtcwix/PEPU2OyETBlIKGwIhv
g/QeKoMonD0kYwGuF7Y1RS+3xGch7UQAqDE5z1bU7bKJEZoBUQaUkhtqVSRgB5rX
Bii8icIX3iephIgSIcPixLJ+V3yv2FcUE1JOjvD8puudMXQvF8WmN046OqnRAuXb
m1ACR5gxysQ=
=i0ep
-----END PGP SIGNATURE-----

Michael Bayer

unread,

Jul 5, 2007, 6:00:32 PM7/5/07

to sqlalchemy

On Jul 5, 5:46 pm, Barry Warsaw <b...@python.org> wrote:
> What I mean by that is that Mailman has at least three separate
> related collections of data: mailing lists, users, and messages. It
> should be possible to put each of those three in separate databases
> using three different engine urls. The classic use case is this: say
> my user database lived in my web application, but that web app was
> separate from the mailing list system, and the data for the lists
> lived in a separate database. It should be possible for Mailman to
> get list configuration data from database B and user data from the
> web app's database A. Similarly, you might want to put the message
> storage in database C, say the one that your archiver used.
>
> Of course, this means that you can't design your data model to have
> foreign keys across these three storages, but that's not hard, though
> you have to manage consistency at the application layer. I'm not
> sure such a design is actually feasible with SQLAlchemy though; I
> think it wasn't back when I actually tried to do this ages ago.
>

its quite feasable, particularly since you are separating concerns at
the table/class level (as opposed to the row level, which is the
"sharding" thing ive been talking about). there are three general
approaches to this:

1. use three separate MetaData objects, each bound to their
appropriate engine.
2. dont bind your MetaData. use an explicit Connection for every
operation and ensure you use the right engine/connection for the
particular tables you're dealing with.
3. similar to #2, if your app is ORM centric, build your own
create_session() function which, after creating a new session, uses
session.bind_mapper() and/or session.bind_table() to associate the
each mapper or underlying table with its appropriate engine (or
connection), then return the session.

in all cases, when using ORM, you just cant have any "eager loads"
across databases, obviously (since it uses JOIN).

you can also combine all three methods together. setting up binds in
the session explcitly will override any metadata-level binds, but
things you dont bind will fall back to whats on the table's metadata.
(see the docstring for session.get_bind() here:
http://www.sqlalchemy.org/docs/sqlalchemy_orm_session.html#docstrings_sqlalchemy.orm.session_Session
)

Barry Warsaw

unread,

Jul 6, 2007, 1:33:37 PM7/6/07

to sqlal...@googlegroups.com

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Jul 5, 2007, at 6:00 PM, Michael Bayer wrote:

> On Jul 5, 5:46 pm, Barry Warsaw <b...@python.org> wrote:
>> What I mean by that is that Mailman has at least three separate
>> related collections of data: mailing lists, users, and messages. It
>> should be possible to put each of those three in separate databases
>> using three different engine urls. The classic use case is this: say
>> my user database lived in my web application, but that web app was
>> separate from the mailing list system, and the data for the lists
>> lived in a separate database. It should be possible for Mailman to
>> get list configuration data from database B and user data from the
>> web app's database A. Similarly, you might want to put the message
>> storage in database C, say the one that your archiver used.
>>
>> Of course, this means that you can't design your data model to have
>> foreign keys across these three storages, but that's not hard, though
>> you have to manage consistency at the application layer. I'm not
>> sure such a design is actually feasible with SQLAlchemy though; I
>> think it wasn't back when I actually tried to do this ages ago.
>>
>
> its quite feasable, particularly since you are separating concerns at
> the table/class level (as opposed to the row level, which is the
> "sharding" thing ive been talking about). there are three general
> approaches to this:

Thanks very much for the information Michael. I'm stashing this
message away for a tasty later meal. :)

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRo588XEjvBPtnXfVAQLNfwP/QK1dhl4DmfE9C5jm49aHDn9rjkBc0f/e
EiA6ZGF0189soweQHL80zBD+Ug5kIk7DsGKPtG3kZ+O4n25t3jGrdUmUezK2PUet
RoxnftZx0sCVKYGWvrG4DdPpKfPi0VgIs3KpFVhCG6auCq6J2B6yw7PoNGa6Blpq
g2gqfUy+lYs=
=bRN1
-----END PGP SIGNATURE-----

Reply all

Reply to author

Forward