SQLAlchemy best practices

Martin

unread,

Sep 20, 2006, 12:42:40 PM9/20/06

to pylons-discuss

Hi all,

I've been playing around with pylons + SQLAlchemy for quite some time
now and things are starting to come together. However, there are some
aspects that I am still wondering about... and seeing how many
questions there are about Pylons + SQLAlchemy, it looks like others
have questions, too. So, could we maybe set up a section in the
docs/wiki about "SQLAlchemy best practices"?

Things that I am still unsure about:
1) What is the recommended place to define the SQLAlchemy-engine?
Currently, I have attached the enginge to the g-object... is this good
or not?

2) Where best to initialize the SQLAlchemy session? In the
models/__init__.py? Somewhere else? Or rather use the threadlocal-mod?
(Mike Bayer does not recommend the latter, if I understand that other
discussion thread correctly) Or rather sessioncontext-mod?

3) Better to define Tables with bound metadata or unbound metadata?

I am aware that lots of different things actually work... that's why I
am confused about which way is best. Anybody else wants to add some
questions? Maybe we can collect questions and answers here, and later
turn the results into a wiki page. Maybe Mike and Ben - knowing the
internals - can help with some of the aspects?

Cheers and thanks,
Martin

Michael Bayer

unread,

Sep 20, 2006, 1:19:42 PM9/20/06

to pylons-...@googlegroups.com

since SA is not a framework, and also because its evident that
different people have different tolerances for verbosity, SA breaks
the golden python rule of "there should only be one obvious way to do
it". by allowing options for things to happen automatically, it
breaks "explicit is better than implicit", or at least gives you the
option to break that rule. we even still haven't fixed the hardcoded
pixel sizes on the website's CSS so SA is just a general outlaw in
many regards.

anyway, its also a product of SA taking a typical python route of
starting out as super-minimal and magical, and then the users
complaining that they couldnt understand/control what was going on.

so i think its best to view SQLAlchemy from the beginning in the most
explicit way possible, and then as those concepts are understood,
options for less explicit operation can be added in. the general
graph of explicit/implicit options is:

1. engines - plain / threadlocal
2. statement execution - via Connections (engine.connect()) / via
engine.execute / via metadata
3. metadata - unbound / bound
4. sessions - via create_session / via sessioncontext or threadlocal
5. mapped objects - using session.save()/update()/etc. or plugging
sessioncontext/threadlocal into mappers so that objects are
automatically attached
6. querying - via session.query(someclass) / via assignmapper
7. the whole package - ActiveMapper / explicit mappers

for #1, the "plain" engine is probably better. the "threadlocal"
engine allows some patterns that were popular in 0.1, namely that the
explicit Connection used is tracked against the current thread so
that multiple functions can share the same connection without passing
it around, and that you can call begin/commit directly on the engine
instance without keeping track of a transaction object. those
features i think are somewhat OK but are probably not worth the
confusion, even though they are pretty transparent and dont affect much.

for #2, we are talking about these three scenarios:

conn = engine.connect()
result = conn.execute(mytable.select())

result = engine.execute(mytable.select())

result = mytable.select().execute()

its hard to say which one is better here, it depends on the structure
of the application.

#3 is sort of combined with the #2 and determines if a statement can
execute itself. it also implies that you dont need to specify an
engine explicitly with create_session. I tend to go with unbound
metadata since it feels a little more flexible.

#4 - for quick things, i use create_session, for an application, you
probably want to use a sessioncontext. or just your own threadlocal
variable, which i find is even easier if you dont need the other
things sessioncontext can do.

#5 - i prefer the more explicit route for this, so that I can choose
when/what objects get placed in the session. a lot of people prefer
the more automatic route.

#6 - im not a fan of assignmapper, but there is an overwhelming
desire among the users to have this functionality since it gives much
more of an "active-record" feel to their objects. again, i like the
explicit approach.

#7 - this is the grand ballroom of implicit behavior, the full
"active record" approach that combines Tables, Mappers, Sessions into
one big ball :). I had always intended that SA would be flexible
enough to support an active record facade, but early on I stopped
working on it so that I could focus on the more explicit API which
was hard enough to get right. I then had the idea that "active
record" would not even be needed; by having just one approach which
is slightly more verbose, we'd be more pythonic (expicit/one way to
do it), and it only involved a few extra lines of code to have the
Table/Mapper/class be separate (sessions were still implicit at that
time). But then came the repeated cries of "SA is too complicated!
too hard !", which are still alive and well today...so i let them add
in ActiveMapper to which those who like that sort of thing are
adamant of its necessity.

the SA tutorial uses bound metadata but is otherwise explicit about
sessions and such. i think it gives the best mix of things that are
well suited to be "implicit" and other things that are clearer if
they are explicit.

Alexandre CONRAD

unread,

Sep 20, 2006, 2:27:47 PM9/20/06

to pylons-...@googlegroups.com

Martin wrote:

> I've been playing around with pylons + SQLAlchemy for quite some time
> now and things are starting to come together. However, there are some
> aspects that I am still wondering about... and seeing how many
> questions there are about Pylons + SQLAlchemy, it looks like others
> have questions, too. So, could we maybe set up a section in the
> docs/wiki about "SQLAlchemy best practices"?
>
> Things that I am still unsure about:
> 1) What is the recommended place to define the SQLAlchemy-engine?
> Currently, I have attached the enginge to the g-object... is this good
> or not?
>
> 2) Where best to initialize the SQLAlchemy session? In the
> models/__init__.py? Somewhere else? Or rather use the threadlocal-mod?
> (Mike Bayer does not recommend the latter, if I understand that other
> discussion thread correctly) Or rather sessioncontext-mod?
>
> 3) Better to define Tables with bound metadata or unbound metadata?

I'm in the same case, I think there's a real lack about setting up SA
with Pylons the *explicit* way.

I started to write some "placeholding" code in files, the code does
*NOT* work, but here is how I like the idea best (cause I like things
well organized).

First, I created a "tables" file where I can hold all my SA tables.

models/my_tables.py:

=============
from sqlalchemy import *

# Clients
clients_table = Table('clients', metadata,
Column('id_client', Integer, primary_key=True),
Column('name', String(40), nullable=False),
)

# Sites
sites_table = Table('sites', metadata,
Column('id_site', Integer, primary_key=True),
Column('id_client', Integer,
ForeignKey('clients.id_client')),
Column('name', String(40), nullable=False),
)
=============

Then my python objects stored into a "models" file.

models/my_models.py:

=============
from sqlalchemy import *

class Client(object):
def __init__(self, name):
self.name = name

class Site(object):
def __init__(self, name, client_id):
self.name = name
self.name = client_id
=============

Then I'd like to map tables and my python objects together.

models/__init__.py:

=============
from sqlalchemy import *

from my_tables import clients_table, sites_table
from my_models import Client, Site

metadata = DynamicMetaData() # Gotta do something with that somewhere...

client_mapper = mapper(clients_table, Client)
site_mapper = mapper(sites_table, Site)
=============

This is where I'm not sure how to plug things together. I'd like to
setup database connection on every request. I followed "Connecting to
the Database" from QuickWiki
(http://pylonshq.com/docs/0.9.2/quick_wiki.html#connecting-to-the-database)
and adapted it the explicit way
(http://groups.google.com/group/pylons-discuss/browse_thread/thread/35d2e633c534c56c/64ace94f78514ee2).
But here we have a problem. When I have my import statments for the
tables, it's gonna complains about "metadata" as not defined. But I just
read on the SA site about "Using the global Metadata object"
(http://www.sqlalchemy.org/docs/metadata.myt#metadata_tables_using).
This might be the trick for it. But we're for sure less explicit. Gotta
dig that out. It might be obvious but I'm still new to that.

But basicly, that's how I see things best. I don't want everything
inside the models/__init__.py file because bigger projects might take up
some pages and I don't want to have it all-in-one-file-long-file. It
might not be the right way to do it, but I hope it is. :)

Regards,
--
Alexandre CONRAD

Michael Bayer

unread,

Sep 20, 2006, 2:34:41 PM9/20/06

to pylons-...@googlegroups.com

On Sep 20, 2006, at 2:27 PM, Alexandre CONRAD wrote:

>
> I'm in the same case, I think there's a real lack about setting up SA
> with Pylons the *explicit* way.
>

its because the pylons examples (as well as a lot of the turbogears
stuff) was all figured out during sqlalchemy 0.1, which was only the
first 5 months or so of SA even being released. you now have this
situation where new users come in and learn the 0.2 style from the
tutorial, and come up against all these tutorials an examples created
against 0.1 :). it will all work itself out soon. but yeah pylons
should get the "threadlocal" thing out of there in favor of something
more explicit (also seeking resources to beef up the "sessioncontext"
docs on the SA site. it pretty much needs its own tutorial).

Joe

unread,

Sep 20, 2006, 9:33:28 PM9/20/06

to pylons-...@googlegroups.com

>From an encapsulation standpoint, I would tend to put tables and classes
together by "subject", e.g.,

--- models/clients.py ---
from sqlalchemy import * # probably better to restrict this

clients_table = Table('clients', metadata,
Column('id_client', Integer, primary_key=True),
Column('name', String(40), nullable=False),
)

class Client(object):

def __init__(self, name):
self.name = name

client_mapper = mapper(clients_table, Client)

--- models/sites.py ---
from sqlalchemy import *

sites_table = Table('sites', metadata,
Column('id_site', Integer, primary_key=True),
Column('id_client', Integer,
ForeignKey('clients.id_client')),
Column('name', String(40), nullable=False),
)

class Site(object):

def __init__(self, name, client_id):
self.name = name
self.name = client_id

site_mapper = mapper(sites_table, Site)

--- models/__init__.py ---
from sqlalchemy import *
from clients import Client
from sites import Site

metadata = DynamicMetaData()
---

The point is that if I add or change a column to the clients table, I'll
probably have to change the Client class. If one class depends on
another, I could import the latter in the former module. Granted that
can get tricky with circular dependencies, but those are very rare. Now
this is all from my C/C++ "upbringing" so I must admit I have little
practical experience with this approach in Python.

> But basicly, that's how I see things best. I don't want everything
> inside the models/__init__.py file because bigger projects might take up
> some pages and I don't want to have it all-in-one-file-long-file. It
> might not be the right way to do it, but I hope it is. :)

The one-big-model-file is an issue that I had raised in an earlier post.
I don't understand the internals of Python and Pylons well enough, but
it seems like a waste of cycles to load a ten-table/class model for say,
a contact page which will only trigger a mail send and probably not even
need to access the database.

I was also discussing with Mike Bayer the issue of not repeating the
database schema. The autoload=True option allows you to eliminate the
repetition but there is a cost involved: you have to be connected to
the database (which means you can't use DynamicMetaData) because SA has
to query the db's catalogs (or data dictionary) to do the autoload.
OTOH, I wonder how does Django's ORM and SLQObject (which emphasize DRY)
manage this. They presumably implement autoload and not give you a
choice. Even with connection pooling and session and database caches,
it seems to me there's a performance impact, unless there's some
selective mechanism that only queries that which is needed.