Joining across multiple databases using temporary tables in a web-app

544 views
Skip to first unread message

RedBaron

unread,
Sep 10, 2012, 1:15:52 AM9/10/12
to sqlal...@googlegroups.com
I am writing a web-app using Pyramid and in a couple of views I need to do a join across databases. I have two separate models in my web-app and initialize SQlalchemy for both separately like so

   Database - users:
   DBSession_users = scoped_session(sessionmaker(extension=ZopeTransactionExtension()))
   Base2 = declarative_base()
   DBSession_users.configure(bind=engine)
   Base2.metadata.bind = engine
   
   Database - master:
   DBSession = scoped_session(sessionmaker(extension=ZopeTransactionExtension()))
   Base = declarative_base()
   DBSession.configure(bind=engine)
   Base.metadata.bind = engine

I ensure that appropriate engines are passed to both the models. The models work fine in isolation.
For a view I need to join a table in users DB with the result set generated from master DB. The result is generated on user-input so the filter criteria are generated on the fly (In SQLAlchemy). Therefore, I can't use hard-coded SQL (Which anyway appears to be bad idea). So I decided to get the result of DBSession query and store it in a temporary table in users. Here's the code that I wrote for this

            metadata = MetaData()
            
            temp1 = Table("temp1",metadata,
                         Column('id',Integer,primary_key=True,autoincrement=True),
                         Column('ip',mysql.MSInteger(unsigned=True)),
                         Column('total_count',Integer),
                         Column('distinct_count',Integer),
                         prefixes=['TEMPORARY'],
                         )
            class TempTableUser(object): 
                def __init__(self,ip,tot_count,distinct_count): 
                    self.id=None
                    self.ip=ip
                    self.total_count = tot_count
                    self.distinct_count = distinct_count
                
            mapper(TempTableUser,temp1)
            temp1.create(Base2.metadata.bind)

Then for the result set (which was obtained from session.query().filter(...).all(), I copied each entry into the table
      session_users.add(TempTableUser(int(entry.ip_inet),int(entry.total),int(entry.count_unique)))

Here session_users=DBSEssion_users()

Then I could join TempTableUsers to other tables and get the result.

The problem is that DBSession_users and DBSession are initialized at App level and so do not close till app is closed (Web server shut-down). So the TempTable persists across requests and grows. I decided to drop the table everytime BEFORE creating it. The code becomes

            temp1.drop(Base2.metadata.bind,True)
            temp1.create(Base2.metadata.bind)

Now the program works fine.
But does this create a potential race-condition? Since temporary table is apparently shared between different requests, can it happen that in case of two near-simultaneous requests, the second request might drop the table before first request has actually populated/used it?

Is there any other/better way of doing what I need to do? I am using pyramid_tm and for transaction management and SQLAlchemy 0.7.8

Thanks

Michael Bayer

unread,
Sep 10, 2012, 10:34:12 AM9/10/12
to sqlal...@googlegroups.com
On Sep 10, 2012, at 1:15 AM, RedBaron wrote:

I am writing a web-app using Pyramid and in a couple of views I need to do a join across databases. I have two separate models in my web-app and initialize SQlalchemy for both separately like so

   Database - users:
   DBSession_users = scoped_session(sessionmaker(extension=ZopeTransactionExtension()))
   Base2 = declarative_base()
   DBSession_users.configure(bind=engine)
   Base2.metadata.bind = engine
   
   Database - master:
   DBSession = scoped_session(sessionmaker(extension=ZopeTransactionExtension()))
   Base = declarative_base()
   DBSession.configure(bind=engine)
   Base.metadata.bind = engine

I'm on a bit of a witch hunt to track down all tutorials on the web that are illustrating the usage of "metadata.bind = engine".   I took a quick look at the SQLAlchemy demo on the Pyramid site (https://github.com/Pylons/pyramid/blob/1.3-branch/docs/tutorials/wiki2/src/models/tutorial/__init__.py) and it's not using this form, though the previous version was.

So just to remove the hard linkage of "metadata" and "engine", you can do it like this:

DBSession = scoped_session(sessionmaker(extension=ZopeTransactionExtension()))
Base = declarative_base()
DBSession.configure(bind=engine)

you're already setting the "bind" for DBSession so there's no need to associate this with MetaData.    All this gives you here is the ability to say metadata.create_all() instead of metadata.create_all(engine). 



I ensure that appropriate engines are passed to both the models. The models work fine in isolation.

the engines are associated with the Sessions here.   The metadata.bind "model" association doesn't actually take effect due to that.

For a view I need to join a table in users DB with the result set generated from master DB. The result is generated on user-input so the filter criteria are generated on the fly (In SQLAlchemy). Therefore, I can't use hard-coded SQL (Which anyway appears to be bad idea). So I decided to get the result of DBSession query and store it in a temporary table in users. Here's the code that I wrote for this

            metadata = MetaData()
            
            temp1 = Table("temp1",metadata,
                         Column('id',Integer,primary_key=True,autoincrement=True),
                         Column('ip',mysql.MSInteger(unsigned=True)),
                         Column('total_count',Integer),
                         Column('distinct_count',Integer),
                         prefixes=['TEMPORARY'],
                         )
            class TempTableUser(object): 
                def __init__(self,ip,tot_count,distinct_count): 
                    self.id=None
                    self.ip=ip
                    self.total_count = tot_count
                    self.distinct_count = distinct_count
                
            mapper(TempTableUser,temp1)
            temp1.create(Base2.metadata.bind)

Then for the result set (which was obtained from session.query().filter(...).all(), I copied each entry into the table
      session_users.add(TempTableUser(int(entry.ip_inet),int(entry.total),int(entry.count_unique)))

that is fine, though if it were me, I'd probably use a core SQL construct to make the operation much more efficient, rather than pulling in a heavy unit-of-work process:

temp1 = Table(...)
temp1.create(DBSession_users.bind)   # ha ha! 
DBSession_users.execute(
     temp1.insert(),
    [
{'id':id, 'ip':ip, 'total_count':total_count, 'distinct_count':distinct_count}
       for id, ip, total_count, distinct_count in DBSession.query(...)
   ]
)



Here session_users=DBSEssion_users()

Then I could join TempTableUsers to other tables and get the result.

The problem is that DBSession_users and DBSession are initialized at App level and so do not close till app is closed (Web server shut-down). So the TempTable persists across requests and grows.

"close" here is a vague term.  the DBSession objects represent transactions starting and ending.   Your app should be structured such that those transactions definitely end at the end of each request, and in fact that's what the ZopeTransactionExtension will get you here, so you are covered in that department.   The temp table lasting beyond that would be because it lasts for the duration of the DBAPI connection itself.   Since the Engine uses a connection pool, this is the effect you're getting.

So there are two ways right off that would solve this problem.  One is to not use connection pooling, so that when the Zope transaction extension tells the Session to close out the transaction, the connection is totally closed for real.  This would be accomplished via create_engine(url, pool_class=NullPool).

Of course not using connection pooling is a hard restriction.  The other easy way which unfortunately is not available to you here would be to create the temporary table so that it is scoped for a transaction, not a connection, but only Postgresql supports this, not MySQL.   So..... lets see your next thing.

I decided to drop the table everytime BEFORE creating it. The code becomes

            temp1.drop(Base2.metadata.bind,True)
            temp1.create(Base2.metadata.bind)

Now the program works fine.
But does this create a potential race-condition? Since temporary table is apparently shared between different requests, can it happen that in case of two near-simultaneous requests, the second request might drop the table before first request has actually populated/used it?

Well no, because when your request has a hold of that DBAPI connection, no other request has access to that connection.    The DBAPI connection is exclusive to a single request at a time so there is no danger here.   

Is there any other/better way of doing what I need to do? I am using pyramid_tm and for transaction management and SQLAlchemy 0.7.8

A totally different way to approach this would be to replicate the target table across the two databases, depending on how many tables you need to do this with and how often.   It would be less awkward on the application side.

RedBaron

unread,
Sep 11, 2012, 1:05:43 AM9/11/12
to sqlal...@googlegroups.com
Thanks for helping me with that one. I guess I'll stick to what I am doing at the moment (minus the Bind.metadata ofcourse!!) while adding a core SQL insert()

chloe decker

unread,
Jun 8, 2020, 8:55:26 AM6/8/20
to sqlalchemy
Thanks for helping me with that one. I guess I'll stick to what I am doing at the moment (minus the Bind.metadata ofcourse!!) while adding a core SQL insert()

On Monday, 10 September 2012 19:34:12 UTC+5, Michael Bayer wrote:
On Sep 10, 2012, at 1:15 AM, RedBaron wrote:

I am writing a web-app using Pyramid and in a couple of views I need to do a join across databases. I have two separate models in my web-app and initialize SQlalchemy for both separately like so

   Database - users:
   DBSession_users = scoped_session(sessionmaker(extension=ZopeTransactionExtension()))
   Base2 = declarative_base()
   DBSession_users.configure(bind=engine)
   Base2.metadata.bind = engine
   
   Database - master:
   DBSession = scoped_session(sessionmaker(extension=ZopeTransactionExtension()))
   Base = declarative_base()
   DBSession.configure(bind=engine)
   Base.metadata.bind = engine

I'm on a bit of a witch hunt to track down all tutorials on the web that are illustrating the usage of "metadata.bind = engine".   I took a quick look at the SQLAlchemy astroworld hoodie demo on the Pyramid site (https://github.com/Pylons/pyramid/blob/1.3-branch/docs/tutorials/wiki2/src/models/tutorial/__init__.pycheck here and it's not using this form, though the previous version was.
Reply all
Reply to author
Forward
0 new messages