Old but Gold -> SQLA + Twisted

51 views
Skip to first unread message

Jean Marcel Duvoisin Shmidt

unread,
Sep 5, 2014, 10:23:25 AM9/5/14
to sqlal...@googlegroups.com
Hi everyone!

I have a more complex and architectural question to make,
it will be a bit long, but I want to make myself clear as I already have done some research ;D

in our company we have some *really* cool stuff made out of SQLA, we wrote over its ORM an abstraction to allow us to
build the same schema on the same database but in different 'database level schema' (from Postgresql), allowing us to
use the same codebase, same database, same architecture and separate clients content easily. We also managed to build
a EAV (entity, attribute, value) database in top of that, that allow us to change the database schema any time, any way,
and get it running it without touching the DB... aaannnddd in top of that we got a Schematics to represent the database
schema and allow us to work as a ORM over the SQLAORM.

Yeh its crazy, but it works, and its really *fast*. We handle thousands of client in that way, and we are happy with it, not
counting how flexible is the code and the database now.

But here comes the problem. We want to scale it up (not just out), and we made some tests on the past with SQLA + Twisted
using, Gevent, Twisted, Psycopg. First we managed successfully to integrate the SQLA-Core + SQLA-ORM + Our EAV-ORM
with twisted using twisted.deferToThread, with works nicely but it is not exactly what we wanted. This takes out all the purpose
of using twisted in the first place, as we end up with a threaded model, where queries are running in threads, and we have a
main thread managing all of that. What we really wanted is to make the app *assyncronous* on top of the ORM.

Than we managed to use assync features of Psycopg with twisted, and in a similar mode that is done with Alchimia. We where
able to make it work with SQLA-Core. But we found out that the ORM is completely designed with the synchronous paradigm,
for logical reasons of course - as 99.9% of the users will use it synchronously, and we though that the best way to make it
work is overwrite the Session, Query, SessionQuery and other classes to make it work with the deferred concept
(collection, CollectionAdapter, Attributes, and so on).

As an app developer there is no problem to create a session and all the ambient to handle every request on SQLA.
With provides isolation avoiding any concurrent problems, this can be done as deferred concept uses concurrent points.

But it is a nightmare to change all SQLA-ORM to handle future promisses and deferreds.

So, the central point in my question is, does any one had tried it before? What is the best path to follow to make
the SQLA-ORM work with promises? Any ideas? Is there any work in progress in that direction? I'll just paste a
really small segment of the code I've been working on, just to you all get the idea:

class AssyncSessionTransaction(SessionTransaction):
    def _connection_for_bind(self, bind):
        self._assert_active()

        if bind in self._connections:
            defrr = defer.Deferred()
            defrr.callback(self._connections[bind][0])
            return defrr

        if self._parent:
            defrr = self._parent._connection_for_bind(bind)
            if not self.nested:
                return defrr
        else:
            if isinstance(bind, engine.Connection):
                if bind.engine in self._connections:
                    raise sa_exc.InvalidRequestError(
                        "Session already has a Connection associated for the "
                        "given Connection's Engine")
                defrr = defer.Deferred()
                defrr.callback(bind)
            else:
                defrr = bind.contextual_connect()

        def start_transaction(conn):
            if self.session.twophase and self._parent is None:
                transaction = conn.begin_twophase()
            elif self.nested:
                transaction = conn.begin_nested()
            else:
                transaction = conn.begin()

            self._connections[conn] = self._connections[conn.engine] = \
              (conn, transaction, conn is not bind)
            self.session.dispatch.after_begin(self.session, self, conn)

            return conn

        defrr.addCallback(start_transaction)
        return defrr






Richard Gerd Kuesters

unread,
Sep 8, 2014, 9:08:39 AM9/8/14
to sqlal...@googlegroups.com
hello all :)

from the past years, i've been working on solutions to the "problem" described by Jean (we are co-workers, and we use twisted and sqlalchemy, A LOT), and as everybody may already know, it's a very complicated combination, since we have to do a lot of code around to have a consistent application.

ok, that's not bad, but ... since we are a team of several developers, from 'grey haired pythonistas' to 'fishes in a barrel', it's hard to mantain a quality standard, so we decided to take ALL our codebase based on twisted and sqlalchemy to give it a try on creating a better integration between them, specially if you are more acquainted to asynchronous programming.

RESULTS

  1. we were able to create an "asynchronous" sqlalchemy, but it relies on deep object copy, so every promise (or deferred, on twisted's language) generates an overhead that is not welcome in our standards (or every programmer with a brain, lol);

  2. without deep copy, furthermore we were able to have again a fully promise version of sqlalchemy, but we had to left aside all the ORM codebase and work only with low level sqlalchemy. that's a good result, but again, we'll have a problem with the fishes not using the ORM.


THE PROBLEM


SO, after reading a lot about the internals of sqlalchemy AND tons of solutions out there (sAsync, etc, etc) we always ended up with the same problem: thread safety. the orm design of sqlalchemy, specially the session states, are all designed to be thread safe (Mike, correct me if i'm wrong), so from there the work might be huge, but we are willing to work on a solution - specially if our main rdbms, postgres, have one (if not the best) of dialects implementations in sa. this solution we expect even to publish for others to use :)

THREAD SAFETY

ok, as Jean already stated (those are actually my words) that 99,9% of the programmers who uses sqlalchemy are quite happy with the solutions it already provides, which are in fact very, very usable. we have no problem with that.

but, what if we want to go further on this? i mean, we can already control a lot of things in sqlalchemy. i mean, a lot, really. it is very flexible in almost all aspects. but ... why it isn't when the subject is the session state? or it is and we are missing something?

FINAL THOUGHTS

the bottom line is not about twisted, just to be clear, but to implement a non thread safe session state system that can work with async frameworks (gevent, twisted, tornado) and so on. is that really possible?


my best regards,
richard.

Michael Bayer

unread,
Sep 8, 2014, 10:31:20 AM9/8/14
to sqlal...@googlegroups.com
so anytime you say:

myobject.someattribute

you return a promise?  because with the ORM, any attribute can trigger a SQL query.





--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
To post to this group, send email to sqlal...@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Richard Gerd Kuesters

unread,
Sep 8, 2014, 10:47:53 AM9/8/14
to sqlal...@googlegroups.com
hello Mike!

yeap, that would be the point. even though the object might already have this value somewhere, the result would be a promise, always.


best regards,
richard.

Michael Bayer

unread,
Sep 8, 2014, 10:59:51 AM9/8/14
to sqlal...@googlegroups.com
its almost like if it could return a promise, but then some kind of syntactical magic would just handle that we already know it’s there, and just hide it, and then just do some kind of deferment so that we can just write the next line of code right below it.   because promises and deferreds, it is 100% pre-determined when these will happen!   if only this completely predictable, repetitive, boilerplate task of receiving a deferral then waiting til the next line of code in the operation could be...somehow…*automated*….. 


or to put it another way: why are you comfortable with the ORM’s implicit SQL on attribute access, but not with gevent’s implicit “defer on IO” ?

Richard Gerd Kuesters

unread,
Sep 8, 2014, 1:06:05 PM9/8/14
to sqlal...@googlegroups.com
the thing is i'm comfortable with all that. what i'm trying to automate, somehow, is the need to "let my fishes" transit around deferreds (or threads) objects that once belongs to one session and can easily be lost if its states changes in this process, including proper session handling (open, use, close).

i'm not talking about something magical here. let's say i need to integrate my app with another network apps -- that may have some latency, or want to spawn a task based on some object that needs to be found first, then just say to the user: "hey, i'll work on it" and that's it, i'll request a promise and close the connection to the user.

Jean Marcel Duvoisin Shmidt

unread,
Sep 8, 2014, 1:50:16 PM9/8/14
to sqlal...@googlegroups.com
Hi!

Maybe I wasn't very clear with what I was suggesting....

think of the code:
value = object.attribute

be like:
do_stuff_and_reply_user(myvar):
     return return_to_user(process_it(myvar)) #deferreds again xD

object.attribute.addCallback(do_stuff_and_reply_user)

#or something like

deff = session.query(MyClass).all()
defrr.addCallback(...)

I know, it doesn’t makes much sense when you first see it but let me explain it better.
The propose of the ORM is to provide the high level abstraction for queries, allowing
programmers forget about data representation and only focus on data and data relations.
This can be stupid for simple queries, but when you have some N to N relation, with multiple
foreign keys or some little more complex relationship it can be really handy and help a lot
in the codding process.

Using deferred/promises will allow use this power of the ORM abstraction in a assync
environment easily. When you are programming it's only needed to know that when data
is retrieved on some object instead of the data itself, you get the deferred.

it could be something like:

MyAssyncBase(SQLABase):
    def __get__(...):
        #some sort of wrapping.....
        #....
        #and when you got the value, somewere, somehow:
        defrr.callback(value)   #or let it to the gevent, twisted, tornado or any other asynchronous framework for python.

I really don't have a clue now, but I'm looking for it! That's why we are asking here
because maybe you will know where is the best path to follow....

--
You received this message because you are subscribed to a topic in the Google Groups "sqlalchemy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sqlalchemy/VS6QeioMmbA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to sqlalchemy+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages