Questions about DB engine setup

63 views
Skip to first unread message

Zsolt Ero

unread,
Apr 27, 2019, 10:34:13 AM4/27/19
to pylons-discuss
I'm trying to set up an API which would use SQLAlchemy Core (not ORM) + PostgreSQL. The server is a Google managed PostgreSQL instance, on external IP.

I have a couple of questions. Since I needed to manually add SSL certificates as connect_args to create_engine + some additional arguments, I'm using create_engine(). My questions are related to this

1. Does this look ok?

import zope.sqlalchemy
from populus_lib.config import in_worker, pg_certs, pg_url
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

def get_session_factory(engine):
    factory
= sessionmaker()
    factory
.configure(bind=engine)
   
return factory

def get_tm_session(session_factory, transaction_manager):
    dbsession
= session_factory()
    zope
.sqlalchemy.register(dbsession, transaction_manager=transaction_manager)
   
return dbsession

def includeme(config):
    settings
= config.get_settings()
    settings
['tm.manager_hook'] = 'pyramid_tm.explicit_manager'

    config
.include('pyramid_tm')

    engine
= create_engine(
        pg_url
,
        connect_args
=pg_certs,
        pool_pre_ping
=True,
        pool_reset_on_return
='rollback' if in_worker else None,  # in_worker means production
   
)

    session_factory
= get_session_factory(engine)
    config
.registry['dbsession_factory'] = session_factory

    config
.add_request_method(
       
lambda r: get_tm_session(session_factory, r.tm),
       
'dbsession',
        reify
=True,
   
)


2. Since I'm not using ORM, but core only, do I need from sqlalchemy.orm import sessionmaker?

3. Is pool_pre_ping supported with Pyramid's way of session/transaction handling? I want to be sure that external server disconnects/reconnects are handled automatically and I think using pool_pre_ping is the best for this.

4. Isn't pool_reset_on_return conflicting pyramid_tm / session handling? I only need to use this in development, since the SQL server is in US and I'm in Europe and without this settings SQLAlchemy has a huge overhead on each query, like 300 ms.

5, Finally, what's puzzling me is that if I create a view like this:
def ping(request):
   
print(id(request.dbsession.connection().engine))
    sleep
(60)

And I run this via curl from two concurrent terminal windows, I get equal ids in pserve / Waitress, while I get different ids with gunicorn defaults (which I believe is multiprocessing).

As I understand each worker needs it's own engine instance, don't they? I think the gunicorn behaviour is good, but I'm puzzled by the pserve/Waitress behaviour. Is this by design?

Michael Merickel

unread,
Apr 27, 2019, 4:34:38 PM4/27/19
to Pylons
The session and sessionmaker are only necessary if you want to hook into pyramid_tm, because zope.sqlalchemy requires a session object and won't work with a simple connection. That being said, I would want to such that pyramid_tm handles the commits for you. That being said, you might abstract that a bit.

- You'll probably want to adjust z.sqla.register() to pass in initial_state='changed' unless you want to always be calling z.sqla.mark_changed(session). This is a very minor optimization z.sqla does with the ORM where it knows which operations are mutations versus selects and won't work with core.

- You might want to abstract your api so request.dbconn or something is the connection, and just leave request.dbsession in the background hooked to z.sqla to deal with commit/rollback.

- pool_pre_ping is done at the connection pool / engine level and will work with core or the orm.

- pool_reset_on_return is just a last resort when you're done with a connection, normal pyramid_tm usage will already close the txn such that when it's returned to the pool no cleanup needs to be done. "rollback" (the default) should be fine.

- The ids are probably the same because you're checking the id of the engine (which is actually the connection pool usually). There is usually only one engine instance, created via create_engine() or engine_from_config() and from there separate connections are created.

Hope that helps,

- Michael

--
You received this message because you are subscribed to the Google Groups "pylons-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pylons-discus...@googlegroups.com.
To post to this group, send email to pylons-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pylons-discuss/e6dc1571-5b50-4b77-be1e-62c9c5964f4d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mike Orr

unread,
Apr 27, 2019, 11:43:33 PM4/27/19
to pylons-...@googlegroups.com
On Sat, Apr 27, 2019 at 1:34 PM Michael Merickel <mmer...@gmail.com> wrote:
>
> The session and sessionmaker are only necessary if you want to hook into pyramid_tm, because zope.sqlalchemy requires a session object and won't work with a simple connection. That being said, I would want to such that pyramid_tm handles the commits for you.

'session.execute()' executes SQL like 'engine.execute()' does, so you
can get the advantages of a request transaction without using ORM
queries. I use that in my applications and unit tests, where some
functions use the ORM queries and other functions use the underlying
tables for efficiency when returning large search results.

Jonathan Vanasco

unread,
Apr 29, 2019, 12:24:13 PM4/29/19
to pylons-discuss


On Saturday, April 27, 2019 at 11:43:33 PM UTC-4, Mike Orr wrote:
 
'session.execute()' executes SQL like 'engine.execute()' does, so you
can get the advantages of a request transaction without using ORM
queries.

in addition to Mike's comments - make sure to read the zope.sqlalchemy docs on `mark_changed` or starting the session on a 'changed' state



if you don't do that, sqlalchemy won't know you did something and will rollback instead of commit.

Zsolt Ero

unread,
Apr 30, 2019, 9:02:01 PM4/30/19
to pylons-discuss
Thanks a lot for all the answers. So far I've only implemented select
queries, using dbsession.execute. I'll read about the "changed" state
handling.

About the additional options: pool_pre_ping seems to work well and
doesn't seem to have any side effects.

pool_reset_on_return on the other hand is interesting, as it did make
my development queries 3x faster. (I did 50 curl queries and it was
600 ms with rollback vs 200 ms with none for a "select 1" endpoint,
between EU <> US).

My only problem was that it left the connection in "idle in
transaction" state, on which we have a 5 minute timer on the server,
so the connection get's disconnected. That breaks SQLAlchemy. Because
of this, I've commented it out (i.e. left it on "rollback").

I think this would only work if I didn't use transactions, but just
direct queries.

Wouldn't it be possible to simply add the engine to request and then
do something like with request.engine.connect() as conn: ?

// I know this is only affecting dev setup, and the overhead of a
transaction + rollback is almost nothing within the same datacenter,
but still, I'm curious about how much of this session + transaction
mechanism is needed for select-only queries.

Zsolt
> --
> You received this message because you are subscribed to a topic in the Google Groups "pylons-discuss" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/pylons-discuss/8Rndsn6oLyg/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to pylons-discus...@googlegroups.com.
> To post to this group, send email to pylons-...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pylons-discuss/6015d75c-f277-42cb-880c-9d30632fdcb5%40googlegroups.com.

Jonathan Vanasco

unread,
May 1, 2019, 1:46:46 PM5/1/19
to pylons-discuss
> As I understand each worker needs it's own engine instance, don't they? I think the gunicorn behaviour is good, but I'm puzzled by the pserve/Waitress behaviour. Is this by design?

Just to expand on Michael's comment on there usually only being one engine but multiple connections...

Worker's don't need their own engine instances, but they need their own connection pools.  gunicorn (and uwsgi) implement a forked process model, and the database connections are not safe to move across the process boundaries due to some things inherent to Python and/or filesystems .  if you connect to the database during setup in this model, you need to call `engine.dispose()` when setup is complete or after the fork, or else weird and bad things will eventually happen.  see https://docs.sqlalchemy.org/en/13/core/connections.html#engine-disposal .   waitress is multithreaded and the pool is supposed to be safe across process boundaries. 

You should be able to just do an Engine on the request, but if you do ensure you do the `dispose()` or set up an event to ensure process safety.  it's all the pooling docs https://docs.sqlalchemy.org/en/13/core/pooling.html    you'll also have to implement your own transaction management, which is honestly not fun.  

if you are sure you're only doing SELECTS... you could have a dedicated engine/connection string for a read-only database user and have that bypass the transaction system entirely.  i use that pattern a lot, but you need to make sure you enforce it as read-only at the database credentials level or you have the potential for a lot of issues bubbling up.


Zsolt Ero

unread,
May 1, 2019, 6:35:33 PM5/1/19
to pylons-discuss
Thanks for the answers.

I've been using gunicorn with sync workers and it seems clear to me.
If I understand correctly, up-till-some-point, everything in the
script is shared across the processes, and these receive identical ids
in Python. Everything which happens at import time is definitely in
this "shared" memory space. Client libraries which cannot be shared,
need to be regenerated / disposed / taken care of like with
engine.dispose().

The fact that database engines get different ids in each worker means
to me that I'm totally safe under gunicorn sync with SQLAlchemy.

With waitress on the other hand, I'm surprised. I mean if waitress is
multi-threaded, then how come it doesn't require every single module
and sub-dependency to be thread-safe? I've only been using waitress
for local development via pserve, but I see it's a popular choice for
a production server as well. How can it work in a multi-threaded mode,
if you have no idea about every single library's every dependency you
use? Or you actually check out all your libraries?

Now about SQL, why do I need any kind of transaction implementation? I
mean isn't

with engine.connect() as conn:
conn.execute(stmt)

a transaction? I'd just need to explicitly pass conn to every
function which need DB access, but I don't see why is it important to
do anything more than this, even with write queries. (I don't know how
would I have DB access in view derivers with this pattern though).

Zsolt
> --
> You received this message because you are subscribed to a topic in the Google Groups "pylons-discuss" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/pylons-discuss/8Rndsn6oLyg/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to pylons-discus...@googlegroups.com.
> To post to this group, send email to pylons-...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pylons-discuss/0e53f127-d115-4724-95aa-5cfd14f23223%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages