Selective logging using the ORM for a single thread. Can't do it, therefore can't debug our app.

207 views
Skip to first unread message

Alfred Perlstein

unread,
Oct 11, 2016, 9:59:42 PM10/11/16
to sqlalchemy
Hello folks, we are using SqlA in a pretty large flask app.

One issue we have is that sometimes due to orm, serializers (marshmallow) or other complex code paths we find ourselves doing many more queries than we expected and this causes us a lot of headache.  We need to be able to track down the code paths causing the extra SQL.

In addition because the codebase is large we can't turn on logging globally otherwise we would drown in noise.

The solution we thought would be simple would be to be able to temporarily turn on logging in a single thread of execution or on a single session object... however that seems to be an elusive thing to be able to do.

What we have found is that while sqlA offers the ability to *create* an engine with "echo=True", there doesn't seem to be a sane way to *set* echo=True with an existing session.

One could try to set session.engine.echo=True, however it appears that the connection object hung from the engine object grabs the value of "echo" when instantiated and never checks to see if the engine has been reconfigured.  Surely a better way to do this would be to have a method to pass down to the Connection object to enable echo.

I've done quite a bit of looking through google and the docs and have simply been unable to find a way do to the following:

.... lots of code previous to this line...
old_enable = session.log_enable()  # turn on logging, return the previous value of logging (so this can be nested)
... do some stuff and see which SQL queries are being made ...
session.log_set(old_enable)  # restore logging to prior to it being set.
... lots more code without any logging happening.

If I were to create a method on engine that would then reconfigure its connection object as well to remedy this, would that be an acceptable patch to the project? Or is there some other trick I am missing?

Thank you!
-Alfred

Mike Bayer

unread,
Oct 12, 2016, 2:19:17 AM10/12/16
to sqlal...@googlegroups.com


On 10/11/2016 09:59 PM, Alfred Perlstein wrote:
> Hello folks, we are using SqlA in a pretty large flask app.
>
> One issue we have is that sometimes due to orm, serializers
> (marshmallow) or other complex code paths we find ourselves doing many
> more queries than we expected and this causes us a lot of headache. We
> need to be able to track down the code paths causing the extra SQL.
>
> In addition because the codebase is large we can't turn on logging
> globally otherwise we would drown in noise.
>
> The solution we thought would be simple would be to be able to
> temporarily turn on logging in a single thread of execution or on a
> single session object... however that seems to be an elusive thing to be
> able to do.
>
> What we have found is that while sqlA offers the ability to *create* an
> engine with "echo=True", there doesn't seem to be a sane way to *set*
> echo=True with an existing session.

Well first off, SQL logging is a function of the Core, not the ORM, so
the Session wouldn't know anything about it.

>
> One could try to set session.engine.echo=True, however it appears that
> the connection object hung from the engine object grabs the value of
> "echo" when instantiated and never checks to see if the engine has been
> reconfigured.

this is actually documented in the green note at the end of
http://docs.sqlalchemy.org/en/rel_1_1/core/engines.html#configuring-logging



Surely a better way to do this would be to have a method
> to pass down to the Connection object to enable echo.

That's not how Python logging works. Logging in Python is organized as
a "cross-cutting" concern, meaning a logger presents itself as an open
stream in which log messages are unconditionally dropped. The
configuration of what messages to log, when, and where, is accomplished
within the logging configuration, not within the system that is emitting
messages to logging. SQLAlchemy's "echo" flag is merely a quick
shortcut to a very simplistic log output, and additionally it adds a
small "block" to remove the overhead of emitting messages to the log
object if we've detected they aren't being used (this occurs
independently of the "echo" flag itself).

Any non-trivial use of logging should be using Python's logging system
directly. The loggers in use are described at
http://docs.sqlalchemy.org/en/rel_1_1/core/engines.html#configuring-logging.

In this case I'd in the simple case just add "threadid" to the log
format
(https://docs.python.org/2/library/logging.html#logrecord-attributes)
and use grep against the stream. More ambitiously I'd write a simple
logging.Filter that only allows messages from a certain thread (some
example filters such as
https://docs.python.org/2/howto/logging-cookbook.html#using-filters-to-impart-contextual-information).



>
> I've done quite a bit of looking through google and the docs and have
> simply been unable to find a way do to the following:
>
> .... lots of code previous to this line...
> old_enable = session.log_enable() # turn on logging, return the
> previous value of logging (so this can be nested)
> ... do some stuff and see which SQL queries are being made ...
> session.log_set(old_enable) # restore logging to prior to it being set.
> ... lots more code without any logging happening.
>
> If I were to create a method on engine that would then reconfigure its
> connection object as well to remedy this, would that be an acceptable
> patch to the project? Or is there some other trick I am missing?
>
> Thank you!
> -Alfred
>
> --
> You received this message because you are subscribed to the Google
> Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to sqlalchemy+...@googlegroups.com
> <mailto:sqlalchemy+...@googlegroups.com>.
> To post to this group, send email to sqlal...@googlegroups.com
> <mailto:sqlal...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

Alfred Perlstein

unread,
Oct 12, 2016, 3:13:16 AM10/12/16
to sqlal...@googlegroups.com
Mike,

I'm trying to find a nice way to say that this doesn't totally make life
difficult for us.

Even if we were to use the filter as described, we couldn't use it for
an existing connection, we would HAVE to restart the connection?

Does that really sound reasonable to you to force users to write all
this scaffolding just to get the SQL emitted for a few lines of code?
To me it really doesn't sound like a framework that facilitates easy
debugging. We basically have to slow down the *entire app* just to get
a very small segment of debug code.

Is there no way to "toggle" the optimization to reset itself,
specifically this:

>The SQLAlchemy Engine conserves Python function call overhead by only
emitting log statements when the current logging level is detected as
logging.INFO or logging.DEBUG. It only checks this level when a new
connection is procured from the connection pool. Therefore when changing
the logging configuration for an already-running application, any
Connection that’s currently active, or more commonly a Session object
that’s active in a transaction, won’t log any SQL according to the new
configuration until a new Connection is procured (in the case of
Session, this is after the current transaction ends and a new one begins).

Really? this is really bad and makes what should be a simple debugging
session into a nightmare of scaffolding that has to be written just to
do something that should be super simple, namely:

"get the sql emitted between these two points of code without having to
write a ton of scaffolding to do so."

I'm sort of speechless, if possible any advice going forward would help,
even pointers on where to get started in the code. Turning on global
debug and taking a performance hit is not possible.

Appreciate your help on the matter.

thanks,

-Alfred

Simon King

unread,
Oct 12, 2016, 4:51:34 AM10/12/16
to sqlal...@googlegroups.com
You could use the before_cursor_execute event to log the SQL based on
some debugging flag.

http://docs.sqlalchemy.org/en/latest/core/events.html#sqlalchemy.events.ConnectionEvents.before_cursor_execute

For example:

from sqlalchemy import event

DEBUGGING = False

@event.listens_for(SomeEngine, 'before_cursor_execute', named=True)
def receive_before_cursor_execute(**kw):
if DEBUGGING:
print kw['statement']

Simon
> email to sqlalchemy+...@googlegroups.com.
> To post to this group, send email to sqlal...@googlegroups.com.

Mike Bayer

unread,
Oct 12, 2016, 9:52:31 AM10/12/16
to sqlal...@googlegroups.com


On 10/12/2016 03:13 AM, Alfred Perlstein wrote:
> Mike,
>
> I'm trying to find a nice way to say that this doesn't totally make life
> difficult for us.

Hi Alfred -

I'd point out that you've spent the time and energy here to produce
about eight paragraphs describing how disappointed you'd be if you had
to at least experiment with the approach I'm describing, most include
hyperbolic, emotion-inducing terminology like "does that really sound
reasonable?" "this is really bad", "I'm sort of speechless".

I know it's not because of the effort involved, because you offered to
go through and patch all the logging in Connection to make your desired
feature work, which would be much more effort than my proposal, not to
mention there are at least five other ways to do this too (though a
logging filter IMO is the most idiomatic).

I don't get too many threads like these where the pushback response is
highly disproportionate to the proposal, but when I do, they are very
draining. I'm not sure what the desired effect is except that I'd be
pushed to write more code to ease your emotions and frustrations. Or if
you just wanted to vent for awhile, I can try to be OK with that.

Mike Bayer

unread,
Oct 12, 2016, 10:01:11 AM10/12/16
to sqlal...@googlegroups.com


On 10/12/2016 09:51 AM, Mike Bayer wrote:
>
>
> On 10/12/2016 03:13 AM, Alfred Perlstein wrote:
>> Mike,
>>
>> I'm trying to find a nice way to say that this doesn't totally make life
>> difficult for us.

Nestled deep in your message I can see you had a concern about
performance, and perhaps you thought I was suggesting your whole
application would need to be permanently slowed. This is not the case
at all. the logging.info() call itself is entirely negligible when the
logging isn't going to a stream anywhere. Those calls are underneath a
conditional in Connection more because there are a lot more calls
related to logging.debug(), and because the performance tuning I do for
Connection /ResultProxy etc. I am trying to shave off individual
function call counts in the single digits all the time. Also, if you
aren't using the "echo" flag (not using the "echo" flag is part of my
suggestion), the sqlalchemy.log.InstanceLogger object that is SQLA's own
internal filter to get the "echo" trickery to work is not used, and
that's the filter that probably has more call overhead than anything in
Python logging itself.

Mike Bayer

unread,
Oct 12, 2016, 1:44:15 PM10/12/16
to sqlal...@googlegroups.com


On 10/12/2016 02:18 AM, Mike Bayer wrote:
>
>
> In this case I'd in the simple case just add "threadid" to the log
> format
> (https://docs.python.org/2/library/logging.html#logrecord-attributes)
> and use grep against the stream. More ambitiously I'd write a simple
> logging.Filter that only allows messages from a certain thread (some
> example filters such as
> https://docs.python.org/2/howto/logging-cookbook.html#using-filters-to-impart-contextual-information).


here is the filter:

import logging
import threading

class MyFilter(logging.Filter):
def __init__(self):
self.flags = threading.local()

def log_sql(self):
self.flags.log = True

def unlog_sql(self):
del self.flags.log

def filter(self, record):
return getattr(self.flags, 'log', False)


here's how to stick it on an Engine, note using engine.logger instead of
hardcoding "sqlalchemy.engine.base.Engine", using the filter on loggers
instead of handlers means it must be applied directly:

my_filter = MyFilter()

engine.logger.setLevel(logging.INFO)
engine.logger.addFilter(my_filter)

"engine.logger" is not documented but is not an underscored attribute
and I will gladly document this as a public attribute. The filter can
also be put on the handler in use.

Setting logger.setLevel() back to logging.WARN will cause all subsequent
connections to revert to sending no messages to the logger and the
filter will remain un-invoked at that point. The filter itself can also
be removed using engine.logger.removeFilter(my_filter).

Using normal Python logging patterns is preferable to adding new flags
and behaviors to SQLAlchemy's "echo" feature. Adding per-connection,
detectable-during-change "echo" flags would add complexity and testing
burden to the SQLAlchemy project, as this would mean special code and
conditionals would need to be added to Connection's logging logic that
must be guaranteed to behave in a very specific way to suit this
specific use case.


here is a demo. hope this helps.

import logging
import threading


class MyFilter(logging.Filter):
def __init__(self):
self.flags = threading.local()

def log_sql(self):
self.flags.log = True

def unlog_sql(self):
del self.flags.log

def filter(self, record):
return getattr(self.flags, 'log', False)


if __name__ == '__main__':
from sqlalchemy import Integer, Column, create_engine, ForeignKey
from sqlalchemy.orm import relationship, Session
from sqlalchemy.ext.declarative import declarative_base
import time

Base = declarative_base()

class A(Base):
__tablename__ = 'a'
id = Column(Integer, primary_key=True)
bs = relationship("B")

class B(Base):
__tablename__ = 'b'
id = Column(Integer, primary_key=True)
a_id = Column(ForeignKey('a.id'))

e = create_engine("postgresql://scott:tiger@localhost/test")
Base.metadata.drop_all(e)
Base.metadata.create_all(e)

def worker(log=False):
for j in range(5):
s = Session(e)
a1 = A(bs=[B(), B(), B()])
s.add(a1)
s.flush()

# note we are mid-transaction.

if log:
my_filter.log_sql()
for a in s.query(A).filter_by(id=a1.id):
a.bs
if log:
my_filter.unlog_sql()

s.commit()

time.sleep(.5)

my_filter = MyFilter()

logging.basicConfig(
format='%(thread)d %(message)s'
)
e.logger.setLevel(logging.INFO)
e.logger.addFilter(my_filter)

threads = [threading.Thread(
target=worker, args=(i == 6, )) for i in range(10)]
for t in threads:
t.start()

for t in threads:
t.join()

Alfred Perlstein

unread,
Oct 12, 2016, 5:49:57 PM10/12/16
to sqlal...@googlegroups.com
Mike,

Thank you, in fact thank you very, very much. Writing out the code was
really above and beyond my expectations. I will use this snippet in our
app.

The following problems remain for us (and also likely other users of
sqla) although they are not immediate issues:

1) I need to make sure that I can use "theading.local", since we are
based on flask we actually use werkzeug's threads, which I now have to
investigate if they are compatible with this snippet. Likewise anyone
using greenlets may have to modify the snippet to fit with greenlets.
That said, I do believe that porting this snippet to werkzeug's thread
primatives should insulate us from greenlet/native-threads problems, but
then those folks not using werkzeug don't have that option. This is why
overall it's a appears problematic to me.... however I've not confirmed
this yet, it's possible that "threading.local" works for most cases
(greenlets and werkzeug) but... I have to test.
2) It looks like to me that by using the engine's logger we would be
turning on logging across all threads. What happens when we have 1000
threads? What will happen to us performance wise?

So as a question, if I ever get to it, would you entertain patches to do
this without turning on global logging and/or being tied to the
threading implementation?

btw, if you're wondering where I'm coming from with these insane scaling
questions.... I used to be CTO of OKCupid and scaled them, now at a new
place, so these things matter to me and my team.

thanks again Mike and apologies for the tone of my original email!
-Alfred

Alfred Perlstein

unread,
Oct 12, 2016, 5:58:41 PM10/12/16
to sqlal...@googlegroups.com


On 10/12/16 1:51 AM, Simon King wrote:
> You could use the before_cursor_execute event to log the SQL based on
> some debugging flag.
>
> http://docs.sqlalchemy.org/en/latest/core/events.html#sqlalchemy.events.ConnectionEvents.before_cursor_execute
>
> For example:
>
> from sqlalchemy import event
>
> DEBUGGING = False
>
> @event.listens_for(SomeEngine, 'before_cursor_execute', named=True)
> def receive_before_cursor_execute(**kw):
> if DEBUGGING:
> print kw['statement']
>
> Simon
Oh nice!! Will this be called in the context of the same thread
executing the sql? This would allow me to debug without formatting all
debugs I believe...

-Alfred

Jonathan Vanasco

unread,
Oct 12, 2016, 6:45:45 PM10/12/16
to sqlalchemy

On Wednesday, October 12, 2016 at 5:58:41 PM UTC-4, Alfred Perlstein wrote:
Oh nice!! Will this be called in the context of the same thread
executing the sql?  This would allow me to debug without formatting all
debugs I believe...

It depends on how you're using threads and connections.  It should/does under most situations.   It is entirely possible that you are using things wrong or are abusing the library -- lots of people do that inadvertently (myself included).

Check out the docs on http://docs.sqlalchemy.org/en/latest/orm/contextual.html , and there is also some stuff on threadlocal under pooling.

For stuff like this, I just use the approach that Mike advocated -- just filter the standard python logger.  An `.ini` file configures my loggers to print the thread id on the line prefix, and I direct sqlalchemy logging to a dedicated file.  Then i just grep that file for a given thread id.

None of the performance issues you raise make much sense to me -- the overhead is negligible and this would all be done on a dev environment. turning something like this on/off on production would worry me about the application design.

Any chance that you coming from another programming language? It sounds like you're unfamiliar with some intricacies on how Python (not just SqlAlchemy) approach certain things, and are overthinking things.
 

Mike Bayer

unread,
Oct 12, 2016, 7:11:40 PM10/12/16
to sqlal...@googlegroups.com


On 10/12/2016 05:49 PM, Alfred Perlstein wrote:
>
>
> Mike,
>
> Thank you, in fact thank you very, very much. Writing out the code was
> really above and beyond my expectations. I will use this snippet in our
> app.
>
> The following problems remain for us (and also likely other users of
> sqla) although they are not immediate issues:
>
> 1) I need to make sure that I can use "theading.local", since we are
> based on flask we actually use werkzeug's threads, which I now have to
> investigate if they are compatible with this snippet. Likewise anyone
> using greenlets may have to modify the snippet to fit with greenlets.
> That said, I do believe that porting this snippet to werkzeug's thread
> primatives should insulate us from greenlet/native-threads problems, but
> then those folks not using werkzeug don't have that option. This is why
> overall it's a appears problematic to me.... however I've not confirmed
> this yet, it's possible that "threading.local" works for most cases
> (greenlets and werkzeug) but... I have to test.

threading.local() should work with greenlets if you're doing global
monkeypatching. Otherwise there should be a similar construct in
gevent that does this, or more simply don't use any kind of "local()"
object, just check a dictionary for greenlet or thread id in the filter.


> 2) It looks like to me that by using the engine's logger we would be
> turning on logging across all threads. What happens when we have 1000
> threads? What will happen to us performance wise?

So I've looked in detail at how echo works and it is true that the
InstanceLogger does not make use of a global log level and instead calls
upon the ._log() method of the logger, doing the level calculation
itself. If the Connection had this logger directly, none of the other
Connection objects would see that they need to send records to the
logging calls.

However, all the records passed to the log methods are passed as fixed
objects with little to no overhead, and would be blocked by the filter
before being processed. The logging calls make sure to defer all string
interpolation and __repr__ calls of objects until they are rendered by a
formatter which would not take place with the filter blocking the
records from getting there.

The whole way that Connection even checks these flags is not really how
logging was meant to be used, you're supposed to just send messages to
the log objects and let the handlers and filters work it out. The
flags are just to eke out that tiny bit extra reduction in method calls,
but these wouldn't have much of an effect even on thousands of threads.
But it's true, it isn't zero either.

>
> So as a question, if I ever get to it, would you entertain patches to do
> this without turning on global logging and/or being tied to the
> threading implementation?


So, to keep it zero, the event approach would be one way to go, since
events can be added to a Connection object directly (that was a big
change made some years ago, use to be just Engine level). Adding a
new attribute connection.logger that defaults to self.engine.logger
would allow InstanceLogger to be local to a connection but still the
_echo flag would need to somehow work into the log system; this is of
course possible, however I'm trying to keep everything to do with how
logging / echo works in one place, and this would incur some kind of
two-level system of "echo" between Engine and Connection that would need
a lot of tests. If Connection has its own logger than we'd think that
each Connection should be able to have independent logging names like
Engine does, etc. It would not be a 2 line pull request.

I also worry that the precedent being set would be that anytime someone
needs to do unusual things with logging, they are going to want to add
new complexity to the ".echo" flag rather than going through the normal
logging system which IMO is extremely flexible should be trusted to
scale up as well as everything else.

Of course if turning on logging.INFO and adding the filter that blocks
99% of all log messages does prove to add some significant performance
impact, that changes everything and we'd have to decide that Python
logging does need to be worked around at scale. But Python logging is
very widely used deep inside many networking related systems without
much performance impact being noticed.



>
> btw, if you're wondering where I'm coming from with these insane scaling
> questions.... I used to be CTO of OKCupid and scaled them, now at a new
> place, so these things matter to me and my team.

the "1000 greenlets" model is one I'm familiar with in Openstack (in
that they've used that setting, but I found that that number of
greenlets was never utilized for real). Are your servers truly using
1000 database connections in a single process?


>
> thanks again Mike and apologies for the tone of my original email!

we can all get along, no worries.


> -Alfred
>

Alfred Perlstein

unread,
Oct 15, 2016, 11:13:20 PM10/15/16
to sqlal...@googlegroups.com
Hey Jonathan,

I probably should give a quick blurb about me so people understand where I'm coming from:
Earlier I added some low level asserts to the OSX kernel, FreeBSD kernel, and JunOS kernels.  My freebsd work is on github/splbio, I've added to a few projects outside of FreeBSD as well, git and a few others.
CTO okcupid 2005-2007 (did the initial scaling work, C++/python)
Leader of FreeNAS project (kernel+python django) ~2011-2012
Build a 10gig threat appliance (kernel+python django) 2012-2015
Currently I'm building a new startup based on flask+sqlalchemy, to be fair our scaling issues aren't as huge these days, but would be nice to start a dialog with the community on how to scale certain things, including debugging so that pushes to prod allow for debugging of new features for select users.

Anyhow, I really don't see a problem with having logging on prod, provided that it's not doing to impact performance across the board and this is why I asked the questions I did.

I'm going to poke around and see what I can/can-not use from this discussion.

What does however have me worried is that I'm getting the impression that this community might not be one that shares the same methodologies that I've used in the past to be successful in my projects.  I'm hoping we can learn from each other, moving away from sqla and flask would be very very painful for us, so I hope we can come to a situation where ideas and code can be shared to improve everyone's experiences.

Maybe now that I've shared my background we can have a more open discussion and not make assumptions about skill levels and backgrounds.

thanks,
-Alfred

Jonathan Vanasco

unread,
Oct 17, 2016, 1:34:32 PM10/17/16
to sqlalchemy
Alfred-

I'm sorry you felt offended and misinterpreted my comment.  I did not think you were unskilled in the slightest.  Quite the opposite (i did look you up).

Your concerns and solutions were in-line with to how every highly experienced lower-level engineer I've worked with has approached an issue in a Python app.  I was hoping to spark a positive discussion, not offend you.

Alfred Perlstein

unread,
Oct 17, 2016, 11:33:49 PM10/17/16
to sqlal...@googlegroups.com

I was not offended personally, however I was annoyed that a conversation on a topic that I felt helpful to the community was being shut down. 

When I shared the response I got the feedback was that the received answer fell under was the umbrella of "mansplainining" and how a response such as that is problematic and explained why folks are turned off to tech, it certainly factors into my weariness with engaging with various communities out there.

It's not that I have any doubt of my skills, but that in the way they were called out was problematic because it shifted the topic from solving a problem to a question about the skillset of the person presenting the problem.  That was the cause of irritation.

The only offense taken was that my time was wasted and that further research into the problem was being brushed aside by questioning skill rather than looking into the actual problem presented.

Anyhow, I found something useful, and I will be looking at making it more generally useful and hopefully sharing with the community.

Adding a snippet like this to my app was very helpful, however I'm wondering if there's something I'm missing that may cause a catastrophe?

I'm looking at maybe making it useful for detecting if a new piece of code happens to also cause more then "N" queries to suddenly happen, making the app slow.

Is there maybe a better way?

thanks,

-Alfred


+++ b/jetbridge/models/quiz_attempt.py
@@ -84,12 +84,21 @@ class QuizAttempt(JBModel, db.Model):
             query = query.options(jl)
         return query
 
+    def new_score_before_cursor_execute(self, conn, cursor, statement, parameters, context, executemany):
+        log.info("quiz_attempt id: {} ".format(self.id))
+        log.info("vvvvvvvvvvvvvvvvvvv\nNEW_SCORE: Received statement: %s\n^^^^^^^^^^^^^^^^^^\n", statement)
+
+
     def new_score(self, question_id=None, question=None):
         """The new scoring system.
 
         Returns a triple (total_score, base_score, timer_bonus)
         """
         _log_scores("Called new_score")
+        from sqlalchemy import event
+        listen_obj = db.session.connection()
+        fun = self.new_score_before_cursor_execute
+        event.listen(listen_obj, "before_cursor_execute", fun)
 
         table = dict()
         """
@@ -143,6 +152,7 @@ class QuizAttempt(JBModel, db.Model):
         log.debug("loops = {}".format(loops))
         log.debug("score: {} (base: {}, timer_bonus: {})".format(total_score, base_score, timer_bonus))
         """
+        event.remove(listen_obj, "before_cursor_execute", fun)
         return {
             "score_total": total_score,
             "score_base": base_score,

--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
 
http://www.sqlalchemy.org/
 
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to a topic in the Google Groups "sqlalchemy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sqlalchemy/W2c5LmCcnww/unsubscribe.
To unsubscribe from this group and all its topics, send an email to sqlalchemy+...@googlegroups.com.

Mike Bayer

unread,
Oct 18, 2016, 9:29:37 AM10/18/16
to sqlal...@googlegroups.com


On 10/17/2016 11:33 PM, Alfred Perlstein wrote:ith the community.
>
> Adding a snippet like this to my app was very helpful, however I'm
> wondering if there's something I'm missing that may cause a catastrophe?
>
> I'm looking at maybe making it useful for detecting if a new piece of
> code happens to also cause more then "N" queries to suddenly happen,
> making the app slow.
>
> Is there maybe a better way?

So especially that you are trying to maximize performance, you should:

1. put imports at the module level. imports inside of functions is
usually not necessary and especially inside of multithreaded code can
cause race conditions.

2. make a single event listener that you attach just once to the engine.
The event registration mechanics are not cheap either.

3. don't use "".format() inside of the log statement, format() is slow
and also doesn't take advantage of logging's late string interpolation
feature

4. There is a pretty good, generalized way to apply stats to catch
changes in execution, SQLAlchemy's own test suite uses this technique
underneath test/aaa_profiling/.


a. apply a decorator to functions you want to monitor:

@monitor_sqlcalls
def my_app_thing():

b. The decorator sets up a context that is intercepted by the
before_cursor_execute() event listener you've set up (this can be keyed
to threadid, or session.connection(), or a token in
session.connection().info(), etc.)

c. The decorator also makes a key based on the function that is
being decorated. The function's physical code position is best, e.g.
looking at fn.__code__ you can get the filename, line number, etc. An
example of this is in sqlalchemy/ext/baked.py also.

d. Count the SQL call and append a counter that is local to the
@monitor_sqlcalls context.

e. at the end, using that code key, you retrieve from a fixed
datafile (sqlite database, dbm file, pickled file, etc) what the
expected number of SQL calls is. If the current number of calls exceeds
it by a certain threshhold, emit a warning.

f. The datafile itself is written out as part of your
application's source code. Whenever a new method is added or modified,
you run the app in test mode with a flag set up such that the decorator
*writes to the file* instead of comparing. SQLAlchemy's file is here:
https://github.com/zzzeek/sqlalchemy/blob/master/test/profiles.txt

g. your test suite should also be making sure these methods
are called and thresholds aren't passed (turn warnings into exceptions
so that tests fail).

This approach can also be made extremely performant by making sure you
load up that profile datafile just once, and cache it in memory in a
dictionary.
>> <mailto:sqlalchemy+...@googlegroups.com>.
>> To post to this group, send email to sqlal...@googlegroups.com
>> <mailto:sqlal...@googlegroups.com>.
>> Visit this group at https://groups.google.com/group/sqlalchemy.
>> For more options, visit https://groups.google.com/d/optout.
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google
> Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to sqlalchemy+...@googlegroups.com
> <mailto:sqlalchemy+...@googlegroups.com>.
> To post to this group, send email to sqlal...@googlegroups.com
> <mailto:sqlal...@googlegroups.com>.

Jonathan Vanasco

unread,
Oct 19, 2016, 1:36:59 PM10/19/16
to sqlalchemy


On Tuesday, October 18, 2016 at 9:29:37 AM UTC-4, Mike Bayer wrote:

This approach can also be made extremely performant by making sure you
load up that profile datafile just once, and cache it in memory in a
dictionary.

What's also nice about the decorator approach in the aspect, is that you can use it as a factory and disable/enable it on startup to eliminate runtime performance issues when not needed:

    def monitor_sqlcalls(wrapped):
        if DO_NOT_DECORATE:
            return wrapped

        def wrapper():
            result = wrapped()
            return result
        return wrapper

    @monitor_sqlcalls
    def foo():
        return True
    
If you toggle it off, you'll only have a performance hit on the application startup, while Python compiles the functions and returns the underlying function, essentially leaving it undecorated.   This can let you decorate profiling code on everything you want, then only run it on a few select deployments / processes.

If you really want to shave off some milliseconds and are using cPython, you can set the env variable `PYTHONOPTIMIZE` to change the `__debug__` constant from True to False (or invoke Python with `-o/-oo`), and nest any random debugging work below it.  Anything in an `if __debug__` block will only be compiled if PYTHONOPTIMIZE is unset.  The compiler only handles a very simple `if __debug__:` construct.  'if not __debug__` needs to be written as `if __debug__: pass` + `else:`  If you want to run something as and/or `__debug__`, you'll need to split that into two if statements.

Another trick is to log stuff using `statsd` from Etsy.  It's an external daemon written in node.js, and clients send data to it via UDP.  We have a handful of functions wrapped in statsd decorators that track timing; some track average results or number of postgres interactions.  The uwsgi containers also report into statsd, allowing us to easily correlate everything to server loads across the network at the time.   The upside is that there is almost no overhead, the downside is that this performance stems from UDP transmissions not being guaranteed of receipt.


Reply all
Reply to author
Forward
0 new messages