Re: finding the source of "interesting" SQL queries

226 views
Skip to first unread message

Russell Keith-Magee

unread,
Oct 22, 2012, 7:35:26 PM10/22/12
to django-d...@googlegroups.com
Hi Matt,

I'm not aware of any community maintained solution for this. However, interestingly, what you've suggested (including some contextual stack information in a query comment) is something that was suggested by Cal Henderson at the very first DjangoCon. 

I'm not sure Ned's "global request object" is necessary to make this happen, though; it should be enough to just get the current stack at the time the database code is invoked, roll back a few frames until you hit user-space code, and use that file and line number as the comment.

I'd certainly be interested in seeing a patch that does this.

Yours,
Russ Magee %-)

On Tue, Oct 23, 2012 at 4:52 AM, Matt McClure <matthew...@gmail.com> wrote:
From time to time the pager goes off, and I need to find the source of a given query hitting MySQL. It might be poorly performing, or flooding MySQL in high volume. In any case, I'd love some instrumentation that could point me at a lead in my Django site that's more specific than a given user or IP address. I'm considering adding a comment to the query that includes some part of the call stack, possibly based on Ned Batchelder's "global request" code[1]. Is there a community maintained solution to similar problems already in the wild? Or failing that, an approach that the community would like to include in Django?


--

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To view this discussion on the web visit https://groups.google.com/d/msg/django-developers/-/PAcaIZ8CvlUJ.
To post to this group, send email to django-d...@googlegroups.com.
To unsubscribe from this group, send email to django-develop...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.

Matt McClure

unread,
Oct 23, 2012, 12:23:46 AM10/23/12
to django-d...@googlegroups.com
Thanks, Russell. That's similar to the approach we were thinking of implementing. Hopefully we'll have a straw man to share shortly.

Matt

Àlex Pérez

unread,
Oct 24, 2012, 1:14:49 AM10/24/12
to django-d...@googlegroups.com

You are modifying the settings on runtime , that practise is not recommended by django and personally i have had bad experience with that...

El 24/10/2012 00:26, "Marty Woodlee" <marty....@gmail.com> escribió:
So the straw man solution we've implemented for now looks like this. (Background: we have a settings.THREADLOCALS construct which serves in a capacity similar to Ned's global requests; also, our monkeypatch_method decorator does what you probably think it does.) First, we've a tiny bit of middleware:

from django.conf import settings

class RequestURIMiddleware:
    """
    Set the request URI in THREADLOCALS so that it can be used to comment
    SQL queries with what triggered them.
    """
    def process_request(self, request):
        settings.THREADLOCALS.request_uri = request.build_absolute_uri()

And then we've monkeypatched the cursor method on BaseDatabaseWrapper like so:

from django.conf import settings
from django.db.backends import BaseDatabaseWrapper, util
from patchers import monkeypatch_method
import traceback
 
class CommentingCursorWrapper(util.CursorDebugWrapper):
    def execute(self, sql, *args, **kwargs):
        """
        Before sending to the DB, this adds a comment to the SQL with notes about the query's origin
        """
        try:
            if getattr(settings, 'SQL_COMMENTS_ALWAYS_TRACEBACK', False):
                raise AttributeError
            # THREADLOCALS.request_uri is (usually) populated by RequestURIMiddleware
            origin_comment = ' /* Originated from request to {0}'.format(settings.THREADLOCALS.request_uri)
        except AttributeError:
            # If no URI available (e.g., Celery task), report the first non-Django point in the call stack:
            tb = reversed(traceback.format_stack()[:-1])  # walk it bottom-up, excluding this frame
            for frame in tb:
                if 'django' not in frame:
                    origin_comment = ' /* Originated at{0}'.format(frame.split('\n')[0])
                    break
        origin_comment = origin_comment.replace('%', '%%')
        origin_comment = origin_comment.replace('*/', '\*\/')
        sql += origin_comment + ' */'
        return self.cursor.execute(sql, *args, **kwargs)

 
@monkeypatch_method(django.db.backends.BaseDatabaseWrapper)
def cursor(self, *args, **kwargs):
    return CommentingCursorWrapper(cursor._original_cursor_function(self, *args, **kwargs), self)


So in short, we comment the SQL with the URI when it's available, and fall back to the (presumably more expensive to gather) traceback information when it's not.

This is just the monkeypatch we're throwing in to solve our immediate problem, but it'd be easy enough to convert into a patch. Before we do, any thoughts/criticisms of this approach?

Best,
Marty

On Monday, October 22, 2012 11:23:47 PM UTC-5, Matt McClure wrote:
Thanks, Russell. That's similar to the approach we were thinking of implementing. Hopefully we'll have a straw man to share shortly.

Matt

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To view this discussion on the web visit https://groups.google.com/d/msg/django-developers/-/-X0AFXuzDv8J.

Marty Woodlee

unread,
Oct 25, 2012, 10:52:37 AM10/25/12
to django-d...@googlegroups.com
Yeah, I don't think we'd try to do anything like that in our actual Django patch... the THREADLOCALS thing has worked well for us but I realize that's a pretty big shift to impose on others. If anything I think the patch might simply modify the existing CursorDebugWrapper to add the traceback-based comment based on DEBUG being True (and/or some other setting).

Also, we discovered a small bug in the above for cases where manual queries were being sent with semicolons already on them. MySQL was seeing it as two queries in the same cursor execution and raising a ProgrammingError. Just in case someone stumbles across this while Googling and wants to use this solution, here's the diff vs. the above code that fixes it:

-        sql += origin_comment + ' */'
+        sql = sql.rstrip(' ;')
+        sql = ''.join([sql, origin_comment, ' */'])

Anssi Kääriäinen

unread,
Oct 25, 2012, 11:06:41 AM10/25/12
to Django developers
On 25 loka, 17:52, Marty Woodlee <marty.wood...@gmail.com> wrote:
> Yeah, I don't think we'd try to do anything like that in our actual Django
> patch... the THREADLOCALS thing has worked well for us but I realize that's
> a pretty big shift to impose on others. If anything I think the patch might
> simply modify the existing CursorDebugWrapper to add the traceback-based
> comment based on DEBUG being True (and/or some other setting).
>
> Also, we discovered a small bug in the above for cases where manual queries
> were being sent with semicolons already on them. MySQL was seeing it as two
> queries in the same cursor execution and raising a ProgrammingError. Just
> in case someone stumbles across this while Googling and wants to use this
> solution, here's the diff vs. the above code that fixes it:
>
> -        sql += origin_comment + ' */'
> +        sql = sql.rstrip(' ;')
> +        sql = ''.join([sql, origin_comment, ' */'])

Some other possible problems:
- In PostgreSQL comments nest (this is the SQL standard behaviour, I
don't know if other DBs implement this or the non-nesting comments
approach).
- If the query ends in line comment (one starting with --) and an
attacker manages to supply in a newline then there is potential for
SQL injection. This should not be possible if using the uri or single
line of the stack trace but worth checking still in the code.

We have currently two different CursorWrappers, debug and default one.
Having a way to supply different CursorWrappers would allow users to
do anything they want, maybe rewriting some queries to use stored
procedures or whatever... We could also have some often needed
cursorwrappers in core.

I don't think it is a good idea to implement something that will allow
annotating the source of the query in SQL comment but nothing else.

- Anssi

Marty Woodlee

unread,
Oct 25, 2012, 12:23:49 PM10/25/12
to django-d...@googlegroups.com
Good point, and I agree, allowing the user to use their own cursor wrappers is probably the best solution. At least, it would allow us to avoid the monkeypatching we're having to use here. From there, this particular wrapping could just turn into a django-snippet. I'll try to work on something oriented towards that instead.  It may take a while--priorities--but I'll post back here when I come up with something. Thanks!
Reply all
Reply to author
Forward
0 new messages