Dogpile.cache w/ SQLAlchemy

602 views
Skip to first unread message

David McKeone

unread,
Sep 24, 2012, 7:15:21 AM9/24/12
to sqlal...@googlegroups.com
As per this comment: http://techspot.zzzeek.org/2012/04/19/using-beaker-for-caching-why-you-ll-want-to-switch-to-dogpile.cache/#comment-503780670

Has any work been put into an example for using Dogpile.cache with SQLAlchemy?  I'm about to embark on implementing caching and I don't want to re-invent the wheel by creating my own adaptation if I don't need to.


Michael Bayer

unread,
Sep 24, 2012, 10:15:24 AM9/24/12
to sqlal...@googlegroups.com
yes, the example in 0.8 should be changed to this, but I haven't done it yet.   dogpile's usage is similar to Beaker as far as the general calling pattern.   A tutorial format of the example using dogpile is attached.

caching_query.py

David McKeone

unread,
Sep 24, 2012, 10:24:41 AM9/24/12
to sqlal...@googlegroups.com
Great, thanks Mike!

David McKeone

unread,
Sep 26, 2012, 11:31:34 AM9/26/12
to sqlal...@googlegroups.com
This example has worked really well so far, but I thought I'd mention that I ran into few small things and made some changes.

1) This is was minor. I wanted to be able to use the cache_regions on multiple databases from the same program so I made the CachingQuery class take a parameter with the regions dict it operates on -- simple enough, and similar to the Beaker example which takes a cache manager instance.

2) This was a little more interesting.  I replaced an existing simplistic caching implementation that stored database results with the new dogpile.cache implementation and I saw my request times slow down from 16ms to 66ms.  It kind of shocked me that it was so much slower, so I ran cProfile and gprof2dot against it to see what was happening.  I found out that the key generation algorithm in _key_from_query -- specifically visitors.traverse -- was causing the slow-down.  Once I added a small feature to use explicit cache keys, it was back to normal.  Admittedly the tables I'm caching in this case are quite large; three preference tables that have 100+ columns each.  However, I think it does serve as a bit of a warning, since I imagine the traverse only gets slower as the query gets more complicated (although I haven't tested that).  Automatic cache-key generation based on the query is nice, but there is certainly a price to be paid in some circumstances.

Anyway, thanks again for the example Mike.  I imagine that some (maybe all) of these things could be due to the fact that it is just an example, but since it is new I thought I'd pass along my experience.  It's simplicity certainly helped me to see where all the pieces fit and to start making these changes.


On Monday, September 24, 2012 4:15:29 PM UTC+2, Michael Bayer wrote:

Michael Bayer

unread,
Sep 26, 2012, 1:11:25 PM9/26/12
to sqlal...@googlegroups.com
On Sep 26, 2012, at 11:31 AM, David McKeone wrote:

This example has worked really well so far, but I thought I'd mention that I ran into few small things and made some changes.

1) This is was minor. I wanted to be able to use the cache_regions on multiple databases from the same program so I made the CachingQuery class take a parameter with the regions dict it operates on -- simple enough, and similar to the Beaker example which takes a cache manager instance.

yup, that's why this whole thing is an "example"....so people can tailor it without the need to maintain a large and complex API/docs



2) This was a little more interesting.  I replaced an existing simplistic caching implementation that stored database results with the new dogpile.cache implementation and I saw my request times slow down from 16ms to 66ms.  It kind of shocked me that it was so much slower, so I ran cProfile and gprof2dot against it to see what was happening.  I found out that the key generation algorithm in _key_from_query -- specifically visitors.traverse -- was causing the slow-down.  Once I added a small feature to use explicit cache keys, it was back to normal.  Admittedly the tables I'm caching in this case are quite large; three preference tables that have 100+ columns each.

yes, the means it uses to generate the key right now I'm not thrilled about.   Previously, I had it just using the "binds" that were in the Query, but we had cases where bound parameters were embedded in the statement too affecting things.   So as a quick fix I changed it to just traverse the whole statement, but this can be optimized significantly more than it is.  It's the way it is so that it "works" more completely on the first go-around.


 However, I think it does serve as a bit of a warning, since I imagine the traverse only gets slower as the query gets more complicated (although I haven't tested that).  Automatic cache-key generation based on the query is nice, but there is certainly a price to be paid in some circumstances.

yup.  hence a usage example :)   100-column tables aren't the norm either which probably added to the slowdown.



Anyway, thanks again for the example Mike.  I imagine that some (maybe all) of these things could be due to the fact that it is just an example, but since it is new I thought I'd pass along my experience.  It's simplicity certainly helped me to see where all the pieces fit and to start making these changes.


On Monday, September 24, 2012 4:15:29 PM UTC+2, Michael Bayer wrote:
yes, the example in 0.8 should be changed to this, but I haven't done it yet.   dogpile's usage is similar to Beaker as far as the general calling pattern.   A tutorial format of the example using dogpile is attached.


--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To view this discussion on the web visit https://groups.google.com/d/msg/sqlalchemy/-/hO3TuPbT8AAJ.
To post to this group, send email to sqlal...@googlegroups.com.
To unsubscribe from this group, send email to sqlalchemy+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.

David McKeone

unread,
Sep 26, 2012, 4:09:26 PM9/26/12
to sqlal...@googlegroups.com
Fair enough about the example stuff; I kind of figured, but thought I'd just share my experience.  It's such a fine-line between authors getting a bead on real-world usage vs. hearing the same question over and over again.

Previously, I had it just using the "binds" that were in the Query, but we had cases where bound parameters were embedded in the statement too affecting things.   So as a quick fix I changed it to just traverse the whole statement, but this can be optimized significantly more than it is.  It's the way it is so that it "works" more completely on the first go-around.

Is there a thread I can pull at here?  I'd love to here about performant key generation options that are more flexible than explicit cache keys (obviously that is highly prone to developer error).   Looking at the query class, I'm guessing you are still using md5(unicode(query.statement)) for the query portion of the key, but what are you using for "binds"?  (Sorry, not quite familiar with the internals of SQLAlchemy yet -- and I can't find that attribute or anything similar in the source)  Are there other alternatives for cache keys that are worth exploring?  (and feel free to just post links or what have you, I'm happy to do the reading)

Michael Bayer

unread,
Sep 26, 2012, 4:43:15 PM9/26/12
to sqlal...@googlegroups.com
This is the changeset in question, has pretty much all the detail including a link to the stackoverflow issue that reported the problem.

diff -r affaa93fad92 -r b0e8d02e6218 CHANGES
--- a/CHANGES Tue Feb 14 10:16:16 2012 -0500
+++ b/CHANGES Tue Feb 14 12:04:04 2012 -0500
@@ -128,6 +128,13 @@
   - [bug] Added missing compilation support for 
     LONG [ticket:2401]
 
+- examples
+  - [bug] Altered _params_from_query() function
+    in Beaker example to pull bindparams from the
+    fully compiled statement, as a quick means
+    to get everything including subqueries in the
+    columns clause, etc.
+
 0.7.5 (January 28, 2012)
 =====
 - orm
diff -r affaa93fad92 -r b0e8d02e6218 examples/beaker_caching/caching_query.py
--- a/examples/beaker_caching/caching_query.py Tue Feb 14 10:16:16 2012 -0500
+++ b/examples/beaker_caching/caching_query.py Tue Feb 14 12:04:04 2012 -0500
@@ -268,8 +268,12 @@
             value = bind.value
 
         v.append(value)
-    if query._criterion is not None:
-        visitors.traverse(query._criterion, {}, {'bindparam':visit_bindparam})
-    for f in query._from_obj:
-        visitors.traverse(f, {}, {'bindparam':visit_bindparam})
+
+    # TODO: this pulls the binds from the final compiled statement.
+    # ideally, this would be a little more performant if it pulled
+    # from query._criterion and others directly, however this would
+    # need to be implemented not to miss anything, including
+    # subqueries in the columns clause.  See
+    visitors.traverse(query.statement, {}, {'bindparam':visit_bindparam})
     return v



--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To view this discussion on the web visit https://groups.google.com/d/msg/sqlalchemy/-/JvefOSuw1FwJ.
Reply all
Reply to author
Forward
0 new messages