Need help on caching, background jobs & manual ORM cache refresh (Pyramid project)

364 views
Skip to first unread message

Learner

unread,
Jun 5, 2012, 7:49:10 AM6/5/12
to pylons-discuss
Hello Pyramid gurus,

I have been searching for quick tutorials on caching, background jobs
& ORM related topics. I found quite a few resources which seem to be
very informative. Since I am new to both Python & Pyramid, I thought I
will seek experienced people opinion, before I go ahead and use
anything I found on web. Any help is very much appreciated.

1. Caching:
The simple use case is:- I want to show top 10 or 20 articles on
my wiki application. Before I render the data I would like to cache
the db result upon first query execution and cache it. Cache to
refresh automatically after every 1 hour or so.

2. Background Jobs:
I am using SQLAlchemy in my application.
All the data needed for the application comes from XML/CSV files. Is
there any way in Pyramid I can create a background job and schedule it
to run every 30 minutes or so?. Job will look at one particular folder
everytime it is run, and if there are any xml/csv files job will pick
it up and process them. Since this is simple ETL job, SQLAlchemy is
not aware of the DB changes. So does this confuse any of the ORM
caching mechanism and show the dirty data? If so, how would I be able
to notify ORM to rebuild its caching?

Thanks for your time.

cheers
-Bkumar

Jason

unread,
Jun 5, 2012, 11:22:38 AM6/5/12
to pylons-...@googlegroups.com


On Tuesday, June 5, 2012 7:49:10 AM UTC-4, Learner wrote:
Hello Pyramid gurus,

I have been searching for quick tutorials on caching, background jobs
& ORM related topics. I found quite a few resources which seem to be
very informative. Since I am new to both Python & Pyramid, I thought I
will seek experienced people opinion, before I go ahead and use
anything I found on web. Any help is very much appreciated.

1. Caching:
    The simple use case is:- I want to show top 10 or 20 articles on
my wiki application. Before I render the data I would like to cache
the db result upon first query execution and cache it. Cache to
refresh automatically after every 1 hour or so.   

As far as caching is concerned you will be better off caching the result of your view. Beaker cache has decorators for caching individual functions/methods for a specified period of time (look for cache_region decorator) this way not only will the database results be cached, but also the processing required to turn them into the template values. I don't know if there is a way to also cache the rendered template with Pyramid.


2. Background Jobs: I am using SQLAlchemy in my application. All the data needed for the application comes from XML/CSV files. Is there any way in Pyramid I can create a background job and schedule it to run every 30 minutes or so?. Job will look at one particular folder everytime it is run, and if there are any xml/csv files job will pick it up and process them. Since this is simple ETL job, SQLAlchemy is not aware of the DB changes. So does this confuse any of the ORM caching mechanism and show the dirty data? If so, how would I be able to notify ORM to rebuild its caching? Thanks for your time.

Are the XML files parsed and then the data is inserted into a database that Pyramid uses? Perhaps a cron job would be better suited to that?  If you are using caching then the data will not be refreshed in Pyramid until the cache refreshes. If you are using beaker you can force the cache to refresh on the next hit. 

Are you sure you need all this caching though? It seems unnecessarily complicated. Pyramid is very fast, SQLAlchemy is very fast, your database will probably be caching the query plans as well so it's going to be very fast. I would recommend building your application with no caching, and then adding it later if it is needed. That way you can worry about getting the loaded data displaying correctly (especially since you're data setup is a little more complex) before having to figure out a caching system.

-- Jason

 

Learner

unread,
Jun 5, 2012, 3:36:37 PM6/5/12
to pylons-discuss
Thanks Jason. I will take your suggestion.
But your hint about beaker caching is helpful.

cheers
-Bkumar

Mike Orr

unread,
Jun 6, 2012, 3:03:59 PM6/6/12
to pylons-...@googlegroups.com
Pyramid reacts to web requests. It does not run periodic jobs without
being prodded by a request. You can use 'prequest' in a cron script to
send the application a request. But that gets into authorization
because you probably don't want Internet yobos accessing those URLs.
There are different philosophies of model design, but the one I follow
says that the model should not depend on the rest of the application
or framework. So if you've designed it that way, you can write a
command script that imports the model and makes any needed changes to
the database. But then if you're using caching, how do you tell the
application to expire its cache? Perhaps just let the cache continue
until it's scheduled to expire.

As for having the application monitor changes in files, there's a
kernel feature to do this in some OSes, but not in a way convenient to
Pyramid. The routine would block until a file changes, or the kernel
would trigger a callback when it changes. Neither of these paradigms
fits into a Pyramid application very well.

The other way to do this is with a long-running thread, separate from
the WSGI server's thread pool. You would spawn a thread in the main
function, and it would periodically do its job and sleep for an
interval. However, this adds complexity to the application, so it may
not have much advantage over a cron job. Especially when using
SQLAlchemy, where two different processes can modify the database
without any interaction between them. (The server-based DBs allow this
by design, while SQLite uses sophisticated file locking to allow it.)

--
Mike Orr <slugg...@gmail.com>

jerry

unread,
Jun 7, 2012, 6:35:18 AM6/7/12
to pylons-discuss
If you like beaker, you might also want to take a look at retools
( http://pypi.python.org/pypi/retools/0.2 ), a newer caching library
specifically for Redis from the same author.

Jerry

Michael Bayer

unread,
Jun 7, 2012, 12:56:56 PM6/7/12
to pylons-...@googlegroups.com
For caching, I'd use dogpile.cache: https://bitbucket.org/zzzeek/dogpile.cache/

Which is specifically the replacement for Beaker caching. It is much simpler and more performant.

SQLAlchemy 0.8 will convert the "beaker caching" examples to use dogpile instead. Attached is a tutorial script from a recent tutorial i gave which illustrates a typical dogpile/SQLAlchemy caching configuration, in the spirit of the Beaker caching example.

caching_query.py

Jason

unread,
Jun 7, 2012, 1:11:01 PM6/7/12
to pylons-...@googlegroups.com
Do you know if Pyramid's default session cache will be changing from beaker to dogpile in the near future? 

Mike Orr

unread,
Jun 7, 2012, 1:46:43 PM6/7/12
to pylons-...@googlegroups.com
On Thu, Jun 7, 2012 at 10:11 AM, Jason <ja...@deadtreepages.com> wrote:
> Do you know if Pyramid's default session cache will be changing from beaker
> to dogpile in the near future?

Pyramid has no default session backend. 'pyramid_beaker' is an
add-on. I suppose 'pyramid_dogpile' will appear when somebody gets
around to writing it. But it won't be "recommended" until Dogpile has
been stable for a while. Last I heard Dogpile was in alpha, but it may
be further along now.

--
Mike Orr <slugg...@gmail.com>

Michael Bayer

unread,
Jun 10, 2012, 10:32:10 AM6/10/12
to pylons-...@googlegroups.com
well also Dogpile doesn't do HTTP sessions. Just data caching.

We're still stuck with Beaker for that, until someone wants to change things (I only use it for client side sessions).


Reply all
Reply to author
Forward
0 new messages