anyone work on temporary values from expensive functions? (perhaps dogpile+celery)

32 views
Skip to first unread message

Jonathan Vanasco

unread,
Jun 17, 2016, 7:28:15 PM6/17/16
to pylons-discuss
I've run into a peculiar issue on one of our systems.

A handful of sql queries have become quite expensive [database growth]. Even with aggressive optimizations, some queries can take 45 seconds. [Migrating to a different database or hardware infrastructure is likely the only avenue for improvement]

I've started to sketch out an idea to handle this, and wondering if anyone else here has had a similar need or has come up with a better solution.  

The general idea is that instead of the pyramid app using dogpile.cache to generate an expensive element, it messages celery to do that; and immediately returns a temporary value. 

The current setup

* `n1` number of servers running Pyramid (currently anywhere from 3-9 servers)
* 1 server runs Celery

Rough Idea:

* A `dogpile.cache` backed value generator will have a default value which is defined as`default = las_computed_value or initial_default`.
   * `initial_default` is in the source code.
   * `last_computed_value` might be a database or filesystem backed value.
   * the idea is to deploy with an acceptable value, but have any system restarts persist a better value.

* A `dogpile.cache` miss will immediately return `default`, and send a message to `Celery` for value generation

* Most subsequent cache misses will see the dogpile lock, and return `default`

* Some dogpile cache misses will not see the lock, and message Celery.  (this could be from a lock timeout or cache reset.  this annoyingly happens)

* When `Celery` receives a message, it uses it's own dogpile lock to ensure only one value generator is called.

* When `Celery` generates a value, it not only populates the cache, but also saves the data as the `computed_default` to persist across shutdowns





Vincent Catalano

unread,
Jun 17, 2016, 7:54:49 PM6/17/16
to pylons-...@googlegroups.com
Taking a step back a bit, have you considered something like a materialized view at the database level for your complex queries?

--
You received this message because you are subscribed to the Google Groups "pylons-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pylons-discus...@googlegroups.com.
To post to this group, send email to pylons-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pylons-discuss/cfc52891-f7a3-4e64-a9a4-6dc362c249ff%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Vincent Catalano
Software Engineer and Web Developer,
(520).603.8944

Jonathan Vanasco

unread,
Jun 17, 2016, 8:22:55 PM6/17/16
to pylons-discuss


On Friday, June 17, 2016 at 7:54:49 PM UTC-4, Vincent Catalano wrote:
Taking a step back a bit, have you considered something like a materialized view at the database level for your complex queries?

I *wish* I could use them for this use-case.  Unfortunately I think that would cause more problems that it could solve.  It wouldn't scale to the number of tables that would be needed (multiple ones per user; thousands of users and growing), and there would be a lot of management overhead in terms of cleanup.  The Postgres datastore is also safeguarded a lot more (i.e. the machines are "more expensive" and are priority backups) while other machines that would house this on files or redis have plenty of resources to spare and are cheap to operate.  

Jonathan Vanasco

unread,
Jun 17, 2016, 9:31:55 PM6/17/16
to pylons-discuss
prototyping this was pretty quick.   the only hard bit will be pulling out the correct pyramid settings data for celery (I hardcoded it for now)

the general recipe is here:

Reply all
Reply to author
Forward
0 new messages