Raw access to cache table

101 views
Skip to first unread message

Erik Cederstrand

unread,
Dec 18, 2014, 6:22:03 AM12/18/14
to Django Users
Hi list,

I'm using Django as a hub to synchronize data between various external systems. To do this, I have some long-running Django management commands that are started as cron jobs. To ensure that only one job is working on a data set at any one time, I have implemented a locking system using the Django cache system.

Due to my less-than-stellar programming talent, a cron job may need to be killed once in a while. To avoid leaving orphaned locks in the cache, I want to clean up any locks before exiting, and I may not know which cache keys exist at the time of exit. Therefore, I want to write a signal handler that searches the Django cache for any entries that can be traced to the current process (my cache keys can be used for this purpose) and delete those.

AFAICS, the Django cache API can't be used for this. There's cache.clear() but I don't want to delete all cache entries. I'm using the database backend, so I'm thinking I could access the database table directly and issue any custom SQL on that. But it does feel hackish, so maybe one of you have a better approach? Maybe I got the whole thing backwards?

Here's my cache setup from settings.py:

CACHES = {
'default': {
'BACKEND': 'django.core.cache.backends.db.DatabaseCache',
'LOCATION': 'dispatch_cache',
'TIMEOUT': None,
}
}


Thanks,
Erik

Vijay Khemlani

unread,
Dec 18, 2014, 9:14:57 AM12/18/14
to django...@googlegroups.com
I'm not sure that using a cache system as a lock mechanism is a good idea.

What I would do is setup a task queue (using celery and rabbitMQ) with a single worker, that way you guarantee that only one task is running at a time and you can queue as many as you want. Also, each task has a maximum lifespan, after that it kill itself and the workers starts executing the next task in the queue automatically. 


--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/9A6110DB-9D5A-4A3F-9DD3-49CC406C1C18%40cederstrand.dk.
For more options, visit https://groups.google.com/d/optout.

Avraham Serour

unread,
Dec 18, 2014, 9:19:07 AM12/18/14
to django...@googlegroups.com
I implemented a network lock using redis, the module had some other functionalities, but the basic lock class was:

class Lock():
"""
Regular behavior lock, works across the network, useful for distributed processes
"""
def __init__(self, name):
self.redis = StrictRedis(host='dev3', db=2)
self.redis._use_lua_lock = False
self.lock = self.redis.lock(name)

def __enter__(self, blocking=True):
# print('Acquiring lock')
return self.lock.acquire(blocking)

def __exit__(self, exc_type=None, exc_val=None, exc_tb=None):
# print('releasing lock')
self.lock.release()

acquire = __enter__
release = __exit__

Javier Guerra Giraldez

unread,
Dec 18, 2014, 9:25:44 AM12/18/14
to django...@googlegroups.com
On Thu, Dec 18, 2014 at 6:20 AM, Erik Cederstrand
<erik+...@cederstrand.dk> wrote:
> I'm using Django as a hub to synchronize data between various external systems. To do this, I have some long-running Django management commands that are started as cron jobs. To ensure that only one job is working on a data set at any one time, I have implemented a locking system using the Django cache system.


there are many ways to skin this cat; one of them is doing it the Unix
way with flock. In short, it's a command line tool that makes it easy
to do "if i'm already running, exit now" in shell scripts.

check the man page, it lists a few ways to do this kind of things.

http://linux.die.net/man/1/flock

--
Javier

François Schiettecatte

unread,
Dec 18, 2014, 11:16:12 AM12/18/14
to django...@googlegroups.com
Hi

I use fcntl.flock() from the fcntl module. I have used this method in C, Perl and Python, works great. Happy to share code.

François
> --
> You received this message because you are subscribed to the Google Groups "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
> To post to this group, send email to django...@googlegroups.com.
> Visit this group at http://groups.google.com/group/django-users.
> To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAFkDaoTFuxQ-f564uWBd6S%3DPaO960nxRKNLkspqko5DujjhioQ%40mail.gmail.com.

Erik Cederstrand

unread,
Dec 19, 2014, 8:03:39 AM12/19/14
to Django Users

> Den 18/12/2014 kl. 12.20 skrev Erik Cederstrand <erik+...@cederstrand.dk>:
>
> Hi list,
>
> I'm using Django as a hub to synchronize data between various external systems. To do this, I have some long-running Django management commands that are started as cron jobs. To ensure that only one job is working on a data set at any one time, I have implemented a locking system using the Django cache system.
>
> Due to my less-than-stellar programming talent, a cron job may need to be killed once in a while. To avoid leaving orphaned locks in the cache, I want to clean up any locks before exiting, and I may not know which cache keys exist at the time of exit. Therefore, I want to write a signal handler that searches the Django cache for any entries that can be traced to the current process (my cache keys can be used for this purpose) and delete those.
>
> AFAICS, the Django cache API can't be used for this. There's cache.clear() but I don't want to delete all cache entries. I'm using the database backend, so I'm thinking I could access the database table directly and issue any custom SQL on that. But it does feel hackish, so maybe one of you have a better approach? Maybe I got the whole thing backwards?

Just to wrap this up, I wrote a function to get all or some cache entries. Most of the code is copy-paste from django.core.cache.backends.db.


def get_cache_entries(key_prefix=None):
db = router.db_for_read(cache.cache_model_class)
table = connections[db].ops.quote_name(cache._table)
with connections[db].cursor() as cursor:
sql = "SELECT cache_key, value FROM %s" % table
params = None
if key_prefix:
sql += " WHERE cache_key LIKE %s"
params = (key_prefix + '%',)
cursor.execute(sql, params=params)
rows = cursor.fetchall()
res = {}
for row in rows:
value = connections[db].ops.process_clob(row[1])
res[row[0]] = pickle.loads(base64.b64decode(force_bytes(value)))
return res


Erik

Collin Anderson

unread,
Dec 20, 2014, 11:55:43 PM12/20/14
to django...@googlegroups.com
Hi Erik,

If you want a nicer interface, I just ran manage.py inspecdb on one of my databases and got this for the cache table. Not sure why django still does it by hand.

class Cache(models.Model):
    cache_key
= models.CharField(primary_key=True, max_length=255)
    value
= models.TextField()
    expires
= models.DateTimeField()

   
class Meta:
        managed
= False
        db_table
= 'cache_table'

Collin

Russell Keith-Magee

unread,
Dec 21, 2014, 6:41:34 PM12/21/14
to Django Users
On Sun, Dec 21, 2014 at 12:55 PM, Collin Anderson <cmawe...@gmail.com> wrote:
Hi Erik,

If you want a nicer interface, I just ran manage.py inspecdb on one of my databases and got this for the cache table. Not sure why django still does it by hand.

Gather `round, children, and let Grandpa fill you in on the details... :-)

Although it isn't huge, Django's ORM does have an overhead. If you're looking for the general capability to query rows on a database, this overhead is a reasonable price to pay - you have a little overhead, but it's a lot easier to express complex queries. However, in the case of a cache table, you need to issue exactly 2 queries:

1) Get me the object with key X.

2) Set the value of key X.

These two queries are trivial to write -- so trivial in fact, that they don't even need to rewritten as they move across database backends. They're as "vanilla" as a query can get in SQL. 

So - this becomes a case study in when *not* to use the ORM. We know the 2 queries that need to be issued. That SQL isn't hard to write or maintain. And using the ORM would impose a non-trivial runtime overhead - and by definition, a cache backend is supposed to be as efficient as possible. Raw SQL is the "right" answer here.

(well.... not using a database backed cache backend is the *really* right answer, but I'll let that slide for the moment...)

If you *do* want to do complex queries on the database cache table, the approach suggested by Collin is as good an approach as any. A managed table will give you ORM operations over an arbitrary table - include Django's own internal tables.

That said, I'll also concur that the database cache backend is the wrong answer here. If you're writing a cron script, the approach I've always used is PIDfile based lock:


Yours,
Russ Magee %-)

Erik Cederstrand

unread,
Dec 22, 2014, 3:36:57 PM12/22/14
to Django Users
Hi Colin,

> Den 21/12/2014 kl. 05.55 skrev Collin Anderson <cmawe...@gmail.com>:
>
> If you want a nicer interface, I just ran manage.py inspecdb on one of my databases and got this for the cache table. Not sure why django still does it by hand.
>
> class Cache(models.Model):
> cache_key = models.CharField(primary_key=True, max_length=255)
> value = models.TextField()
> expires = models.DateTimeField()
>
> class Meta:
> managed = False
> db_table = 'cache_table'

Thanks, that's actually a nice trick. I'd still need to un-pickle the values, but I can use the ORM to query cache_key.

Erik

Erik Cederstrand

unread,
Dec 22, 2014, 3:49:17 PM12/22/14
to Django Users
Hi Russel,

> Den 22/12/2014 kl. 00.40 skrev Russell Keith-Magee <rus...@keith-magee.com>:
>
> If you *do* want to do complex queries on the database cache table, the approach suggested by Collin is as good an approach as any. A managed table will give you ORM operations over an arbitrary table - include Django's own internal tables.
>
> That said, I'll also concur that the database cache backend is the wrong answer here. If you're writing a cron script, the approach I've always used is PIDfile based lock

Thanks for taking the time to explain the design considerations for the cache mechanism.

I agree that using the cache to hold locks is a bit awkward, but DB transactions would be running way too long for my taste, and I can't use pidfile locks because my dataset can get updates from both cron jobs and my REST API. But I should probably take a hard look at my design again.

Erik

Collin Anderson

unread,
Dec 23, 2014, 11:00:32 AM12/23/14
to django...@googlegroups.com
Hi Russ,

Makes sense. Thanks.

Collin
Reply all
Reply to author
Forward
0 new messages