I want to make sure that the contents of this field is encrypted in
the database, but decrypted whenever I work with the field in Django -
nice and transparent to all the developers. I could add a new
property to my model and have it decrypt the field, but this field's
been around for ages and I don't want to mod all the dependencies.
Sub-classing CharField seemed the obvious choice and encrypting on
save is easy - I simply override the Field's get_db_prep_save()
method.
But I can't find a post-load hook for decrypting the value as soon as
the field's value gets loaded from the db.
This is the part that works well - encrypting on save (silly
encryption algorithm permitting:)
class EncryptedCharField(models.CharField):
def _encrypt(self, value):
if value is not None:
value = ''.join([chr(ord(char)+1) for char in str(value)])
return value
def _decrypt(self, value):
if value is not None:
value = ''.join([chr(ord(char)-1) for char in str(value)])
return value
def get_internal_type(self):
return "CharField"
def get_db_prep_save(self, value):
value = self._encrypt(value)
return super(EncryptedCharField, self).get_db_prep_save(value)
Does anyone have any idea how I could hook into and change the field's
value as soon as the value gets loaded from the db?
Where can I call my _decrypt() method from? I would like to call it
once - the moment the results from the Model's QuerySet are parsed and
the Model's field values populated.
This is something that's been raised before, and there were a few
ideas proposed. Malcolm's looking over a variety of Field subclassing
issues, including what you're asking for.
That's not a great answer, but in the meantime, you might want to take
a look at the lazy instantiation in the GIS branch[1]. That code isn't
exactly what you want, but it might help get you started on a
descriptor-based approach to do what you're asking for until there's a
proper solution.
-Gul
Thanks Marty - that was exactly what I was looking for. Using a proxy
with descriptors solved my problem - what a nifty, sneaky little
trick!
I had some difficulty determining the context of the __set__ calls -
on loading from the db, the inbound value is the encrypted db value
(which needs to be encrypted only then), or due to a user changing the
value (in clear text - in which case the setter should not decrypt the
value) - and therefore the flag.
Here's my take on it:
class CharEncryptProxy(object):
def __init__(self, field):
self._field = field
self._decrypted = False
def __get__(self, obj, type=None):
return obj.__dict__[self._field.attname]
def __set__(self, obj, value):
# only decrypt first time if set by db
if (not self._decrypted) and (value != ''):
obj.__dict__[self._field.attname] =
self._field._decrypt(value)
else:
obj.__dict__[self._field.attname] = value
self._decrypted = True
class CharEncryptField(models.CharField):
def _encrypt(self, value):
if value is not None:
value = ''.join([chr(ord(char)+1) for char in str(value)])
return value
def _decrypt(self, value):
if value is not None:
value = ''.join([chr(ord(char)-1) for char in str(value)])
return value
def get_internal_type(self):
return "CharField"
def get_db_prep_save(self, value):
value = self._encrypt(value)
return super(CharEncryptField,
self).get_db_prep_save(value)
def contribute_to_class(self, cls, name):
super(CharEncryptField, self).contribute_to_class(cls, name)
# override with proxy
setattr(cls, self.attname, CharEncryptProxy(self))
Works like a charm - thanks again for putting me on the right track!
I've been following this thread in the background. Modulo any small
bugs, this looks like exactly the right approach. As you've worked out,
all the bits for subclassing are present. The only slightly painful
thing is having to work out the __set__ and __get__ constructs and
realising that a wrapper field, as you've done, can make things easier.
That's the part I want to automate a bit. Probably via metaclassing so
that it doesn't interfere with whatever Field subclass you are
inheriting from. I'm writing the code at the moment, having had a few
minutes this week to chew over approaches with Jacob and Jeremy Dunck
whilst we were in the same location.
What you've written won't become obsolete in any way, either, since it's
exactly what we're doing to do under the covers and Django will just be
providing a convenience shortcut. You can always ignore the shortcut and
go directly to writing __set__ and __get__ functions if you need to.
Regards,
Malcolm
I would've loved to be there for that conversation, it sounds like a
blast! Okay, so I've been spending far too time on these sorts of
things, and I'm possibly enjoying them entirely too much.
More seriously though, I'm glad to hear that you're taking that
approach. It seems like this issue of determining whether the field is
populated during Model instantiation or elsewhere is going to be the
common case, so it'll be great to avoid all that boilerplate.
-Gul
IMHO this specific problem cannot be solved using proxy descriptors as
I did in the solution above.
So I was back at 1^2 - finding a post_db_load-hook where I can change
a field's value immediately after loading from the db. Since there is
none yet, I rolled my own and the QuerySet's iterator method seemed
the perfect spot for the hook. I added this right at the end - just
before the final "yield obj" statement in the iterator method (i.e.
django.db.models.query.QuerySet.iterator):
------------------------------------------
for field in obj._meta.fields:
if hasattr(field, 'get_db_post_load'):
setattr(obj, field.attname,
field.get_db_post_load(getattr(obj, field.attname)))
yield obj
------------------------------------------
Then, in my CharEncryptField, I did away with the proxy descriptor
(quite reluctantly, actually - gotta luv that bit of slick:) and
hooked into my new custom hook:
class CharEncryptField(models.CharField):
def encryptValue(self, value):
return ''.join([chr(ord(char)+1) for char in str(value)])
def decryptValue(self, value):
return ''.join([chr(ord(char)-1) for char in str(value)])
def get_internal_type(self):
return "CharField"
def get_db_prep_save(self, value):
value = self.encryptValue(value)
return super(CharEncryptField, self).get_db_prep_save(value)
def get_db_post_load(self, value):
value = self.decryptValue(value)
# no super method here (yet)
return value
This solved the problem with the context and looks a lot neater
`modulo` that boilerplate ;>
This isn't entirely accurate. What I notice in your code is that you
were storing the _descrypted flag on the descriptor object itself,
which exists as a single instance for all instances of your model.
Instead, you'd need to store the flag as an attribute on each
individual model instance after the descriptor is first accessed. I
ran into this same problem.[1]
The solution that was pointed out to me was get_cache_name(). It's
undocumented, but is used for ReverseSingleRelatedObjectDescriptor
internally. One of my patches[2] shows how I used it. That patch
probably won't go into Django as is, but it shows how you can tackle
that problem successfully without hacking QuerySets.
-Gul
[1] http://groups.google.com/group/django-developers/browse_thread/thread/93fc74069e10aa17
[2] http://code.djangoproject.net/attachment/ticket/3982/lazy_attribute.diff