How to sub-class a CharField for a custom EncryptedCharField

369 views
Skip to first unread message

z0n3z00t

unread,
Jul 25, 2007, 4:38:22 PM7/25/07
to Django developers
I'm trying to create a custom CharField that would encrypt a field's
contents before saving to the db, and conversely decrypt the field's
values from the db as it loads (using a public key).

I want to make sure that the contents of this field is encrypted in
the database, but decrypted whenever I work with the field in Django -
nice and transparent to all the developers. I could add a new
property to my model and have it decrypt the field, but this field's
been around for ages and I don't want to mod all the dependencies.

Sub-classing CharField seemed the obvious choice and encrypting on
save is easy - I simply override the Field's get_db_prep_save()
method.

But I can't find a post-load hook for decrypting the value as soon as
the field's value gets loaded from the db.

This is the part that works well - encrypting on save (silly
encryption algorithm permitting:)

class EncryptedCharField(models.CharField):

def _encrypt(self, value):
if value is not None:
value = ''.join([chr(ord(char)+1) for char in str(value)])
return value

def _decrypt(self, value):
if value is not None:
value = ''.join([chr(ord(char)-1) for char in str(value)])
return value

def get_internal_type(self):
return "CharField"

def get_db_prep_save(self, value):
value = self._encrypt(value)
return super(EncryptedCharField, self).get_db_prep_save(value)


Does anyone have any idea how I could hook into and change the field's
value as soon as the value gets loaded from the db?

Where can I call my _decrypt() method from? I would like to call it
once - the moment the results from the Model's QuerySet are parsed and
the Model's field values populated.

Marty Alchin

unread,
Jul 25, 2007, 5:10:59 PM7/25/07
to django-d...@googlegroups.com
On 7/25/07, z0n3z00t <maxn...@gmail.com> wrote:
> Does anyone have any idea how I could hook into and change the field's
> value as soon as the value gets loaded from the db?

This is something that's been raised before, and there were a few
ideas proposed. Malcolm's looking over a variety of Field subclassing
issues, including what you're asking for.

That's not a great answer, but in the meantime, you might want to take
a look at the lazy instantiation in the GIS branch[1]. That code isn't
exactly what you want, but it might help get you started on a
descriptor-based approach to do what you're asking for until there's a
proper solution.

-Gul

[1] http://code.djangoproject.com/changeset/5657

z0n3z00t

unread,
Jul 25, 2007, 7:43:04 PM7/25/07
to Django developers
> That's not a great answer, but in the meantime, you might want to take
> a look at the lazy instantiation in the GIS branch[1]. That code isn't
> exactly what you want, but it might help get you started on a
> descriptor-based approach to do what you're asking for until there's a
> proper solution.

Thanks Marty - that was exactly what I was looking for. Using a proxy
with descriptors solved my problem - what a nifty, sneaky little
trick!

I had some difficulty determining the context of the __set__ calls -
on loading from the db, the inbound value is the encrypted db value
(which needs to be encrypted only then), or due to a user changing the
value (in clear text - in which case the setter should not decrypt the
value) - and therefore the flag.

Here's my take on it:

class CharEncryptProxy(object):
def __init__(self, field):
self._field = field
self._decrypted = False

def __get__(self, obj, type=None):
return obj.__dict__[self._field.attname]

def __set__(self, obj, value):
# only decrypt first time if set by db
if (not self._decrypted) and (value != ''):
obj.__dict__[self._field.attname] =
self._field._decrypt(value)
else:
obj.__dict__[self._field.attname] = value
self._decrypted = True

class CharEncryptField(models.CharField):


def _encrypt(self, value):
if value is not None:
value = ''.join([chr(ord(char)+1) for char in str(value)])
return value

def _decrypt(self, value):
if value is not None:
value = ''.join([chr(ord(char)-1) for char in str(value)])
return value

def get_internal_type(self):
return "CharField"

def get_db_prep_save(self, value):
value = self._encrypt(value)

return super(CharEncryptField,
self).get_db_prep_save(value)

def contribute_to_class(self, cls, name):
super(CharEncryptField, self).contribute_to_class(cls, name)
# override with proxy
setattr(cls, self.attname, CharEncryptProxy(self))

Works like a charm - thanks again for putting me on the right track!

Malcolm Tredinnick

unread,
Jul 26, 2007, 12:45:10 PM7/26/07
to django-d...@googlegroups.com

I've been following this thread in the background. Modulo any small
bugs, this looks like exactly the right approach. As you've worked out,
all the bits for subclassing are present. The only slightly painful
thing is having to work out the __set__ and __get__ constructs and
realising that a wrapper field, as you've done, can make things easier.
That's the part I want to automate a bit. Probably via metaclassing so
that it doesn't interfere with whatever Field subclass you are
inheriting from. I'm writing the code at the moment, having had a few
minutes this week to chew over approaches with Jacob and Jeremy Dunck
whilst we were in the same location.

What you've written won't become obsolete in any way, either, since it's
exactly what we're doing to do under the covers and Django will just be
providing a convenience shortcut. You can always ignore the shortcut and
go directly to writing __set__ and __get__ functions if you need to.

Regards,
Malcolm

Marty Alchin

unread,
Jul 26, 2007, 4:03:59 PM7/26/07
to django-d...@googlegroups.com
On 7/26/07, Malcolm Tredinnick <mal...@pointy-stick.com> wrote:
> I'm writing the code at the moment, having had a few
> minutes this week to chew over approaches with Jacob and Jeremy Dunck
> whilst we were in the same location.

I would've loved to be there for that conversation, it sounds like a
blast! Okay, so I've been spending far too time on these sorts of
things, and I'm possibly enjoying them entirely too much.

More seriously though, I'm glad to hear that you're taking that
approach. It seems like this issue of determining whether the field is
populated during Model instantiation or elsewhere is going to be the
common case, so it'll be great to avoid all that boilerplate.

-Gul

z0n3z00t

unread,
Aug 2, 2007, 5:03:33 AM8/2/07
to Django developers
In the interest of anyone following this issue, I must confess that my
previous solution broke down the minute I deployed it. What was
interesting was that the `painful thing` - finding the context from
which the __set__ is called, proved an impossible task without a
framework hook.

IMHO this specific problem cannot be solved using proxy descriptors as
I did in the solution above.

So I was back at 1^2 - finding a post_db_load-hook where I can change
a field's value immediately after loading from the db. Since there is
none yet, I rolled my own and the QuerySet's iterator method seemed
the perfect spot for the hook. I added this right at the end - just
before the final "yield obj" statement in the iterator method (i.e.
django.db.models.query.QuerySet.iterator):

------------------------------------------
for field in obj._meta.fields:
if hasattr(field, 'get_db_post_load'):
setattr(obj, field.attname,
field.get_db_post_load(getattr(obj, field.attname)))

yield obj
------------------------------------------

Then, in my CharEncryptField, I did away with the proxy descriptor
(quite reluctantly, actually - gotta luv that bit of slick:) and
hooked into my new custom hook:

class CharEncryptField(models.CharField):
def encryptValue(self, value):
return ''.join([chr(ord(char)+1) for char in str(value)])

def decryptValue(self, value):
return ''.join([chr(ord(char)-1) for char in str(value)])

def get_internal_type(self):
return "CharField"

def get_db_prep_save(self, value):
value = self.encryptValue(value)
return super(CharEncryptField, self).get_db_prep_save(value)

def get_db_post_load(self, value):
value = self.decryptValue(value)
# no super method here (yet)
return value


This solved the problem with the context and looks a lot neater
`modulo` that boilerplate ;>

Marty Alchin

unread,
Aug 2, 2007, 6:55:58 AM8/2/07
to django-d...@googlegroups.com
On 8/2/07, z0n3z00t <maxn...@gmail.com> wrote:
> In the interest of anyone following this issue, I must confess that my
> previous solution broke down the minute I deployed it. What was
> interesting was that the `painful thing` - finding the context from
> which the __set__ is called, proved an impossible task without a
> framework hook.

This isn't entirely accurate. What I notice in your code is that you
were storing the _descrypted flag on the descriptor object itself,
which exists as a single instance for all instances of your model.
Instead, you'd need to store the flag as an attribute on each
individual model instance after the descriptor is first accessed. I
ran into this same problem.[1]

The solution that was pointed out to me was get_cache_name(). It's
undocumented, but is used for ReverseSingleRelatedObjectDescriptor
internally. One of my patches[2] shows how I used it. That patch
probably won't go into Django as is, but it shows how you can tackle
that problem successfully without hacking QuerySets.

-Gul

[1] http://groups.google.com/group/django-developers/browse_thread/thread/93fc74069e10aa17
[2] http://code.djangoproject.net/attachment/ticket/3982/lazy_attribute.diff

Reply all
Reply to author
Forward
0 new messages