Looking for a way to detect changes in database records with low storage footprint

464 views
Skip to first unread message

DJ-Tom

unread,
Oct 9, 2013, 11:05:29 AM10/9/13
to django...@googlegroups.com
Hi,

a customer has requested a function to "compare" reports.

So let's say I create a database report on Monday, then the data is constantly changing the whole week and on the following Monday it should be possible to get a report that either contains only changed and new records or all records plus a column that indicates that something has changed.

It is not important to know exactly which field has changed but only that something has changed.

I'm already storing a "change_date" for each record, but this is also updated if the user just saves the same data again without actually changing any content.

Also it is important to be able to not only compare based on date/time, but based on a specific reports status.

So I thought about storing a hash value for every record for each report so I could compare this later on.

Is there any functionality that makes creating this hash, e.g. by serializing each record into a string and creating a hash from this information?

Or does someone have any other (better) idea how to implement this?

Thanks
Thomas

Nikolas Stevenson-Molnar

unread,
Oct 9, 2013, 1:07:54 PM10/9/13
to django...@googlegroups.com
A hash should work fine. Alternatively, you could diff the two records,
using something like difflib: http://docs.python.org/2/library/difflib.html

_Nik
> --
> You received this message because you are subscribed to the Google
> Groups "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to django-users...@googlegroups.com.
> To post to this group, send email to django...@googlegroups.com.
> Visit this group at http://groups.google.com/group/django-users.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-users/c8051a13-ac70-4a7f-9498-000f135c1f4f%40googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

DJ-Tom

unread,
Oct 10, 2013, 9:55:27 AM10/10/13
to django...@googlegroups.com

Thanks, but *how* can I create a hash from a database object in Django - is there a generic method to iterate over all (or only specific) fields and get a hash value?

C. Kirby

unread,
Oct 10, 2013, 10:41:54 AM10/10/13
to django...@googlegroups.com
You could grab the __dict__ of the instance and use that.

import hashlib

hash = hashlib.md5()
hash.update(str(MyModel.objects.get(pk = 1).__dict__))
hash.digest()

Tom Evans

unread,
Oct 10, 2013, 1:01:32 PM10/10/13
to django...@googlegroups.com
On Thu, Oct 10, 2013 at 2:55 PM, DJ-Tom <event...@gmail.com> wrote:
>
> Thanks, but *how* can I create a hash from a database object in Django - is
> there a generic method to iterate over all (or only specific) fields and get
> a hash value?
>

This is not a full solution, but you can select out extra fields (and
then subsequently filter on them) using the .extra() method on
querysets.

Eg, if you were using mysql and wanted a field that was a hash of pk
(id), first_name and last_name fields, then you could do something
like so:

>>> User.objects.filter(id__lt=10).extra(
... select={'hash': 'MD5(CONCAT(id, first_name,
last_name))'}).values_list('hash')
[('a2147bd4161f55a7ca402d8971b6a45e',),
('663dfc6f51e0ef2c9fa6691527722f20',),
('045dde540781fe013c58c3ebb9383df6',),
('70079b2943da078f437a3678578c78f7',),
('2b997848117758e1844875b1119702bb',),
('2602ecfa423f969614af5b6174c6863e',),
('c9f0f895fb98ab9159f51fd0297e236d',)]

I don't know that this necessarily helps in your aim though. It might
be better to have a custom save method which updates a
"report_fields_modified" timestamp column when fields that affect your
report are modified.

Unfortunately, (IIRC) you cannot easily determine which fields have
been modified without first loading a pristine copy of the object you
are saving from the database, which has obvious performance issues.

Cheers

Tom

DJ-Tom

unread,
Oct 14, 2013, 3:50:19 AM10/14/13
to django...@googlegroups.com, teva...@googlemail.com

Thanks, this might be a way to go - since the .extra() method may create a dependency on the SQL dialect used I will have to be careful though ;-)
Reply all
Reply to author
Forward
0 new messages