Running a custom function when a queryset is evaluated.

20 views
Skip to first unread message

David Nielson

unread,
Oct 21, 2016, 6:18:18 PM10/21/16
to Django users
I'm really hitting a wall on this issue...maybe there's someone out there who has insight on this.

In this application there is an Employee model with its accompanying table in the database. In order to track the "effective dates" of certain fields, I have removed those columns from the table, and have moved them to a table with the following structure:
 
EmployeeData
* id
* employee_id
* field
* value
* effective_start_date
 
(where 'field' can be any of the original column names)
 
I have created a function that takes a list of employees and pre-fetches the related data (the values for fields that are "active" at a given time. Note that this has nothing to do with django's prefetch_related functionality)
 
To avoid code duplication, I'd like to have this function run any time the queryset is evaluated (when the results are fetched from the database and the list of employees is generated). I've already created a custom QuerySet class. However I don't know where to put this function so it is run on the resulting list of employees upon queryset evaluation.

---------------------------------------------------------------------------
For reference the current setup is as follows:
---------------------------------------------------------------------------

from django import models
from django.contrib.auth.models import BaseUserManager from collections import defaultdict

class EmployeeQuerySet(models.query.QuerySet):
    ...
    def get(self, *args, **kwargs):
        """ If getting a single employee, automatically apply the current data fields """
        employee = super(EmployeeQuerySet, self).get(*args, **kwargs)
        print "...prefetching the data for a single employee..."
        prefetch_employee_data([employee]) # this function is defined below, but would normally be defined first for obvious reasons
        return employee


class EmployeeManager(BaseUserManager):
    ...

    def get_query_set(self):
        return EmployeeQuerySet(self.model)


class Employee(models.Model):
    objects = EmployeeManager()

    first_name = models.CharField(max_length=20)

class EmployeeData(models.Model):
    employee = ForeignKey(Employee, related_name='data')
    field = CharField(max_length=20, ...)
    value = CharField(max_length=50, ...)
    effective_date = DateField()


function prefetch_employee_data(employees):
    """ Prefetches employee data for a list of employees """
    today = datetime.date.today().strftime('%Y-%m-%d')
    employee_ids_string = ','.join([str(x.id) for x in employees]) 

    # This query obtains the values for the fields that are effective as of today. It's a raw query since I wasn't able to create this within django's framework
    data_query = "SELECT * FROM (SELECT * FROM core_employeedata WHERE effective_date <= '%s' AND employee_id IN (%s) ORDER BY effective_date DESC) t GROUP BY t.employee_id, t.field" %(today, employee_ids_string)

    data = get_model('core', 'EmployeeData').objects.raw(data_query)

    data_dict = defaultdict(list) # the keys are employee id, the values are list of tuples (field, value), obtained from the data records, for the given employee
    for record in data:
        data_dict[record.employee_id].append( (record.field, record.value) )

    # apply the data to the employees
    for employee in employees:
        for field, value in data_dict.get(employee.id, []):
            setattr(employee, field, value)

---------------------------------------------------------------------------
EXAMPLE:
---------------------------------------------------------------------------

I have the following employee:
+----+------------+
| id | first_name |
+----+------------+
| 10 | Jane       |
+----+------------+

I then have the following data records for the above employee:

+-----+-------------+------------+-------+----------------+
| id  | employee_id | field      | value | effective_date |
+-----+-------------+------------+-------+----------------+
| 1   |          10 | department | A     | 2016-09-10     |
| 2   |          10 | department | B     | 2016-10-02     |
| 3   |          10 | manager    | Bob   | 2016-10-13     |
+-----+-------------+------------+-------+----------------+

So in a shell:

>>> employee = Employee.objects.get(id=10)
...prefetching the data for a single employee...
>>> employee.first_name
'Jane'
>>> # Even though these aren't actual fields on the Employee class, the 
>>> attributes now exist because of the prefetch_employee_data() 
>>> function
>>>
>>> employee.department # Note that the EmployeeData record with id #1 
>>> above is ignored because a newer value exists (id #2)
'B'
>>> employee.manager
'Bob'

---------------------------------------------------------------------------

So as you can see, I can call the function on a single employee when the get() method is invoked, and it works exactly as expected.

I now need to know where to place this function call when the EmployeeQuerySet is evaluated (when the data is fetched from the database). The problem is that since querysets are lazy, it's not as simple as overriding the fetch() method of the queryset.

Thanks in advance!!!

David Nielson

unread,
Oct 21, 2016, 6:27:43 PM10/21/16
to Django users
I noticed a typo in my original message: "function prefetch_employee_data(employees):" should be "def prefetch_employee_data(employees):"
Reply all
Reply to author
Forward
0 new messages