How to aggregate on insert?

67 views
Skip to first unread message

Erik Cederstrand

unread,
May 23, 2016, 3:58:57 PM5/23/16
to Django Users
Hi,

I have inherited a legacy Item model that has a composite unique key consisting of a Customer ID and a per-customer, incrementing Item ID. Assume I can't change the model.

On inserts, the legacy code would let the database increment the Item ID to avoid race conditions. Something like this:

INSERT INTO item_table (customer_id, item_id, name, ...) VALUES (123, (SELECT MAX(item_id) FROM item_table WHERE customer_id =123) + 1, 'MyItem', ...);


Is there any way I can do the same using the Django ORM without opening up for race conditions? I.e. something better than:

i = Item(customer_id=123, name='MyItem', ...)
i.item_id = Item.objects.filter(customer_id=123).aggregate(Max('item_id'))['item_id__max'] + 1
i.save()

Or do I just wrap that in a loop and catch IntegrityError if I don't want to use raw SQL?


Thanks,
Erik

Ketan Bhatt

unread,
May 23, 2016, 4:58:04 PM5/23/16
to Django users
Hey Erik,

What Django version are you on?

Erik Cederstrand

unread,
May 23, 2016, 6:38:24 PM5/23/16
to Django Users

> Den 23. maj 2016 kl. 22.49 skrev Ketan Bhatt <ketanbh...@gmail.com>:
>
> Hey Erik,
>
> What Django version are you on?

I'm on Django 1.9.

Erik

James Schneider

unread,
May 23, 2016, 7:12:23 PM5/23/16
to django...@googlegroups.com
On Mon, May 23, 2016 at 12:58 PM, Erik Cederstrand <erik+...@cederstrand.dk> wrote:
Hi,

I have inherited a legacy Item model that has a composite unique key consisting of a Customer ID and a per-customer, incrementing Item ID. Assume I can't change the model.

On inserts, the legacy code would let the database increment the Item ID to avoid race conditions. Something like this:

   INSERT INTO item_table (customer_id, item_id, name, ...) VALUES (123, (SELECT MAX(item_id) FROM item_table WHERE customer_id =123) + 1, 'MyItem', ...); 

Is there any way I can do the same using the Django ORM without opening up for race conditions? I.e. something better than:

   i = Item(customer_id=123, name='MyItem', ...)
   i.item_id = Item.objects.filter(customer_id=123).aggregate(Max('item_id'))['item_id__max'] + 1
   i.save()


I feel like an explicit transaction wrapping this set of queries would be the way to go to help avoid (or at least detect) a race condition: 


Or do I just wrap that in a loop and catch IntegrityError if I don't want to use raw SQL?

Even with the transactions in place, you'd still need to account for times when the transaction fails. At that point the logic is up to you. I definitely wouldn't put an infinite loop in, as there may be some other problem that is preventing the transaction from completing, rather raise an exception and handle it in the view accordingly. The transaction also gives you the advantage of having a known start point (nothing incidentally created as part of a failed operation), so it is easier to recover from. 

The efficacy of this solution is also heavily dependent on your underlying DB and the supported level of isolation. Postgres seems to be the winner in terms of concurrency, transactions, and preemptive failure detection.
 
I don't believe there is a magic bullet for this case, but I'm no DB expert.

-James

Erik Cederstrand

unread,
May 24, 2016, 6:31:33 AM5/24/16
to Django Users

> Den 24. maj 2016 kl. 01.11 skrev James Schneider <jrschn...@gmail.com>:
> On Mon, May 23, 2016 at 12:58 PM, Erik Cederstrand <erik+...@cederstrand.dk> wrote:
>
> I have inherited a legacy Item model that has a composite unique key consisting of a Customer ID and a per-customer, incrementing Item ID. Assume I can't change the model.
>
> On inserts, the legacy code would let the database increment the Item ID to avoid race conditions. Something like this:
>
> INSERT INTO item_table (customer_id, item_id, name, ...) VALUES (123, (SELECT MAX(item_id) FROM item_table WHERE customer_id =123) + 1, 'MyItem', ...);
>
> Is there any way I can do the same using the Django ORM without opening up for race conditions? I.e. something better than:
>
> i = Item(customer_id=123, name='MyItem', ...)
> i.item_id = Item.objects.filter(customer_id=123).aggregate(Max('item_id'))['item_id__max'] + 1
> i.save()
>
>
> I feel like an explicit transaction wrapping this set of queries would be the way to go to help avoid (or at least detect) a race condition:

Thank you for your suggestions! The default isolation level for PostgreSQL is READ COMMITTED, so another thread could insert an identical (customer_id, item_id) pair between the select and the insert. I don't think a transaction would help here. I have a unique index on (customer_id, item_id) so save() would fail, and I only have one write statement, so there's nothing else to rollback if save() fails.

After digging around a bit more in the documentation and Django codebase, this solution seems to work for me:


from django.db.models import Value
from .models import Customer, Item


class IncrementingValue(Value):
def __init__(self, field, *args, **kwargs):
super().__init__(*args, **kwargs)
self.f = field

def as_sql(self, compiler, connection):
return '(SELECT MAX(%s) FROM %s WHERE customer_id=%%s) + 1' % (self.f.get_attname_column()[1], self.f.model._meta.db_table), [self.value.id]


c = Customer.objects.get(id=123)
i = Item.objects.create(customer=c, name='MyItem', item_id=IncrementingValue(value=c, field=Item._meta.get_field('item_id')))
i.refresh_from_db()


I didn't audit thoroughly for SQL injections. I would have used the ORM to generate the SQL for the SELECT MAX, but I can't seem to get the SQL for an aggregate() query.


Thanks,
Erik
Reply all
Reply to author
Forward
0 new messages