I have a table that looks like this
date car_crashes city
01.01 1 Washington
01.02 4 Washington
01.03 0 Washington
01.04 2 Washington
01.05 0 Washington
01.06 3 Washington
01.07 4 Washington
01.08 1 Washington
01.01 0 Detroit
01.02 2 Detroit
01.03 4 Detroit
01.04 2 Detroit
01.05 0 Detroit
01.06 3 Detroit
01.07 1 Detroit
I want to know how many car crashes for each day happened in the entire nation, and I can do that with this:
Model.values("date") \
.annotate(car_crashes=Sum('car_crashes')) \
.values("date", "car_crashes")
Now, let's suppose I have an array like this:
weights = [
{
"city": "Washington",
"weight": 1,
},
{
"city": "Detroit",
"weight": 2,
}
]
This means that Detroit's car crashes should be multiplied by 2 before being aggregated with Washington's.
It can be done like this:
from django.db.models import IntegerField
when_list = [When(city=w['city'], then=w['weight']) for w in weights]
case_params = {'default': 1, 'output_field': IntegerField()}
Model.objects.values('date') \
.annotate(
weighted_car_crashes=Sum(
F('car_crashes') * Case(*when_list, **case_params)
))
However, this generates very slow SQL code, especially as more properties and a larger array are introduced.
Another solution which is way faster but still sub-optimal is using pandas to :
aggregated = false
for weight in weights:
ag = Model.values("date") \
.annotate(car_crashes=Sum('car_crashes')) \
.values("date", "car_crashes")
if aggregated is False:
aggregated = ag
else:
aggregated = aggregated.union(ag)
aggregated = pd.DataFrame(aggregated)
if len(weights) > 1:
aggregated = aggregated.groupby("date", as_index=False).sum(level=[1])
This is faster, but still not as fast as what happens if, before calling pandas, I take the aggregated.query string and
wrap it with a few lines of SQL.
SELECT "date", sum("car_crashes") FROM (
// String from Python
str(aggregated.query)
)
This works perfectly when pasted into my database SQL. I could do this in Python/Django using .raw() but the documentation says to ask here before using .raw() as mostly anything could be acomplished with the ORM.
Yet, I don't see how. Once I call .union on 2 querysets, I cannot aggregate further.
aggregated.union(ag).annotate(cc=Sum('car_crashes'))
gives
Cannot compute Sum('car_crashes'): 'car_crashes' is an aggregate
Is this possible to do with the Django ORM or should I use .raw()?