Changing table name in response to a QuerySet

118 views
Skip to first unread message

Peter of the Norse

unread,
Mar 17, 2015, 1:58:02 AM3/17/15
to django...@googlegroups.com
We have two tables that — for legal reasons — are not allowed to be combined. The data is so similar that we are using PostgreSQL table inheritance, but we can’t return the information in the same page. Originally, we only had one table with a column specifying which data set to use, but or lawyers put an end to that.

So we have two tables tables with the exact same columns: product_foo and product_bar. We also have two models that use model inheritance so that they have the exact same fields: ProductFoo and ProductBar. The decision to choose which table to use is handled based on user settings.

We would like to go back to just having the one Product model, and have its table name switch as necessary. Is there a way to do that?

Peter of the Norse



James Schneider

unread,
Mar 17, 2015, 5:09:07 AM3/17/15
to django...@googlegroups.com

That sounds more like an exact use case for model inheritance. You would define a single model that matches the columns in both tables. You can define that model as abstract, then have the two real models (one for each table) inherit from that abstract model. In the definition for the child models, you would specify the DB table names to match your two tables in the existing schema.

https://docs.djangoproject.com/en/1.7/ref/models/options/#db-table

That way, you only maintain a single (abstract) model definition, even though multiple models use it (each with a separate DB table).

You may also need to mark the real models as unmanaged by the ORM if you're doing funny things directly in the DB, and you will be managing the table schema directly for those tables.

https://docs.djangoproject.com/en/1.7/ref/models/options/#managed

I'm assuming you have already determined the logic deciding which model each user uses. At that point, assuming that all other functionality remains the same (calls/processing the data, regardless of table), the calls to those models can be generalized by a simple utility function that returns a class instance of the necessary model.

def get_right_model(user):
    # logic to pick ModelA or ModelB based on user
    return ModelA

Then later, maybe in a view:

selected_model = get_right_model(self.request.user)

all_data = selected_model.objects.all()

#all_data should now contain the results of the query against the right model/table in the DB

Trying to flip between tables in the ORM by tweaking the innards probably is going to lead to excessive complication.

-James

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/0A706025-5A66-47E3-B524-222E1780ADA4%40radio1190.org.
For more options, visit https://groups.google.com/d/optout.

Peter of the Norse

unread,
Mar 19, 2015, 12:35:20 AM3/19/15
to django...@googlegroups.com
I’m already doing all of that. The problem is we often have logic that goes like:

model = Product.get_subtable(‘foo’)
queryset = model.objects.
    filter(…).
    select_related(…).
    exclude(…).
    extra(…)
    … and so on and so on. Seriously, the method that constructs the queryset is some 1500 lines long and can quite a bit to run
if queryset.count() == 0:
    model = Product.get_subtable(‘bar’)
    queryset = model.objects.
        filter(…).
        select_related(…).
        exclude(…).
        extra(…)
        … the same expensive method again.

A lot of our performance gains have come from caching the queryset creation function, but even with all of that the method can still take 200ms to run. (Seriously, there can be over 40 filters, some of which are complex Q() and F() objects.) The problem is the second queryset produces almost the exact same SQL, but with one table having a different name. Back when we were creating the raw SQL, it was a simple string replace to switch tables. I was hoping that someone had some kind of “switch table” command on the queryset or something similar.

On Mar 17, 2015, at 3:08 AM, James Schneider <jrschn...@gmail.com> wrote:

That sounds more like an exact use case for model inheritance. You would define a single model that matches the columns in both tables. You can define that model as abstract, then have the two real models (one for each table) inherit from that abstract model. In the definition for the child models, you would specify the DB table names to match your two tables in the existing schema.

https://docs.djangoproject.com/en/1.7/ref/models/options/#db-table

That way, you only maintain a single (abstract) model definition, even though multiple models use it (each with a separate DB table).

You may also need to mark the real models as unmanaged by the ORM if you're doing funny things directly in the DB, and you will be managing the table schema directly for those tables.

https://docs.djangoproject.com/en/1.7/ref/models/options/#managed

I'm assuming you have already determined the logic deciding which model each user uses. At that point, assuming that all other functionality remains the same (calls/processing the data, regardless of table), the calls to those models can be generalized by a simple utility function that returns a class instance of the necessary model.

def get_right_model(user):
    # logic to pick ModelA or ModelB based on user
    return ModelA

Then later, maybe in a view:

selected_model = get_right_model(self.request.user)

all_data = selected_model.objects.all()

#all_data should now contain the results of the query against the right model/table in the DB

Trying to flip between tables in the ORM by tweaking the innards probably is going to lead to excessive complication.

-James

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.

For more options, visit https://groups.google.com/d/optout.

James Schneider

unread,
Mar 19, 2015, 1:05:38 AM3/19/15
to django...@googlegroups.com
Wow, that's quite the queryset! 

If I understand your snippet right, it looks as though you always run an initial large query against one of the tables, and if it doesn't find anything, run the similar query against the other table. If the filters, etc. are exactly the same, you could take the raw SQL from the first query and perform a substitution for the new model name, and just run the raw SQL for the second iteration, rather than depending on the ORM to (re)build the query for you. That may not be an option if you are heavily depending on the ORM to package up the results nicely for you. Given the complexity of your query, raw SQL may buy you some serious performance gains if you are able to optimize the query that the ORM would normally generate.

This may be one of those cases where the limitations of the ORM are being reached, however, I'll defer any further comment to others on the list.

-James


Peter of the Norse

unread,
Mar 22, 2015, 10:52:40 PM3/22/15
to django...@googlegroups.com
I’m not sure I explained myself that well. Our system has two models, Product and ProductSearch. The Product tables come from third party sources, and can’t be edited from the front end. Users can setup their own ProductSearches, which are then used to construct the queryset. So our code actually looks more like:

model = Product.get_subtable(‘foo’)
logger.info(’start first run of get_queryset’)
qs = user_product_search.get_queryset(request, model)
logger.info(‘finished first run of get_queryset’) 

logger.info(’starting count() of qs for the first time’)
product_count = qs.count() 
logger.info(‘finish count() of qs’)
if product_count == 0:
    model = Product.get_subtable(‘foo’)
    logger.info(’start second run of get_queryset’)
    qs = user_product_search.get_queryset(request, model)
    logger.info(‘finished second run of get_queryset’)

In the worst case scenario, the actual count() runs in about 150-250ms, (we have some good indexes) the first get_queryset call takes ~800ms (there are DB look-ups) and the second can take 200ms (we cache the DB values).  This is 200ms slower than the old way of “c = qs.filter(source=‘foo’).count(); if c>0: qs = qs.filter(source=‘foo’) else: qs = qs.filter(source=‘bar’)”. And about 3000ms faster than building up the raw SQL that we used to do. (Actually, the problem comes because we use the queryset in a dozen different ways depending on other things that need to be looked up. Seriously, the ORM is faster than trying to roll our own.)


For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages