Strange behavior with defer and select_related

50 views
Skip to first unread message

Zach Snow

unread,
Aug 14, 2014, 12:59:08 PM8/14/14
to django-users
Hi there. I'm running into some strange behavior when using `defer` with `select_related` and I wanted to find out if I'm simply misunderstanding what should be going on, or if indeed I have stumbled onto a bug.

Summary: Using `defer` on a table that has been re-included in a query via `select_related` affects the fields returned by "original" table.

Consider in the following example that each `Company` might have many users, and each company might have been created by a user owned by the company itself or a different company. Now we'd like to query the database for a list of companies, along with enough information about the `creator` to display, say, their name and company name.

(For the sake of the example the number of fields is very small, but in practice I ran into this bug because doing multiple `select_related`s on a wide table actually caused me to hit the 1664 column limit in Postgres).

    # This is in an app "defer".
    from django.db import models

    class Company(models.Model):
        creator = models.ForeignKey('defer.User', null=True)
        name = models.CharField(max_length=32)
        description = models.TextField()

    class User(models.Model):
        company = models.ForeignKey(Company)
        username = models.CharField(max_length=32)
        name = models.CharField(max_length=32)

Now we query `Company` in such a way that it is `select_related` again into the query, and defer fields on the instance of the table that is `select_related`, but not on the "main" instance of the table.

    >>> cs = Company.objects.all().select_related(
        'creator', 'creator__company').defer('creator__company__description')
    >>>cs.first().description # Causes a query.

Printing the SQL shows what's going wrong:

    >>> print cs.query
    SELECT "defer_company"."id", "defer_company"."creator_id",
        "defer_company"."name", "defer_user"."id",
        "defer_user"."company_id", "defer_user"."username", "defer_user"."name",
        T3."id", T3."creator_id", T3."name"
    FROM "defer_company"
    LEFT OUTER JOIN "defer_user" ON ( "defer_company"."creator_id" = "defer_user"."id" )
    LEFT OUTER JOIN "defer_company" T3 ON ( "defer_user"."company_id" = T3."id" )

Note that we are *not* retrieving the field `defer_company.description`. It seems that the `defer` called on the instance of the `defer_company` table selected through the `creator` field (`T3` above) is being applied to the "first" instance of the table as well.

Updating the SQL to include that column (in the way I expected `defer` and `select_related` to behave) succeeds in returning the correct data -- in sqlite3, at least.

I encountered this behavior in version 1.6.5 and also reproduced it in 1.7c2.

Thanks!

-Zach

Zach Snow

unread,
Aug 14, 2014, 7:33:11 PM8/14/14
to django...@googlegroups.com
Upon reviewing the code in `django/db/models/sql/compiler.py` it seems clear that the list of deferred fields is per-table instead of per-table alias  -- see `deferred_column_names`, https://github.com/django/django/blob/f0b358880a6825d667c037757caac470bc526a1f/django/db/models/sql/compiler.py#L683 which for the above example will generate something like the following:

    { 'defer_company': set([u'creator_id', 'name', u'id']) }

Whereas I would expect:

    { 'T3': set([u'creator_id', 'name', u'id']) }

(As the `only_load` dictionary created by this method only lists tables for which the entire set of columns should not be loaded, we wouldn't see an entry for 'defer_company' at all). It seems that impossible to generate the correct SQL without tracking deferred fields per-table alias, but it wasn't clear to me that the information necessary to determine the table alias to which each deferred field "path" (of the form 'field1__field2__...__fieldN') should be applied is available in the relevant place in the code.

-Zach
Reply all
Reply to author
Forward
0 new messages