Stop QuerySet repr from executing queries

946 views
Skip to first unread message

Matt

unread,
Oct 10, 2019, 11:50:31 AM10/10/19
to Django developers (Contributions to Django itself)
repr(some_queryset) will execute the query and display some results from the query. This is done because it's (somewhat) helpful for debugging.


This has caused at least two people to scratch their heads about odd queries being generated, and it crashed my production database yesterday.

See #20393 and #30863

Luke Plant has said:

... you have a conflict between the goal of being information rich and with always being useful for debugging. In general, automatically seeing the results is information rich and is useful in debugging, but sometimes it causes further problems.


I have thought about it, and with this conflict of goals, I think we are going to have to err on the side of the current behaviour. It is possible to work around by not calling repr on QuerySets in your middleware, and there are too many people who will be upset (or have problems with their own debugging facilities) by changing this.

One reason people love Django is because of its attention to things like debug-ability. I can certainly see the argument here. However, I want to press on the desirability of this feature.

The argument that you can work around it by not calling repr is certainly true, but in practice, I don't think it's reasonable. Production error reporting tools (like Sentry and New Relic) will call repr when reporting on exceptions.

I see this behavior as being analogous to a file object printing the first 21 characters of a file, which it doesn't do (for good reason, I would imagine):

>>> f = open("foo.py")
>>> repr(f)
"<_io.TextIOWrapper name='foo.py' mode='r' encoding='UTF-8'>"
>>>

I think we ought to reconsider the behavior of repr on QuerySets. I'm of the opinion that we should scrap it completely. It could be replaced by something that generates the SQL that would be executed (although that may be tricky), or some kind of representation of the query filter.

Any thoughts?

Ryan Hiebert

unread,
Oct 10, 2019, 12:16:15 PM10/10/19
to django-d...@googlegroups.com
On Thu, Oct 10, 2019 at 10:50 AM Matt <ma...@satchamo.com> wrote:

I think we ought to reconsider the behavior of repr on QuerySets. I'm of the opinion that we should scrap it completely. It could be replaced by something that generates the SQL that would be executed (although that may be tricky), or some kind of representation of the query filter. 

Just speaking for myself, I think I'm roughly in agreement with the general sentiment and reasoning behind your proposal. There's one place that I think there's common utility, and which I think is the most likely the cause disruption to people, the interactive prompt. The issues that you've mentioned are also present on the interactive prompt as well, so the change would be optimal there as well. Perhaps it would be possible to ease the pain somewhat if there were a setting to do reprs as currently, perhaps _only_ on the interactive prompt, to avoid the issues you've pointed out with tools like Sentry. I'm not sure what the default setting value should be, though I do think new projects should include whatever settings would be needed to make this the default behavior for new projects, even if the setting default was to maintain backward compatibility, as has been done for other settings.

tl;dr: I like the motivation, but I'm not sure how the backward compatibility aspects will play out for the community.

Adam Johnson

unread,
Oct 10, 2019, 3:17:43 PM10/10/19
to django-d...@googlegroups.com
I’m in favour of changing repr to not execute. Luke’s reasoning seems reasonable.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CABpHFHS1%3DirRaQVWgiUnmTX%2By%3DB7Vy%2BpGLha-aoYmqTGUSj20g%40mail.gmail.com.
--
Adam

Matt

unread,
Oct 11, 2019, 10:28:49 PM10/11/19
to Django developers (Contributions to Django itself)

I reviewed the patch that was originally created for this:

 

https://github.com/django/django/pull/1055/commits/7d53d428c0323a7eca93b7b56968a895b031e2ae

 

Essentially, it only includes the results of the queryset in the repr *if* the QuerySet._result_cache has been populated. There is one minor thing I don't like about the particular implementation: it shows the empty list if the result cache is not populated. I think it should just show <QuerySet> instead of <QuerySet []>. That way you can differentiate between _result_cache=None, and _result_cache=[].

 

This is an OK approach, but the inconsistency in the representation smells a little off.

 

RawQuerySet simply does:

 

def __repr__(self):

    return "<%s: %s>" % (self.__class__.__name__, self.query)

 

which is dead simple, and useful. I think it can be argued that QuerySet should be consistent with it (otherwise, why isn't RawQuerySet executing its SQL to show the repr?)

 

Regarding other things that have been mentioned: I'm not too keen about introducing a setting for this behavior, or doing things differently in an interactive console. Django already has a lot of settings, and this seems like such a minor thing. And I don't know of any precedent for changing the behavior of Django in an interactive shell (and you can tell I'm not a fan of inconsistent behaviors).


I'm going to re-open the ticket and hope other Django devs chime in.

Ryan Hiebert

unread,
Oct 11, 2019, 11:20:26 PM10/11/19
to django-d...@googlegroups.com
On Fri, Oct 11, 2019 at 9:29 PM Matt <ma...@satchamo.com> wrote: 

I think it should just show <QuerySet> instead of <QuerySet []>.

 
I agree, I think this makes the most sense.

I think it can be argued that QuerySet should be consistent with [RawQuerySet, which just uses a string of the query in the repr]


I can see the argument I suppose, but I think most of the time that's just noise. When you want to see that, you'll say you want to see that by getting the repr of the query itself. It seems like too much for a default repr for QuerySet, IMO.


Regarding other things that have been mentioned: I'm not too keen about introducing a setting for this behavior, or doing things differently in an interactive console. Django already has a lot of settings, and this seems like such a minor thing. And I don't know of any precedent for changing the behavior of Django in an interactive shell (and you can tell I'm not a fan of inconsistent behaviors).


Me either, and I'm one that mentioned it. I personally have no misgivings about changing the repr unconditionally, and it would be my preference. I was trying to consider potential dissent to give balanced feedback.

Matt

unread,
Oct 14, 2019, 8:07:08 PM10/14/19
to Django developers (Contributions to Django itself)

This patch will print the results of the query (if it has been evaluated). I did hit one odd case while writing the tests. Essentially, if you do:

queryset = SomeModel.objects.all()
list(queryset)
SomeModel.objects.create(...)
repr(queryset)

The newly created model won't appear in the repr. I believe the existing behavior will show the newly created object. I'm still not a huge fan of this repr behavior (I lean towards just displaying the SQL like RawQuerySet).

I just noticed felixxm closed the original ticket. Unfortunately, I don't think I'm advertising the severity of this issue very well and most people are probably never going to encounter it (and if they do, it's probably their own fault). So to be more explicit about the problem...

How to crash your production database with Django:
  1. Use an error reporting system like Sentry
  2. Have a database table with millions of rows (the more the "better")
  3. Have a Django model representing the above table, with a default sort of a non-indexed field
  4. Have a local variable somewhere in the stack like `queryset = HugeTable.objects.all()` (assume it doesn't get "refined down" until deeper in your code)
  5. Trigger an uncaught exception
  6. Your error reporting layer will call repr() on that queryset local variable.
  7. repr(queryset) will trigger your database server to do SELECT ... FROM HugeTable ORDER BY NonIndexedColumn LIMIT 21 (locking it up)
  8. Bonus: Trigger all of the above on all your wsgi processes
In short, just call repr(BigTable.objects.all()) (where BigTable has an ordering on an non-indexed column and lots of rows)



On Thursday, October 10, 2019 at 8:50:31 AM UTC-7, Matt wrote:

Mariusz Felisiak

unread,
Oct 15, 2019, 4:21:14 AM10/15/19
to Django developers (Contributions to Django itself)
Hi y'all,

    I closed the original ticket because two responses is not enough to reopen it. I'm -1 to this change. IMO your issue is really niche and it's caused by multiple reasons, i.e. huge table, Meta.ordering by non-indexed column, uncaught exception, error reporting layer that calls `repr()` etc. and to be honest even with all of this fetching the first 20 rows should not crash your production database (I worked with tables that had hundreds of millions of records). I'm against this change because it will have a negative impact for most of users (like pointed by Luke[1]). It will also have a big impact on our documentation ~ 200 examples to change.

Best,
Mariusz

Carlton Gibson

unread,
Oct 15, 2019, 5:29:01 AM10/15/19
to Django developers (Contributions to Django itself)
I'd be -1 on the change here too. 

That QuerySet shows you its value in the shell is beyond valuable. 

Look at the Django Tutorial, or the Django Girl Tutorial, or any of the other learning examples, and imagine making it so that these playing with the API examples required grasping at which point QuerySets are evaluated. Users don't get that quickly. (These exact issues come up all the time on DRF — Why isn't my QuerySet updating? Because you evaluated it in your class definition...) 

If you must have a different behaviour here, use a subclass.

Kind Regards,

Carlton


On Tuesday, 15 October 2019 10:21:14 UTC+2, Mariusz Felisiak wrote:
I'm -1 to this change. ... to be honest even with all of this fetching the first 20 rows should not crash your production database... I'm against this change because it will have a negative impact for most of users ...It will also have a big impact on our documentation ~ 200 examples to change.

Alexandr Aktsipetrov

unread,
Oct 15, 2019, 5:38:52 AM10/15/19
to Django developers (Contributions to Django itself)
Current behavior is indeed usually useful but I actually remember being annoyed by it during interactive debugging - as watches affect sql log.  

What about reusing DEBUG instead of introducing brand new setting? I imagine all tutorials are in the context of scaffolded project with DEBUG=True as a default?

James Bennett

unread,
Oct 15, 2019, 5:59:27 AM10/15/19
to django-d...@googlegroups.com
This request is a bit more complicated than I think people
realize. There's no practical way I can see to have the __repr__() of
a QuerySet supply information sufficient to reconstruct it, at least
not in a manner recognizable to most users of the ORM (there's no
internal record of which query methods were called, or with which
arguments -- only the manipulations of the nodes of the underlying
Query object).

And even if we decided to go to the trouble of having QuerySet store a
record of query method calls, it's likely to be a considerable amount
of trouble, since arguments to query methods may be objects that are
difficult or impossible to safely store (such as iterators which would
be consumed in creating a string representation).

So having QuerySet.__repr__() produce the sort of "information to
reconstruct the object" typical of the __repr__() of Python's builtins
is not going to work.

Which leaves the question of what to put in the __repr__(), if it's
not going to be a way to reconstruct the QuerySet and it's not going
to be the results of the query. We would also need to ensure QuerySet
implements __str__() in a way that *does* provide the visual debugging
aid of showing the results, since currently it doesn't, and relies on
Python falling back to __repr__() to get the string representation.

And... I don't have a good answer to this. Anything that isn't either
the method chain to reconstruct the query, or the results of the
query, is not going to be a sufficient debugging aid.

Which brings us back around to the current behavior; it seems to me
that this is the most useful thing we can do. It's already documented
that a string representation of a QuerySet shows the results, which
requires evaluating it and performing the query. There are some small
optimizations we could do -- such as checking for a populated result
cache and using it, rather than re-performing the query as __repr__()
currently does -- and we could be more explicit about the fact that
performing the query is a requirement for generating a string
representation, but I think that's about the best we can do for this
issue.

charettes

unread,
Oct 15, 2019, 8:42:12 AM10/15/19
to Django developers (Contributions to Django itself)
I agree with what James eloquently said, the issue is more complicated it appears.

Simon

Harro

unread,
Oct 16, 2019, 1:53:05 AM10/16/19
to Django developers (Contributions to Django itself)
Yes, it's a complicated issue, but isn't the SQL query the ultimate representation of which methods are called or not?

Having the query evaluated during debugging has been shown to be harmful in certain situations, isn't that the most important thing to fix?

Mariusz Felisiak

unread,
Oct 16, 2019, 3:14:37 AM10/16/19
to Django developers (Contributions to Django itself)
W dniu środa, 16 października 2019 07:53:05 UTC+2 użytkownik Harro napisał:
Yes, it's a complicated issue, but isn't the SQL query the ultimate representation of which methods are called or not?

Having the query evaluated during debugging has been shown to be harmful in certain situations, isn't that the most important thing to fix?


Current behavior is extremely valuable IMO. It is not the most important thing to remove behavior that most of users use because we want to fix an edge case that was reported twice in the last six years. I agree that we can clarify this in docs. SQL query is not a solutions because not all of ORM users know SQL. Moreover the current `repr()` shows values from DB that we'll miss showing only SQL. You can check in the Django documentation how users use the current `repr()` e.g. in tutorials.

Best,
Mariusz

Matt

unread,
Oct 20, 2019, 3:47:25 PM10/20/19
to Django developers (Contributions to Django itself)
Perhaps we ought to just keep the current behavior when DEBUG is True (it seems so obvious now, I can't believe it wasn't the first thing I suggested)? Django does lots of helpful things in DEBUG mode at the expense of performance. I think this would be an innocuous change for most people.

It is not the most important thing to remove behavior that most of users use because we want to fix an edge case that was reported twice in the last six years.

I don't consider any of those individual conditions to trigger the problem "off the beaten path" for a bigger production Django site. All of them combined is obviously extremely rare, but it will effect someone else eventually. It doesn't sit well with me to not fix the issue. Django should be reliable in production for any combination of those conditions.

Michel Sabchuk

unread,
Mar 6, 2021, 6:54:56 PM3/6/21
to Django developers (Contributions to Django itself)
I agree with Matt on that. Avoiding executing the queries on production would be helpful!

Let me share my case. I use django-rest-framework in a specific project and DRF has a feature: a serializer has a fancier string representation when printed. I was using a serializer with a queryset in a view that had an exception further, which caused the serializer to be logged and thus, the queryset to be executed.

There are more details in this discussion: https://github.com/encode/django-rest-framework/discussions/7782

The short story is: I was logging this serializer passively and it was causing the execution of a query to a whole table with millions of records, without any sorting optimization, creating hard to debug performance problems.

I understand it is an unusual situation, but repeating Matt's words: Django should be reliable in production for any combination of those conditions.

Thanks!

Joachim Jablon

unread,
Mar 23, 2021, 8:40:39 AM3/23/21
to Django developers (Contributions to Django itself)
Just been bitten by that same bug (combination of Sentry, using a Queryset.as_manager() that creates an unfiltered queryset as a local variable in the stack, a create signal that errored provoking a stacktrace that contained the queryset, a table that is always filtered by a field, and sorted by another field and an index that behaves poorly without the aformentionned filter).

So would a contribution that would avoid evaluating the uncached queryset on repr when DEBUG is False likely to be accepted ? If so, I'm ready to get my hands dirty :)

Cheers !

Adam Johnson

unread,
Mar 23, 2021, 9:02:39 AM3/23/21
to django-d...@googlegroups.com
Would James' suggestion of reusing the result cache in __repr__ have solved your issue? I would like to see that first.

I'm not against the DEBUG-only fetching but there hasn't been any suggestion what to show instead that could be of use.

One can also mitigate against bad queries particularly well by setting a statement timeout at the database level (at least MySQL and PostgreSQL support this), which most users will generally want at a scale where repr() becomes a problem.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.


--
Adam

Kevin Weaver

unread,
Apr 21, 2022, 3:45:43 PM4/21/22
to Django developers (Contributions to Django itself)
I know this thread is more than a year old, but I was just bitten by this as well. We use the Elastic APM Python agent which, like Sentry, calls `repr()` on local variables that it sends as part of stack traces. We use Django REST Framework and had a serializer that looked something like this (fake) example:

```
class AccountSerializer(ModelSerializer):
  owner = PrimaryKeyRelatedField(queryset=User.objects.all())
  days_since_joined = serializers.SerializerMethodField()

  def get_days_since_joined(self, obj):
    try:
      ...
    except Exception:
      logger.exception(...)
```

What happened was:
  1. We have several places in the application that return a paginated list of, say, 50 accounts, using this serializer.
  2. Someone introduced the log line to catch exceptions when calculating `days_since_joined`, which is called for each of the 50 accounts returned on the page. Due to a bug, the exception was almost always raised.
  3. Each time the exception is raised, the Elastic APM Python agent generates a stack trace and calls `repr()` on local variables in each stack frame. One of those variables in the `get_days_since_joined` frame is the `AccountSerializer` object `self`, which includes the `owner` property. Calling `repr()` on the `owner` property evaluates the `User.objects.all()` QuerySet, which fetches the first 21 User objects from the database.
  4. We now had `SELECT ... FROM user LIMIT 21` being called in a loop 50 times each time someone tries to load a page of accounts, suddenly consuming significantly more database CPU and slowing down the application considerably until queries start timing out.
To prevent this from happening again, we ended up applying something very similar to the suggestions above as a monkey patch:

```
orig = QuerySet.__repr__

def __repr__(self):
  if settings.DEBUG or self._result_cache:
    return orig(self)

  return f"<{self.__class__.__name__} [not evaluated]>"

QuerySet.__repr__ = __repr__
```

If folks are willing to reconsider their positions on this issue, I am more than willing to do what is necessary to contribute this change upstream, so that others are not impacted by this in the future.

Thank you,

Kevin

Ian Foote

unread,
Apr 21, 2022, 5:58:14 PM4/21/22
to django-d...@googlegroups.com
I've been working on the Kolo debugging tool and as part of that I've also run into this issue. Generating unexpected queries when monitoring a django project was a nasty surprise.

In Kolo's case I was also able to work around it with a monkeypatch, but not needing this would be a nice performance win.

I also want to point to a similar case in Django where I think the default behaviour is more helpful - that of forms. Calling `str` on a form can trigger form validation, but calling `repr` avoids this. I'd be in favour of moving the current `__repr__` implementation for `QuerySet` into `__str__` and having an implementation of `__repr__` guaranteed not to generate extra queries.

Regards,
Ian

Reply all
Reply to author
Forward
0 new messages