[Django] #35801: Signals are dispatched to receivers associated with dead senders

61 views
Skip to first unread message

Django

unread,
Sep 28, 2024, 7:38:51 AM9/28/24
to django-...@googlegroups.com
#35801: Signals are dispatched to receivers associated with dead senders
-------------------------+----------------------------------------
Reporter: bobince | Type: Bug
Status: new | Component: Core (Other)
Version: 5.1 | Severity: Normal
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------+----------------------------------------
django.dispatch.Signal.receivers is keyed on the id() of the receiver and
sender (modulo a thing to make bound methods consistent).

If a sender connected to a signal is destroyed, and a new object is
allocated with the same id() and then also connected as a sender to the
signal, when the signal fires it will match the original sender and call
the receiver that was connected for that sender (as well as the new one).

Signal works around the problem of re-used ids for the receiver by having
a weakref to the receiver (since it needs a reference anyway to be able to
call it), but it doesn't for sender.

In my case this resulted in post-migrate hooks for the wrong apps being
occasionally called in migration tests that mutated INSTALLED_APPS causing
AppConfig senders to be re-created, but it can be more simply provoked
with:

{{{
from django.dispatch import Signal

receivers_called = []
def create_receiver(i):
def receiver(**kwargs):
receivers_called.append(i)
return receiver

n = 100
receivers = [create_receiver(i) for i in range(n)]

signal = Signal()
for i in range(n):
sender = object()
signal.connect(receivers[i], sender=sender)
receivers_called = []
_ = signal.send(sender=sender)
# only the receiver for the new sender object should be called
assert receivers_called == [i], f'Expected [{i}], called
{receivers_called}'
}}}

(how readily this explodes may depend on local memory allocation
differences, but it dies pretty consistently for me on iteratio
n 3.)

Perhaps Signal should be keeping a weakref to each sender as well as
receiver, and detecting when it has gone None? Does this need to
participate in the _dead_receivers mechanism?
--
Ticket URL: <https://code.djangoproject.com/ticket/35801>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Oct 2, 2024, 2:03:42 PM10/2/24
to django-...@googlegroups.com
#35801: Signals are dispatched to receivers associated with dead senders
------------------------------+--------------------------------------
Reporter: bobince | Owner: (none)
Type: Bug | Status: closed
Component: Core (Other) | Version: 5.1
Severity: Normal | Resolution: needsinfo
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------+--------------------------------------
Changes (by Natalia Bidart):

* resolution: => needsinfo
* status: new => closed

Comment:

Hello bobince, thank you for taking the time to create this ticket report.

I have investigated and I have used the code that you shared to reproduce.
I applied a small change because I don't think it's correct to clear
`receivers_called = []` like that inside the loop. I also added some
`kwargs` when sending the `Signal` to ensure we are debugging the right
thing. What I find more "correct" is the following, which unless I'm
missing something should be equivalent to your proposal. Sadly, this does
not fail for me at any iteration (I even increased `n` to be `1000`):

{{{#!python
from collections import defaultdict

from django.dispatch import Signal


signal_calls = defaultdict(list)


def create_receiver(i):
def receiver(sender, **kwargs):
signal_calls[i].append((sender, kwargs))

return receiver


n = 1000
receivers = [create_receiver(i) for i in range(n)]
signal = Signal()
for i in range(n):
sender = object()
signal.connect(receivers[i], sender=sender)
_ = signal.send(sender=sender, i=i)
# only the receiver for the new sender object should be called
expected = [(sender, {"signal": signal, "i": i})]
assert signal_calls[i] == expected, f"{expected=}, got
{signal_calls=}"
}}}

Do you think you could, instead of a script, provide a failing test case
for the Django test suite to illustrate the issue you are describing?
--
Ticket URL: <https://code.djangoproject.com/ticket/35801#comment:1>

Django

unread,
Oct 2, 2024, 3:53:23 PM10/2/24
to django-...@googlegroups.com
#35801: Signals are dispatched to receivers associated with dead senders
------------------------------+--------------------------------------
Reporter: bobince | Owner: (none)
Type: Bug | Status: closed
Component: Core (Other) | Version: 5.1
Severity: Normal | Resolution: needsinfo
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------+--------------------------------------
Comment (by bobince):

> I don't think it's correct to clear receivers_called = [] like that
inside the loop

It should be fine as long as the loop is in the same scope as the
definition of receivers_called. But yes, it might be clearer to reset the
list without rebinding it by doing `del receivers_called[:]`.

> Sadly, this does not fail for me at any iteration

That's because the new example stores a reference to `sender` in
`signal_calls`, which prevents the sender being destroyed and its id re-
used.

If you store `id(sender)` or `str(sender)` instead you'll get the error
again.
--
Ticket URL: <https://code.djangoproject.com/ticket/35801#comment:2>

Django

unread,
Oct 3, 2024, 10:35:46 AM10/3/24
to django-...@googlegroups.com
#35801: Signals are dispatched to receivers associated with dead senders
------------------------------+------------------------------------
Reporter: bobince | Owner: (none)
Type: Bug | Status: new
Component: Core (Other) | Version: 5.1
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------+------------------------------------
Changes (by Natalia Bidart):

* cc: Simon Charette (added)
* resolution: needsinfo =>
* stage: Unreviewed => Accepted
* status: closed => new

Comment:

Replying to [comment:2 bobince]:
> > I don't think it's correct to clear receivers_called = [] like that
inside the loop
>
> It should be fine as long as the loop is in the same scope as the
definition of receivers_called. But yes, it might be clearer to reset the
list without rebinding it by doing `del receivers_called[:]`.

Right, I would suggest `receivers_called.clear()`

> > Sadly, this does not fail for me at any iteration
>
> That's because the new example stores a reference to `sender` in
`signal_calls`, which prevents the sender being destroyed and its id re-
used.
>
> If you store `id(sender)` or `str(sender)` instead you'll get the error
again.

Great catch, that makes perfect sense. I tweaked my script and was able to
confirm your report. I'll accept this ticket on that basis, I couldn't
find a previous report about this.

Would you like to prepare a patch?

(Adding Simon as cc since he might have some good ideas for a robust fix.)
--
Ticket URL: <https://code.djangoproject.com/ticket/35801#comment:3>

Django

unread,
Oct 6, 2024, 8:26:18 AM10/6/24
to django-...@googlegroups.com
#35801: Signals are dispatched to receivers associated with dead senders
------------------------------+------------------------------------
Reporter: bobince | Owner: (none)
Type: Bug | Status: new
Component: Core (Other) | Version: 5.1
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------+------------------------------------
Comment (by bobince):

> Right, I would suggest receivers_called.clear()

(Ha ha, what is this crazy new-fangled API? Oh... Python 3.3 you say.
well. huh!)

> Would you like to prepare a patch?

Here's a candidate fix:
https://github.com/bobince/django/compare/main..signals_from_dead_senders

It raises some questions about what to do when ‘sender’ isn't weakreffable
— which admittedly is uncommon, and would already break if ‘use_caching’,
but there's no doc that rules it out as far as I can see. In this commit
it quietly falls back to making a strong reference, which would _probably_
be better than the buggy behaviour, but is still a change in behaviour.

So would certainly appreciate thoughts from someone familiar with the
background of dispatch before PR'ing.
--
Ticket URL: <https://code.djangoproject.com/ticket/35801#comment:4>

Django

unread,
Oct 9, 2024, 9:14:24 PM10/9/24
to django-...@googlegroups.com
#35801: Signals are dispatched to receivers associated with dead senders
------------------------------+------------------------------------
Reporter: bobince | Owner: (none)
Type: Bug | Status: new
Component: Core (Other) | Version: 5.1
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------+------------------------------------
Comment (by Simon Charette):

> In my case this resulted in post-migrate hooks for the wrong apps being
occasionally called in migration tests that mutated INSTALLED_APPS causing
AppConfig senders to be re-created

Could you provide a minimal project demonstrating the issue in a non-
synthetic manner? Before we commit to a solution (which yours appear to be
correct from a quick look) we'd want to establish under which organic
conditions the problem can be reproduced.

The Django test suite has many instances of
`@override_settings(INSTALLED_APPS)` and doesn't run into this issue and
I'm not sure it's fair to expect Django to behave properly under memory
pressure circumstances that make `id` and therefore `hash` (as the default
implementation of `__hash__` is `id()` based) to return the same value for
two distinct objects.

I think a possible alternative solution could be to identify under which
organic circumstances this problem is likely to happen, document that
signal connected with a sender must be disconnected prior to the object
being garbage collected, and adjust the infringing cases within the
framework itself.
--
Ticket URL: <https://code.djangoproject.com/ticket/35801#comment:5>

Django

unread,
Nov 19, 2024, 12:59:07 PM11/19/24
to django-...@googlegroups.com
#35801: Signals are dispatched to receivers associated with dead senders
------------------------------+------------------------------------
Reporter: bobince | Owner: (none)
Type: Bug | Status: new
Component: Core (Other) | Version: 5.1
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------+------------------------------------
Changes (by Sjoerd Job Postmus):

* Attachment "minimal.tar.gz" added.

Django

unread,
Nov 19, 2024, 12:59:15 PM11/19/24
to django-...@googlegroups.com
#35801: Signals are dispatched to receivers associated with dead senders
------------------------------+------------------------------------
Reporter: bobince | Owner: (none)
Type: Bug | Status: new
Component: Core (Other) | Version: 5.1
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------+------------------------------------
Comment (by Sjoerd Job Postmus):

I found this ticket while investigating a problem we are having in our CI,
where we have recurring failures when running `python manage.py migrate`.

In our set up, we have a lot of database migrations. Some of our models
inherit from the `DirtyFieldsMixin` provided by `django-dirtyfields`.

In `DirtyFieldsMixin.__init__`, it calls `post_save.connect`, with
`sender=self.__class__`. (Code: https://github.com/romgar/django-
dirtyfields/blob/f48bbe975c627f3f7577f018992a44b1b1cbad8d/src/dirtyfields/dirtyfields.py#L32).

What we're seeing during the migrations, is that `reset_state` is called
with an instance of a model that does *not* inherit from
`DirtyFieldsMixin`. Likely because the class definition is now in the same
area of memory as which originally contained the temporary model class of
a model that does inherit from `DirtyFieldsMixin`.

Trimming that down to a 'minimal project demonstrating the issue in a non-
synthetic manner' is however hard, because as you said Simon, it has to do
also with memory pressure, and it's hard to simulate the memory pressure
of a large project while simultaneously exhibiting a minimal example.

I did manage to trim down the workings to a minimal example:

{{{
import random

from django.db import migrations


def create_instance_withoutdirty_instance(apps, schema_editor):
WithoutDirty = apps.get_model("stress", "WithoutDirty")
WithoutDirty.objects.create()


def clear_apps_cache(apps, schema_editor):
# Whenever something schema-related changes, the apps-cache gets
cleared. To keep
# this test minimal, we clear the cache manually instead of changing
the schema.
apps.clear_cache()


def init_withdirty_instance(apps, schema_editor):
WithDirty = apps.get_model("stress", "WithDirty")
print(f"WithDirty has ID {id(WithDirty)}")
# Call __init__, don't even bother doing more.
WithDirty()


def update_withoutdirty_instance(apps, schema_editor):
WithoutDirty = apps.get_model("stress", "WithoutDirty")
print(f"WithoutDirty has ID {id(WithoutDirty)}")
instance = WithoutDirty.objects.get()
instance.save()


class Migration(migrations.Migration):

dependencies = [
('stress', '0001_initial'),
]

operations = [
migrations.RunPython(create_instance_withoutdirty_instance),
*[
migrations.RunPython(
random.choice(
[
clear_apps_cache,
init_withdirty_instance,
update_withoutdirty_instance,
]
)
)
for _ in range(1000)
]
]
}}}

(see also attached project)

The example works by randomly deciding from three options:
* clear the apps cache (as if a model was added/altered/deleted)
* Call `WithDirty()`, triggering the `.connect(...)` call.
* Call `WithoutDirty.objects.get().save()`, triggering `post_save` to
notify all its receivers.

When it fails, this is because the temporary `WithoutDirty` class is
instantiated in the same location of memory as which previously held a
temporary `WithDirty` class, causing `reset_state` to be called with the
`WithoutDirty` instance.

During testing, occasionally it would not trigger the bug, but then I'd
just delete the `db.sqlite3` file and run again.
--
Ticket URL: <https://code.djangoproject.com/ticket/35801#comment:6>

Django

unread,
Jan 4, 2025, 11:53:59 AM1/4/25
to django-...@googlegroups.com
#35801: Signals are dispatched to receivers associated with dead senders
------------------------------+------------------------------------
Reporter: bobince | Owner: (none)
Type: Bug | Status: new
Component: Core (Other) | Version: 5.1
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
------------------------------+------------------------------------
Changes (by Simon Charette):

* has_patch: 0 => 1
* needs_better_patch: 0 => 1

Comment:

Thanks for all the extra details Sjoerd, this is greatly appreciated.

Here's how I think we can resume the problem in a few bullet points

1. `Signal` doesn't provide a way to connect to non-weak senders
2. `Signal.receivers` doesn't hold a strong reference to senders as it
uses `id` to retrieve them by lookup and under memory pressure `id` can
return the same value for two distinct objects
3. The migration framework creates and disposes of `ModelBase` instances
(aka `Model` subclasses) which creates a lot of memory pressure (see
#29898) and cycles through `sender` for all model signals which makes the
perfect conditions for `id(SomeFakeModel) == id(SomeOtherFakeModel)`
collisions.

With all that said I can see a few solutions which should be considered.

First making `Signal.connect(weak=True)` also use `weakref` for senders
like
[https://github.com/bobince/django/compare/main..signals_from_dead_senders
bobince implemented] seems like a good step forward. If you could open a
PR from your branch that also tests `Signal(use_caching=True)` (which
should work as it already uses a `WeakKeyDictionary` for its sender cache)
we should be land this work independently of the following proposed
solutions. I believe we should have `weakref.ref(sender)` raise a
`TypeError` instead of silencing though as the proposed code now holds a
''strong'' reference to `sender` in `receivers` which could make things
much worst but this can be discussed further during code review.

That should leave us with what should be done with signal receivers
connected with `weak=False`. In the case of `django-dirty-fields`'s usage
Sjoerd I think they should switch to `weak=True` as their `reset_state`
receiver is defined at the module level anyway so it's causing more harm
than good. There are albeit rare use cases for `weak=False` and it's a
public API so I wonder if the migration framework should do a better job
at clearing after itself by implementing `__del__` logic for `AppConfig`
and fake models for the core signals at least.
--
Ticket URL: <https://code.djangoproject.com/ticket/35801#comment:7>

Django

unread,
Jan 17, 2025, 11:15:15 PM1/17/25
to django-...@googlegroups.com
#35801: Signals are dispatched to receivers associated with dead senders
------------------------------+------------------------------------
Reporter: bobince | Owner: (none)
Type: Bug | Status: new
Component: Core (Other) | Version: 5.1
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
------------------------------+------------------------------------
Changes (by Lincoln):

* cc: Lincoln (added)

--
Ticket URL: <https://code.djangoproject.com/ticket/35801#comment:8>

Django

unread,
Jan 23, 2025, 1:21:47 AM1/23/25
to django-...@googlegroups.com
#35801: Signals are dispatched to receivers associated with dead senders
------------------------------+------------------------------------
Reporter: bobince | Owner: (none)
Type: Bug | Status: new
Component: Core (Other) | Version: 5.1
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
------------------------------+------------------------------------
Comment (by Simon Charette):

I've tweaked the patch and test from bobvince slightly and bundle it up in
[https://github.com/django/django/pull/19093 this PR]. I'd appreciate
feedback on it to confirm it resolves your issue when you have a minute.

I tried to give you co-author attribution by using `And Clover
<a...@doxdesk.com>` but let me know if you'd like to be referenced in a
different manner.
--
Ticket URL: <https://code.djangoproject.com/ticket/35801#comment:9>

Django

unread,
Jan 23, 2025, 1:22:01 AM1/23/25
to django-...@googlegroups.com
#35801: Signals are dispatched to receivers associated with dead senders
------------------------------+------------------------------------
Reporter: bobince | Owner: (none)
Type: Bug | Status: new
Component: Core (Other) | Version: 5.1
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------+------------------------------------
Changes (by Simon Charette):

* needs_better_patch: 1 => 0

--
Ticket URL: <https://code.djangoproject.com/ticket/35801#comment:10>

Django

unread,
Jan 23, 2025, 5:40:12 AM1/23/25
to django-...@googlegroups.com
#35801: Signals are dispatched to receivers associated with dead senders
------------------------------+------------------------------------------
Reporter: bobince | Owner: Simon Charette
Type: Bug | Status: assigned
Component: Core (Other) | Version: 5.1
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------+------------------------------------------
Changes (by Sarah Boyce):

* owner: (none) => Simon Charette
* status: new => assigned

--
Ticket URL: <https://code.djangoproject.com/ticket/35801#comment:11>

Django

unread,
Mar 12, 2025, 7:40:02 AM3/12/25
to django-...@googlegroups.com
#35801: Signals are dispatched to receivers associated with dead senders
------------------------------+------------------------------------------
Reporter: bobince | Owner: Simon Charette
Type: Bug | Status: assigned
Component: Core (Other) | Version: 5.1
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------+------------------------------------------
Comment (by bobince):

Sorry for the radio silence, been a bit underwater recently!

> I'd appreciate feedback on it to confirm it resolves your issue when you
have a minute.

Thanks — yes, this works for me and I think to minimise disruption the
approach with falling back to the status quo for unweakreffable senders is
preferable to my PR's fallback to strong refs.

> let me know if you'd like to be referenced in a different manner.

no that's fine thanks!
--
Ticket URL: <https://code.djangoproject.com/ticket/35801#comment:12>

Django

unread,
Apr 23, 2025, 9:31:55 AM4/23/25
to django-...@googlegroups.com
#35801: Signals are dispatched to receivers associated with dead senders
-------------------------------------+-------------------------------------
Reporter: bobince | Owner: Simon
| Charette
Type: Bug | Status: assigned
Component: Core (Other) | Version: 5.1
Severity: Normal | Resolution:
Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* stage: Accepted => Ready for checkin

--
Ticket URL: <https://code.djangoproject.com/ticket/35801#comment:13>

Django

unread,
Apr 23, 2025, 9:36:09 AM4/23/25
to django-...@googlegroups.com
#35801: Signals are dispatched to receivers associated with dead senders
-------------------------------------+-------------------------------------
Reporter: bobince | Owner: Simon
| Charette
Type: Bug | Status: closed
Component: Core (Other) | Version: 5.1
Severity: Normal | Resolution: fixed
Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak <felisiak.mariusz@…>):

* resolution: => fixed
* status: assigned => closed

Comment:

In [changeset:"760121dcb1837fdee6ac7f4e2412c0d539fbff7a" 760121d]:
{{{#!CommitTicketReference repository=""
revision="760121dcb1837fdee6ac7f4e2412c0d539fbff7a"
Fixed #35801 -- Prevented collision of senders with non-overlapping
lifetimes.

As documented, the id() function can return the same value for distinct
objects with non-overlapping lifetimes which can result in signals being
sent to the wrong receivers if two distinct senders happen to have a
colliding id() value.

Since reproduction of the issue requires memory constrained
circumstances where the same exact id() is reused for two senders of the
same signal the test opt to simulate the collision by systematically
making the same id for Sender instances.

Note that we explicitly avoid keeping a strong reference to senders that
cannot be weakly referenced as that would unexpectedly prevent them from
being garbage collected. This means that id(sender) collisions could
still occur for such objects but Django itself doesn't make use of them.

Thanks Sjoerd Job Postmus for the reduced test case and Mariusz for the
review.

Co-authored-by: And Clover <a...@doxdesk.com>
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/35801#comment:14>
Reply all
Reply to author
Forward
0 new messages