[Django] #25302: BrokenLinkEmailsMiddleware shouldn't report 404s when Referer = URL

13 views
Skip to first unread message

Django

unread,
Aug 22, 2015, 4:21:34 PM8/22/15
to django-...@googlegroups.com
#25302: BrokenLinkEmailsMiddleware shouldn't report 404s when Referer = URL
-----------------------------------------+------------------------
Reporter: aaugustin | Owner: nobody
Type: New feature | Status: new
Component: HTTP handling | Version: 1.8
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-----------------------------------------+------------------------
Many dubious bots send a Referer header that is equal to the current URL,
presumably to work around checks for empty Referer headers. Some of these
bots are also poorly implemented and trigger a stupid amount of 404s.

BrokenLinkEmailsMiddleware is smart enough not to report 404s without a
Referer. I suggest to make it not report 404s when the Referer it equal to
the current URL either.

--
Ticket URL: <https://code.djangoproject.com/ticket/25302>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Aug 22, 2015, 4:40:19 PM8/22/15
to django-...@googlegroups.com
#25302: BrokenLinkEmailsMiddleware shouldn't report 404s when Referer = URL
-------------------------------+--------------------------------------

Reporter: aaugustin | Owner: nobody
Type: New feature | Status: new
Component: HTTP handling | Version: 1.8
Severity: Normal | Resolution:

Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------

Comment (by aaugustin):

Just to be clear: a 404 with a Referer equal to the current URL must be a
false positive, because there's no way you can be coming from a page that
doesn't exist.

If the page disappeared between the previous and the current request, then
the problem is solved, since the page containing the broken link no longer
exists.

--
Ticket URL: <https://code.djangoproject.com/ticket/25302#comment:1>

Django

unread,
Aug 23, 2015, 1:04:31 PM8/23/15
to django-...@googlegroups.com
#25302: BrokenLinkEmailsMiddleware shouldn't report 404s when Referer = URL
-------------------------------+--------------------------------------
Reporter: aaugustin | Owner: mlorant
Type: New feature | Status: assigned

Component: HTTP handling | Version: 1.8
Severity: Normal | Resolution:

Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------
Changes (by mlorant):

* owner: nobody => mlorant
* status: new => assigned


--
Ticket URL: <https://code.djangoproject.com/ticket/25302#comment:2>

Django

unread,
Aug 23, 2015, 1:58:37 PM8/23/15
to django-...@googlegroups.com
#25302: BrokenLinkEmailsMiddleware shouldn't report 404s when Referer = URL
-------------------------------+--------------------------------------
Reporter: aaugustin | Owner: mlorant
Type: New feature | Status: assigned
Component: HTTP handling | Version: 1.8
Severity: Normal | Resolution:

Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------

Comment (by mlorant):

Should I update the documentation? It is already written that empty
referer are ignored, I think we could add something about this new
behaviour too...

See https://docs.djangoproject.com/en/1.8/howto/error-reporting/#errors

--
Ticket URL: <https://code.djangoproject.com/ticket/25302#comment:3>

Django

unread,
Aug 23, 2015, 2:49:18 PM8/23/15
to django-...@googlegroups.com
#25302: BrokenLinkEmailsMiddleware shouldn't report 404s when Referer = URL
-------------------------------+--------------------------------------
Reporter: aaugustin | Owner: mlorant
Type: New feature | Status: assigned
Component: HTTP handling | Version: 1.8
Severity: Normal | Resolution:

Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------

Comment (by aaugustin):

Yes, you should update the documentation as well.

--
Ticket URL: <https://code.djangoproject.com/ticket/25302#comment:4>

Django

unread,
Aug 23, 2015, 4:23:47 PM8/23/15
to django-...@googlegroups.com
#25302: BrokenLinkEmailsMiddleware shouldn't report 404s when Referer = URL
-------------------------------+------------------------------------

Reporter: aaugustin | Owner: mlorant
Type: New feature | Status: assigned
Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+------------------------------------
Changes (by claudep):

* version: 1.8 => master
* stage: Unreviewed => Accepted


--
Ticket URL: <https://code.djangoproject.com/ticket/25302#comment:5>

Django

unread,
Aug 24, 2015, 7:36:26 PM8/24/15
to django-...@googlegroups.com
#25302: BrokenLinkEmailsMiddleware shouldn't report 404s when Referer = URL
-------------------------------+------------------------------------
Reporter: aaugustin | Owner: mlorant
Type: New feature | Status: closed

Component: HTTP handling | Version: master
Severity: Normal | Resolution: fixed
Keywords: | Triage Stage: Accepted

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+------------------------------------
Changes (by Tim Graham <timograham@…>):

* status: assigned => closed
* resolution: => fixed


Comment:

In [changeset:"4ce433e811763f29c32e0553fe1e0070fd14c6a2" 4ce433e]:
{{{
#!CommitTicketReference repository=""
revision="4ce433e811763f29c32e0553fe1e0070fd14c6a2"
Fixed #25302 -- Prevented BrokenLinkEmailsMiddleware from reporting 404s
when Referer = URL.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/25302#comment:6>

Django

unread,
Nov 26, 2015, 3:10:56 PM11/26/15
to django-...@googlegroups.com
#25302: BrokenLinkEmailsMiddleware shouldn't report 404s when Referer = URL
-------------------------------+------------------------------------
Reporter: aaugustin | Owner: mlorant
Type: New feature | Status: new
Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+------------------------------------
Changes (by aaugustin):

* status: closed => new
* resolution: fixed =>


Comment:

I ran 1.9 RC 1 in production for a few days and sadly, the fix doesn't
yield the results I hoped for.

Broken bots are hardcoded to use `http://<domain><url>` as referer.
However Django's check is sensitive to the scheme. Since I run on
`https://...` the condition added to fix this ticket here never triggers.

Would it make sense to also ignore the scheme in the check?

--
Ticket URL: <https://code.djangoproject.com/ticket/25302#comment:7>

Django

unread,
Nov 26, 2015, 3:30:28 PM11/26/15
to django-...@googlegroups.com
#25302: BrokenLinkEmailsMiddleware shouldn't report 404s when Referer = URL
-------------------------------+------------------------------------
Reporter: aaugustin | Owner: mlorant
Type: New feature | Status: new
Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+------------------------------------

Comment (by aaugustin):

I submitted a pull request: https://github.com/django/django/pull/5730.

I would like to backport it to 1.9 before it's released.

--
Ticket URL: <https://code.djangoproject.com/ticket/25302#comment:8>

Django

unread,
Nov 26, 2015, 4:08:44 PM11/26/15
to django-...@googlegroups.com
#25302: BrokenLinkEmailsMiddleware shouldn't report 404s when Referer = URL
-------------------------------+------------------------------------
Reporter: aaugustin | Owner: mlorant
Type: New feature | Status: new
Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+------------------------------------

Comment (by mlorant):

I don't mind ignoring the scheme indeed, +100 for the PR :) Did not think
of it but it is clearly something that should be ignored.

--
Ticket URL: <https://code.djangoproject.com/ticket/25302#comment:9>

Django

unread,
Nov 26, 2015, 4:44:42 PM11/26/15
to django-...@googlegroups.com
#25302: BrokenLinkEmailsMiddleware shouldn't report 404s when Referer = URL
-------------------------------+------------------------------------
Reporter: aaugustin | Owner: mlorant
Type: New feature | Status: new
Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+------------------------------------

Comment (by aaugustin):

One hour after posting this PR, I received 160 emails like the following
in 8 minutes:

{{{
Referrer: http://REDACTED.com/libraries/joomla/html/language/en-GB/en-
GB.jhtmldate.ini
Requested URL: /libraries/joomla/html/language/en-GB/en-GB.jhtmldate.ini
User agent: Go 1.1 package http
IP address: 162.158.56.47
}}}

I'm wondering if anyone besides me actually uses this in production...

--
Ticket URL: <https://code.djangoproject.com/ticket/25302#comment:10>

Django

unread,
Nov 26, 2015, 5:20:36 PM11/26/15
to django-...@googlegroups.com
#25302: BrokenLinkEmailsMiddleware shouldn't report 404s when Referer = URL
-------------------------------------+-------------------------------------
Reporter: aaugustin | Owner: mlorant

Type: New feature | Status: new
Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by timgraham):

* has_patch: 0 => 1
* stage: Accepted => Ready for checkin


--
Ticket URL: <https://code.djangoproject.com/ticket/25302#comment:11>

Django

unread,
Nov 27, 2015, 2:12:42 AM11/27/15
to django-...@googlegroups.com
#25302: BrokenLinkEmailsMiddleware shouldn't report 404s when Referer = URL
-------------------------------------+-------------------------------------
Reporter: aaugustin | Owner: mlorant
Type: New feature | Status: closed

Component: HTTP handling | Version: master
Severity: Normal | Resolution: fixed

Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Aymeric Augustin <aymeric.augustin@…>):

* status: new => closed
* resolution: => fixed


Comment:

In [changeset:"11f10b70f3cb21c7a7f859e417adee104758221b" 11f10b7]:
{{{
#!CommitTicketReference repository=""
revision="11f10b70f3cb21c7a7f859e417adee104758221b"
Fixed #25302 (again) -- Ignored scheme when checking for bad referers.

The check introduced in 4ce433e was too strict in real life. The poorly
implemented bots this patch attempted to ignore are sloppy when it comes
to http vs. https.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/25302#comment:12>

Django

unread,
Nov 27, 2015, 2:13:24 AM11/27/15
to django-...@googlegroups.com
#25302: BrokenLinkEmailsMiddleware shouldn't report 404s when Referer = URL
-------------------------------------+-------------------------------------
Reporter: aaugustin | Owner: mlorant
Type: New feature | Status: closed
Component: HTTP handling | Version: master
Severity: Normal | Resolution: fixed
Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Aymeric Augustin <aymeric.augustin@…>):

In [changeset:"8dc11dc592dbd5027943462d1bb52a60e40db034" 8dc11dc5]:
{{{
#!CommitTicketReference repository=""
revision="8dc11dc592dbd5027943462d1bb52a60e40db034"
[1.9.x] Fixed #25302 (again) -- Ignored scheme when checking for bad
referers.

The check introduced in 4ce433e was too strict in real life. The poorly
implemented bots this patch attempted to ignore are sloppy when it comes
to http vs. https.

Backport of 11f10b7 from master
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/25302#comment:13>

Reply all
Reply to author
Forward
0 new messages