[Django] #20099: Allow filtering 404 reports by user agent

18 views
Skip to first unread message

Django

unread,
Mar 20, 2013, 12:40:39 PM3/20/13
to django-...@googlegroups.com
#20099: Allow filtering 404 reports by user agent
-------------------------------+--------------------
Reporter: coolRR | Owner: nobody
Type: New feature | Status: new
Component: Uncategorized | Version: master
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------
I get a lot of 404 reports to my inbox that are caused by various spiders.
I'd like to block these. The same way we have a mechanism for blocking 404
reports for certain URLs, we should be able to block 404 reports with user
agents that are easily discernible as spiders.

--
Ticket URL: <https://code.djangoproject.com/ticket/20099>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Mar 21, 2013, 8:53:52 AM3/21/13
to django-...@googlegroups.com
#20099: Allow filtering 404 reports by user agent
-------------------------------+------------------------------------

Reporter: coolRR | Owner: nobody
Type: New feature | Status: new
Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------+------------------------------------
Changes (by claudep):

* needs_docs: => 0
* needs_better_patch: => 0
* component: Uncategorized => HTTP handling
* needs_tests: => 0
* stage: Unreviewed => Accepted


Comment:

Globally, I'm not thrilled by the proposal. Firstly, redirecting 404 to
inboxes is not a best practice, as generally it is better to redirect to a
service like Sentry. Secondly, this filtering is depending on projects,
there might be people wanting to make stats about 404 generated by
spiders. And thirdly, the spider's UA are changing and hard-coding a list
in Django would be subject to frequent updates.

However, we should ease middleware customization so as filtering is
easier. I would like to counsel people to subclass
`BrokenLinkEmailsMiddleware` and write their own `is_ignorable_404`
method. Unfortunately, the user agent is not accessible for that method.
So I'm accepting this ticket with two goals:
- add the user agent as a parameter to `is_ignorable_404`
- documenting `is_ignorable_404` method of
https://docs.djangoproject.com/en/dev/ref/middleware/#django.middleware.common.BrokenLinkEmailsMiddleware

--
Ticket URL: <https://code.djangoproject.com/ticket/20099#comment:1>

Django

unread,
Mar 21, 2013, 9:00:30 AM3/21/13
to django-...@googlegroups.com
#20099: Allow filtering 404 reports by user agent
-------------------------------+------------------------------------

Reporter: coolRR | Owner: nobody
Type: New feature | Status: new
Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------+------------------------------------

Comment (by coolRR):

I agree regarding email, it's better to redirect to services like Sentry.

Regarding the first item, do we need anything beyond just adding `,
user_agent=None` to the definition of `is_ignorable_404` and sending in
the user agent in `process_response`?

--
Ticket URL: <https://code.djangoproject.com/ticket/20099#comment:2>

Django

unread,
Mar 21, 2013, 9:13:52 AM3/21/13
to django-...@googlegroups.com
#20099: Allow filtering 404 reports by user agent
-------------------------------+------------------------------------

Reporter: coolRR | Owner: nobody
Type: New feature | Status: new
Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------+------------------------------------

Comment (by claudep):

Replying to [comment:2 coolRR]:


> Regarding the first item, do we need anything beyond just adding `,
user_agent=None` to the definition of `is_ignorable_404` and sending in
the user agent in `process_response`?

Yes, something like that.

--
Ticket URL: <https://code.djangoproject.com/ticket/20099#comment:3>

Django

unread,
Mar 21, 2013, 12:11:43 PM3/21/13
to django-...@googlegroups.com
#20099: Allow filtering 404 reports by user agent
-------------------------------+------------------------------------

Reporter: coolRR | Owner: nobody
Type: New feature | Status: new
Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------+------------------------------------
Changes (by coolRR):

* has_patch: 0 => 1


Comment:

Patch attached. Let me know whether it's good. (I don't work often with
patches, this one was generated by Git, I don't know whether you can use
it in hg.)

--
Ticket URL: <https://code.djangoproject.com/ticket/20099#comment:4>

Django

unread,
Mar 21, 2013, 3:55:09 PM3/21/13
to django-...@googlegroups.com
#20099: Allow filtering 404 reports by user agent
-------------------------------+------------------------------------

Reporter: coolRR | Owner: nobody
Type: New feature | Status: new
Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 1
Needs tests: 1 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------+------------------------------------
Changes (by claudep):

* needs_docs: 0 => 1
* needs_tests: 0 => 1


Comment:

Django is in Git (Python is in hg), so yes, the format is right.
Now we still miss tests and docs.

--
Ticket URL: <https://code.djangoproject.com/ticket/20099#comment:5>

Django

unread,
Mar 21, 2013, 7:15:53 PM3/21/13
to django-...@googlegroups.com
#20099: Allow filtering 404 reports by user agent
-------------------------------+------------------------------------

Reporter: coolRR | Owner: nobody
Type: New feature | Status: new
Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 1
Needs tests: 1 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------+------------------------------------

Comment (by coolRR):

Here's a test. It's pretty lame; I would have preferred a test that
creates a subclass but I'm very clueless about Django's test architecture,
so I don't know how to do it in a way that won't bother other Django
developers.

--
Ticket URL: <https://code.djangoproject.com/ticket/20099#comment:6>

Django

unread,
Mar 22, 2013, 4:16:43 AM3/22/13
to django-...@googlegroups.com
#20099: Allow filtering 404 reports by user agent
-------------------------------+------------------------------------

Reporter: coolRR | Owner: nobody
Type: New feature | Status: new
Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 1
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------+------------------------------------
Changes (by claudep):

* needs_tests: 1 => 0


Comment:

Here's how I would have tested that functionality. Of course, testing is
often a matter of personal taste (unit-oriented or functionality-
oriented). But it's a little more in line with other tests in the same
class.

--
Ticket URL: <https://code.djangoproject.com/ticket/20099#comment:7>

Django

unread,
Mar 22, 2013, 11:12:09 AM3/22/13
to django-...@googlegroups.com
#20099: Allow filtering 404 reports by user agent
-------------------------------+------------------------------------

Reporter: coolRR | Owner: nobody
Type: New feature | Status: new
Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 1
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------+------------------------------------

Comment (by aaugustin):

At this point, it would make more sense to pass the whole `request` to
`is_ignorable_404`.

Please provide a single patch including both code, tests and docs changes.

--
Ticket URL: <https://code.djangoproject.com/ticket/20099#comment:8>

Django

unread,
Mar 22, 2013, 3:28:30 PM3/22/13
to django-...@googlegroups.com
#20099: Allow filtering 404 reports by user agent
-------------------------------+------------------------------------

Reporter: coolRR | Owner: nobody
Type: New feature | Status: new
Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 1
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------+------------------------------------

Comment (by coolRR):

aaugustin: I considered giving the request as an argument so the user
could do anything they want, but then I thought that if the user needs so
much freedom, maybe they're better off replacing
BrokenLinkEmailsMiddleware completely.

--
Ticket URL: <https://code.djangoproject.com/ticket/20099#comment:9>

Django

unread,
Mar 22, 2013, 6:13:19 PM3/22/13
to django-...@googlegroups.com
#20099: Allow filtering 404 reports by user agent
-------------------------------+------------------------------------

Reporter: coolRR | Owner: nobody
Type: New feature | Status: new
Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 1
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------+------------------------------------

Comment (by aaugustin):

I'd just like to avoid further tickets asking to add more keyword
arguments.

--
Ticket URL: <https://code.djangoproject.com/ticket/20099#comment:10>

Django

unread,
Mar 23, 2013, 8:29:22 AM3/23/13
to django-...@googlegroups.com
#20099: Allow filtering 404 reports by user agent
-------------------------------+------------------------------------

Reporter: coolRR | Owner: nobody
Type: New feature | Status: new
Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 1
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------+------------------------------------

Comment (by coolRR):

aaugustin: I've submitted a patch which changes the method to take the
request. It also moves the existing request-checking logic into that
method. I didn't do docs yet, I want to first know whether this approach
is acceptable.

--
Ticket URL: <https://code.djangoproject.com/ticket/20099#comment:11>

Django

unread,
Mar 23, 2013, 8:45:59 AM3/23/13
to django-...@googlegroups.com
#20099: Allow filtering 404 reports by user agent
-------------------------------+------------------------------------

Reporter: coolRR | Owner: nobody
Type: New feature | Status: new
Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 1
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------+------------------------------------

Comment (by aaugustin):

This is entering bikeshedding territory, the only thing I don't like in
this patch is the name `is_request_we_should_notify_for`;
`is_ignorable_request` would be much more straightforward.

--
Ticket URL: <https://code.djangoproject.com/ticket/20099#comment:12>

Django

unread,
Mar 23, 2013, 8:49:34 AM3/23/13
to django-...@googlegroups.com
#20099: Allow filtering 404 reports by user agent
-------------------------------+------------------------------------

Reporter: coolRR | Owner: nobody
Type: New feature | Status: new
Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 1
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------+------------------------------------

Comment (by coolRR):

I'm not attached to the name (I felt it was too long but couldn't come up
with a shorter one.)

Note though that `is_ignorable_request` is a negative name while the name
I introduced is positive. (I flipped the logic of the method, I think that
all those double negatives are really annoying.)

If you can come up with a short and positive name, I'll be happy to hear
it.

--
Ticket URL: <https://code.djangoproject.com/ticket/20099#comment:13>

Django

unread,
Mar 23, 2013, 9:18:34 AM3/23/13
to django-...@googlegroups.com
#20099: Allow filtering 404 reports by user agent
-------------------------------+------------------------------------

Reporter: coolRR | Owner: nobody
Type: New feature | Status: new
Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 1
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------+------------------------------------

Comment (by aaugustin):

I thought about the double negative, but I felt it was still worth
emphasizing the relation with IGNORABLE_404_URLS.

--
Ticket URL: <https://code.djangoproject.com/ticket/20099#comment:14>

Django

unread,
Mar 23, 2013, 9:30:56 AM3/23/13
to django-...@googlegroups.com
#20099: Allow filtering 404 reports by user agent
-------------------------------+------------------------------------

Reporter: coolRR | Owner: nobody
Type: New feature | Status: new
Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 1
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------+------------------------------------

Comment (by coolRR):

I'm not sure what you're suggesting. If you're suggesting to keep the old
name and negativity so it'll be easier for people to understand that this
method is related to `IGNORABLE_404_URLS`, I disagree.

--
Ticket URL: <https://code.djangoproject.com/ticket/20099#comment:15>

Django

unread,
Mar 24, 2013, 9:14:50 AM3/24/13
to django-...@googlegroups.com
#20099: Allow filtering 404 reports by user agent
-------------------------------+------------------------------------

Reporter: coolRR | Owner: nobody
Type: New feature | Status: new
Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 1
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------+------------------------------------

Comment (by aaugustin):

#10214 was closed as a duplicate. The patch for this ticket should add a
note in the docs, explaining that subclassing the middleware is the
recommended way to alter its behavior.

--
Ticket URL: <https://code.djangoproject.com/ticket/20099#comment:16>

Django

unread,
Mar 24, 2013, 1:04:47 PM3/24/13
to django-...@googlegroups.com
#20099: Allow filtering 404 reports by user agent
-------------------------------+------------------------------------

Reporter: coolRR | Owner: nobody
Type: New feature | Status: new
Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 1
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------+------------------------------------

Comment (by coolRR):

New patch with docs addition attached.

--
Ticket URL: <https://code.djangoproject.com/ticket/20099#comment:17>

Django

unread,
Apr 1, 2013, 8:32:03 AM4/1/13
to django-...@googlegroups.com
#20099: Allow filtering 404 reports by user agent
-------------------------------+------------------------------------

Reporter: coolRR | Owner: nobody
Type: New feature | Status: new
Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 1
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------+------------------------------------

Comment (by coolRR):

Can anyone review this or merge it in?

--
Ticket URL: <https://code.djangoproject.com/ticket/20099#comment:18>

Django

unread,
Apr 8, 2013, 5:52:11 AM4/8/13
to django-...@googlegroups.com
#20099: Allow filtering 404 reports by user agent
-------------------------------+------------------------------------

Reporter: coolRR | Owner: nobody
Type: New feature | Status: new
Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 1
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------+------------------------------------

Comment (by coolRR):

''Tumbleweed''

--
Ticket URL: <https://code.djangoproject.com/ticket/20099#comment:19>

Django

unread,
Apr 15, 2013, 11:58:26 AM4/15/13
to django-...@googlegroups.com
#20099: Allow filtering 404 reports by user agent
-------------------------------+------------------------------------

Reporter: coolRR | Owner: nobody
Type: New feature | Status: new
Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 1
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------+------------------------------------

Comment (by coolRR):

Added pull request, if that helps:

https://github.com/django/django/pull/1014

If you want, I can rename `is_request_we_should_notify_for` to
`is_relevant_request`.

--
Ticket URL: <https://code.djangoproject.com/ticket/20099#comment:20>

Django

unread,
May 24, 2013, 11:58:16 AM5/24/13
to django-...@googlegroups.com
#20099: Allow filtering 404 reports by user agent
-------------------------------+------------------------------------

Reporter: coolRR | Owner: nobody
Type: New feature | Status: new
Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 1
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------+------------------------------------

Comment (by claudep):

Revisiting this issue after some time, here's an alternative approach:
https://github.com/django/django/pull/1211

We need a referee now :-)

--
Ticket URL: <https://code.djangoproject.com/ticket/20099#comment:21>

Django

unread,
May 24, 2013, 12:21:44 PM5/24/13
to django-...@googlegroups.com
#20099: Allow filtering 404 reports by user agent
-------------------------------------+-------------------------------------

Reporter: coolRR | Owner: nobody
Type: New feature | Status: new
Component: HTTP handling | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Ready for
Has patch: 1 | checkin
Needs tests: 0 | Needs documentation: 1
Easy pickings: 0 | Patch needs improvement: 0
| UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by charettes):

* cc: charettes (added)
* stage: Accepted => Ready for checkin


Comment:

Left some minor comment on @claudep's PR. Apart from those the patch looks
RFC.

--
Ticket URL: <https://code.djangoproject.com/ticket/20099#comment:22>

Django

unread,
May 25, 2013, 6:16:37 AM5/25/13
to django-...@googlegroups.com
#20099: Allow filtering 404 reports by user agent
-------------------------------------+-------------------------------------
Reporter: coolRR | Owner: nobody
Type: New feature | Status: closed

Component: HTTP handling | Version: master
Severity: Normal | Resolution: fixed

Keywords: | Triage Stage: Ready for
Has patch: 1 | checkin
Needs tests: 0 | Needs documentation: 1
Easy pickings: 0 | Patch needs improvement: 0
| UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Claude Paroz <claude@…>):

* status: new => closed
* resolution: => fixed


Comment:

In [changeset:"f940e564e4623d531eb97a2cf1b116851003f9fd"]:
{{{
#!CommitTicketReference repository=""
revision="f940e564e4623d531eb97a2cf1b116851003f9fd"
Fixed #20099 -- Eased subclassing of BrokenLinkEmailsMiddleware

Thanks Ram Rachum for the report and the initial patch, and Simon
Charette for the review.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/20099#comment:23>

Reply all
Reply to author
Forward
0 new messages