Error email flooding on high traffic production sites

189 views
Skip to first unread message

Simon Litchfield

unread,
Sep 8, 2010, 11:26:58 PM9/8/10
to Django developers
Hi all

Default behaviour of sending an email on 500 error is great.

Problem is on high traffic sites, and you might just be making a quick
update- literally within seconds you can bring your mail server down-
crash your mail client- or render your gmail account useless.

With "batteries included" and "production ready" ethos in mind, I
reckon this needs fixing.

1) Max emails per minute setting

2) Include alternative error handler middleware in core

I haven't tried it yet, but this looks interesting (note web2py
includes this) --
http://bitbucket.org/ashcrow/django-error-capture-middleware/wiki/Home

Thoughts? I know I'm not the only one who has run into this (Russ?)

Cheers
Simon

David P. Novakovic

unread,
Sep 8, 2010, 11:29:11 PM9/8/10
to django-d...@googlegroups.com
Hey dude,

What about something like sentry or lumberjack?

I haven't looked at them too seriously, but I'd imagine there'd be a
way to do smarter summarizing of emails etc..?

D

> --
> You received this message because you are subscribed to the Google Groups "Django developers" group.
> To post to this group, send email to django-d...@googlegroups.com.
> To unsubscribe from this group, send email to django-develop...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
>
>

Andy McKay

unread,
Sep 8, 2010, 11:51:42 PM9/8/10
to django-d...@googlegroups.com
On 2010-09-08, at 8:29 PM, David P. Novakovic wrote:

> Hey dude,
>
> What about something like sentry or lumberjack?

Or Arecibo. There's quite a few solutions to this that don't involve sending email. In fact I use the django taking down your email server as an example for Arecibo all the time.

This has been discussed at least once, I found following thread, but I bet there's more.

http://groups.google.com/group/django-developers/browse_thread/thread/ea8402eef79a68de/4aa93ca80287ed52?hl=en&ie=UTF-8&q=error+email+production+%22django-developers%22&pli=1

I do think a note in the docs to say that this is can be an issue on high traffic sites and you might want to investigate a plan B, would be appropriate.
--
Andy McKay

Justin Lilly

unread,
Sep 9, 2010, 1:52:51 AM9/9/10
to django-d...@googlegroups.com
This seems applicable to dcramer's django-db-log.

http://github.com/dcramer/django-db-log

This might also dovetail with logging support which may make it into
1.3, so it might be worth looping in the folks involved in that.

-justin

Russell Keith-Magee

unread,
Sep 9, 2010, 5:06:32 AM9/9/10
to django-d...@googlegroups.com

This exact problem is the reason why adding logging support is on my
todo list for 1.3. Essentially, the idea is that by merging Vinay's
logging work, the current behavior of 'send an email on server error'
can become a configuration item; the default configuration would be to
send an email (for backwards compatibility), but this setting could be
easily modified to be 'write to a file', 'write to syslog', 'post to
Arecibo', 'write to database', or any other logging handler you care
to write and install -- potentially multiple handlers, if required.

If you want to keep email error handling but want to avoid the
problems you describe, you can then write (and install) a custom error
logging handler that has the properties you describe.

Yours,
Russ Magee %-)

Kevin Howerton

unread,
Sep 21, 2010, 2:36:20 PM9/21/10
to django-d...@googlegroups.com
Lumberjack is basically a backport of what I'd like built-in logging
support to be. I also have it built-in to django on my experimental
branch on github...

To over-ride the default email behavior just turn on the 500handler
view in lumberjack. This is what I perceive as the best method of
catching errors currently, as catching them with signals or middleware
can be troublesome. If there is an error in your middleware or
signal, the error signal will fail to receive the error and it will
just push out to stderr.

We are currently using lumberjack on a very high traffic website for
just this purpose. We are just using the stream handler and having
apache catch the errors, though I have tested all of the other
handlers that come with lumberjack on smaller sites. I also just
added support for arecibo... and the flexibility of the python logging
system allows you to use multiple handlers ... pushing to arecibo,
database, and file all at the same time are possible for example.

I'm also currently working on a tool that will push the errors to a
message queue and then have them be pushed out with websockets...
which I think will be useful for development and staging servers for
debugging. Though this is more of a poor excuse for me to work with
websockets and rabbitmq, I think it could be useful. Thoughts?

Anyhow... here's the current project... If anyone has feature
requests... feedback.. criticism ... i'm always game.

http://github.com/kevin/django-lumberjack

Andrew Wilkinson

unread,
Sep 30, 2010, 6:47:01 AM9/30/10
to django-d...@googlegroups.com
Hi,

Sorry the digging up an old thread, I'm a bit behind on my reading of
django-dev.

I have a patch in the bug tracker that fixes this exact problem -
http://code.djangoproject.com/ticket/11565. The patch is just over a
year old now so it might not apply that cleanly to the current trunk.

The patch works by caching an MD5 of the traceback for each error with
a timeout of settings.ERROR_EMAIL_RATE_LIMIT minutes. This prevents
the same error being sent more than once in the time. It's not perfect
because a single error might cause multiple tracebacks and you'd get
one email for each distinct one. It's definitely better than the
current situation though. It also relies on you having a cache backend
set up.

Unfortunately I never did get time to write unittests for it, and
unittesting something like this is quite hard because of the timeout
involved.

Hope this is useful,
Andrew

Kevin Howerton

unread,
Oct 1, 2010, 11:58:03 AM10/1/10
to django-d...@googlegroups.com
I feel like something like this would be better suited to be in an
external application... since it will fail without a cache-backend.

Also, the implementation will have to change as Russell is about to
commit a logging patch with ticket #12012.

This would be best suited though for a custom handler in an external
app going forward... I will likely add something of this nature to one
of lumberjack's backends (aggregation of some-sort).

-k

Simon Litchfield

unread,
Oct 21, 2010, 1:22:35 AM10/21/10
to Django developers
On Sep 30, 8:47 pm, Andrew Wilkinson <andrewjwilkin...@gmail.com>
wrote:
> I have a patch in the bug tracker that fixes this exact problem -http://code.djangoproject.com/ticket/11565. The patch is just over a
> year old now so it might not apply that cleanly to the current trunk.

Revised patch here (against r13296) --
http://code.djangoproject.com/ticket/11565
Reply all
Reply to author
Forward
0 new messages