Email encoding (DKIM, long lines, etc..)

Showing 1-4 of 4 messages
Email encoding (DKIM, long lines, etc..) not...@gmail.com 4/30/14 4:13 AM
Hi !

I am implementing DKIM validation for my emails.
I noticed than some emails do not pass DKIM validation due to different body hashes.
I followed the email flow, and found that postfix automatically truncates lines to 998 characters if they are too long (in accordance to https://tools.ietf.org/html/rfc2822#section-2.1.1).

Such emails can be generated by Sentry, by my own apps, etc..

Now about Django:
When using EmailMessage, the charset is utf-8, and the Content-Transfer-Encoding is either 7bit or 8bit (automatically changes between them when the body contains non-ASCII characters).
cf the ticket where the developers decided to switch from base64 to this behaviour : https://code.djangoproject.com/ticket/3472

Quick validation with:

>>> from django.core.mail import EmailMessage
>>> print EmailMessage('subject', 'body', 'fro...@example.net', ['to....@example.net']).message().as_string()
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Subject: subject
From: fro...@example.net
To: to....@example.net
Date: Wed, 30 Apr 2014 11:01:44 -0000
Message-ID: <20140430110144.1564.85693@localhost>
body


So at the moment, because Django says not to use base64 for utf-8 emails in its code, Django does no longer make sure that the lines are not too long, despite the RFC.

I see 3 ways to make sure that the lines are not too long :
  • automatically split lines in a way that can be recognized by email readers (no idea how to do that properly..) (in Django/in the apps)
  • go back to base64 (but it seems to increase spam scores)
  • switch to quoted-printable (functioning code below, no idea of the potentially negative impact)
from django.core.mail.message import utf8_charset
from email import Charset
utf8_charset.body_encoding = Charset.QP
What do you think ?
Is Django code / app code the correct place to fix this ?
Should Django respect the RFC ?
Any other ideas ?

Thanks !

Re: Email encoding (DKIM, long lines, etc..) not...@gmail.com 4/30/14 4:38 AM
Quick addition : the ticket where it was approved to switch from quoted-printable to 7bit/8bit; but with the possibly unintended effect that lines were no longer short..

Re: Email encoding (DKIM, long lines, etc..) Russell Keith-Magee 4/30/14 5:12 PM

I'd say so. EmailMessage et al are the end user's public interface to sending mail. If a user can use our API to generate a non-RFC compliant mail, then that's a bug. Even if the root cause of the bug lies in a deeper layer (e.g., Python's email library), we should do everything we can to provide a workaround so that end users aren't affected.

Should Django respect the RFC ?

Absolutely. Django shouldn't expose a public API that makes it possible to generate non-RFC emails payloads; furthermore, if there's any defacto standards or common practices that make an email unacceptable on receipt (e.g., the problem with base64 encoding getting flagged as spam), we should be adhering to that, too.
 
Any other ideas ?


It sounds like you've found a problem; this should be logged as a ticket so it isn't forgotten. If you want to try your hand at a patch, the help would be most welcome.

I'd need to do some more reading about the right solution; Quoted Printable might be workable, but I'm also not aware if that will have any downstream consequences. Some investigation will be required. The only option I can rule out is moving back to base64, because that was done for a reason. Unless you can validate that base64 encoding is no longer penalized by popular spam services, this isn't an option.

Yours,
Russ Magee %-)
 
Re: Email encoding (DKIM, long lines, etc..) not...@gmail.com 5/2/14 6:10 AM
Thanks.

Ticket created at https://code.djangoproject.com/ticket/22561, with hopefully enough references and elements to make an educated choice.

Regards,

NotSqrt