[PATCH] notify extension: truncate lines to 998 characters to conform to RFC 5322

31 views

Skip to first unread message

William Gallafent

unread,

Oct 27, 2011, 9:51:55 AM10/27/11

to mercuri...@selenic.com

According to RFC 5322 ( http://www.rfc-editor.org/rfc/rfc5322.txt ) section 2.1.1, lines in internet messages must not exceed 998 characters in length. The Notify extension does not enforce this when sending emails containing the result of a diff. The following patch fixes this by truncating any such lines found in the diff output, adding the conventional "..." to indicate that truncation has taken place on that line, and preceding the diff output with a warning that the truncation has taken place.

I use the approach given at http://www.guyrutenberg.com/2007/10/12/conditional-expressions-in-python-24/ to retain something similar to conditional expression syntax within the list comprehension, allowing it to work with Python 2.4.

Apologies for unwrapped paragraphs above, Thunderbird's composition window not letting me wrap the text but not wrap the diff! This diff just comes from my live installation, I don't have a development tree installed here. I hope that its simplicity means it is still acceptable.

--- /usr/share/pyshared/hgext/notify.py~ 2011-08-01 23:08:59.000000000 +0000
+++ /usr/share/pyshared/hgext/notify.py 2011-10-27 13:06:12.000000000 +0000
@@ -257,6 +257,11 @@
chunks = patch.diff(self.repo, prev, ref, opts=patch.diffopts(self.ui))
difflines = ''.join(chunks).splitlines()

+ if any(len(l)>998 for l in difflines):
+ msg = _('\nwarning: long diff lines truncated to 998 characters to conform to RFC5322.\n')
+ self.ui.write(msg)
+ difflines = [(l[:995]+"...",l)[len(l)<999] for l in difflines]
+
if self.ui.configbool('notify', 'diffstat', True):
s = patch.diffstat(difflines)
# s may be nil, don't include the header if it is

--
Bill Gallafent
_______________________________________________
Mercurial-devel mailing list
Mercuri...@selenic.com
http://selenic.com/mailman/listinfo/mercurial-devel

Matt Mackall

unread,

Oct 27, 2011, 12:09:46 PM10/27/11

to William Gallafent, mercuri...@selenic.com

On Thu, 2011-10-27 at 14:51 +0100, William Gallafent wrote:
> According to RFC 5322 ( http://www.rfc-editor.org/rfc/rfc5322.txt )
> section 2.1.1, lines in internet messages must not exceed 998
> characters in length. The Notify extension does not enforce this when
> sending emails containing the result of a diff. The following patch
> fixes this by truncating any such lines found in the diff output,
> adding the conventional "..." to indicate that truncation has taken
> place on that line, and preceding the diff output with a warning that
> the truncation has taken place.

Looks like we have a case of battling, evolving standards from the early
70s.

Such a truncated diff is no longer a correct diff. Applying it may in
fact result in '...' ending up in the source erroneously. So at a
minimum, the warning needs to end up -in the email-, preferably right in
the middle of the diff where the truncation occurred to prevent it from
applying.

But we've also got the patchbomb extension, which will have the same
issue. And there, the whole point is for the remote end to apply the
patch. Intentionally crippling a patch there is a non-starter.
I'm not sure whether it's better to send a correct patch and hope the
client doesn't choke on it, or simply abort and force the user to use an
attachment.

Notably, RFC821 (1982) has this to say about line lengths:

****************************************************
* *
* TO THE MAXIMUM EXTENT POSSIBLE, IMPLEMENTATION *
* TECHNIQUES WHICH IMPOSE NO LIMITS ON THE LENGTH *
* OF THESE OBJECTS SHOULD BE USED. *
* *
****************************************************

> I use the approach given at
> http://www.guyrutenberg.com/2007/10/12/conditional-expressions-in-python-24/ to retain something similar to conditional expression syntax within the list comprehension, allowing it to work with Python 2.4.
>
> Apologies for unwrapped paragraphs above, Thunderbird's composition
> window not letting me wrap the text but not wrap the diff! This diff
> just comes from my live installation, I don't have a development tree
> installed here. I hope that its simplicity means it is still
> acceptable.
>
> --- /usr/share/pyshared/hgext/notify.py~ 2011-08-01 23:08:59.000000000 +0000
> +++ /usr/share/pyshared/hgext/notify.py 2011-10-27 13:06:12.000000000 +0000
> @@ -257,6 +257,11 @@
> chunks = patch.diff(self.repo, prev, ref, opts=patch.diffopts(self.ui))
> difflines = ''.join(chunks).splitlines()
>
> + if any(len(l)>998 for l in difflines):

The any function is post-py2.4-ism. Simpler to just use a for loop over
the lines. Or util.any.

> + msg = _('\nwarning: long diff lines truncated to 998 characters to conform to RFC5322.\n')
> + self.ui.write(msg)

ui.warning would be more appropriate here, but who's going to read it?
It'll generally end up in an web server log file.

> + difflines = [(l[:995]+"...",l)[len(l)<999] for l in difflines]
> +

You'll notice we use PEP8 spacing around operators like '>', '+', and
','.

> if self.ui.configbool('notify', 'diffstat', True):
> s = patch.diffstat(difflines)
> # s may be nil, don't include the header if it is
>
>

--
Mathematics is the supreme nostalgia of our time.

William Gallafent

unread,

Oct 27, 2011, 12:45:39 PM10/27/11

to Matt Mackall, mercuri...@selenic.com

On 27 October 2011 17:09, Matt Mackall <m...@selenic.com> wrote:
> On Thu, 2011-10-27 at 14:51 +0100, William Gallafent wrote:
>> According to RFC 5322 ( http://www.rfc-editor.org/rfc/rfc5322.txt )
>> section 2.1.1, lines in internet messages must not exceed 998
>> characters in length. The Notify extension does not enforce this when
>> sending emails containing the result of a diff. The following patch
>> fixes this by truncating any such lines found in the diff output,
>> adding the conventional "..." to indicate that truncation has taken
>> place on that line, and preceding the diff output with a warning that
>> the truncation has taken place.
>
> Looks like we have a case of battling, evolving standards from the early
> 70s.

RFC 5322 was published in 2008, but I'll take “evolving” :) …
regardless of the history of the RFC, it says “MUST”! So we should aim
to ensure that no mail originating in mercurial has any lines longer
than 998 characters, because if that is not the case, then mercurial
is sending non-compliant messages.

> Such a truncated diff is no longer a correct diff.

Agreed, but see my next paragraph.

> Applying it may in
> fact result in '...' ending up in the source erroneously. So at a
> minimum, the warning needs to end up -in the email-, preferably right in
> the middle of the diff where the truncation occurred to prevent it from
> applying.

As I understand it, the purpose of the notify extension is _not_ to
provide complete and unadulterated diffs, but to _notify_ watchers
that a change has been made, and _indicate_ the nature of that change
by providing first a diffstat, and then the first three hundred lines
(by default) of the diff.

Not to labour the point, but: the diff representing the change is
_already_ truncated, if it is longer than 300 lines; any change with a
diff longer than 300 lines will already no longer be the correct full
diff for this change.

> But we've also got the patchbomb extension, which will have the same
> issue.

Quite possibly. I expect that once a good fix for notify is agreed, a
similar fix for patchbomb will be obvious. I think we should
concentrate on the notify extension here though, and keep the two
discussions separate. My reason for that is that my understanding of
the purpose of the the patchbomb extension is that is designed to send
patches to people in order that those patches be applied; on the other
hand, the purpose of the notify extension is simply to tell
subscribers that a change has occurred, and to give them an outline
desription of the change (the fact that this description includes [a
part of] patch along with the other information is simply because that
gives a good overview of the nature of the change …)

> And there, the whole point is for the remote end to apply the
> patch. Intentionally crippling a patch there is a non-starter.

Definitely. It feels that attaching the diff as an attachment to the
email might be the only solution there; is there any reason why that
should not be done? I won't comment further here since I'd like to
stick to notify and leave patchbomb for later.

> I'm not sure whether it's better to send a correct patch and hope the
> client doesn't choke on it, or simply abort and force the user to use an
> attachment.

It feels to me that the answer to that is probably “correct patch” for
patchbomb, but “human-readable” for notify, given the different
purposes of these two extensions. In either case, conformance is my
goal here :)

> Notably, RFC821 (1982) has this to say about line lengths:
>
> ****************************************************
> * *
> * TO THE MAXIMUM EXTENT POSSIBLE, IMPLEMENTATION *
> * TECHNIQUES WHICH IMPOSE NO LIMITS ON THE LENGTH *
> * OF THESE OBJECTS SHOULD BE USED. *
> * *
> ****************************************************

Similar but less shouty text remains in section 4.5.3.1 of the current
equivalent, rfc5321. My reading, though, indicates that the following
section of the same document is more relevant to this discussion:

---B<---
4.5.3.1.6. Text Line

The maximum total length of a text line including the <CRLF> is 1000
octets (not counting the leading dot duplicated for transparency).
This number may be increased by the use of SMTP Service Extensions.
---B<---

Note also that the introduction to this section, “4.5.3.1. Size
Limits and Minimums” states “Objects larger than these sizes SHOULD be
avoided when possible.”, while also saying that “some Internet mail
constructs such as encoded X.400 addresses (RFC 2156 [35]) will often
require larger objects. Clients MAY attempt to transmit these, but
MUST be prepared for a server to reject them if they cannot be handled
by it.”. I think it is possible to avoid larger objects here without
damaging the purpose of the notify extension. If I'm wrong about that
purpose, please correct me (and explain the 300-line truncation that
already occurs!) …

Since we would rather mail sent by mercurial is not rejected by
mailservers, I suggest that being compliant here is important. The way
to be compliant probably differs between notify and patchbomb. I think
my approach is the right one for notify, and agree that something
should be done about patchbomb (outside this thread!) to make it more
compliant too, probably by wrapping the patches as attachments to
ensure their integrity is maintained.

(The rest of this message discusses only technicalities of the patch,
not its purpose or nature!)

[removed my introduction to the patch]

>> --- /usr/share/pyshared/hgext/notify.py~ 2011-08-01 23:08:59.000000000 +0000
>> +++ /usr/share/pyshared/hgext/notify.py 2011-10-27 13:06:12.000000000 +0000
>> @@ -257,6 +257,11 @@
>> chunks = patch.diff(self.repo, prev, ref, opts=patch.diffopts(self.ui))
>> difflines = ''.join(chunks).splitlines()
>>
>> + if any(len(l)>998 for l in difflines):
>
> The any function is post-py2.4-ism. Simpler to just use a for loop over
> the lines. Or util.any.

OK.

>
>> + msg = _('\nwarning: long diff lines truncated to 998 characters to conform to RFC5322.\n')
>> + self.ui.write(msg)
>
> ui.warning would be more appropriate here, but who's going to read it?
> It'll generally end up in an web server log file.

The warning is for the human recipient of the email. In the same way
as the truncation of the patch at 300 lines generates a warning for
the human recipient, so should this line-length truncation. For me it
ends up in the email sent to the recipient of the notification, as per
the other self.ui.write calls in this function (from which I copied
that line!) - confirmed in testing.

>> + difflines = [(l[:995]+"...",l)[len(l)<999] for l in difflines]
>> +
>
> You'll notice we use PEP8 spacing around operators like '>', '+', and
> ','.

OK, will correct. You can tell I'm new to Python as well as to Mercurial :)

--
Bill Gallafent.

Mads Kiilerich

unread,

Oct 27, 2011, 7:46:12 PM10/27/11

to Matt Mackall, mercuri...@selenic.com

(Discussing patch for http://mercurial.selenic.com/bts/issue3075 )

Matt Mackall wrote, On 10/27/2011 06:09 PM:
> But we've also got the patchbomb extension, which will have the same
> issue.

Really? patchbomb uses mail.mimetextpatch which 3e544c074459 changed to
handle odd patches correctly:

>>> import mercurial.mail
>>> print mercurial.mail.mimetextpatch('a\nb')
From nobody Fri Oct 28 01:19:21 2011
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit

a
b
>>> print mercurial.mail.mimetextpatch('b\xc3\xb8b')
From nobody Fri Oct 28 01:20:10 2011
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64

YsO4Yg==

>>> print mercurial.mail.mimetextpatch('x'*951)
From nobody Fri Oct 28 01:20:44 2011
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

AFAICS everything would be fine if notify did like patchbomb.

This is also why http://mercurial.selenic.com/bts/issue2414 IMHO never
was an issue.

A viable solution to http://mercurial.selenic.com/bts/issue2290 could
perhaps be to also use mime or qp if '[^\n\t\x32-\x7e]' is found.