Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How to properly fold a subject header in email

89 views
Skip to first unread message

Harry Putnam

unread,
Apr 17, 2010, 9:42:38 PM4/17/10
to begi...@perl.org
I've acquired a massive headache from trying to look thru some of the
mail related modules on cpan that where my searches on header folding
lead me. Needless to say I was a little overcome by the attempt.

I'm working on my own little home boy perl script that looks through
an `events' file in ~/ and extracts information I've put there, to
notify me by email and/or text message when one of these events is close.

A todo calender kind of tool.
I've got it working but I had hoped when there were more than 1 or 2
things being sent out, that I could put several lines in the subject
field and fold them like you see in some email headers.

The idea being that the all the events would then be in the subject
line for quick eyeball parsing.

I tried a few ways of folding the lines, but really had no idea how it
was supposed to be done.

I did start to google on it, but right away it was apparent that
several rfcs for mail are involved, trying read and understand one of
those looked like a lifetime occupation.

That goes for the hefty mail tools on cpan as well,

I wondered if someone here happens to know or can explain how to fold
lines in an email header such that email tools won't scream or unfold
them. I know folded lines make it thru my sendmail MTA and `gnus'
(that is the news/mail reader packaged with emacs)
mail/news reader. But not sure about the format involved.

Harry Putnam

unread,
Apr 17, 2010, 10:09:11 PM4/17/10
to begi...@perl.org
Harry Putnam <rea...@newsguy.com> writes:

> I wondered if someone here happens to know or can explain how to fold
> lines in an email header such that email tools won't scream or unfold
> them. I know folded lines make it thru my sendmail MTA and `gnus'
> (that is the news/mail reader packaged with emacs)
> mail/news reader. But not sure about the format involved.

I may have hit on something just by experimentation.

In the tool I've written the subject field is created from an array.

The elements are fairly short not more than 10-30 char in general.

I found if I put them in the array, starting with the second element,
with something like 14 leading spaces, and end the line with a
semi-colon then newline, then the folded subject line the array
creates survives mailing across the internet.

Pushing like this (after the first push):

push subj_ar, " $_;\n";

Ends up:


[...]
From: rea...@reader.local.lan
Subject: blabla alklka alk;
kkvnak eia mnalvawliw aqwopia;
blab blab abla loabae;
kkvn akeiamn alv awliw aq w opia;
blab blab ablaloa bae;
Date: Sat, 17 Apr 2010 20:55:04 -0500
[...]

I wondered if I just got lucky, and this is likely to lead to problems
or if I accidentally hit on part of the rules for formatting folded headers.

John W. Krahn

unread,
Apr 17, 2010, 11:28:36 PM4/17/10
to Perl Beginners
Harry Putnam wrote:
> I've acquired a massive headache from trying to look thru some of the
> mail related modules on cpan that where my searches on header folding
> lead me. Needless to say I was a little overcome by the attempt.
>
> I'm working on my own little home boy perl script that looks through
> an `events' file in ~/ and extracts information I've put there, to
> notify me by email and/or text message when one of these events is close.
>
> A todo calender kind of tool.
> I've got it working but I had hoped when there were more than 1 or 2
> things being sent out, that I could put several lines in the subject
> field and fold them like you see in some email headers.
>
> The idea being that the all the events would then be in the subject
> line for quick eyeball parsing.
>
> I tried a few ways of folding the lines, but really had no idea how it
> was supposed to be done.

From:

http://www.rfc-editor.org/rfc/rfc5322.txt

<QUOTE>
2.2. Header Fields

Header fields are lines beginning with a field name, followed by a
colon (":"), followed by a field body, and terminated by CRLF. A
field name MUST be composed of printable US-ASCII characters (i.e.,
characters that have values between 33 and 126, inclusive), except
colon. A field body may be composed of printable US-ASCII characters
as well as the space (SP, ASCII value 32) and horizontal tab (HTAB,
ASCII value 9) characters (together known as the white space
characters, WSP). A field body MUST NOT include CR and LF except
when used in "folding" and "unfolding", as described in section
2.2.3. All field bodies MUST conform to the syntax described in
sections 3 and 4 of this specification.


2.2.3. Long Header Fields

Each header field is logically a single line of characters comprising
the field name, the colon, and the field body. For convenience
however, and to deal with the 998/78 character limitations per line,
the field body portion of a header field can be split into a
multiple-line representation; this is called "folding". The general
rule is that wherever this specification allows for folding white
space (not simply WSP characters), a CRLF may be inserted before any
WSP.

For example, the header field:

Subject: This is a test

can be represented as:

Subject: This
is a test

Note: Though structured field bodies are defined in such a way
that folding can take place between many of the lexical tokens
(and even within some of the lexical tokens), folding SHOULD be
limited to placing the CRLF at higher-level syntactic breaks. For
instance, if a field body is defined as comma-separated values, it
is recommended that folding occur after the comma separating the
structured items in preference to other places where the field
could be folded, even if it is allowed elsewhere.

The process of moving from this folded multiple-line representation
of a header field to its single line representation is called
"unfolding". Unfolding is accomplished by simply removing any CRLF
that is immediately followed by WSP. Each header field should be
treated in its unfolded form for further syntactic and semantic
evaluation. An unfolded header field has no length restriction and
therefore may be indeterminately long.
</QUOTE>

John
--
The programmer is fighting against the two most
destructive forces in the universe: entropy and
human stupidity. -- Damian Conway

Harry Putnam

unread,
Apr 18, 2010, 11:10:58 AM4/18/10
to begi...@perl.org
"John W. Krahn" <jwk...@shaw.ca> writes:

[...]

> From:
>
> http://www.rfc-editor.org/rfc/rfc5322.txt
>
> <QUOTE>
> 2.2. Header Fields

[...]

> (and even within some of the lexical tokens), folding SHOULD be
> limited to placing the CRLF at higher-level syntactic breaks. For

CRLF is mentioned in several places in rfc5322 as being used to fold.

I used a newline and a unix newline isn't the same as CRLF is it?

The wikipedia on CRLF only lead to lots more confusion. I guess finally
saying many internet protocols such as smtp are tolerant on that
question.

Maybe thats as good an explanation as I'll find.


John W. Krahn

unread,
Apr 18, 2010, 11:51:54 AM4/18/10
to Perl Beginners
Harry Putnam wrote:
> "John W. Krahn" <jwk...@shaw.ca> writes:
>
> [...]
>
>> From:
>>
>> http://www.rfc-editor.org/rfc/rfc5322.txt
>>
>> <QUOTE>
>> 2.2. Header Fields
>
> [...]
>
>> (and even within some of the lexical tokens), folding SHOULD be
>> limited to placing the CRLF at higher-level syntactic breaks. For
>
> CRLF is mentioned in several places in rfc5322 as being used to fold.
>
> I used a newline and a unix newline isn't the same as CRLF is it?

No. But you can use the Socket (or IO::Socket) module to import that
constant:

perldoc Socket
[ SNIP ]
Also, some common socket "newline" constants are provided: the
constants "CR", "LF", and "CRLF", as well as $CR, $LF, and $CRLF,
which map to "\015", "\012", and "\015\012". If you do not want
to use the literal characters in your programs, then use the
constants provided here. They are not exported by default, but
can be imported individually, and with the ":crlf" export tag:

0 new messages