>
> So, my question is: Was there something I overlooked there, is it
> really a VM bug, or none of the above? Thanks in advance.
The VM code is in the function vm-find-leading-message-separator in the
file vm-folder.el. All that it looks for is "From " at the beginning of
the line and some sequence of digits at the end of the line. The RFC
4155 is unfortunately very loose. It doesn't guarantee the leading
separator line to be a particular format. So, VM can't assume too much
about the format of the separator line.
It is common practice to put a ">" escape marker for any line that
begins with "From" right after a blank line. If you do that, then VM
should be able to handle it fine.
Let me also suggest a simpler method to combine several single-message
files into an mbox folder. You can put them all into a directory,
select all of them in a GUI and then drag-and-drop them into a new
Thunderbird folder. (You might need to rename the files to have a
".eml" extension. I am not sure.) Then you can move the Thunderbird
folder to somewhere else and VM will handle it fine.
Cheers,
Uday
Ugh. VM uses mbox by default, but it doesn't have to. mbox is about
the cruddiest most idiotic mailbox format out there, and Unix has a
lot to answer for.
If you like messages all in one file, I suggest using mmdf format,
which is an awful lot easier to deal with.
(setq vm-default-folder-type 'mmdf)
MMDF itself is a mail system which is probably now used hardly
anywhere.
But its mailbox format is very simple, and has the great advantage
that it doesn't require the From-stuffing of mbox.
Every message is both preceded and followed by a line of the form
^A^A^A^A
(where ^A stands for the character \001, as usual).
Of course, it is still theoretically possible to cause chaos by
putting those lines into a message with
Content-Transfer-Encoding: 8bit
but it hasn't happened to me yet!
A sensible format would use content-length, and there is a
From_with_content_length or similar variant of mbox, but I don't know
how widely supported and robust it is.
After looking at the RFC 4122 more closely, I notice that it defines a
"default" mbox format that is more tightly defined. This RFC came out
only in 2005, it seems, and then too it is only an "informational" RFC,
not a standard. So, if we want compatibility with other mail tools, we
cannot depend on it.
My Thunderbird folders have leading separator lines like the following:
From - Sun Oct 03 00:20:05 2010
VM itself produces separator lines like this:
From VM Mon Feb 6 16:51:47 2006
Neither of these would satisfy your syntax.
There could be value in defining a new mbox type for VM that is RFC
4122-compliant. I will think about it.
Cheers,
Uday
> Try "^From .+[@]?.+ .+ [+-]?[0-9][0-9][0-9][0-9]$", then. It works for
> me.
I am adding a variable vm-leading-message-separator-regexp-From_ which
you can modify if you wish.
I am reluctant to hard code a new regexp without a careful review.
Users will have old mbox's dating back to years. If the message
separators there don't satisfy the tighter constraints, then messages
get clubbed together. Come to think of it, we used to have a lot of
problems of that kind in the early days of VM.
Cheers,
Uday
>A sensible format would use content-length...
No, because content-length is number of bytes in the message body,
which includes end-of-line characters and such, so it varies across
systems. If you move the file from one system to another, it breaks.
Usenet's Lines: would work, though.
I've had an idea to use the dot-stuffing algorithm used by SMTP: a
message is terminated by a line containing nothing but '.'; any such
line in the message gets another '.' added at the front.
A file would be multiple messages, each followed by a '.' line. It
would do line boundaries in whatever way makes it a text file in the
system where it's stored (so might have to be converted when moved to
a different system).
Er, it's up to the MTA/MUA to maintain the Content-Length correctly if
they do anything other than make a perfect binary copy.
The format already exists, and VM already supports it.
> I've had an idea to use the dot-stuffing algorithm used by SMTP: a
> message is terminated by a line containing nothing but '.'; any such
> line in the message gets another '.' added at the front.
You also need to stuff any line containing only dots, of course.That's
what From-stuffing is like, and it's evil. It should not be necessary
to mangle a message in order to store it.
I was talking about moving the file outside of the mail software.
Say, with FTP or similar. Useful for import/export, archive, etc.
>> I've had an idea to use the dot-stuffing algorithm used by SMTP: a
>> message is terminated by a line containing nothing but '.'; any such
>> line in the message gets another '.' added at the front.
>
>You also need to stuff any line containing only dots, of course.
That's what I said.
>That's what From-stuffing is like, and it's evil. It should not be
>necessary to mangle a message in order to store it.
No, SMTP's dot-stuffing is reversible. The original message is easily
restored. Every mail message sent across the Internet goes through
this: the sending MTA dot-stuffs the message, the receiver undoes it.
From-stuffing is not reversible, because a message line starting with
">From " doesn't get a second '>', so it becomes impossible to know
whether the '>' was there originally or added later. That information
is lost. That's the evil.
SMTP's algorithm, from RFC 5321 (descendant of 821), section 4.5.2:
o Before sending a line of mail text, the SMTP client checks the
first character of the line. If it is a period, one additional
period is inserted at the beginning of the line.
o When a line of mail text is received by the SMTP server, it checks
the line. If the line is composed of a single period, it is
treated as the end of mail indicator. If the first character is a
period and there are other characters on the line, the first
character is deleted.
It may be what you meant, but it's not what you said. Read what you
said!
> No, SMTP's dot-stuffing is reversible. The original message is easily
Yes, SMTP's is. But it still shouldn't be necessary to mangle a message in
order to store it!