On 2/11/2011 9:14 PM, José Manuel García-Patos wrote:
> So, my question is: Was there something I overlooked there, is it > really a VM bug, or none of the above? Thanks in advance.
The VM code is in the function vm-find-leading-message-separator in the file vm-folder.el. All that it looks for is "From " at the beginning of the line and some sequence of digits at the end of the line. The RFC 4155 is unfortunately very loose. It doesn't guarantee the leading separator line to be a particular format. So, VM can't assume too much about the format of the separator line.
It is common practice to put a ">" escape marker for any line that begins with "From" right after a blank line. If you do that, then VM should be able to handle it fine.
Let me also suggest a simpler method to combine several single-message files into an mbox folder. You can put them all into a directory, select all of them in a GUI and then drag-and-drop them into a new Thunderbird folder. (You might need to rename the files to have a ".eml" extension. I am not sure.) Then you can move the Thunderbird folder to somewhere else and VM will handle it fine.
On 2011-02-11, José Manuel García-Patos <jmgarc...@madrid.uned.es> wrote:
> I had written a Ruby script that allowed me to import my old emails from Kmail > to VM. Kmail uses one file per mail, while VM (to the best of my > knowledge) uses mbox, so what I did was, quite simply, concatenate all
Ugh. VM uses mbox by default, but it doesn't have to. mbox is about the cruddiest most idiotic mailbox format out there, and Unix has a lot to answer for. If you like messages all in one file, I suggest using mmdf format, which is an awful lot easier to deal with. (setq vm-default-folder-type 'mmdf)
> Thanks for the advice, though. I'll have to read about mmdf. I know > nothing about it.
MMDF itself is a mail system which is probably now used hardly anywhere. But its mailbox format is very simple, and has the great advantage that it doesn't require the From-stuffing of mbox.
Every message is both preceded and followed by a line of the form ^A^A^A^A (where ^A stands for the character \001, as usual).
Of course, it is still theoretically possible to cause chaos by putting those lines into a message with Content-Transfer-Encoding: 8bit but it hasn't happened to me yet!
A sensible format would use content-length, and there is a From_with_content_length or similar variant of mbox, but I don't know how widely supported and robust it is.
>> The VM code is in the function vm-find-leading-message-separator in >> the file vm-folder.el. All that it looks for is "From " at the >> beginning of the line and some sequence of digits at the end of the >> line. The RFC 4155 is unfortunately very loose. It doesn't guarantee >> the leading separator line to be a particular format. So, VM can't >> assume too much about the format of the separator line.
> Given the RFC says that there has to be an email address «conformant with > the "addr-spec" syntax from RFC 2822», couldn't the regexp in > vm-find-leading-message-separator be something more like the following?
> "^From .+@.+ .+ [+-]?[0-9][0-9][0-9][0-9]$"
> That would eliminate most, though probably not all, false positives.
After looking at the RFC 4122 more closely, I notice that it defines a "default" mbox format that is more tightly defined. This RFC came out only in 2005, it seems, and then too it is only an "informational" RFC, not a standard. So, if we want compatibility with other mail tools, we cannot depend on it.
My Thunderbird folders have leading separator lines like the following:
From - Sun Oct 03 00:20:05 2010
VM itself produces separator lines like this:
From VM Mon Feb 6 16:51:47 2006
Neither of these would satisfy your syntax.
There could be value in defining a new mbox type for VM that is RFC 4122-compliant. I will think about it.
On 2/13/2011 8:22 PM, José Manuel García-Patos wrote:
> Try "^From .+[@]?.+ .+ [+-]?[0-9][0-9][0-9][0-9]$", then. It works for > me.
I am adding a variable vm-leading-message-separator-regexp-From_ which you can modify if you wish.
I am reluctant to hard code a new regexp without a careful review. Users will have old mbox's dating back to years. If the message separators there don't satisfy the tighter constraints, then messages get clubbed together. Come to think of it, we used to have a lot of problems of that kind in the early days of VM.
No, because content-length is number of bytes in the message body, which includes end-of-line characters and such, so it varies across systems. If you move the file from one system to another, it breaks.
Usenet's Lines: would work, though.
I've had an idea to use the dot-stuffing algorithm used by SMTP: a message is terminated by a line containing nothing but '.'; any such line in the message gets another '.' added at the front.
A file would be multiple messages, each followed by a '.' line. It would do line boundaries in whatever way makes it a text file in the system where it's stored (so might have to be converted when moved to a different system).
On 2011-02-15, Kurt Hackenberg <k...@pnnnnx.kom> wrote:
> Julian Bradfield <j...@inf.ed.ac.uk> wrote: >>A sensible format would use content-length... > No, because content-length is number of bytes in the message body, > which includes end-of-line characters and such, so it varies across > systems. If you move the file from one system to another, it breaks.
Er, it's up to the MTA/MUA to maintain the Content-Length correctly if they do anything other than make a perfect binary copy. The format already exists, and VM already supports it.
> I've had an idea to use the dot-stuffing algorithm used by SMTP: a > message is terminated by a line containing nothing but '.'; any such > line in the message gets another '.' added at the front.
You also need to stuff any line containing only dots, of course.That's what From-stuffing is like, and it's evil. It should not be necessary to mangle a message in order to store it.
>On 2011-02-15, Kurt Hackenberg <k...@pnnnnx.kom> wrote: >> Julian Bradfield <j...@inf.ed.ac.uk> wrote: >>>A sensible format would use content-length... >> No, because content-length is number of bytes in the message body, >> which includes end-of-line characters and such, so it varies across >> systems. If you move the file from one system to another, it breaks.
>Er, it's up to the MTA/MUA to maintain the Content-Length correctly if >they do anything other than make a perfect binary copy. >The format already exists, and VM already supports it.
I was talking about moving the file outside of the mail software. Say, with FTP or similar. Useful for import/export, archive, etc.
>> I've had an idea to use the dot-stuffing algorithm used by SMTP: a >> message is terminated by a line containing nothing but '.'; any such >> line in the message gets another '.' added at the front.
>You also need to stuff any line containing only dots, of course.
That's what I said.
>That's what From-stuffing is like, and it's evil. It should not be >necessary to mangle a message in order to store it.
No, SMTP's dot-stuffing is reversible. The original message is easily restored. Every mail message sent across the Internet goes through this: the sending MTA dot-stuffs the message, the receiver undoes it.
From-stuffing is not reversible, because a message line starting with ">From " doesn't get a second '>', so it becomes impossible to know whether the '>' was there originally or added later. That information is lost. That's the evil.
SMTP's algorithm, from RFC 5321 (descendant of 821), section 4.5.2:
o Before sending a line of mail text, the SMTP client checks the first character of the line. If it is a period, one additional period is inserted at the beginning of the line.
o When a line of mail text is received by the SMTP server, it checks the line. If the line is composed of a single period, it is treated as the end of mail indicator. If the first character is a period and there are other characters on the line, the first character is deleted.
On 2011-02-15, Kurt Hackenberg <k...@pnnnnx.kom> wrote:
> Julian Bradfield <j...@inf.ed.ac.uk> wrote: >>On 2011-02-15, Kurt Hackenberg <k...@pnnnnx.kom> wrote: >>> I've had an idea to use the dot-stuffing algorithm used by SMTP: a >>> message is terminated by a line containing nothing but '.'; any such >>> line in the message gets another '.' added at the front.
>>You also need to stuff any line containing only dots, of course.
> That's what I said.
It may be what you meant, but it's not what you said. Read what you said!
> No, SMTP's dot-stuffing is reversible. The original message is easily
Yes, SMTP's is. But it still shouldn't be necessary to mangle a message in order to store it!