Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: mbox format problem

20 views
Skip to first unread message
Message has been deleted
Message has been deleted

Uday Reddy

unread,
Feb 11, 2011, 6:41:27 PM2/11/11
to José Manuel García-Patos
On 2/11/2011 9:14 PM, José Manuel García-Patos wrote:

>
> So, my question is: Was there something I overlooked there, is it
> really a VM bug, or none of the above? Thanks in advance.

The VM code is in the function vm-find-leading-message-separator in the
file vm-folder.el. All that it looks for is "From " at the beginning of
the line and some sequence of digits at the end of the line. The RFC
4155 is unfortunately very loose. It doesn't guarantee the leading
separator line to be a particular format. So, VM can't assume too much
about the format of the separator line.

It is common practice to put a ">" escape marker for any line that
begins with "From" right after a blank line. If you do that, then VM
should be able to handle it fine.

Let me also suggest a simpler method to combine several single-message
files into an mbox folder. You can put them all into a directory,
select all of them in a GUI and then drag-and-drop them into a new
Thunderbird folder. (You might need to rename the files to have a
".eml" extension. I am not sure.) Then you can move the Thunderbird
folder to somewhere else and VM will handle it fine.

Cheers,
Uday

Julian Bradfield

unread,
Feb 12, 2011, 5:09:34 AM2/12/11
to
On 2011-02-11, José Manuel García-Patos <jmga...@madrid.uned.es> wrote:
> I had written a Ruby script that allowed me to import my old emails from Kmail
> to VM. Kmail uses one file per mail, while VM (to the best of my
> knowledge) uses mbox, so what I did was, quite simply, concatenate all

Ugh. VM uses mbox by default, but it doesn't have to. mbox is about
the cruddiest most idiotic mailbox format out there, and Unix has a
lot to answer for.
If you like messages all in one file, I suggest using mmdf format,
which is an awful lot easier to deal with.
(setq vm-default-folder-type 'mmdf)

Message has been deleted
Message has been deleted

Julian Bradfield

unread,
Feb 12, 2011, 5:28:17 PM2/12/11
to
> Thanks for the advice, though. I'll have to read about mmdf. I know
> nothing about it.

MMDF itself is a mail system which is probably now used hardly
anywhere.
But its mailbox format is very simple, and has the great advantage
that it doesn't require the From-stuffing of mbox.

Every message is both preceded and followed by a line of the form
^A^A^A^A
(where ^A stands for the character \001, as usual).

Of course, it is still theoretically possible to cause chaos by
putting those lines into a message with
Content-Transfer-Encoding: 8bit
but it hasn't happened to me yet!

A sensible format would use content-length, and there is a
From_with_content_length or similar variant of mbox, but I don't know
how widely supported and robust it is.


Uday Reddy

unread,
Feb 13, 2011, 4:47:28 AM2/13/11
to José Manuel García-Patos
On 2/12/2011 7:24 PM, José Manuel García-Patos wrote:

> Uday Reddy<uDOTsD...@cs.bham.ac.uk> writes:
>
>> The VM code is in the function vm-find-leading-message-separator in
>> the file vm-folder.el. All that it looks for is "From " at the
>> beginning of the line and some sequence of digits at the end of the
>> line. The RFC 4155 is unfortunately very loose. It doesn't guarantee
>> the leading separator line to be a particular format. So, VM can't
>> assume too much about the format of the separator line.
>
> Given the RFC says that there has to be an email address «conformant with
> the "addr-spec" syntax from RFC 2822», couldn't the regexp in
> vm-find-leading-message-separator be something more like the following?
>
> "^From .+@.+ .+ [+-]?[0-9][0-9][0-9][0-9]$"
>
> That would eliminate most, though probably not all, false positives.

After looking at the RFC 4122 more closely, I notice that it defines a
"default" mbox format that is more tightly defined. This RFC came out
only in 2005, it seems, and then too it is only an "informational" RFC,
not a standard. So, if we want compatibility with other mail tools, we
cannot depend on it.

My Thunderbird folders have leading separator lines like the following:

From - Sun Oct 03 00:20:05 2010

VM itself produces separator lines like this:

From VM Mon Feb 6 16:51:47 2006

Neither of these would satisfy your syntax.

There could be value in defining a new mbox type for VM that is RFC
4122-compliant. I will think about it.

Cheers,
Uday

Message has been deleted

Uday Reddy

unread,
Feb 14, 2011, 8:20:10 AM2/14/11
to
On 2/13/2011 8:22 PM, José Manuel García-Patos wrote:

> Try "^From .+[@]?.+ .+ [+-]?[0-9][0-9][0-9][0-9]$", then. It works for
> me.

I am adding a variable vm-leading-message-separator-regexp-From_ which
you can modify if you wish.

I am reluctant to hard code a new regexp without a careful review.
Users will have old mbox's dating back to years. If the message
separators there don't satisfy the tighter constraints, then messages
get clubbed together. Come to think of it, we used to have a lot of
problems of that kind in the early days of VM.

Cheers,
Uday

Kurt Hackenberg

unread,
Feb 14, 2011, 7:31:10 PM2/14/11
to

Julian Bradfield <j...@inf.ed.ac.uk> wrote:

>A sensible format would use content-length...

No, because content-length is number of bytes in the message body,
which includes end-of-line characters and such, so it varies across
systems. If you move the file from one system to another, it breaks.

Usenet's Lines: would work, though.

I've had an idea to use the dot-stuffing algorithm used by SMTP: a
message is terminated by a line containing nothing but '.'; any such
line in the message gets another '.' added at the front.

A file would be multiple messages, each followed by a '.' line. It
would do line boundaries in whatever way makes it a text file in the
system where it's stored (so might have to be converted when moved to
a different system).

Julian Bradfield

unread,
Feb 15, 2011, 3:47:24 AM2/15/11
to
On 2011-02-15, Kurt Hackenberg <k...@pnnnnx.kom> wrote:
> Julian Bradfield <j...@inf.ed.ac.uk> wrote:
>>A sensible format would use content-length...
> No, because content-length is number of bytes in the message body,
> which includes end-of-line characters and such, so it varies across
> systems. If you move the file from one system to another, it breaks.

Er, it's up to the MTA/MUA to maintain the Content-Length correctly if
they do anything other than make a perfect binary copy.
The format already exists, and VM already supports it.

> I've had an idea to use the dot-stuffing algorithm used by SMTP: a
> message is terminated by a line containing nothing but '.'; any such
> line in the message gets another '.' added at the front.

You also need to stuff any line containing only dots, of course.That's
what From-stuffing is like, and it's evil. It should not be necessary
to mangle a message in order to store it.

Kurt Hackenberg

unread,
Feb 15, 2011, 4:53:39 PM2/15/11
to

Julian Bradfield <j...@inf.ed.ac.uk> wrote:
>On 2011-02-15, Kurt Hackenberg <k...@pnnnnx.kom> wrote:
>> Julian Bradfield <j...@inf.ed.ac.uk> wrote:
>>>A sensible format would use content-length...
>> No, because content-length is number of bytes in the message body,
>> which includes end-of-line characters and such, so it varies across
>> systems. If you move the file from one system to another, it breaks.
>
>Er, it's up to the MTA/MUA to maintain the Content-Length correctly if
>they do anything other than make a perfect binary copy.
>The format already exists, and VM already supports it.

I was talking about moving the file outside of the mail software.
Say, with FTP or similar. Useful for import/export, archive, etc.

>> I've had an idea to use the dot-stuffing algorithm used by SMTP: a
>> message is terminated by a line containing nothing but '.'; any such
>> line in the message gets another '.' added at the front.
>
>You also need to stuff any line containing only dots, of course.

That's what I said.

>That's what From-stuffing is like, and it's evil. It should not be
>necessary to mangle a message in order to store it.

No, SMTP's dot-stuffing is reversible. The original message is easily
restored. Every mail message sent across the Internet goes through
this: the sending MTA dot-stuffs the message, the receiver undoes it.

From-stuffing is not reversible, because a message line starting with
">From " doesn't get a second '>', so it becomes impossible to know
whether the '>' was there originally or added later. That information
is lost. That's the evil.

SMTP's algorithm, from RFC 5321 (descendant of 821), section 4.5.2:

o Before sending a line of mail text, the SMTP client checks the
first character of the line. If it is a period, one additional
period is inserted at the beginning of the line.

o When a line of mail text is received by the SMTP server, it checks
the line. If the line is composed of a single period, it is
treated as the end of mail indicator. If the first character is a
period and there are other characters on the line, the first
character is deleted.

Julian Bradfield

unread,
Feb 16, 2011, 5:15:20 AM2/16/11
to
On 2011-02-15, Kurt Hackenberg <k...@pnnnnx.kom> wrote:
> Julian Bradfield <j...@inf.ed.ac.uk> wrote:
>>On 2011-02-15, Kurt Hackenberg <k...@pnnnnx.kom> wrote:
>>> I've had an idea to use the dot-stuffing algorithm used by SMTP: a
>>> message is terminated by a line containing nothing but '.'; any such
>>> line in the message gets another '.' added at the front.
>>
>>You also need to stuff any line containing only dots, of course.
>
> That's what I said.

It may be what you meant, but it's not what you said. Read what you
said!

> No, SMTP's dot-stuffing is reversible. The original message is easily

Yes, SMTP's is. But it still shouldn't be necessary to mangle a message in
order to store it!

0 new messages