Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Question on mail storage format and other improvements

6 views
Skip to first unread message

peris...@gmail.com

unread,
Jun 27, 2008, 11:28:00 AM6/27/08
to
Hi all,
first of all, sorry for my bad English, since I am Italian and don't
know English at all (at least some words).
I wish to ask to TB 3 developers if they are planning or not to change
the TB3 mail format. I have to say that current MBOX format is a pain
when you have some corporate folders with GByte of emails, I think
that a simpler format that use normal operating system directories
with emails stored as single files could be far better, it can avoid
the need to compact folders, it can foster the work of tools like
google desktop search, spotlight, windows desktop search etc etc.
Finally it gives more control to the user about its own data. Same
thing for contacts, they can be stored as a bunch of vcard files, it
can benefit, among other things, the exchange of address books with
cellphones via bluetooth, without rely on non-standard tools like
Nokia PC Suite or SyncML crap. In this way the user would be allowed
to manage his own data as simple standard files, like .eml and .vcf,
that are understood from every mail application on earth, giving users
a great freedom. By the way those formats are already used by Windows
Mail. Finally, it would be fine to add contacts synchronization
functionality and support for TB and SB to Mozilla Weave, just to have
something similar to Plaxo, Apple MobileMe, Funambol, and so on. What
do you think about?
Thank you for your attention.

Joshua Cranmer

unread,
Jun 27, 2008, 11:59:46 AM6/27/08
to
peris...@gmail.com wrote:
> I wish to ask to TB 3 developers if they are planning or not to change
> the TB3 mail format. I have to say that current MBOX format is a pain
> when you have some corporate folders with GByte of emails,

Well, it's better than Outlook's "your entire mail in but a single file"
(we do it on a per-folder basis).

At this point, the change to a different file format is sufficiently
complex that I doubt it could be finished before the release of TB 3.
There are a few bugs on the topic (most notable one on making it
pluggable), most notably
<https://bugzilla.mozilla.org/show_bug.cgi?id=402392>.

> I think
> that a simpler format that use normal operating system directories
> with emails stored as single files could be far better, it can avoid
> the need to compact folders, it can foster the work of tools like
> google desktop search, spotlight, windows desktop search etc etc.

First off, we seem to be handling the latter two sufficiently well
without moving off of mbox. Secondly, I have misgivings about the
maildir format, as I explain here:
<http://quetzalcoatal.blogspot.com/2008/03/mail-storage.html> (not that
that is a complete listing of all the problems I have with the format).

> Finally it gives more control to the user about its own data. Same
> thing for contacts, they can be stored as a bunch of vcard files, it
> can benefit, among other things, the exchange of address books with
> cellphones via bluetooth, without rely on non-standard tools like
> Nokia PC Suite or SyncML crap.

Many of my problems with maildir would extend to this system as well,
most notably my opinion that such an implementation would suffer from
performance problems.

> In this way the user would be allowed
> to manage his own data as simple standard files, like .eml and .vcf,
> that are understood from every mail application on earth, giving users
> a great freedom.

Mbox is one of the most widely-supported formats. Just not by a company
which thrives on proprietary formats.

> Finally, it would be fine to add contacts synchronization
> functionality and support for TB and SB to Mozilla Weave, just to have
> something similar to Plaxo, Apple MobileMe, Funambol, and so on. What
> do you think about?

There is currently an effort going on to synchronize TB's address book
with Google Contacts, and it also supports using Outlooks or OS X's
address book (the latter now by default). It is also possible to
synchronize with palms by an extension.

Chris Barnes

unread,
Jun 27, 2008, 1:01:55 PM6/27/08
to
peris...@gmail.com wrote:
> Hi all,
> first of all, sorry for my bad English, since I am Italian and don't
> know English at all (at least some words).
> I wish to ask to TB 3 developers if they are planning or not to change
> the TB3 mail format. I have to say that current MBOX format is a pain
> when you have some corporate folders with GByte of emails, I think
> that a simpler format that use normal operating system directories
> with emails stored as single files could be far better,

I disagree with this idea - STRONGLY.
Each folder *should* be it's on single file (and mbox format is just
fine).

The problem you describe is a USER PROBLEM - it is insane and massively
disorganized for a person to have GBs of data in a single mailbox
folder. If a person wants to do stupid things, then they deserve the
consequences of their actions.

--

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Chris Barnes AOL IM: CNBarnes
ch...@txbarnes.com Yahoo IM: chrisnbarnes

You always have freedom of choice, but you never have freedom of
consequence.

Kent James

unread,
Jun 27, 2008, 1:26:56 PM6/27/08
to
Chris Barnes wrote:
> it is insane and massively disorganized for a person to have GBs of data
> in a single mailbox folder.

This will not be true in a world where organization is done by tags.
"Folders" are then a storage issue, and not an organization issue. Users
should not then need to artificially store email in multiple folders
simply for performance or data protection reasons.

Joshua Cranmer

unread,
Jun 27, 2008, 1:54:48 PM6/27/08
to

True, but here's an idea I've bounced around my head for a while:

If an mbox gets above a certain size, prune it and use multiple mbox files.
E.g. Inbox, Inbox.1, Inbox.2, etc.

peris...@gmail.com

unread,
Jun 27, 2008, 6:00:04 PM6/27/08
to

I actually done it over an year ago, I split all my corporate mail in
different local folders, one for each year. And it worked out fine.
Usually I keep the last month mails in my IMAP inbox folder, I keep
both sent and received email in the inbox and set it as threaded view,
so I can follow all the threads like newsgroups. It is q very nice
function. I also use a lot of search folders, another nifty
functionality of TB. The way to keep all the emails together in big
folders and to use saved searches is a smart one. Computers are made
to keep track of big amount of data, like emails, for human beings.
So, IMHO it is not my duty to keep folders organized, computers should
do it for me. This is my way to conceive IT.

peris...@gmail.com

unread,
Jun 27, 2008, 6:08:05 PM6/27/08
to
On 27 Giu, 19:01, Chris Barnes <ch...@txbarnes.com> wrote:

> perista...@gmail.com wrote:
> > Hi all,
> > first of all, sorry for my bad English, since I am Italian and don't
> > know English at all (at least some words).
> > I wish to ask to TB 3 developers if they are planning or not to change
> > the TB3 mail format. I have to say that current MBOX format is a pain
> > when you have some corporate folders with GByte of emails, I think
> > that a simpler format that use normal operating system directories
> > with emails stored as single files could be far better,
>
> I disagree with this idea - STRONGLY.
> Each folder *should* be it's on single file (and mbox format is just
> fine).

can you explain us why the mbox is better than maildir-similar format?
I respect your opinion, but it would be nice to know the rational
behind your opinion.

>
> The problem you describe is a USER PROBLEM - it is insane and massively
> disorganized for a person to have GBs of data in a single mailbox
> folder. If a person wants to do stupid things, then they deserve the
> consequences of their actions.

I currently have a couple of 3 Gbytes-sized mailbox folder without
any problem, because I know what I am doing. Sometimes, for the
current year folder it have to delete the .msf files to rebuild
screwed indexes, it is just a shot in a couple of months. This is ok
for me because I know what I am doing. But the average user don'y know
anything about TB internals. So if TB team wants to spread TB use they
have to deal with such issues.
Kind Regards.

peris...@gmail.com

unread,
Jun 27, 2008, 6:34:45 PM6/27/08
to
On 27 Giu, 17:59, Joshua Cranmer <Pidgeo...@verizon.net> wrote:

> perista...@gmail.com wrote:
> > I wish to ask to TB 3 developers if they are planning or not to change
> > the TB3 mail format. I have to say that current MBOX format is a pain
> > when you have some corporate folders with GByte of emails,
>
> Well, it's better than Outlook's "your entire mail in but a single file"
> (we do it on a per-folder basis).

In my corporation my colleagues do use Outlook 2007, I use TB :)

>
> At this point, the change to a different file format is sufficiently
> complex that I doubt it could be finished before the release of TB 3.
> There are a few bugs on the topic (most notable one on making it
> pluggable), most notably
> <https://bugzilla.mozilla.org/show_bug.cgi?id=402392>.
>
> > I think
>
> > that a simpler format that use normal operating system directories
> > with emails stored as single files could be far better, it can avoid
> > the need to compact folders, it can foster the work of tools like
> > google desktop search, spotlight, windows desktop search etc etc.
>
> First off, we seem to be handling the latter two sufficiently well
> without moving off of mbox. Secondly, I have misgivings about the
> maildir format, as I explain here:
> <http://quetzalcoatal.blogspot.com/2008/03/mail-storage.html> (not that
> that is a complete listing of all the problems I have with the format).

The only real problem, IMHO is the number of .eml files per directory,
some filesystems don't scale very well with thousands of files per one
directory, perhaps one can address this problem by keeping mail in
separate files and index files in SQLite
About the metadata I thing that the size overhead is acceptable, it
is not 1kb per file, it is few bytes, in modern systems the waste of
space is located elsewhere :)

>
> > Finally it gives more control to the user about its own data. Same
> > thing for contacts, they can be stored as a bunch of vcard files, it
> > can benefit, among other things, the exchange of address books with
> > cellphones via bluetooth, without rely on non-standard tools like
> > Nokia PC Suite or SyncML crap.
>
> Many of my problems with maildir would extend to this system as well,
> most notably my opinion that such an implementation would suffer from
> performance problems.

I think not, if you index the files with a sqlite db you should be ok
(but noticeably, sqlite needs also a vacuum maintenance, but it is
better to defrag a db of mail indexes instead of the all mail archive
itself).

>
> > In this way the user would be allowed
>
> > to manage his own data as simple standard files, like .eml and .vcf,
> > that are understood from every mail application on earth, giving users
> > a great freedom.
>
> Mbox is one of the most widely-supported formats. Just not by a company
> which thrives on proprietary formats.

Storing single mails as plain unicode text, it is the most compatible
format you can use. In this case (Windows Mail) Microsoft did
unusually a good job storing mails as single .eml files and contacts
as .vcf files.

>
> > Finally, it would be fine to add contacts synchronization
>
> > functionality and support for TB and SB to Mozilla Weave, just to have
> > something similar to Plaxo, Apple MobileMe, Funambol, and so on. What
> > do you think about?
>
> There is currently an effort going on to synchronize TB's address book
> with Google Contacts, and it also supports using Outlooks or OS X's
> address book (the latter now by default). It is also possible to
> synchronize with palms by an extension.

Good thing, but I would prefer to leverage Mozilla's Weave project, to
have an all-rounded management of personal data.

Leni

unread,
Jun 27, 2008, 7:45:25 PM6/27/08
to peris...@gmail.com, dev-apps-t...@lists.mozilla.org
peris...@gmail.com wrote:
> Finally, it would be fine to add contacts synchronization
> functionality and support for TB and SB to Mozilla Weave, just to have
> something similar to Plaxo, Apple MobileMe, Funambol, and so on. What
> do you think about?

The Zindus extension syncs Google Contacts and Zimbra contacts with
Thunderbird: http://www.zindus.com/

The release after next will have a "sync with multiple servers" feature.

Leni.

Robert Kaiser

unread,
Jun 28, 2008, 4:14:10 PM6/28/08
to
peris...@gmail.com wrote:
> I have to say that current MBOX format is a pain
> when you have some corporate folders with GByte of emails, I think
> that a simpler format that use normal operating system directories
> with emails stored as single files could be far better

Hehe, that's fun. Some people want us to change to storing all the
messages in one database file as they claim the performance is better
(e.g. opening fewer files is always good for perf), and some want us to
split everything up even further and do more files (maildir format).

Maybe our "something in between" solution isn't that bad after all. :)

Robert Kaiser

Nelson Bolyard

unread,
Jun 28, 2008, 5:15:33 PM6/28/08
to
Chris Barnes wrote, On 2008-06-27 10:01:
> peris...@gmail.com wrote:
>> Hi all,
>> first of all, sorry for my bad English, since I am Italian and don't
>> know English at all (at least some words).
>> I wish to ask to TB 3 developers if they are planning or not to change
>> the TB3 mail format. I have to say that current MBOX format is a pain
>> when you have some corporate folders with GByte of emails, I think
>> that a simpler format that use normal operating system directories
>> with emails stored as single files could be far better,
>
> I disagree with this idea - STRONGLY.
> Each folder *should* be it's on single file (and mbox format is just
> fine).

More that "just fine", it is widely understood, a "lingua Franca" of the
email world. It means that numerous tools other than TB are able to handle
TB's mail folders quite well. That would surely not be true if mail
folders became sqlite files or other (RDF, god help us!).

> The problem you describe is a USER PROBLEM - it is insane and massively
> disorganized for a person to have GBs of data in a single mailbox
> folder. If a person wants to do stupid things, then they deserve the
> consequences of their actions.

Hear! Hear!

peris...@gmail.com

unread,
Jun 30, 2008, 4:07:31 AM6/30/08
to
On 28 Giu, 22:14, Robert Kaiser <ka...@kairo.at> wrote:

the work done on TB is fine, but, still, mbox does not scale well,
this is not an opinion, this is my direct trial.

ovidiu

unread,
Jun 30, 2008, 5:24:27 AM6/30/08
to
peris...@gmail.com wrote:
> On 27 Giu, 17:59, Joshua Cranmer <Pidgeo...@verizon.net> wrote:
>
>> perista...@gmail.com wrote:
>>
...

>> > I think
>>
>>
>>> that a simpler format that use normal operating system directories
>>> with emails stored as single files could be far better, it can avoid
>>> the need to compact folders, it can foster the work of tools like
>>> google desktop search, spotlight, windows desktop search etc etc.
>>>
>> First off, we seem to be handling the latter two sufficiently well
>> without moving off of mbox. Secondly, I have misgivings about the
>> maildir format, as I explain here:
>> <http://quetzalcoatal.blogspot.com/2008/03/mail-storage.html> (not that
>> that is a complete listing of all the problems I have with the format).
>>
>
> The only real problem, IMHO is the number of .eml files per directory,
> some filesystems don't scale very well with thousands of files per one
> directory, perhaps one can address this problem by keeping mail in
> separate files and index files in SQLite
>
>
Here are few others that derive from this split:
-you split the mail management between OS tools and TB tools. Basically,
2 ways of managing same data, 2 sets of similar tools, searches, tags,
index.. yet separated (user experience issues may arrive) Eventually,
why use TB, or when use OS? (Was I tagging that in os or TB? Where is my
search?)
-may have to eventually understand/compel the OS's ways of indexing, or
the way os may add somethings to the eml file ..
-in this case (split) the mail program will probably be just a viewer of
eml msg and OS or try to share more than just the files with the OS
-It will open another Pandora's box (or mbox :) ) considering changes in
OS's and their influence over this structures
-could raise whole new bunch of issues when different Mail Apps and
explorer will acces same files (TB and WinMail for ex) and the
structure of data changes (reindex etc ..)
-nevertheless, cross platform is one thing that makes it complicated
enough ..

>>> Finally it gives more control to the user about its own data. Same
>>> thing for contacts, they can be stored as a bunch of vcard files, it
>>> can benefit, among other things, the exchange of address books with
>>> cellphones via bluetooth, without rely on non-standard tools like
>>> Nokia PC Suite or SyncML crap.
>>>
>> Many of my problems with maildir would extend to this system as well,
>> most notably my opinion that such an implementation would suffer from
>> performance problems.
>>
>
> I think not, if you index the files with a sqlite db you should be ok
> (but noticeably, sqlite needs also a vacuum maintenance, but it is
> better to defrag a db of mail indexes instead of the all mail archive
> itself).
>
>
>> > In this way the user would be allowed
>>
>>
>>> to manage his own data as simple standard files, like .eml and .vcf,
>>> that are understood from every mail application on earth, giving users
>>> a great freedom.
>>>
>> Mbox is one of the most widely-supported formats. Just not by a company
>> which thrives on proprietary formats.
>>
>
> Storing single mails as plain unicode text, it is the most compatible
> format you can use. In this case (Windows Mail) Microsoft did
> unusually a good job storing mails as single .eml files and contacts
> as .vcf files.
>
That is, of course, the trigger of this discuss. And is a thing that has
drawn my attention too. But I can speculate that MS had several reasons
that may differ from a mozilla case:
-win mail is designed to better integrate with os's features
-no point in loading specific features in mail app when presenting it in
vista (searches, tags ..)
-may just be the simplest structural answer to the above
-after all, os's are finally moving towards a more flexible db like
approach for users than the basic folder hands-on rigid style and all
bundled apps are supposed to follow

Well, for a TB case, that would become a huge effort for just making
another viewer/composer for mail and even performance issues would
become very much dependent on os performance. Meaning that integration
is great but will come with a price and result that may not make sense,
for Tb, anyway.

As for the contacts, that is another issue, presenting them to all OS
apps .. But maybe I want more than just that vcard .contact style. [See
the discussions here about AB hub etc ..] Well, AB now is less than
that, but I always like to think of more than imitating or reaching
normal expectations ..


Ps. I have to admit I was immediately wondering about same thing and
took a while to actually draw conclusions as above and maybe not even
now really convinced of them. Maybe this needs some thought in a future
restructuring ..

0 new messages