Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Using maildir type mailboxes

3 views
Skip to first unread message

jennyw

unread,
Nov 14, 2001, 7:00:27 PM11/14/01
to
Is there a way to get UW IMAP (2001a, I believe) to use maildir type
mailboxes in users' home directories? I've looked through the IMAP book,
but didn't find any references to this.

I'm currently running a mail server on Debian GNU/Linux 2.2 (potato). The
mta is postfix and I'm using procmail as an mda. Currently, all mail goes to
maildir format mailboxes. The IMAP server can read the inboxes just fine,
but when it creates folders of its own, it creates them in (what I think is)
mbox format in the user's home directory. I'd rather have the folders be
created inside the Maildir directory in maildir format (which I currently
have procmail doing).

Any suggestions?

If UW IMAP cannot do maildir, what's the suggested thing to do? For
example, should I try to get everything to be mbox? Or should I leave the
inbox as maildir and just have the folders be mbox? Or are the folders even
mbox? I'm not really sure ...

Thanks!

Jen


Sam

unread,
Nov 14, 2001, 7:31:45 PM11/14/01
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

In article <9sv0f9$huf$1...@news.stanford.edu>,
"jennyw" <donots...@dangerousideas.com> writes:

> Is there a way to get UW IMAP (2001a, I believe) to use maildir type

[ ... ]

> I'm currently running a mail server on Debian GNU/Linux 2.2 (potato). The
> mta is postfix and I'm using procmail as an mda. Currently, all mail goes to
> maildir format mailboxes. The IMAP server can read the inboxes just fine,

Right about now, I suspect that someone else is having a major case of
indigestion.

But, nevertheless, moving on...

> If UW IMAP cannot do maildir, what's the suggested thing to do? For

Replace UW-IMAP with Courier-IMAP. UW-IMAP does not support maildirs.
Your UW-IMAP server reads maildirs only because of a separate patch that
adds half-baked maildir support, and you are seeing some of the cracks in
that particular implementation.

Either the postfix or the courier-users mailing list should be able to help
with any issues. There are plenty of folks who are using this combination.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: http://www.geocities.com/SiliconValley/Peaks/5799/GPGKEY.txt

iD8DBQE78wzv3ejdWUS0ltARAj38AJ9tdc7XuWS0UwfuSuoLbnW5+SQ+sACgmaP3
o1zlCDYNWDcW9WcHuzNxT7c=
=dmeH
-----END PGP SIGNATURE-----

Mark Crispin

unread,
Nov 14, 2001, 10:53:34 PM11/14/01
to jennyw
On Wed, 14 Nov 2001, jennyw wrote:
> Currently, all mail goes to
> maildir format mailboxes. The IMAP server can read the inboxes just fine,
> but when it creates folders of its own, it creates them in (what I think is)
> mbox format in the user's home directory. I'd rather have the folders be
> created inside the Maildir directory in maildir format (which I currently
> have procmail doing).

If UW imapd can read you maildir format mailboxes, this means that you
have a version of UW imapd with a third-party driver to implement maildir
support. This driver is not supported by me, so I can't help you on
details on its performance or behavior.

Fortunately, I *can* help you with your question.

The default driver for newly-created mailboxes is defined at build time by
the following line in imap-????/src/osdep/unix/Makefile:
CREATEPROTO=unixproto

This means that, by default, newly-created mailboxes will be created in
the format supported by the "unix" driver, which is the driver which
implements the traditional UNIX format.

The "proto" part of it refers to a prototype stream used to locate the
create factory method (don't worry about these OOP buzzwords if they're
not meaningful to you).

So, what you need to do is rebuild UW imapd, with CREATEPROTO redefined to
the name of your third party maildir driver. If your maildir driver is
called "maildir", then the line should be:
CREATEPROTO=maildirproto

There is a semi-secret way that you can do this without rebuilding UW
imapd. If you can't rebuild imapd, or don't know how to do it, send me
email and I'll tell you the secret. I normally don't encourage this
particular method except in "last resort" situations; but, since you
already have an environment that's unsupported by me, what's more bit of
unsupported? :-)

> If UW IMAP cannot do maildir, what's the suggested thing to do?

UW imapd as distributed by UW doesn't, but as I indicated above, you seem
to have a modified UW imapd that adds maildir support.

> For
> example, should I try to get everything to be mbox? Or should I leave the
> inbox as maildir and just have the folders be mbox? Or are the folders even
> mbox?

The answers to these questions depend upon what you want to accomplish.
There are various tradeoffs to the choices.

If you are not committed to maildir, or otherwise have no pressing reasons
to use it, then I can make recommendations about a format switch. If, on
the other hand, you are committed to maildir or otherwise have chosen it
as your preferred format, then it would be a waste of your time and mine
to talk about a format switch, and instead we should talk about what you
need to do to make things work best for you.

-- Mark --

http://staff.washington.edu/mrc
Science does not emerge from voting, party politics, or public debate.

Mark Crispin

unread,
Nov 14, 2001, 11:03:48 PM11/14/01
to
On Thu, 15 Nov 2001, Sam wrote:
> Replace UW-IMAP with Courier-IMAP. UW-IMAP does not support maildirs.
> Your UW-IMAP server reads maildirs only because of a separate patch that
> adds half-baked maildir support, and you are seeing some of the cracks in
> that particular implementation.

This is a remarkable series of non sequitors.

If her UW imapd server reads maildir format mailboxes, then it supports
maildir. What my support policies are with regard to third-party
distributions are really irrelevant.

Similarly, the comment about "cracks in that particular implementation" is
nonsense.

Unlike single-format servers such as Sam's Courier-IMAP, UW imapd supports
potentially an infinite number of formats. It is not distributed with
maildir support, but it is distributed with support for several other
formats. When a new mailbox is created, it has to decide which format to
use in creating the mailbox, and this decision is usually made at build
time.

Her problem report is consistant with a UW imapd with has maildir support
added and the default create format left at the distribution setting of
traditional UNIX mailbox format. If she got her UW imapd from Debian, it
is not at all surprising that they would build it in this way; it is the
most sensible build for most sites.

It is not in any way a "crack" that it is configured that way, or is
unable to read her mind and know that she wants maildir. It has to be
told, and there are ways to tell it.

Now, whether or not she as a maildir user would be better served by
Courier-IMAP is a different matter. I don't have any strong opinions on
that question. However, such a decision should be not be made on the
basis of imaginary "cracks" in UW imapd claimed by the author of
Courier-IMAP.

Sam

unread,
Nov 15, 2001, 12:05:56 AM11/15/01
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

In article <Pine.NXT.4.41.01111...@tomobiki-cho.cac.washington.edu>,
Mark Crispin <m...@CAC.Washington.EDU> writes:

> On Thu, 15 Nov 2001, Sam wrote:
>> Replace UW-IMAP with Courier-IMAP. UW-IMAP does not support maildirs.
>> Your UW-IMAP server reads maildirs only because of a separate patch that
>> adds half-baked maildir support, and you are seeing some of the cracks in
>> that particular implementation.
>
> This is a remarkable series of non sequitors.
>
> If her UW imapd server reads maildir format mailboxes, then it supports
> maildir. What my support policies are with regard to third-party
> distributions are really irrelevant.

Really?

You just wrote (and I did check the datestamps, to make sure that I had the
sequence of events correct):

# If UW imapd can read you maildir format mailboxes, this means that you
# have a version of UW imapd with a third-party driver to implement maildir
# support. This driver is not supported by me, so I can't help you on
# details on its performance or behavior.

So they do appear to be quite relevant here.

Furthermore, speaking of non-sequiturs, if you believe that any part of my
initial statement was incorrect, then feel free to point it out. The
distrbuted UW-IMAP does not, in fact (and unless something happened
recently that I'm not aware about) provide a maildir driver. True or
false? True. Her UW-IMAP install reads maildirs only because of an
external patch. True or false? True. Is that patch half-baked? Well, do
you really want me to trawl Google and dig up your own numerous posts that
state precisely that?

So, in the immortal words of Clara Peller: where's the beef?

> Now, whether or not she as a maildir user would be better served by
> Courier-IMAP is a different matter. I don't have any strong opinions on
> that question. However, such a decision should be not be made on the
> basis of imaginary "cracks" in UW imapd claimed by the author of
> Courier-IMAP.

Neither should this kind of a decision be based on FUD regarding the
Courier-IMAP server claimed by the author of the UW-IMAP server.

I agree wholeheartedly.

Now, would you care to retract your absolute, blanket, performance claims?

Hypocrite.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: http://www.geocities.com/SiliconValley/Peaks/5799/GPGKEY.txt

iD8DBQE7800x3ejdWUS0ltARAiKiAJ9NByN5w7kP/mm2qpsvBJzQiryMBACgiaMu
8Ec1ge9c8czAt2LstPmS8xI=
=KSgx
-----END PGP SIGNATURE-----

jennyw

unread,
Nov 15, 2001, 10:47:46 AM11/15/01
to
Thanks for the info! I've given up my quest for maildir because it also
offers some problems when a large number of messages are involved (disk
space, a lot of slowness when you have 120,000 messages in a single folder
(don't ask how I ended up with this situation) ...)

But I'm wondering if the maildir patch affected other stuff in UW IMAP ... I
noticed that I cannot create sub-folders -- I can only create folders on
the top level. Is this by design? I've tried this with a perl script and
with outlook express. I can, it seems create a directory and put mailboxes
in it, but that's now quite the same thing as being able to do it through
the IMAP server. Could this be because it's using two different formats? If
this is the normal behavior of UW IMAP, then there's probably no reason to
change the config.

Thanks again!

Jen

"Mark Crispin" <m...@CAC.Washington.EDU> wrote in message
news:Pine.NXT.4.41.01111...@Tomobiki-Cho.CAC.Washington.EDU
...

jennyw

unread,
Nov 15, 2001, 10:49:26 AM11/15/01
to
Out of curiosity, is it possible to use database instead of files to store
messages? It seems that using either mbox or maildir have issues with high
numbers of messages (I'm using this particular IMAP server to keep an
archive of the mailing lists I'm on, some of which are very high volume).

Jen

Markus Mann

unread,
Nov 15, 2001, 2:24:10 PM11/15/01
to
jennyw wrote:
> Out of curiosity, is it possible to use database instead of files to store
> messages? It seems that using either mbox or maildir have issues with high
> numbers of messages (I'm using this particular IMAP server to keep an
> archive of the mailing lists I'm on, some of which are very high volume).

I am using slrn and slrnpull. slnrpull is used to store the news in a
local spooldir and slrn is a very good newsreader (without gui, but who
cares ];-).

You can find it on http://slrn.sourceforge.net/ there should also be a
Windows binary.

Ciao.
--
Markus Mann . .
];-) /V\
ma...@max93.de, Homepage http://www.max93.de /m m\
Es wurde Win 98 oder besser verlangt, also installierte ich Linux

Mark Crispin

unread,
Nov 15, 2001, 3:06:25 PM11/15/01
to jennyw
On Thu, 15 Nov 2001, jennyw wrote:
> Out of curiosity, is it possible to use database instead of files to store
> messages?

A database for messages has been one of my personal holy grails for some
time. A number of people have talked about a MySQL driver, but I'm not
aware of successes in the freeware world. The closest so far is Cyrus.

The successes that I am aware of have all been with proprietary server
implementations, such as Microsoft Exchange. Exchange is certainly the
most well-known database type server, but there are others.

I suspect that, like the proprietary implementations, a successful
freeware implementation using a database would use one that has been
especially designed with IMAP in mind from the beginning, rather than
trying to do IMAP well with something like MySQL. That's also the barrier
to getting it done; it's one thing to think of "doing IMAP with MySQL",
it's quite another to have to reinvent what MySQL does! :-)

> It seems that using either mbox or maildir have issues with high
> numbers of messages (I'm using this particular IMAP server to keep an
> archive of the mailing lists I'm on, some of which are very high volume).

Yes, this is the attraction behind a database. You get big performance
hits as the number of files in a directory goes.

The mbx format of UW imapd (not to be confused with "mbox" or "traditional
UNIX") uses size counts to link to the next message, so it can locate all
the messages in the mailbox much faster than traditional UNIX format.
There's work in progress on a successor to mbx which uses index records
for the message metadata, so it doesn't have to chase after the messages
at all (that is, after all, a start towards database). Initial testing is
quite promising.

Mark Crispin

unread,
Nov 15, 2001, 3:25:53 PM11/15/01
to jennyw
On Thu, 15 Nov 2001, jennyw wrote:
> Thanks for the info! I've given up my quest for maildir because it also
> offers some problems when a large number of messages are involved (disk
> space, a lot of slowness when you have 120,000 messages in a single folder
> (don't ask how I ended up with this situation) ...)

120,000 message is a lot no matter how one looks at it. It would be a
hefty bite for traditional UNIX or even mbx format. On the other hand, I
know of people who've done worse than that. One of the things triggering
the mbx-successor work is to handle 5-digit message mailboxes with aplomb,
and handle 6-digit message mailboxes well.

120,000 directory entries are a lot for most filesystems to swallow. Some
can't handle that many.

> But I'm wondering if the maildir patch affected other stuff in UW IMAP ... I
> noticed that I cannot create sub-folders -- I can only create folders on
> the top level.

In both mbx and traditional unix format, a mailbox is a terminal node in
the hierarchy tree. Consequently, if you have a mailbox named "foo", you
can't also have a mailbox named "foo/bar".

However, you can have non-terminal nodes (directories) in the hierarchy
tree, and those nodes can be nested. For example, you can have a
directory named foo/, a mailbox named foo/bar, a directory named foo/zap,
a mailbox named foo/zap/zowie, and a mailbox named foo/zap/rap.

The attraction of formats like maildir (leaving aside matters of religion)
is that these formats support so-called "dual-use" mailboxes; that is,
mailboxes which can also be directories. These people believe that the
messages are the terminal node, and don't understand why anyone would
think otherwise.

Partisans of the other way of thinking (that mailboxes should be terminal
nodes) argue that messages are not name-addressable in the hierarchy, and
therefore are not hierarchy nodes; they are just data within a node. They
too don't understand why anyone would think otherwise.

The other question comes up with "what does it mean to open a name?" If
you are of the "directory or mailbox but not both" religion, then "open"
means "open directory" in one case and "open mailbox" in the other. If
you are of the "dual-use directory and mailbox" religion, then you can't
have a single "open" in your interface; you must have separate "open as
directory" and "open as mailbox" operators.

UW imapd doesn't take a position either way. The "driver" (code that
implements the format) simply tells it what religion that particular
format has, and UW imapd does the appropriate IMAP babble to export either
way.

Because UW imapd's default format is traditional UNIX, some individuals
have alleged that "UW imapd doesn't support dual-use mailboxes". Such
statements are completely false; it supports dual-use mailboxes if the
underlying format does.

Now, with all that in mind, I personally don't like the concept of
dual-use mailboxes because of the two open operator problem. Instead of
just clicking on a name, I have to click one widget to see what names may
be under it, and a different widget to see what messages may be in it. So
that's two places that have to be clicked, and two places to look, for
something that may be in a name.

jennyw

unread,
Nov 15, 2001, 4:47:55 PM11/15/01
to
Actually, I think of folders as a visual representation, not a data thing.
The underlying data need not be hierarchially organized. In fact, I'd prefer
if it wasn't. If we had a relational, for example, we could have a
categories table. These could be represented as folders, but they'd point to
the same message. This could have potential advantages for folder
synchronization (you could do it off of message ids instead of traversing
folders, and picking up attribute information like category) or moving
between folders (the message doesn't move since category information is
stored elsewhere).

Of course, I'm sure none of this is new ... Just my 2 cents. Your point is
well taken about MySQL or another existing database probably not being right
for a mail server ... we'll just have to invent a messagedb one day ...

Jen

P.S. Thanks for all the info on UW IMAP. I'm sticking with this for now
since it's working, which is definitely a major point in its favor.

Jeremy Howard

unread,
Nov 16, 2001, 4:00:10 AM11/16/01
to
> A database for messages has been one of my personal holy grails for some
> time. A number of people have talked about a MySQL driver, but I'm not
> aware of successes in the freeware world. The closest so far is Cyrus.
>
> The successes that I am aware of have all been with proprietary server
> implementations, such as Microsoft Exchange. Exchange is certainly the
> most well-known database type server, but there are others.
>
Interestingly, it looks like the next major version of MS Exchange will use
the forthcoming MS SQL Server ('Yukon', IIRC) as it's storage engine rather
than it's existing specialised DB.

BTW, we wrote the last version of FastMail.FM using a SQL DB rather than a
traditional IMAP store. It worked pretty well--folders were not really a
hierarchy, but just used a naming convention to show how they related to
each other. Messages were stored in the DB if they were <64Kb, and stored on
the FS with a link from the DB if they were >64Kb. This was back when
FastMail.FM was only web accessible, not IMAP accessible.

When we decided to support IMAP, we had to either rewrite our web front-end
to talk IMAP rather than SQL, or write a SQL backend to some IMAP server, or
right our own IMAP server with a SQL backend. In the end we went with the
solution about 2 years ago of rewriting the web front-end to talk IMAP (to a
Cyrus server in our case, but there's nothing Cyrus specific in the code).
But we often find ourselves still wishing that we could use SQL to search
for messages, do complex updates, add rich meta-data about folders, and so
forth. We hack around this by, for instance, synchronizing a table of user
folders (with extra meta-data) in our SQL database with the IMAP server
every time someone logs in, and by using IMAP commands and procedural code
to do searches and updates, but a lot of things would certainly be faster
and easier in SQL.

I only mention this because I've seen the SQL backend idea ridiculed on this
NG before, particularly 3 years ago when we were considering doing it (this
is what put us off at the time). With the benefit of more experience in my
opinion there's a lot to be gained in going down this route, particularly in
linking up with an established IMAP server like UW or Cyrus and just
replacing the back-end, rather than writing an IMAP server from scratch.

Ian G Batten

unread,
Nov 16, 2001, 5:04:10 AM11/16/01
to
In article <Pine.NXT.4.41.01111...@Tomobiki-Cho.CAC.Washington.EDU>,

Mark Crispin <m...@CAC.Washington.EDU> wrote:
> A database for messages has been one of my personal holy grails for some
> time. A number of people have talked about a MySQL driver, but I'm not
> aware of successes in the freeware world. The closest so far is Cyrus.

Cyrus works very well, but has interesting performance characteristics
with very large (in terms of message count, not in terms of aggregate
message size, for large meaning some thousands of message) mailboxes.
It's certainly noticeable that using a client which performs no
persistent client-side caching, like mutt, immediately starting a second
session up from a second client machine pointed at the same mailbox is
_far_ faster than the initial connection: I presume this is down to
inode caching in the filesystem on the server.

I think the Cyrus scaling is fantastic: we're still running 1.6.22, and
an Ultra2 with 2x167MHz processors, 768M of RAM and eight 36G spindles
in a RAID 0+1 configuration serves six hundred simulataneous IMAP users,
some POP3 users, handles all inbound and outbound mail and virus scans
all internal and external mail, with the loadaverage rarely rising above
1. I can't help thinking that a good hashed directory structure would
help a great deal, and reiserfs on Linux would be very tempting.
Alternatively, Auspex and NetApp have hashed directory structures, but
I've heard worries about Cyrus on NFS. I don't know if these apply to
single servers, however, or to the case of hoping to have multiple Cyrus
server machines sharing a common spool.

ian
--
PGP: http://www.batten.eu.org/~igb/pgpsignatures/20011116/100405.19611.asc

Mark Crispin

unread,
Nov 16, 2001, 2:06:03 PM11/16/01
to
On 16 Nov 2001, Ian G Batten wrote:
> Cyrus works very well, but has interesting performance characteristics
> with very large (in terms of message count, not in terms of aggregate
> message size, for large meaning some thousands of message) mailboxes.

This is a general issue with the one-message/one-file type formats, which
is why I don't think this is the way to go. I think that Cyrus probably
does the best job that is possible with that class of format (and it does
so quite well).

> I've heard worries about Cyrus on NFS.

Based on what I know about Cyrus, I doubt that it would work on NFS at
all, or at least not reliably. The same is true for the advanced formats
in UW imapd.

Sam

unread,
Nov 16, 2001, 9:41:10 PM11/16/01
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

In article <9t0nvo$o16$1...@news.stanford.edu>,
"jennyw" <donots...@dangerousideas.com> writes:

> Thanks for the info! I've given up my quest for maildir because it also
> offers some problems when a large number of messages are involved (disk
> space, a lot of slowness when you have 120,000 messages in a single folder
> (don't ask how I ended up with this situation) ...)

120K messages is a bit hard to swallow no matter how you go about it. You
need a right tool for the right job. IMAP is meant to be what it's called
- - a mechanism to access your mailbox on a remote server. And, people do
not typically have mailboxes with 120,000 messages. There is a big
difference between a mailbox, and a mail archive.

The right solution for your situation would probably be a local search
engine that indexes that archive, and can handle searches and retrieval.

Finding an acceptable solution for this situation using IMAP is not going
to be easy. IMAP is simply not designed to be used with such a large pile
of mail. When an IMAP client opens this folder, any IMAP client, you're
looking at the client having to download a minimum of a megabyte of data
just to be able to resynchronize with the server, EVEN if the client caches
message metadata.

Forget IMAP. You need a search engine.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: http://www.geocities.com/SiliconValley/Peaks/5799/GPGKEY.txt

iD8DBQE79c4+3ejdWUS0ltARAnRBAJ97hM+aeAHhpo0Phk554pKgPbar0ACfXTas
93Lel0bF1fMTazeB70DnYUE=
=PoI5
-----END PGP SIGNATURE-----

Mark Crispin

unread,
Nov 17, 2001, 12:19:24 AM11/17/01
to
On Sat, 17 Nov 2001, Sam wrote:
> IMAP is simply not designed to be used with such a large pile
> of mail.

That may well be the case with maildir-based IMAP mailboxes.

However, the ability to handle mailboxes with 6-digit message counts is
definitely a design goal of IMAP.

The impetus for the mbx format successor under development is to handle
such mailboxes well, based on the observation that most of the real time
that is needed to open an mbx format file is in seek latency. The actual
amount of I/O needed is quite small; only about 7MB for 120,000 messages.
It's the 120,000 seeks that take all the time.

> When an IMAP client opens this folder, any IMAP client, you're
> looking at the client having to download a minimum of a megabyte of data
> just to be able to resynchronize with the server, EVEN if the client caches
> message metadata.

That only applies to offline and disconnected clients which which to
maintain synchronization of metadata with the server. Online clients do
not do this, and therefore have no need to obtain the metadata of every
message. A well-designed online client can open a 120,000 message mailbox
as fast as it opens as 120 message mailbox.

> Forget IMAP. You need a search engine.

Or, perhaps, forget maildir with IMAP. The non-maildir IMAP world will
soon handle 120,000 message mailboxes with aplomb.

Vlad Lungu

unread,
Nov 17, 2001, 6:42:38 AM11/17/01
to
Mark Crispin wrote:

> On Thu, 15 Nov 2001, jennyw wrote:
>
>>Out of curiosity, is it possible to use database instead of files to store
>>messages?
>>
>
> A database for messages has been one of my personal holy grails for some
> time. A number of people have talked about a MySQL driver, but I'm not
> aware of successes in the freeware world. The closest so far is Cyrus.


I am tinkering with an ODBC driver from time to time, and I wouldn't call it a success.

Yet. Performance-wise, it sucks, especially considering that the ODBC
implementation is far from complete, and the Postgres driver I'm using
is in an early-beta stage,IMHO, but hey, it works.
You may wonder why I chose ODBC and Postgres instead of using direct
access to MySQL or other fast, minimal SQL engine. First, with ODBC you
can change the driver and use Oracle or DB2 instead, without
recompilation. Second, ODBC has X/Open CLI as API, and there are
databases that already use CLI as their API (I believe DB2 is one of
them). With minimal modifications (connection setup/teardown, some
modifications in the Makefile), you could port the driver to a new RDBMS
in, let's say, 24 hours, and eliminate the ODBC layer, for a few
percents of performance. And the extra features provided by a real
database are really useful; disabling autocommit on particular code
paths gave me a huge performance boost, and you can't do that with MySQL.


> The successes that I am aware of have all been with proprietary server
> implementations, such as Microsoft Exchange. Exchange is certainly the
> most well-known database type server, but there are others.
>
> I suspect that, like the proprietary implementations, a successful
> freeware implementation using a database would use one that has been
> especially designed with IMAP in mind from the beginning, rather than
> trying to do IMAP well with something like MySQL. That's also the barrier
> to getting it done; it's one thing to think of "doing IMAP with MySQL",
> it's quite another to have to reinvent what MySQL does! :-)


IMO, trying to reinvent the wheel wouldn't be quite productive. Don't
think "reinvent what MySQL does", think "reinvent what Oracle/DB2/Sybase
does". MySQL takes a lot of shortcuts by design, and they're now trying
do the right thing and implement transactions/proper locking etc, since
people started to realise that speed ain't worth shit when your data is
gone. Think MS Access. And suppose you design a database that
functionally does the job. Then you have to do backups. Online backups.
Are you going to reinvent that too? And so on.

>>It seems that using either mbox or maildir have issues with high
>>numbers of messages (I'm using this particular IMAP server to keep an
>>archive of the mailing lists I'm on, some of which are very high volume).
>>
>
> Yes, this is the attraction behind a database. You get big performance
> hits as the number of files in a directory goes.

> The mbx format of UW imapd (not to be confused with "mbox" or "traditional
> UNIX") uses size counts to link to the next message, so it can locate all
> the messages in the mailbox much faster than traditional UNIX format.
> There's work in progress on a successor to mbx which uses index records
> for the message metadata, so it doesn't have to chase after the messages
> at all (that is, after all, a start towards database). Initial testing is
> quite promising.


A database doesn't give you speed. It gives you scalability and granularity.

If you have 100 messages per folder, you can use flat files and be happy.

A RDBMS is giving you just a 5-10x performance hit. If you have 500
messages, it's maybe 2-3x performance hit. If you have 3000 messages per
folder, it's already too much for flat files. And that's just if you
look at one folder. Let's asume you have 100,000's of users. With a SQL
driver, you can have n SQL servers storing the mailboxes and a table for
each user saying "folder x is stored on server y table WERQWERQW" and so
on, and distribute BOTH the I/O load and much of the CPU load, plus you
can do online replication/backup/folder migration; with standard formats
you either use login referrals/DNS hacks or buy a SAN and a big SUN (pun
intended) and hope it will work.
All those things don't come cheap; you need extra space for indexes,
extra CPU power and RAM for the RDBMS, but if you get it right, you
could grow almost linearly, instead scraping the whole system and
restarting from scratch if you hit a certain limitation.

And I didn't even start talking about extra features like message
searching, keyword indexing and so on. If you want to do funny things
with your messages, it's much easier to do it by interfacing directly
with the RDBMS than accesing the IMAP folders or (oh, the horror, the
horror!) accesing the folders directly. I wrote the mailer for my
driver in 15 minutes (lacking the APPEND method in the driver), as a
shell script, using procmail (duh) and the sql monitor. It doesn't get
any faster or easier than that, believe me.

If anyone is interested, I'm willing to share the code for the driver
in it's present state, even if I'm ashamed of it. Actually, I'm not only
willing, I'm actually very interested, since I'm no SQL guru. And no
IMAP guru either, actually, so I could use some help :-). That's why I
subscribed to the NG in the first place. I was planning to contact Mark
anyway when the code would have reached alpha state (i.e. append/delete
support, minimal error handling), but this moment is as good as any other.


Vlad Lungu

Sam

unread,
Nov 17, 2001, 11:10:55 AM11/17/01
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

In article <Pine.NXT.4.41.01111...@tomobiki-cho.cac.washington.edu>,
Mark Crispin <m...@CAC.Washington.EDU> writes:

> On Sat, 17 Nov 2001, Sam wrote:
>> IMAP is simply not designed to be used with such a large pile
>> of mail.
>
> That may well be the case with maildir-based IMAP mailboxes.

Neither with mbox-based mailboxes.

Or any kind of mailboxes as well.

> However, the ability to handle mailboxes with 6-digit message counts is
> definitely a design goal of IMAP.

You can design all you want, but the underlying IMAP protocol design makes
folders with hundreds of thousands of messages completely impractical.
You are certainly welcome to go ahead and try to pull this off. It's a
free world. But nobody in the free world will care. Except maybe five or
six people.

It takes two to tango: a client and a server. Even if the server can
handle the Encyclopedia Brittanica, no client - worth mentioning - will be
able to use it. Hence nobody (except those five or six people) will ever
care about it.

> The impetus for the mbx format successor under development is to handle
> such mailboxes well, based on the observation that most of the real time
> that is needed to open an mbx format file is in seek latency. The actual
> amount of I/O needed is quite small; only about 7MB for 120,000 messages.
> It's the 120,000 seeks that take all the time.

And all of that needs to be pushed down to the client, when the mailbox is
opened. No matter how much you try to avoid that, you can't. And this,
right here, will make this kind of a server nothing more than an
educational excersize in theoretical server design. And nothing else.

You are free to try to design a server that can handle encyclopedia-sized
mailboxes, of course. But no mainstream client, now or ever, will be able
to use it with any degree of efficiency, as long as the protocol is what it
is right now, or until clients are explicitly rewritten to support huge
folders within the existing protocol framework, which completely falls
apart on such a scale for those clients.

>> When an IMAP client opens this folder, any IMAP client, you're
>> looking at the client having to download a minimum of a megabyte of data
>> just to be able to resynchronize with the server, EVEN if the client caches
>> message metadata.
>
> That only applies to offline and disconnected clients which which to
> maintain synchronization of metadata with the server. Online clients do

Which pretty much includes, I'd estimate, 90-95% of IMAP clients in use.
And if people do not need offline or disconnected mode, they might as well
use POP3. If you have huge piles of mail that you need to sift through, no
matter how you twist yourself it's always better to copy it to a local disk
and handle it there, instead of leaving it on the server, somewhere, and
trying to siphon it with a straw.

> not do this, and therefore have no need to obtain the metadata of every
> message.

Yes they do. An online client is simply not going to send a request to the
server, and wait for a response, each time the user clicks on the folder
index scrollbar.

This is just a pipe dream. Pun intended. Dream on.

> A well-designed online client can open a 120,000 message mailbox
> as fast as it opens as 120 message mailbox.

Only for certain, narrow, definitions of "well-designed".

>> Forget IMAP. You need a search engine.
>
> Or, perhaps, forget maildir with IMAP. The non-maildir IMAP world will
> soon handle 120,000 message mailboxes with aplomb.

Maildir already easily handles 120,000 messages. Not at some time in the
future, as a pie-in-the-sky, but right now.

Ever heard of ReiserFS? Or XFS, perhaps?

You are intentionally wearing blindfolds, and purposefully avoiding facing
the fact that Courier-IMAP is mostly a thin mapping layer between the IMAP
protocol, and the underlying filesystem. That's all that is. If a
particular implementation cannot handle large numbers of messages in a
given folder, its only because of limitations in the underlying filesystem.
You're still living in the past and thinking that all filesystems are
alike. They are not. Sorry, but they aren't. Some of them are better at
some things, than others. The server code itself, which is really nothing
more than an IMAP parser, has no builtin limitation on the folder size, and
does not care what filesystem lives under it, as long as it carries POSIX
semantics. So, it all comes down to whether a given filesystem can handle
the given folder's data. That's all.

Unfortunately for you, there are new filesystems, still being actively
developed, that can easily, and efficiently, handle directories with
millions of files. Your outlook on the world is exceedingly naive. You,
of course, are welcome to keep fooling yourself, if that makes you happy.

But, in any case, this specific situation is unique. Most people do not
have folders with hundreds of thousands of messages, and expecting a
typical mail client to handle it is rather silly. I believe that something
like this is better handled by other, well established technologies that
have been explicitly designed and tuned for such a task. It's just too bad
that this makes too much sense, for some.

A local, dedicated search engine will do a far better job handling this
kind of a mail archive than any IMAP server, no matter how many
pointy-headed geeks slave away all their life optimizing it. I believe in
using the right tool for the right job. Optimizing Courier-IMAP for this
extreme situation, whic is atypical and does not represent typical IMAP
client usage, is only going to degrade performance for the center portion
of the IMAP client Bell curve. That would be a rather dumb thing to do.
I think I'll leave doing the dumb things to other people, who believe they
have a higher calling in life to do those dumb things.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: http://www.geocities.com/SiliconValley/Peaks/5799/GPGKEY.txt

iD8DBQE79owK3ejdWUS0ltARAsHTAJ4iQwk+iwes14XdzyBTLr6dPMZmyQCeMbHZ
hah18cAZ551tl0LYD5FeU2U=
=w/Pa
-----END PGP SIGNATURE-----

Mark Crispin

unread,
Nov 17, 2001, 1:46:02 PM11/17/01
to
On Sat, 17 Nov 2001, Vlad Lungu wrote:
> I am tinkering with an ODBC driver from time to time, and I wouldn't
> call it a success.
> Yet. Performance-wise, it sucks, especially considering that the ODBC
> implementation is far from complete, and the Postgres driver I'm using
> is in an early-beta stage,IMHO, but hey, it works.

This has been the general problem with using an off-the-shelf database;
it's easy to do, but hard to do with acceptable performance.

> IMO, trying to reinvent the wheel wouldn't be quite productive.

I see it more as an unfortunate necessity than something that any sane
person would want to do.

However, the *right* database is clearly up to the job. It isn't as if
this is uncharted territory. Insurance companies were playing with
terabyte sized databases in the 1960s. What's special about mail is that
people tend to think of access to mail in a strictly hierarchial fashion
down to the lowest level of metadata, without any attempt to optimize
sideways traversals of the tree. But that's what IMAP clients do, a lot
of sideways traversals.

This is the old Computer Science axiom of "any problem is trivial if you
use the right data structures." The task we face is in using someone off
the shelf that has the right data structures for this particular problem.

And yes, you're right; reinventing a database from the ground up, with all
the implications (including backup!) that has *is* a daunting task. No
wonder it hasn't been done yet in the freeware world (I'm aware of at
least three different commercial efforts). You just can't help but think
that must be something off-the-shelf that'll work.

> If anyone is interested, I'm willing to share the code for the driver
> in it's present state, even if I'm ashamed of it. Actually, I'm not only
> willing, I'm actually very interested, since I'm no SQL guru. And no
> IMAP guru either, actually, so I could use some help :-).

I hope that someone will take you up on your generous offer.

Mark Crispin

unread,
Nov 17, 2001, 2:15:56 PM11/17/01
to
On Sat, 17 Nov 2001, Sam wrote:
> It takes two to tango: a client and a server. Even if the server can
> handle the Encyclopedia Brittanica, no client - worth mentioning - will be
> able to use it.

I consider an award-winning client with a user community in the millions
to be worth mentioning. I consider another (commerical) client which is
frequently named (here and elsewhere) as the best GUI client to be worth
mentioning.

Consequently I was surprised to learn that to be "worth mentioning", a
client must be unable to handle large mailboxes well.

Well. Count a day wasted when you don't learn something.

Mike Brodbelt

unread,
Nov 17, 2001, 8:29:54 PM11/17/01
to
On Fri, 16 Nov 2001 09:00:10 +0000, Jeremy Howard wrote:

>> A database for messages has been one of my personal holy grails for some
>> time. A number of people have talked about a MySQL driver, but I'm not aware
>> of successes in the freeware world. The closest so far is Cyrus.
>>
>> The successes that I am aware of have all been with proprietary server
>> implementations, such as Microsoft Exchange. Exchange is certainly the most
>> well-known database type server, but there are others.
>>
> Interestingly, it looks like the next major version of MS Exchange will use
> the forthcoming MS SQL Server ('Yukon', IIRC) as it's storage engine rather
> than it's existing specialised DB.

<snip>

> I only mention this because I've seen the SQL backend idea ridiculed on this
> NG before, particularly 3 years ago when we were considering doing it (this is
> what put us off at the time). With the benefit of more experience in my
> opinion there's a lot to be gained in going down this route, particularly in
> linking up with an established IMAP server like UW or Cyrus and just replacing
> the back-end, rather than writing an IMAP server from scratch.

Thinking about this, it might be possible to implement it independantly of the
IMAP server back end. Cyrus hits a bottleneck with filesystem performance - you
end up with large directories, and this starts to slow things.

On a Linux system (and a assume most other *nix systems have a similar
concept), it would theoretically be possible to implement a database, optimised
for IMAP use, and present the interface to the database through the VFS. You
could simply have an "imapfs" filesystem type. You could even have a userland
interface to the filesystem driver that allowed SQL queries against the store.

Implementing something like this would not be easy, but it would abstract it
from the IMAP server, and if done right, could theoretically provide a
performance boost to any store hat used the one file/one message metaphor.

My 2p worth.

Mike.

Sam

unread,
Nov 17, 2001, 8:55:15 PM11/17/01
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> On Sat, 17 Nov 2001, Sam wrote:
>> It takes two to tango: a client and a server. Even if the server can
>> handle the Encyclopedia Brittanica, no client - worth mentioning - will be
>> able to use it.
>
> I consider an award-winning client with a user community in the millions
> to be worth mentioning.

Microsoft will also tell you that Exchange is "award-winning". What really
matters is what kind of an award it is, and what it's earned for.

> Consequently I was surprised to learn that to be "worth mentioning", a
> client must be unable to handle large mailboxes well.

Well, those are, indeed, the facts of life. The mainstream mail clients,
with a user community numbering hundreds of millions, are not designed to
handle folders with hundreds of thousands of messages, simply because
that's not what they were designed to do.

> Well. Count a day wasted when you don't learn something.

Life is full of surprises. It might be beneficial to step out in the real
world, once in a while, and get a breath of fresh air instead of sitting
inside windowless offices, drawing up grandiose theories on what the life
outside is suppsoed to be like.

Otherwise, you might be in for quite a surprise.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: http://www.geocities.com/SiliconValley/Peaks/5799/GPGKEY.txt

iD8DBQE79xUB3ejdWUS0ltARAsO0AKCUOz1vVDDNRaYDWhpuWqoSmvQZUQCfY6TY
zl3cG8c2720wFTrob3QpM2c=
=LkxA
-----END PGP SIGNATURE-----

Mike Montour

unread,
Nov 24, 2001, 12:29:12 AM11/24/01
to
Sam wrote:

> > Or, perhaps, forget maildir with IMAP. The non-maildir IMAP world will
> > soon handle 120,000 message mailboxes with aplomb.
>
> Maildir already easily handles 120,000 messages. Not at some time in the
> future, as a pie-in-the-sky, but right now.
>
> Ever heard of ReiserFS? Or XFS, perhaps?

[...]


> The server code itself, which is really nothing more than an IMAP
> parser, has no builtin limitation on the folder size, and does not
> care what filesystem lives under it, as long as it carries POSIX
> semantics. So, it all comes down to whether a given filesystem can
> handle the given folder's data. That's all.

Can you give me some advice about choosing a filesystem for a server
using Maildir to store a "large" (not 120,000 per user, but at least
that many system-wide) number of messages? Design goals:

#1 - Data integrity (minimum chance of lost mail)
#2 - Availability (good system uptime; fast crash recovery)
#3 - Performance

I've been dabbling with various packages, and right now I am planning
to use Postfix for mail delivery, Maildir format for storage, and
Courier-IMAP for client access. The underlying OS/Filesystem
combinations
I am considering are:

{Open,Free}BSD, native filesystem (SoftUpdates?)
Linux(SuSE 7.3), ext2/ext3/reiserfs/xfs

Now, the "long fsck on reboot" problem is a serious negative for the
non-journal filesystems (ext2, BSD?). Also I don't know what sort of
relative performance these would offer - it's my impression that the
newer filesystems like xfs and reiserfs would have more advanced
directory structures and would be faster for Maildir operations.

There appears to be a certain amount of antagonism between ReiserFS
and Postfix (at least based on some of the list archives I've read),
regarding data integrity. Specifically, operations like link() and
unlink() are assumed by the MTA to be synchronous, when in fact (at
the filesystem level) they are not. The risk here is that in a system
crash, a message could disappear or end up in the wrong directory,
even though the mail software got a "success" code. It seems to me
that this issue would also be very important to Courier-IMAP, because
it also has to do a lot of directory manipulation in the Maildir. Can
anyone comment about this?

I know less about ext3 and xfs, but it is my impression that they
are less mature (at least on Linux; I know xfs/Irix has been around
a while). Bugs aside, would one of these two be a preferred filesystem
for this application? And what about the bugs/stability, for the
revision levels in SuSE 7.3 (2.4.x kernel; I can look up the actual
version numbers of everything if it's relevant)?

Any comments would be welcome. (Not trying to start any flamewars,
just looking for experiences and opinions, and to correct any
misunderstandings I might have about the various filesystems).

Sam

unread,
Nov 24, 2001, 11:39:27 AM11/24/01
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

In article <3BFF3027...@spamcop.net>,
Mike Montour <mmon...@spamcop.net> writes:

> Can you give me some advice about choosing a filesystem for a server
> using Maildir to store a "large" (not 120,000 per user, but at least
> that many system-wide) number of messages? Design goals:
>
> #1 - Data integrity (minimum chance of lost mail)
> #2 - Availability (good system uptime; fast crash recovery)
> #3 - Performance
>
> I've been dabbling with various packages, and right now I am planning
> to use Postfix for mail delivery, Maildir format for storage, and
> Courier-IMAP for client access. The underlying OS/Filesystem
> combinations
> I am considering are:
>
> {Open,Free}BSD, native filesystem (SoftUpdates?)
> Linux(SuSE 7.3), ext2/ext3/reiserfs/xfs

The most common way this is done is have all the mailboxes on a separate
NAS box that's NFS-mounted by the mail servers. High-end NAS boxes are
engineered with redundant hot-swappable hardware, that can be replaced
without bringing the system down. Then, you have a pair of cheap boxes up
front that are mirror twins of each other. Normally they can share the
load, and if one fails the other one will have to bear the entire load
until the broken box is fixed.

On average, BSD's NFS implementation is slightly more solid than Linux's,
although, if you do your homework, some kernel revs' NFS implementations
are as good as BSD's. It doesn't really matter what the native filesystem
is, since the mailboxes are going to be mounted over NFS.

> Now, the "long fsck on reboot" problem is a serious negative for the
> non-journal filesystems (ext2, BSD?). Also I don't know what sort of
> relative performance these would offer - it's my impression that the
> newer filesystems like xfs and reiserfs would have more advanced
> directory structures and would be faster for Maildir operations.

That's only true in the most extreme cases of tens of thousands of messages
in the same folder. For normal-sized folders the difference is not that
much. You just have to make sure that the mailboxes themselves are hashed,
and you don't have ten thousand /var/mail/$username maildirs in the same
flat /var/mail hierarchy.

> There appears to be a certain amount of antagonism between ReiserFS
> and Postfix (at least based on some of the list archives I've read),
> regarding data integrity. Specifically, operations like link() and
> unlink() are assumed by the MTA to be synchronous, when in fact (at
> the filesystem level) they are not. The risk here is that in a system
> crash, a message could disappear or end up in the wrong directory,
> even though the mail software got a "success" code. It seems to me
> that this issue would also be very important to Courier-IMAP, because
> it also has to do a lot of directory manipulation in the Maildir. Can
> anyone comment about this?

I've heard this old wives' tale brought up repeatedly in many contexts, and
I always thought it to be a rather dumb argument.

Someone needs to explain to me the logic in wanting a stable box, but
accepting it as a given that the box is going to crash, and worrying about
how well it will recover from the crash. Seems to me it's better to make
sure that the box isn't going to crash in the first place. Then you don't
even have to worry about this situation. Seems to me that there are two
general reasons for a typical crash - a bug in the kernel, or a hardware
failure. Right now, vendor-QAed Linux kernels, and stable BSD kernels will
run for years until it's time to bring them down for maintenance. So I
don't think there's much of a chance there. And if the disk fails, it
doesn't really matter if everything is synced, or not. The data is gone.
Instead of worrying about whether everything is synced properly, or not,
use a RAID mirror for local mail server's disks, and that's going to be the
end of it.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: http://www.geocities.com/SiliconValley/Peaks/5799/GPGKEY.txt

iD8DBQE7/8073ejdWUS0ltARAv4SAJ4pjrFANyPSdEH5HL17A1Q3OJG7FQCbB+XB
NzPW4T6BZeLxyblkVU1Ji38=
=nBWU
-----END PGP SIGNATURE-----

Mike Montour

unread,
Nov 24, 2001, 2:23:54 PM11/24/01
to
Sam wrote:

Thank you for your response.


> The most common way this is done is have all the mailboxes on a separate
> NAS box that's NFS-mounted by the mail servers. High-end NAS boxes are
> engineered with redundant hot-swappable hardware, that can be replaced
> without bringing the system down.

I'll consider this for future expansion. I already have hot-swap drives
and hardware RAID on the server, and will be using that initially.

Would you see any advantage in setting up that server as an NFS server
only, then using a couple of smaller boxes to host the applications?
I.e. use one server as my own "NAS box"? Or is the only benefit to
using a NAS box that the box itself has higher-reliability hardware?

> > There appears to be a certain amount of antagonism between ReiserFS
> > and Postfix (at least based on some of the list archives I've read),

> > regarding data integrity. [...]


>
> I've heard this old wives' tale brought up repeatedly in many contexts,
> and I always thought it to be a rather dumb argument.
>
> Someone needs to explain to me the logic in wanting a stable box, but
> accepting it as a given that the box is going to crash, and worrying
> about how well it will recover from the crash. Seems to me it's better
> to make sure that the box isn't going to crash in the first place.

Some combination of Murphy's Law, and Douglas Adams' observation that:

The major difference between a thing that might go wrong and a
thing that cannot possibly go wrong is that when a thing that
cannot possibly go wrong goes wrong it usually turns out to be
impossible to get at or repair. [from 'Mostly Harmless']

I know that no system is going to be 100% reliable (mirrored drives
don't do much good if your RAID controller goes loopy and scribbles
junk onto both of them), but I still want to do what I can to reduce
the chance of losing mail. Not to the point of "space shuttle"
redundancy
with several computers each performing every calculation and "voting" on
the results, but more than just "the brochure says it's
high-availability so I'm fine".

I can see the point that the Postfix people raise, that they want to be
sure that a message has been fully committed to physical disks (as
opposed to buffer RAM) before they return an SMTP "OK" code to the
upstream client. It sounded like every filesystem had slightly different
semantics about how (or even if) this could be done, and that POSIX was
ambiguous on this matter.

The ReiserFS site includes a patch for qmail to address this concern,
but I saw no corresponding code for Postfix or Courier-IMAP (which are
presumably affected by the same issue, as they all use Maildir storage).
That's why I was wondering if xfs or ext3 might be a better choice for
Maildir storage.

Joshua Slive

unread,
Nov 24, 2001, 3:10:11 PM11/24/01
to
Sam <s...@email-scan.com> wrote:
> I've heard this old wives' tale brought up repeatedly in many contexts, and
> I always thought it to be a rather dumb argument.

> Someone needs to explain to me the logic in wanting a stable box, but
> accepting it as a given that the box is going to crash, and worrying about
> how well it will recover from the crash. Seems to me it's better to make
> sure that the box isn't going to crash in the first place. Then you don't
> even have to worry about this situation.

I can't claim to be as much of an expert as you in running mail
servers, but this seems way off base to me. Computers crash. There's
no way you are going to stop that. You can minimize the chances (for
example, by running a mainframe rather than a PC), but you will never
eliminate crashes entirely.

So what you are saying then is that you don't mind losing the
occasional mail message. That may be a fine thing for a user to say,
but I think it is a really terrible thing for a server administrator
or server designer to say. The first principle should be "don't lose
mail". Once you've got that down, then you can worry about other things
like speed and features.

Joshua.

Sam

unread,
Nov 24, 2001, 4:56:23 PM11/24/01
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

In article <3BFFF3C9...@spamcop.net>,
Mike Montour <mmon...@spamcop.net> writes:

> Sam wrote:
>
> Thank you for your response.
>
>> The most common way this is done is have all the mailboxes on a separate
>> NAS box that's NFS-mounted by the mail servers. High-end NAS boxes are
>> engineered with redundant hot-swappable hardware, that can be replaced
>> without bringing the system down.
>
> I'll consider this for future expansion. I already have hot-swap drives
> and hardware RAID on the server, and will be using that initially.
>
> Would you see any advantage in setting up that server as an NFS server
> only, then using a couple of smaller boxes to host the applications?
> I.e. use one server as my own "NAS box"? Or is the only benefit to
> using a NAS box that the box itself has higher-reliability hardware?

It does make sense to use a box with hot-swappable disks as a first step.
It will give you some level of fault-tolerance. Even if the hardware is
not completely fault tolerant, it certainly makes sense to dedicate the
hardware to just storing the data. You don't need a hot-swappable
disk to talk to external mail clients, so there's no need to waste
fault-tolerant hardware's resources on network I/O.

> The ReiserFS site includes a patch for qmail to address this concern,
> but I saw no corresponding code for Postfix or Courier-IMAP (which are
> presumably affected by the same issue, as they all use Maildir storage).
> That's why I was wondering if xfs or ext3 might be a better choice for
> Maildir storage.

There isn't really enough of historical data about using journaling
filesystems for a mail store, in order to make a firm conclusion, but it
does look to be the case from a theoretical viewpoint. Both ext3 and xfs
have transaction logging (there are several logging options with ext3) so
if the server crashes it should be able to recover most of the data.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: http://www.geocities.com/SiliconValley/Peaks/5799/GPGKEY.txt

iD8DBQE8ABeD3ejdWUS0ltARAkpQAJ0QbmMRuBz0vU3dGkD/2KHEqokdhQCfXyje
s+o0e3fJIo0fZiwG6l9Q+Kk=
=f0Ma
-----END PGP SIGNATURE-----

Sam

unread,
Nov 30, 2001, 11:09:23 PM11/30/01
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

In article <9tour3$lm6$1...@nntp.itservices.ubc.ca>,
"Joshua Slive" <ne...@slive.ca> writes:

> Sam <s...@email-scan.com> wrote:
>> I've heard this old wives' tale brought up repeatedly in many contexts, and
>> I always thought it to be a rather dumb argument.
>

> Joshua.


>
>> Someone needs to explain to me the logic in wanting a stable box, but
>> accepting it as a given that the box is going to crash, and worrying about
>> how well it will recover from the crash. Seems to me it's better to make
>> sure that the box isn't going to crash in the first place. Then you don't
>> even have to worry about this situation.
>
> I can't claim to be as much of an expert as you in running mail
> servers, but this seems way off base to me. Computers crash. There's
> no way you are going to stop that. You can minimize the chances (for
> example, by running a mainframe rather than a PC), but you will never
> eliminate crashes entirely.
>
> So what you are saying then is that you don't mind losing the
> occasional mail message.

What I'm saying that if the hardware explodes it doesn't matter whether
you've synced things up properly. The data is gone. Kaput. So the only
instance where syncing makes a difference is when there's a software crash.

My preferred solution to this dilemma is to make sure that the software
doesn't crash, instead of accepting it as a fact of life and trying to
mitigate the damage.

> That may be a fine thing for a user to say,
> but I think it is a really terrible thing for a server administrator
> or server designer to say. The first principle should be "don't lose
> mail". Once you've got that down, then you can worry about other things
> like speed and features.

If you are mailing nuclear secrets around and you need absolute, and the
highest level of reliability, you should probably do some other things, to
get that.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: http://www.geocities.com/SiliconValley/Peaks/5799/GPGKEY.txt

iD8DBQE8ABhc3ejdWUS0ltARAiyPAJ905WP/IIyB0m+AHYg6Rz8PqF78kQCeM6WO
PBA8PG9KuImFIs4mGLbWhGU=
=8opt
-----END PGP SIGNATURE-----

Joshua Slive

unread,
Dec 1, 2001, 11:12:23 AM12/1/01
to
Sam <s...@email-scan.com> wrote:
>> So what you are saying then is that you don't mind losing the
>> occasional mail message.

> What I'm saying that if the hardware explodes it doesn't matter whether
> you've synced things up properly. The data is gone. Kaput. So the only
> instance where syncing makes a difference is when there's a software crash.

I just don't see that. The only hardware crash that should result in
the loss of data is a disk crash. And for a well managed system, it
would need to be two simultaneous disk crashes. If that happens, you
are obviously screwed. But for any other hardware crash, you should
be able to recover gracefully (with some amount of down-time -- Note
that down-time does not result in mail-loss in a well managed system.
At worse it results in bounces, which are much better than mail loss).

> If you are mailing nuclear secrets around and you need absolute, and the
> highest level of reliability, you should probably do some other things, to
> get that.

Sure, nothing is absolute. But I think you have the wrong starting
point. I'd like to use a mail server that doesn't lose mail if
someone accidentally kicks the power cord out from behind.

Joshua.

Sam

unread,
Dec 1, 2001, 12:20:41 PM12/1/01
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

In article <9uavh7$7jt$1...@nntp.itservices.ubc.ca>,
"Joshua Slive" <ne...@slive.ca> writes:

> Sam <s...@email-scan.com> wrote:
>>> So what you are saying then is that you don't mind losing the
>>> occasional mail message.
>
>> What I'm saying that if the hardware explodes it doesn't matter whether
>> you've synced things up properly. The data is gone. Kaput. So the only
>> instance where syncing makes a difference is when there's a software crash.
>
> I just don't see that. The only hardware crash that should result in
> the loss of data is a disk crash. And for a well managed system, it
> would need to be two simultaneous disk crashes. If that happens, you
> are obviously screwed. But for any other hardware crash, you should
> be able to recover gracefully (with some amount of down-time -- Note
> that down-time does not result in mail-loss in a well managed system.
> At worse it results in bounces, which are much better than mail loss).

99.9999% of all crashes are disk crashes. That's the only moving part in
the box. Except for the fans. But you'll have redundant cooling fans, so
one fan locking up should not cause a disaster.

I'm sure that someone somewhere will sometimes have his CPU catch fire, or
the power supply explode inside the box. So, if you're in the business of
mailing nuclear secrets around, it would certainly make sense to eliminate
even that of a remote possibility. But if that's what you're doing, you're
not going to mail nuclear secrets from a gray box. You'll have
custom-designed hardware for this, that's engineered itself to survive
catastrophic events. That's a much better solutions than software
workarounds for fault-susceptible hardware.

And in all other cases, you're going to hobble performance without really
anything to show for it. There's nothing wrong with striving for
perfection, of course, but I think that sometimes the price is just too
high to be worth it.

You know, my main dev box is a frankenbox that was born eight years ago,
and all of its hardware has been replaced at least a couple of times, by
now. I've yet to lose any data, or mail. If I can do it, without hobbling
performance, then you can do it too.

What's my secret? Daily, rotating, tape dumps. High quality, and
expensive, hardware. Regular maintenance, and replacement of aging
components. Keep the air conditioning on.

I've had power supplies and cooling fans beginning to wig out on me, but I
replaced them quickly, and at the first sign of a problem. Do that,
instead of playing Quake all day, and you won't have any problems. That's
all.

>> If you are mailing nuclear secrets around and you need absolute, and the
>> highest level of reliability, you should probably do some other things, to
>> get that.
>
> Sure, nothing is absolute. But I think you have the wrong starting
> point. I'd like to use a mail server that doesn't lose mail if
> someone accidentally kicks the power cord out from behind.

Keep the lusers away from your hardware. Lock them out of the room. If
you need to do maintenance behind the rack, shut everything down, or be
careful. Problem solved.

I understand that. Several times I had to forcibly mentally-realign
myself, to stop doing stupid things that might cause these kinds of
problems. These are all social or procedural issues, not software ones.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: http://www.geocities.com/SiliconValley/Peaks/5799/GPGKEY.txt

iD8DBQE8CRFk3ejdWUS0ltARAvJlAJ9L9hfFgKTmWlW1/0mfaUybK54zSwCbBBVi
jPuhXFvq4ewoXXMOOobPgd8=
=K/5b
-----END PGP SIGNATURE-----

0 new messages