Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

beware of "Courier-IMAP"

176 views
Skip to first unread message

Mark Crispin

unread,
Mar 10, 2000, 3:00:00 AM3/10/00
to IMAP Interest List
The Courier-IMAP server is non-compliant with the IMAP specification, and
its author states that he has no intention to make Courier-IMAP compliant:

> It's completely absurd. The parser in Courier-IMAP is a straightforward
> parser, so I treat [ and ] as distinct lexical units, so they are rejected
> when sent as part of an unquoted string. I'm not going to insert a bunch
> of spaghetti code, and break something, just to comply with completely
> nonsensical portions of IMAP4rev1.

-- Mark --

* RCW 19.190 notice: This email address is located in Washington State. *
* Unsolicited commercial email may be billed $500 per message. *
Science does not emerge from voting, party politics, or public debate.


Nicholas Lee

unread,
Mar 10, 2000, 3:00:00 AM3/10/00
to

"Mark Crispin" <m...@CAC.Washington.EDU> wrote in message
news:Pine.NXT.4.30.000310...@Tomobiki-Cho.CAC.Washington.ED
U...

> The Courier-IMAP server is non-compliant with the IMAP specification, and
> its author states that he has no intention to make Courier-IMAP compliant:
>
> > It's completely absurd. The parser in Courier-IMAP is a straightforward
> > parser, so I treat [ and ] as distinct lexical units, so they are
rejected
> > when sent as part of an unquoted string. I'm not going to insert a
bunch
> > of spaghetti code, and break something, just to comply with completely
> > nonsensical portions of IMAP4rev1.

I must say that I'm rather displease that you have posted these comments (to
myself from the author of Courier imap that I was discussing with you
privately) to an open forum.

In fact I'd say it was rather irresponible and out of context. I might even
go so far as to say you are misrepresenting the discussion and using
bully-boy tactics.

Nicholas


Mark Crispin

unread,
Mar 10, 2000, 3:00:00 AM3/10/00
to Nicholas Lee
There is no misrepresentation. The facts are clear:

1) Courier-IMAP rejected an atom with a [ character.

2) The vendor of Courier-IMAP claims that it is a client bug (in my code,
no less) to send an atom with a [ character.

3) The vendor of Courier-IMAP acknowledges that the IMAP specification
permits this, but states "I'm not going to insert a bunch of spaghetti


code, and break something, just to comply with completely nonsensical
portions of IMAP4rev1."

I have an obligation to report non-compliant servers and defiant vendors
who refuse to implement the specification. It is unfair to the dozens of
other vendors -- all of whom implement IMAP according to specification --
to be burdened by bug reports caused by a vendor who openly defies the
specification and claims that everybody else is wrong.

It has also come to my attention that he posts a so-called "client bugs"
list, which misrepresent problems in his server (or simply his failure to
understand IMAP) as being bugs in various clients.

On Fri, 10 Mar 2000, Nicholas Lee wrote:
> I must say that I'm rather displease that you have posted these comments (to
> myself from the author of Courier imap that I was discussing with you
> privately) to an open forum.
>
> In fact I'd say it was rather irresponible and out of context. I might even
> go so far as to say you are misrepresenting the discussion and using
> bully-boy tactics.

-- Mark --

Nicholas Lee

unread,
Mar 10, 2000, 3:00:00 AM3/10/00
to

"Mark Crispin" <m...@CAC.Washington.EDU> wrote in message
news:Pine.NXT.4.30.000310...@Tomobiki-Cho.CAC.Washington.ED
U...

> There is no misrepresentation. The facts are clear:

I think you've completed missed the point. My email to you was private. I
feel somewhat offended that you took certain comments from a third party
that I directed to your attention and placed those in a public forum.

Ignoring time for delivery you seemed to judge and reply to my email within
an hour. Then without waiting further response (I was asleep) 11 mintues
later posted a rather negative message regarding that third party's product
to this forum.

I state again, what your motives might be. Taking private communications
out of context and using them in a public forum is somewhat irresponible.
I'm somewhat dishearten by your actions in this matter, particular for a
member of your standing in the community.

The fact of the matter was that I was presenting an analysis for improvement
of the IMAP spec. This was given to me by the courier imap author in
response to difficults I had discovered while installing his product and
attempting to use it with both pine (4.21) and outlook express.


Nicholas

Sam

unread,
Mar 11, 2000, 3:00:00 AM3/11/00
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

In article <Pine.NXT.4.30.000310...@tomobiki-cho.cac.washington.edu>,
Mark Crispin <m...@CAC.Washington.EDU> writes:

> I have an obligation to report non-compliant servers and defiant vendors
> who refuse to implement the specification. It is unfair to the dozens of
> other vendors -- all of whom implement IMAP according to specification --
> to be burdened by bug reports caused by a vendor who openly defies the
> specification and claims that everybody else is wrong.

Oh, you're full of it, Mark. There are numerous interoperability problems
between many different clients and servers, and you know it. When it comes
down to it, I think what's really eating you is that you've finally
accepting the fact that all the interoperatibility problems that have
surfaced over the years are simply due to RFC 2060 being a very poorly
written spec, both from a readability and a technical standpoint. Nobody
can read that and implement anything right off the bat. You do not see the
same level of interoperability problems with any other wire protocol, be it
ESMTP, POP3, or anything else for that matter.

And when someone gets caught in a middle of those interoperatibility
problems, and ends up agreeing with my analysis that RFC 2060 is poorly
designed, you go off the deep end. Well, Mark, that's just too bad, and I
guess you'll just have to learn how to deal with some constructive
criticism, without getting personal and going bonkers, like that.

I can't help but mention another incident several weeks ago where a similar
issue cropped up with another IMAP client -- Mulberry's Mac client. But,
unlike yourself, that fella was very polite and courteous, and, after
hashing it over in E-mail, a couple of times, he made a few tweaks to his
code, and so did I, and everyone lived happily ever after.

But, when someone on a huge ego trip decides to act like a total jackass in
public, I don't think that that's kind of a behavior is going to encourage
much cooperation.

It seems that what's really getting your goat, Mark, is your decade-old fed
with Dan Berstein, of which I really couldn't care less. For years you've
satisfied your enormous ego by refusing numerous requests to support
maildirs, with some flimsy excuse. That's how you got your kicks. And now
that's no longer necessary -- people now have a reliable alternative to
UW-IMAP that is not the bloated monster that it is, and that just bugs the
hell out of you.

> It has also come to my attention that he posts a so-called "client bugs"
> list, which misrepresent problems in his server (or simply his failure to
> understand IMAP) as being bugs in various clients.

Grow up, Mark. Stop acting like a baby.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.1 (GNU/Linux)
Comment: http://www.geocities.com/SiliconValley/Peaks/5799/GPGKEY.txt

iD8DBQE4ym8O+3BFaxHnGY0RAhykAJ9EixentV/kLOMEZ72waHWX+yHBkwCgnFYJ
uDwbAf2Dlk+3zlY4KaIvLh4=
=cXU4
-----END PGP SIGNATURE-----


Lawrence Greenfield

unread,
Mar 11, 2000, 3:00:00 AM3/11/00
to
Since I don't know the details of the e-mail exchange that is being
debated here, I'm not prepared to defend or attack Mark. However, the
technical problems he raises are real problems. The current IMAP we
have may not be the perfect IMAP in all people's views, but it's the
specification we have after long amounts of debate and give and take,
and interoperability is important and is achieved with complying with
the spec.

Building an interoperable IMAP server _is_ hard---that's why we have
documents like RFC 2683, an excellent document on real world
implementation recommendations by Barry Leiba.

Personally, I'm very pleased to see another free IMAP server---I think
that's definitely for the best. I'd like to see it interoperate with
as many clients as possible.

I pointed out several interoperability gotchas with Courier IMAP
around three weeks ago in personal e-mail. Some of the issues to
raise in the Courier IMAP BUGS file are legitimate client bugs. Some
are not---and most issues are clearly dealt with in the specification.

Here's some of the clearer issues (the quoted text is from the
imap/BUGS file distributed with courier-imap-0.27):

> 1) Pine chokes on whitespace between BODY and [

msg-att-static = "ENVELOPE" SP envelope / "INTERNALDATE" SP date-time /
"RFC822" [".HEADER" / ".TEXT"] SP nstring /
"RFC822.SIZE" SP number / "BODY" ["STRUCTURE"] SP body /
"BODY" section ["<" number ">"] SP nstring /
"UID" SP uniqueid

The only way of generating a "BODY" followed by a [ is the
"BODY" section [ "<" number ">"] SP nstring
rule.

section = "[" [section-spec] "]"

Since "section" MUST begin with a [, there can be NO whitespace
between "BODY" and "[".

> 3) Occasionally Pine sends a FETCH request with an invalid UID.
> This usually happens after you resume a postponed message, and
> send it. It looks like other IMAP servers simply ignore this
> error condition, however Courier-IMAP will return an error
> message, which Pine shows briefly on the status line. This is
> similar to the Netscape Communicator bug (see below), but not as
> bad.

Section 6.4.8
A non-existent unique identifier is ignored without any error
message generated. Thus it is possible for a UID FETCH command to
return OK without any data or a UID COPY or UID STORE to return OK
without performing any operations.

> 1) Netscape Communicator insists that the response in HEADER.FIELDS is
> terminated by a blank line, supposedly the end of message headers.

Section 6.4.5
The HEADER, HEADER.FIELDS, and HEADER.FIELDS.NOT part
specifiers refer to the [RFC-822] header of the message or of
an encapsulated [MIME-IMT] MESSAGE/RFC822 message.
HEADER.FIELDS and HEADER.FIELDS.NOT are followed by a list of
field-name (as defined in [RFC-822]) names, and return a subset
of the header. The subset returned by HEADER.FIELDS contains
only those header fields with a field-name that matches one of
the names in the list; similarly, the subset returned by
HEADER.FIELDS.NOT contains only the header fields with a
non-matching field-name. The field-matching is
case-insensitive but otherwise exact. In all cases, the
[RFC-822] delimiting blank line between the header and the body
is always included.

Larry


Sam

unread,
Mar 11, 2000, 3:00:00 AM3/11/00
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> The Courier-IMAP server is non-compliant with the IMAP specification, and
> its author states that he has no intention to make Courier-IMAP compliant:
>
>> It's completely absurd. The parser in Courier-IMAP is a straightforward
>> parser, so I treat [ and ] as distinct lexical units, so they are rejected

>> when sent as part of an unquoted string. I'm not going to insert a bunch


>> of spaghetti code, and break something, just to comply with completely
>> nonsensical portions of IMAP4rev1.

Now, now, Mark, what exactly are you trying to accomplish, here? If I was
really interested in prolonging this pissing match, I would probably go
ahead and publish the entire exchange that took place, not just a single
isolated paragraph out of context, so that everyone could see for
themselves what the fuss is all about.

But, I'm not.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.1 (GNU/Linux)
Comment: http://www.geocities.com/SiliconValley/Peaks/5799/GPGKEY.txt

iD8DBQE4ydL9+3BFaxHnGY0RAhLuAKCJGdBmoN6ibxViXnlzbaSwnGQIWwCgmjme
Ys5yIhGm/tOon2J4ZzGT6h8=
=mrU8
-----END PGP SIGNATURE-----


ra...@adsl-151-203-22-73.bellatlantic.net

unread,
Mar 12, 2000, 3:00:00 AM3/12/00
to
On 11 Mar 2000 16:07:05 GMT, Sam <s...@email-scan.webcircle.com> wrote:
>-----BEGIN PGP SIGNED MESSAGE-----
>Hash: SHA1
>
>In article <Pine.NXT.4.30.000310...@tomobiki-cho.cac.washington.edu>,
> Mark Crispin <m...@CAC.Washington.EDU> writes:
>
>> I have an obligation to report non-compliant servers and defiant vendors
>> who refuse to implement the specification. It is unfair to the dozens of
>> other vendors -- all of whom implement IMAP according to specification --
>> to be burdened by bug reports caused by a vendor who openly defies the
>> specification and claims that everybody else is wrong.
>
>Oh, you're full of it, Mark. There are numerous interoperability problems
>between many different clients and servers, and you know it. When it comes

Mark and I have disagreed about technical details in the past, especially
incompatibilities.

Publishing a known-non-compliant product, refusing to fix it, and getting
pissy when the refusal gets published is a problem. While Mark may have been
rude to publish notes from a private email, it's certainly legal and may have
even been appropriate.

>down to it, I think what's really eating you is that you've finally
>accepting the fact that all the interoperatibility problems that have
>surfaced over the years are simply due to RFC 2060 being a very poorly
>written spec, both from a readability and a technical standpoint. Nobody
>can read that and implement anything right off the bat. You do not see the
>same level of interoperability problems with any other wire protocol, be it
>ESMTP, POP3, or anything else for that matter.

Oh, my great aunt Petunia, are you a newbie.... "Easy to read", "easy to
implement" does not mean reliable, workable, complete, or even consistent.

>But, when someone on a huge ego trip decides to act like a total jackass in
>public, I don't think that that's kind of a behavior is going to encourage
>much cooperation.

I'm sure you've noticed this yourself....

--

Nico Kadel-Garcia
nka...@bellatlantic.net

Sam

unread,
Mar 12, 2000, 3:00:00 AM3/12/00
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

In article <slrn8cm6f2...@adsl-151-203-22-73.bellatlantic.net>,
ra...@adsl-151-203-22-73.bellatlantic.net () writes:

> Mark and I have disagreed about technical details in the past, especially
> incompatibilities.
>
> Publishing a known-non-compliant product, refusing to fix it, and getting
> pissy when the refusal gets published is a problem.

I agree. The user who reported this problem concluded that it was a Pine
bug, and asked to have it fixed.

> While Mark may have been
> rude to publish notes from a private email, it's certainly legal and may have
> even been appropriate.

Mark Crispin published one paragraph out of a rather drawn out E-mail
exchange; this was totally misleading and didn't really have much to do
with anything, I don't really have a problem with hashing this out
publicly, but certainly not in this manner -- with intentional
misrepresentation, inflammatory rhetoric, and complete disregard for the
issues at hand, which quickly degenerates into a pissing match. Leave me
out of it, please.

As I wrote, it looks to me that Mark Crispin is simply looking to pick up
the age-old feud he -- and others -- been having with someone else. I'm
not interested.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.1 (GNU/Linux)
Comment: http://www.geocities.com/SiliconValley/Peaks/5799/GPGKEY.txt

iD8DBQE4yyuQ+3BFaxHnGY0RAiRIAKClf9MUpuEuXYnBthobr5pSK5AIFwCg3Kc8
eO/UyicuvC5OzsNKwxnjvqQ=
=93oV
-----END PGP SIGNATURE-----


Sam

unread,
Mar 12, 2000, 3:00:00 AM3/12/00
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

In article <2000031203...@smtp3.andrew.cmu.edu>,
Lawrence Greenfield <le...@andrew.cmu.edu> writes:

> Since I don't know the details of the e-mail exchange that is being
> debated here, I'm not prepared to defend or attack Mark. However, the
> technical problems he raises are real problems.

He did not raise any technical problems. He was upset simply because I
dared to question the gospel of RFC 2060; and when I explained what the
problems in that document were to someone else, that someone else agreed
with my conclusions, considered it to be a Pine bug, and asked to have it
fixed, noting that at least two other IMAP clients work just fine.

I was prepared to go over those same issues again, but, I suddenly realized
that this would merely prolong a meaningless pissing match, and I would be
wasting my breath. It doesn't really matter, in the grand scheme of things.
I have no interest in actively participating in a pissing match. I can't
help it if someone else is hell-bent on starting one, the only thing I can
do is avoid wasting my time in it.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.1 (GNU/Linux)
Comment: http://www.geocities.com/SiliconValley/Peaks/5799/GPGKEY.txt

iD8DBQE4yyip+3BFaxHnGY0RAqLrAJ0dxAAGuy3KcLVWWSKDMfXeNxf6hwCcCfSs
nOUu3iZLI0ggYqCaAyKgVg0=
=HgyZ
-----END PGP SIGNATURE-----


Nicholas Lee

unread,
Mar 12, 2000, 3:00:00 AM3/12/00
to

<ra...@adsl-151-203-22-73.bellatlantic.net> wrote in message
news:slrn8cm6f2...@adsl-151-203-22-73.bellatlantic.net...

> Publishing a known-non-compliant product, refusing to fix it, and getting

> pissy when the refusal gets published is a problem. While Mark may have


been
> rude to publish notes from a private email, it's certainly legal and may
have
> even been appropriate.

As much as I hate to post another OT message, I feel I have to disagree
here. Its not a question of whether posting private email conversations is
legal or not. Its just not good practice or netiquette. He took something
from private conversation out of context and used that in a public forum to
discredit someone. For a member of the community such as himself this is
just not good behaviour.

Nicholas

Yiorgos Adamopoulos

unread,
Mar 13, 2000, 3:00:00 AM3/13/00
to
In article <95289681...@shelley.paradise.net.nz>, Nicholas Lee wrote:
>discredit someone. For a member of the community such as himself this is
>just not good behaviour.

I see two problems here:

- Problem #1 is Mark jumping on the gun whenever someone posts something
"against" what he has documented as being IMAP, or NFS with the UW
toolkit, or maildir. Well, this is Mark's personality and it cannot be
changed, the same way as my or any other's personality cannot be changed.
Yes he is a flaming gun, but also whenever he flames, I see technical
arguments from his side (not to add tha I collect his flames - I sort of
have the same behavior in our local ntua.* newsgroups for different
reasons).

- Problem #2 is the IMAP spec and how some choose to implement it. Well,
here there are two choices. Choice #1 says, you follow the spec no
matter how you disagree with it, and certainly you do not break it.
Choice #2 says *write and implement your own spec*. Just as Bernstein
did (for example) with QMTP and Maildir. If you do not like what is
already there (and cannot convice the inventor to do otherwise) present
your alternative and let the community decide what to use. But if you
stick to implement the standard, you are inexcusable if you don't
(because you *claim* to implement it). Simply stating that the IMAP RFC
has nonsensical requirement does not prove anything. The RFC has the
requirement, so you are required to follow it no matter what.
Otherwise write (and implement) your own RFC. Nobody will stop you on
that.

I too am working on similar things and try to put my thoughts into code. I
do not think that I will ever state X is done the "wrong" way in protocol
Y. The standard is there and I either follow it, extend it or write my
own. But I do not break the existing, *especially* just because I do not
like the personality of the inventor.

--
${talks}

Sam

unread,
Mar 13, 2000, 3:00:00 AM3/13/00
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

In article <slrn8coi74...@ithaca.dbnet.ece.ntua.gr>,
ad...@dblab.ece.ntua.gr (Yiorgos Adamopoulos) writes:

> In article <95289681...@shelley.paradise.net.nz>, Nicholas Lee wrote:
>>discredit someone. For a member of the community such as himself this is
>>just not good behaviour.
>
> I see two problems here:
>
> - Problem #1 is Mark jumping on the gun whenever someone posts something
> "against" what he has documented as being IMAP, or NFS with the UW
> toolkit, or maildir. Well, this is Mark's personality and it cannot be
> changed, the same way as my or any other's personality cannot be changed.
> Yes he is a flaming gun, but also whenever he flames, I see technical

Well, that's fine, but as long as you do not cross the line that he has
crossed. It's one thing to be a flamehead -- which is a fine Usenet
tradition, after all, that is beyond all reproach -- but it's a completely
different situation when you take things out of context, and completely
misrepresent someone else in a totally underhanded and cowardly manner.
This is not flaming, this is something quite different.

> - Problem #2 is the IMAP spec and how some choose to implement it. Well,
> here there are two choices. Choice #1 says, you follow the spec no
> matter how you disagree with it, and certainly you do not break it.
> Choice #2 says *write and implement your own spec*. Just as Bernstein
> did (for example) with QMTP and Maildir.

You are not entirely 100% correct. DJB did break the letter of ESMTP.
Qmail advertises 8BITMIME, but doesn't do anything about it -- it will send
8-bit mail to non-8bit mailers without downshifting it to quoted-printable.

Now, I do have my own problems with Qmail, and DJB's reasons for doing that
happen to be 100% analogous -- he's stated that the whole 8BITMIME business
is just plain dumb -- yet you do not see me making a spectacle out of it on
Usenet and on private mailing lists. I have flamed DJB in the past over
this, but I draw the line at twisting someone else's words in order to
further my own personal agenda.

> If you do not like what is
> already there (and cannot convice the inventor to do otherwise) present
> your alternative and let the community decide what to use. But if you
> stick to implement the standard, you are inexcusable if you don't
> (because you *claim* to implement it). Simply stating that the IMAP RFC
> has nonsensical requirement does not prove anything. The RFC has the
> requirement, so you are required to follow it no matter what.
> Otherwise write (and implement) your own RFC. Nobody will stop you on
> that.

Well, this is what that crowd would _like_ for you to believe the issue is,
but it's not. It's a red herring. What's really happening is that people
are simply having a major cow because someone dared to diss IMAP4rev1,
that's all there is to it. Every time I run into something dumb in RFC
2060, and have to put in yet another workaround due to its weirdness, I
document it, and it's now grown to be quite a collection of bloopers.
Apparently, some egos got slightly bruised because of nothing more than
just a silly web page. And this latest issue will be just another footnote
in the next revision.

> I too am working on similar things and try to put my thoughts into code. I
> do not think that I will ever state X is done the "wrong" way in protocol
> Y. The standard is there and I either follow it, extend it or write my

The fact that something is a "standard" does not mean that everyone must
agree that it makes sense, and does not exempt the "standard" from being
subject to criticism. That may be what SOME people would like you to
believe, but I refuse to accept that line of thinking.

> own. But I do not break the existing, *especially* just because I do not
> like the personality of the inventor.

Congratulations -- you've completely fell for their trap. If I were to
have actually done what you've been led to believe I've done, neither Pine,
nor Outlook Express would work at all, with Courier-IMAP. Well, to be
technically correct, Pine would break with the next revision, because I
haven't yet revved since the conflict flared up (and I was never known for
making positive comments vis-a-vis Microsoft). But, it won't. One thing
I've realized is that I don't think that I really want to earn the same
reputation as UW-IMAP. In fact, I wrote Courier-IMAP precisely because of
UW-IMAP's reputation of ignoring repeated requests for compatibility with
software written by someone who's been feuding with UW-IMAP's authors.
And I'm certainly not going to go down the same path myself.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.1 (GNU/Linux)
Comment: http://www.geocities.com/SiliconValley/Peaks/5799/GPGKEY.txt

iD8DBQE4zIzU+3BFaxHnGY0RApcHAJoD1Q7Wj2S7B4il9u9GitSUo+l0WQCeOiUT
TkRrFm8kdX5A+kMxPYibFi4=
=nqgh
-----END PGP SIGNATURE-----


ra...@adsl-151-203-22-73.bellatlantic.net

unread,
Mar 13, 2000, 3:00:00 AM3/13/00
to
On 13 Mar 2000 06:38:17 GMT, Sam <s...@email-scan.webcircle.com> wrote:

>Well, this is what that crowd would _like_ for you to believe the issue is,
>but it's not. It's a red herring. What's really happening is that people
>are simply having a major cow because someone dared to diss IMAP4rev1,
>that's all there is to it. Every time I run into something dumb in RFC
>2060, and have to put in yet another workaround due to its weirdness, I
>document it, and it's now grown to be quite a collection of bloopers.

Then stop kvetching and *POST IT*. You're starting to sound like a meower....

--

Nico Kadel-Garcia
nka...@bellatlantic.net

Sam

unread,
Mar 13, 2000, 3:00:00 AM3/13/00
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

In article <slrn8cpnns...@adsl-151-203-22-73.bellatlantic.net>,
ra...@adsl-151-203-22-73.bellatlantic.net () writes:

> On 13 Mar 2000 06:38:17 GMT, Sam <s...@email-scan.webcircle.com> wrote:
>

>>Well, this is what that crowd would _like_ for you to believe the issue is,
>>but it's not. It's a red herring. What's really happening is that people
>>are simply having a major cow because someone dared to diss IMAP4rev1,
>>that's all there is to it. Every time I run into something dumb in RFC
>>2060, and have to put in yet another workaround due to its weirdness, I
>>document it, and it's now grown to be quite a collection of bloopers.
>

> Then stop kvetching and *POST IT*. You're starting to sound like a meower....

I have posted it on the project's web page, and it's included in the source
tarball. It's been out there for a while.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.1 (GNU/Linux)
Comment: http://www.geocities.com/SiliconValley/Peaks/5799/GPGKEY.txt

iD8DBQE4zOpy+3BFaxHnGY0RAjU3AJ0eoehJXqo5qkkk1+vvtSMikVPpjgCgvwC0
ueXxTeqfHvpi6h0BJiN6h5U=
=1zAg
-----END PGP SIGNATURE-----


Vladimir A. Butenko

unread,
Mar 13, 2000, 3:00:00 AM3/13/00
to
In article <courier.38CC...@email-scan.webcircle.com>, Sam
<s...@email-scan.webcircle.com> wrote:

> Well, this is what that crowd would _like_ for you to believe the issue is,
> but it's not. It's a red herring. What's really happening is that people
> are simply having a major cow because someone dared to diss IMAP4rev1,
> that's all there is to it. Every time I run into something dumb in RFC
> 2060, and have to put in yet another workaround due to its weirdness, I
> document it, and it's now grown to be quite a collection of bloopers.
> Apparently, some egos got slightly bruised because of nothing more than
> just a silly web page. And this latest issue will be just another footnote
> in the next revision.

Look. Calm down, please. You are not the only person Mark attacted here
:-). And I'd agree that taking info from a personal E-mail and posting it
on a public forum is a BAD thing. But be professional. Try to separate
personal relations/customs/etc. from the technical side of the things.

a) Yes, IMAP standard is not perfect. But it is a complicated protocol,
and for the level of its complexity - it is written pretty well. I bet you
have never read LDAP RFCs :-). If you want a perfect standard, find
someone who DOES NOT implement his/her own server or client and make that
person re-write the standard. Otherwise some things that are obvious for
the implementor, but completely unknown for others inevitably slip
through. Mark did a very good job on the IMAP standard - again, just read
the LDAP RFCs that do not even care to explain you what the whole thing is
all about.

b) Standards are called standards because they are standards. Yes, both
client and server vendors sometimes do not get what the RFC author meant,
but if they then find that the standard really SAYS what he meant, they
should comply. Otherwise the whole sense of having a standard vanishes.
For example, our implementation of the ACAP server was found to be
incompatible with a client that was designed by one of ACAP spec
co-authors. But we did what the standard SAID, and the author itself
agreed, and had to change his code, or find a workaround, and finally we
started to talk about drafting a new version of the ACAP specs, because
the standard actually said not what the authors meant. That's how the
things work in the world of standards. And this is why all standards start
their lifes as drafts so everybody can try and comment and minimize the
risk of mis-understanding in future.

c) I personally do not like the way IMAP spec treat spaces. Most of the
client parsers simply scan spaces before any lexem, so only when we
started to deploy our servers in universities that use pine, the problem
of our server inserting spaces where the IMAP standard does not require
them came up. Yes, someone wrote to Mark, because Netscape, Outlook,
Mulberry had no problem - only pine had. And Mark posted a similar note
here, called "CommuniGate Pro IMAP bug". Whatever our feels were about
this way of handling things (instead of writing to our tech. team
directly, as all other vendors do) - from the TECHNICAL point of view Mark
was right, and the IMAP standard says nothing about a space in that place
- so we had to fix it immediately and release a new version out of
schedule. Because it was OUR fault, and if we say that we comply with
IMAP4rev1, we have to comply.

d) there are many other issues in IMAP specs that can be discussed. But
they should be discussed in the professional manner, and not in starting
the "pissing match" that you always say you do not want to participate in.
And till those issues are formulated in some IMAP5 or whatever, you either
follow the written IMAP4rev1 standard, or you say that your server does
not support that standard.

> The fact that something is a "standard" does not mean that everyone must
> agree that it makes sense, and does not exempt the "standard" from being
> subject to criticism. That may be what SOME people would like you to
> believe, but I refuse to accept that line of thinking.

Standards are not equal to the law. One may think that he can change the
laws one southern state by making oral sex on public, and then protesting
from the prison cell, demanding the change of that law. The RFC standards
are neither about the moral standards, nor about political issues, - they
are about interoperability. The "social activism" is not a working method
here. If you want to change the things without waiting for a new standard,
and/or you want to push the development of a new standard - it's doable
and it's simple: if you have a client vendor who also needs a different
protocol, you can:

a) present a new keyword in your IMAP CAPABILITY response:
COURIERMODE, for example.
b) let a client issue a special command, let's say COURIERMODE ON.
c) do whatever you want with the IMAP standard after that - i.e. work in
your own protocol that your server and that client both understand.

But if the client has not issued that command, the server should work
strictly as described in RFC2060, otherwise it won't be an IMAP server.

> Congratulations -- you've completely fell for their trap. If I were to
> have actually done what you've been led to believe I've done, neither Pine,
> nor Outlook Express would work at all, with Courier-IMAP.

That would be a bad thing. The result of that thing would be inferrior
popularity of your server - compared to a standard-compliant IMAP4rev1
server.

> making positive comments vis-a-vis Microsoft). But, it won't. One thing
> I've realized is that I don't think that I really want to earn the same
> reputation as UW-IMAP. In fact, I wrote Courier-IMAP precisely because of
> UW-IMAP's reputation of ignoring repeated requests for compatibility with
> software written by someone who's been feuding with UW-IMAP's authors.

Could you please list those requests? We have now aprx. 5mln seats sold
last year. Aprx 10% of those are using IMAP. We would hear about any
problem with any IMAP software. And I should tell you - we have no
"improvements" of the IMAP4rev1 in our servers. It's strictly on the
standard, and those clients do follow the standard. So I'd be very
interested in learning about the problems the current standard presents to
the current clients.

> And I'm certainly not going to go down the same path myself.

If you want to DEVELOP a BETTER standard - let's discuss it. Right here.
As you can see, there are many IMAP4rev1 extensions that are documented in
RFCs that do not have Mark's name on them. So, I do not understand why you
think that the only way to implement a better standard is to screw up the
existing one: that's definitely has nothing to do with Mark's attitude.

Hydrogen fuel is better than gas. So, let's add hydrogen pumps on the gas
stations and encourage car manufacturers to build H-powered cars
(clients). But if you start to put Hydrogen instead of gas into all cars
that stop by because your sign reads "gas station" - I would seriously
doubt that you will be able to earn better reputation than UW-IMAP.

--
Vladimir Butenko
Stalker Software, Inc.

Mark Crispin

unread,
Mar 13, 2000, 3:00:00 AM3/13/00
to Vladimir A. Butenko
On Mon, 13 Mar 2000, Vladimir A. Butenko wrote:
> Whatever our feels were about
> this way of handling things (instead of writing to our tech. team
> directly, as all other vendors do) - from the TECHNICAL point of view Mark
> was right, and the IMAP standard says nothing about a space in that place
> - so we had to fix it immediately and release a new version out of
> schedule. Because it was OUR fault, and if we say that we comply with
> IMAP4rev1, we have to comply.

If it makes you feel any better, I've been on the other side, and with
MUCH more embarassing problems. It's not a pleasant situation.
Unfortunately, as you've discovered, the only way out is to make an
emergency release.

I strongly recommend that you go to the periodic IMC IMAP interoperability
bakeoffs. This is the best way to avoid such problems in the future. I
don't know when the next one will be held, but it should be announced
here. It's always better to get interoperability problems discovered and
resolved in pre-release code!

> Could you please list those requests?

I think that what he is talking about is that I don't want to get into the
business of supporting the "maildir" format. There are at least three
third-party c-client drivers available for maildir. If someone uses
maildir, they can go to one of those third parties for code and support.

I believe that it is infeasible to build maildir support that scales well
(e.g. does not exhibit performance problems with a moderately large
mailbox of 2000 messages) and also does not violate a major rule of either
maildir or IMAP. It's a no-win situation for me; and therefore I choose
to allow the maildir enthusiast community to do their own development,
distribution, and support of maildir IMAP code.

Mark Crispin

unread,
Mar 13, 2000, 3:00:00 AM3/13/00
to
On 13 Mar 2000, Sam wrote:
> > Then stop kvetching and *POST IT*. You're starting to sound like a meower....
> I have posted it on the project's web page, and it's included in the source
> tarball. It's been out there for a while.

That's where my material came from, and your statements are incorrect.

Please implement the specification, not your notion of what the
specification should be.

If you want to see a specification changed to conform to your views (such
as forbidding "[" in atoms), please follow the process for doing so.

Please make sure you have your facts right before you attack other
people's software.

Vladimir A. Butenko

unread,
Mar 13, 2000, 3:00:00 AM3/13/00
to
In article
<Pine.NXT.4.30.000313...@Tomobiki-Cho.CAC.Washington.EDU>,
Mark Crispin <m...@CAC.Washington.EDU> wrote:

> On Mon, 13 Mar 2000, Vladimir A. Butenko wrote:

> > Whatever our feels were about
> > this way of handling things (instead of writing to our tech. team
> > directly, as all other vendors do) - from the TECHNICAL point of view Mark
> > was right, and the IMAP standard says nothing about a space in that place
> > - so we had to fix it immediately and release a new version out of
> > schedule. Because it was OUR fault, and if we say that we comply with
> > IMAP4rev1, we have to comply.
>

> If it makes you feel any better, I've been on the other side, and with
> MUCH more embarassing problems. It's not a pleasant situation.
> Unfortunately, as you've discovered, the only way out is to make an
> emergency release.

I do not see anything unpleasant in the situation itself: it's our job.
It's virtually impossible that anyone (including the author :-) can
implement EVERYTHING exactly as outlined in the docs. Some things can
be misunderstood, some can be simply overlooked. And when we get the
problem report, we investigate and fix the problem.

I grepped the http://www.stalker.com/CommuniGatePro/History.html file
for "Bug" and "IMAP". There are 16 of them there (over the last 2 years),
though there were only 1 since that "CGatePro IMAP Bug" report you posted
many months ago, and I doubt if it counts as a bug :-). But who knows -
one can run into some problem in future, and that's what CGatePro Logs are
for, and that's what we are for: to fix the things if something is wrong.
I do not see anything unpleasant in these things: bugs do happen, the
problem is how many of them are there, how important they are and how
quickly they are fixed.


> I strongly recommend that you go to the periodic IMC IMAP interoperability
> bakeoffs. This is the best way to avoid such problems in the future. I
> don't know when the next one will be held, but it should be announced
> here. It's always better to get interoperability problems discovered and
> resolved in pre-release code!

While I'd enjoy to go to such a forum myself (if time permits :( ), I do
not think that this is the best way to fix the things. We work slightly
different: when a problem is reported, it's investigated immediately, and
the fix appears in the next release - at least, the next beta release that
go out every 1-2 weeks. I think that you do the same, and the main purpose
of the forum is to settle the ideological misunderstandings, draw a way
for new development, and solve those interoperability problems that go
beyond the written specs. For example, I'd like to see the eyes of those
Microsoft fellows who made their clients open 20 connections for one
session... :-) and :-(


> > Could you please list those requests?
>

> I think that what he is talking about is that I don't want to get into the
> business of supporting the "maildir" format. There are at least three
> third-party c-client drivers available for maildir. If someone uses
> maildir, they can go to one of those third parties for code and support.
>
> I believe that it is infeasible to build maildir support that scales well
> (e.g. does not exhibit performance problems with a moderately large
> mailbox of 2000 messages) and also does not violate a major rule of either
> maildir or IMAP. It's a no-win situation for me; and therefore I choose
> to allow the maildir enthusiast community to do their own development,
> distribution, and support of maildir IMAP code.

I can argue with you that directory-based solutions can be made scalable,
but this is not the point: the internal structure of some IMAP server has
nothing to do with the IMAP protocol specs themselves. That's I'm afraid,
the Sam's problem here: he had some disagreements with you as the designer
of one of IMAP servers, and that's completely up to him - he can create an
IMAP server that has nothing in common with your server, and if it is a
better server -so let it be so.

But then he passes that disagreement with Mark-designer to
Mark-protocol-maintainer, and this is completely different issue. One can
use a completely different approach and code to develop a server, but as
long as it complies with IMAP4rev1 specs, it's an IMAP4rev1 server. On the
other hand, if one takes imapd-uw code and changes just one line of it so
it would not comply (be that person name Sam or Mark) - that server will
not be an IMAP4rev1 server, and that's it.

What I wanted to see posted is not some problems of implementing this or
that in someone's code, but the problems in the IMAP4rev1 protocol itself,
and the requests for improvements of that protocol. Imporvements of a
particular server code is a completely different issue, and should be
discussed on that server support mailing list/forum.

> -- Mark --

Lyndon Nerenberg

unread,
Mar 13, 2000, 3:00:00 AM3/13/00
to
>>>>> "Vladimir" == Vladimir A Butenko <but...@stalker.com> writes:

Vladimir> permits :( ), I do not think that this is the best way
Vladimir> to fix the things. We work slightly different: when a
Vladimir> problem is reported, it's investigated immediately, and
Vladimir> the fix appears in the next release - at least, the next
Vladimir> beta release that go out every 1-2 weeks. I think that
Vladimir> you do the same, and the main purpose of the forum is to
Vladimir> settle the ideological misunderstandings, draw a way for
Vladimir> new development, and solve those interoperability
Vladimir> problems that go beyond the written specs. For example,

Which means your fixing things in a reactionary mode, and without
the hive knowledge present when you get a bunch of experienced
engineers together. Being able to discuss problems in a group,
especially when there is more than one "right answer" to the
problem is invaluable. You will also find your software taking
a much severer beating at the interop then it *ever* will in
the field. For our SMS server product, about 80% of the bug fixes
that went in were a direct result of IMC interop testing. A lot
of these were edge cases that you won't likely run into in the
field, but *will* see when you're testing against other peoples
alpha software. And then there are those of us mercenaries who
while away the time telneting to the servers and doing "unexpected"
things. (My t/golf shirt collection has grown considerably as a
result of these activities ;-)

Vladimir> I'd like to see the eyes of those Microsoft fellows who
Vladimir> made their clients open 20 connections for one
Vladimir> session... :-) and :-(

Even more reason to attend.

IMAP is a complex and subtle protocol. Any vendor of IMAP
products wouldn't be taking their job seriously if they
didn't attend the IMC interop events on at least a semi-
regular basis.

--lyndon

Vladimir A. Butenko

unread,
Mar 13, 2000, 3:00:00 AM3/13/00
to
In article <s9u7lf6...@zappa.esys.ca>, Lyndon Nerenberg
<lyn...@MessagingDirect.COM> wrote:

OK, I agree with all that, but I still cannot get the point.

a) there are already several thousand CGatePro server installed, and some
- on major ISPs, so it's not a problem to test any alpha-mode software
against it.

b) the CGatePro server software is available for free testing for 18
platforms directly from http://www.stalker.com/CommuniGatePro/ and I do
not see the problem in testing against it. Several E-mail client vendors
already use it as a "testbed" not only for their alpha versions, but for
the development process, too.

What the difference will it make if, say, I appear on some forum? I'm not
an IMAP server and will break as soon as someone puts a signle IMAP
command into any of my "outside world connections". If someone wants to
test something against our servers and for any reason cannot deploy it on
one of their own machines - just call 800-262-4722 and ask for Philip and
ask him to set a test account on one of our own CGatePro servers - there
are at least 3 avaiable on the "visible" Internet, and we can even create
an account on a Dynamic Cluster, too - but that's unlikely to make any
difference for the tester. Last time I looked, mail.stalker.com had about
half a dozen accounts created for varios Mail-client developers.

If there is a PROBLEM, and that problem has to be discussed with client
vendors - then, of course, a forum of some kind is a must. But how for God
sake can we find interoperability problems if we just sit together
somewhere and start to chat? I really do not get it, please explain.

> Vladimir> I'd like to see the eyes of those Microsoft fellows who
> Vladimir> made their clients open 20 connections for one
> Vladimir> session... :-) and :-(
>
> Even more reason to attend.

Yes, because there is a KNOWN problem. And it is of some interest for all
server vendors, not to Stalker only.

> IMAP is a complex and subtle protocol. Any vendor of IMAP
> products wouldn't be taking their job seriously if they
> didn't attend the IMC interop events on at least a semi-
> regular basis.

I do undrestand that it's kinda strange that we do not attend your
meetings, and I would take it as a good pitch for that conference, but
while you may say that we do not take our job seriously, I must say that
IMAP code is.. let me check.. - just 5% of CGatePro code. And it does not
create any problem neither to us, nor to our clients. While there are much
more serious issues and portions of the code that are much more
complicated than IMAP with all its extensions - and those things do
require our full attention - S/MIME incorporation, distributed LDAP,
Calendaring, WML - that's a huge list of things under active development
now.

This is all not to say that I do not see a reason for such meetings, but I
just want to know - what exactly we want to discuss there, what problems
do exist now, and - since we are in E-mail messaging business - why all
this cannot be discussed via E-mail or on Usenet?

Call me old and lazy, but if someone has to cross 11 time zones every 2
months and to participate in several Expos all over the globe, that person
inevitably developes a habbit of using E-mail and Usenet and asking
carefully what one more flight is needed for...

If I only could explain this to the cops, too - that I HAVE to drive fast,
because I'm tired of plains.... %-)


> --lyndon

Nicholas Lee

unread,
Mar 13, 2000, 3:00:00 AM3/13/00
to

"Vladimir A. Butenko" <but...@stalker.com> wrote in message
news:butenko-1303...@stalker.gamma.ru...

The issue that has given rise to this thread is essentially to provide
explicit tokinisation for at least the password field, but I might suggest
given the "Which clients canNOT hndle '@' in authentication identifiers"
thread that tokinising this might be worthwhile as well.

I was having a problem because the '[]' are use as tokin limiters (MIME
stuff) else where in the protocol spec where as they where not 'special'
characters in the password auth field. For various reasons Sam at this
stage treats [] as distrinct tokin limits where every. (Personally I think
his reasons that it reduce code bloat are extremely valid.)

The fix he gave me for including [] in the password auth field is to
stringify the field, ie. "foo[]bar". Once this is done I managed to get my
"telnet localhost imap" testing to work, outlook express worked regards,
but pine using the complete spec doesn't.

I can only say IMO that having a token limiter in one part of the protocol
stream but not in another other part, would seem to increase the code bloat
and maintance requirements. Of course not being a imap server authour I
can't say this exactly. Adding four bytes (tokinising the login id and
password auth fields) certainly in this case seems worth while.


Of course back to the topic on hand, the thing that pissed me off about
Mark's actions is he took a paragraph out context and used it to extend his
agenda. I don't care if he thinks he's doing in the public good or other
people accept this . It's bad behaviour and I'm going to tell him off for
it. There are more civil ways of doing this without resorting to being a
bully-boy.

Nicholas

Lyndon Nerenberg

unread,
Mar 13, 2000, 3:00:00 AM3/13/00
to
>>>>> "Vladimir" == Vladimir A Butenko <but...@stalker.com> writes:
Vladimir> If there is a PROBLEM, and that problem has to be
Vladimir> discussed with client vendors - then, of course, a forum
Vladimir> of some kind is a must. But how for God sake can we find
Vladimir> interoperability problems if we just sit together
Vladimir> somewhere and start to chat? I really do not get it,
Vladimir> please explain.

Vladimir, it's an interop event. 80 engineers in a large room with
a large ethernet and a large number of computers running a large
number of client and server implementations, all making sure they
can talk to each other. [Note that sales, marketing, press, and other
non-engineering scum are explicitly forbidden from attending ;-)]

There is very little chatting going on. In fact, if you hear
someone talking it means they found a bug.

Since these are all engineers, you're almost guaranteed that they
have source to their products with them. Thus, things gets fixed
in minutes, and the fixes can be tested against everyone elses code
very rapidly.

One example of this involved early deployments of the SASL DIGEST-MD5
mechanism. At the interop last March there were three or four vendors
working on this. We discovered quite quickly that there were some
vast differences in interpretation of some parts of the spec. Having
a group of engineers in the same room we were able to quickly break out
and identify the issues, propose a solution, implement that, and see
if it got us any further along. All inside of an hour. If we had tried
to do that by email it a) probably never would have happened to begin
with and b) still be a work in progress.

Interop events are invaluable for this sort of thing, and I stand
completely behind my statement about serious vendors attending
them. And that's not a bullet aimed at you. It sounds like you've
not been to one, so it's hard to appreciate just how valuable they
are. (And for those of us who have attended the IMC IMAP events
over the years, we have noticed a *direct* correlation between participation
and product quality. It's amazing to see how the quality of a product
has improved by the second time a vendor shows up at the event. This
is a Good Thing for everyone.)

--lyndon

Nicholas Lee

unread,
Mar 13, 2000, 3:00:00 AM3/13/00
to

"Mark Crispin" <m...@CAC.Washington.EDU> wrote in message
news:Pine.NXT.4.30.000313...@Tomobiki-Cho.CAC.Washington.ED
U...


> I believe that it is infeasible to build maildir support that scales well
> (e.g. does not exhibit performance problems with a moderately large
> mailbox of 2000 messages) and also does not violate a major rule of either

As a point of interest, I have several mailbox at stage with near a 1000
message. Many with large attactments. I merged (copied) a few them into one
mailbox giving me 2060 messages, about 34 megs worth.

Both Outlook express and pine (4.21) have no issue with this mailbox at all.
Pine opens the mailbox instantly having never seen it before and of course
outlook express goes though the process of caching the headers which takes a
little while.

It might be noted that this is currently an semi-loaded K6-2 400 with
128megs of ram. With only my mail box being open. However I can't see how
having too parse a 2000 mbox format message of size 25-34 megs is going to
be fast (and saver) than the file system and maildir format. Worse case you
use something liek Resifs (sp?) to deal with all the small files.

Of course I'm using courier-imap.

Nicholas


Lyndon Nerenberg

unread,
Mar 13, 2000, 3:00:00 AM3/13/00
to
>>>>> "Nicholas" == Nicholas Lee <nj.le...@kiwa.co.nz> writes:

Nicholas> The issue that has given rise to this thread is
Nicholas> essentially to provide explicit tokinisation for at
Nicholas> least the password field, but I might suggest given the
Nicholas> "Which clients canNOT hndle '@' in authentication
Nicholas> identifiers" thread that tokinising this might be
Nicholas> worthwhile as well.

Sorry, I wasn't clear about this. It's not a protocol issue, it's
a UI issue. The UIs of some (very) popular IMAP and POP clients
arbitrarily throw away any authentication string data entered
after an '@' character, along with the '@' character itself.

Thus, (imagine you're looking at a GUI login/password dialog
box) if I enter 'lyn...@messagingdirect.com' in the login
field of the dialog, the client throws out the '@messagingdirect.com'
part and tries to log me in as 'lyndon' (which is not my authentication
id on the server).

--lyndon

Mark Crispin

unread,
Mar 13, 2000, 3:00:00 AM3/13/00
to
On Tue, 14 Mar 2000, Vladimir A. Butenko wrote:
> You forgot to mention one thing - the OS you are using. I guess that you
> are doing this on Linux, and Mark spends most of his time on - let me
> guess - Solaris :-). The speed of directory scan varies a lot on those
> platforms.

Not Solaris, but your point is correct. On some platforms, doing a stat()
on 2000 files in a directory takes 30 seconds, not 3 seconds. Ouch.

It quickly gets worse. If you are lucky, you can get the IMAP
internaldate and rfc822.size from the stat() call. But, that assumes that
the messages are stored in RFC822 format, not UNIX format, meaning CR/LF
newlines instead of LF newlines. If you don't have some sort of index
file (ala Cyrus), you end up having to open and read the file to count the
newlines. A lot of clients do "FETCH 1:* FAST"...oh dear...oh my...

You also should have an index file to store envelopes and body structures,
so you don't have to open the file. Many filesystems serialize path
references, so lots of consecutive opens are a problem. Yes (shudder),
there are clients which do "FETCH 1:* ALL" or "FETCH 1:* FULL".

So, you want to avoid doing a stat() at all; meaning that you have some
other way of discovering which objects from readdir() are files (messages)
and which are directories (subfolders). Again, the index file.

That's a lot of work for the index file to do. One reason that my mx
format is not encouraged is that it didn't go far enough and failed to
elimate the stat(). mx and mbx were simultaneous experiments, and mbx
kicked mx's rear end by two orders of magnitude. That's why work on mx
was abandoned with it half-finished.

The problem with an index file in the maildir context is that it defeats
one of the primary advertised capabilities of maildir: the use of
filesystem primitives to do locking (and hence NFS safety). So, an index
file, which is precisely what maildir tries to avoid, is probably out of
the question. What maildir does is store lots of stuff in the file names,
taking advantage of readdir() being much cheaper than the stat(). But you
can only cram so much here; and you have to interact with other maildir
software.

It is a set of difficult design tradeoff decisions that I don't wish to
make; no matter which one I choose, someone will be unhappy. That's why
it's better to let the maildir enthusiasts develop their own c-client
maildir drivers.

Mark Crispin

unread,
Mar 13, 2000, 3:00:00 AM3/13/00
to
On Mon, 13 Mar 2000, Mark Crispin wrote:
> Not Solaris, but your point is correct. On some platforms, doing a stat()
> on 2000 files in a directory takes 30 seconds, not 3 seconds. Ouch.

I should add that there's worse than just stat(). I just ran some quick
tests.

What really takes forever is copying 2000 files, because of the
serialization of open() on most filesystems. On most systems, it's much
faster to copy a 2MB file than to copy 2000 1KB files. On my system, it
came out to be 2 seconds vs. 103 seconds!

Deleting 2000 files took 80 seconds. The more messages you expunge, the
slower it is with a one-file/one-message format, and the faster it is with
a flat file format.

Renaming 2000 files, such as what you would have to do in maildir with a
build flags change if you store flags in the file name, took 93 seconds.

Big ouch.

Mark Crispin

unread,
Mar 13, 2000, 3:00:00 AM3/13/00
to Vladimir A. Butenko
On Tue, 14 Mar 2000, Vladimir A. Butenko wrote:
> But that's not the point. Even if your can get the RFC size directly from
> the file, it's still a "stat" call. And almost all mail clients want the
> to know the size of the message when they scan the mailbox.

Yes, which is why I made a mistake in my mx experiment by depending upon
stat() to get the size instead of storing it in the index. I didn't
realize how slow stat() could be.

Cyrus did not make that mistake; it also stores envelopes and body
structures in the index file. I think that Cyrus pretty much maximizes
the performance that you can squeeze out of one-file/one-message formats,
and they did a great job.

> And this still does not solve the problem of using a cluster when several
> clients can access the store from different servers

And, of course, Cyrus just punts on that entirely by outlawing NFS.

> mdir helps to avoid
> file locks (if it does not use index files), but does not help to
> synchronise changes.

Which is why we use locks in the first place! ;-)

> That's the problem only for the servers that rely on other software for
> mail delivery/automatd processing. But the point is clear, and I hope that
> all of us should agree:

Yes, I agree completely with your (a)-(e).

> There is, though one case that came to my mind, and it can become more and
> more important these days. Some of our clients said that they chose mdir
> for only one feature. It has nothing to do what we have discussed here
> before. THis feature is complete transparency.

Yes, transparency is very important, and often is neglected.

c-client is "mostly" transparent with traditional UNIX mailbox format; the
">" is not needed unless the line really looks like a UNIX mailbox header
line.

However, I agree this is an issue with traditional UNIX mailbox format,
and it's one of the big reasons why I never liked it.

mbx (the current favorite) and mtx (the old Tenex/TOPS-20 format) are
guaranteed fully transparent, even for binary data. I expect to be doing
some work soon to bum extra performance out of mbx format. It used to be
much faster than traditional UNIX format, but after the work I did to bum
performance in the traditional UNIX format it's now only "somewhat
faster". So I need to do some hacking to restore the big performance
advantage of my favorite format... ;-)

tenex format is transparent for normal text; in spite of the name, it's
actually a UNIXified mtx format (originally used by the UNIX port of MM)
that uses UNIX-style bare LF newlines instead of CR/LF. So its not
transparent if you care about CR and LF transparency and/or binary.

mbx, mtx, and tenex all allow shared read/write access, but they require
the ability to synchoronize updates and that means no NFS. Even when you
get locking out of the way, the inode vs. data cache problem over NFS will
still bite you.

mbx has the additional win that it allows shared expunge. That's great
for people who leave an IMAP client running 24/7, and then want to run an
IMAP client on their mail someplace else. For many folks at UW, it's
their office system that runs an IMAP client 24/7, for me, it's my home
system...

Vladimir A. Butenko

unread,
Mar 14, 2000, 3:00:00 AM3/14/00
to
In article <95298168...@shelley.paradise.net.nz>, "Nicholas Lee"
<nj.le...@kiwa.co.nz> wrote:

> "Vladimir A. Butenko" <but...@stalker.com> wrote in message
> news:butenko-1303...@stalker.gamma.ru...
> > In article
> >
> <Pine.NXT.4.30.000313...@Tomobiki-Cho.CAC.Washington.EDU>,
> > Mark Crispin <m...@CAC.Washington.EDU> wrote:
>
> > What I wanted to see posted is not some problems of implementing this or
> > that in someone's code, but the problems in the IMAP4rev1 protocol itself,
> > and the requests for improvements of that protocol. Imporvements of a
> > particular server code is a completely different issue, and should be
> > discussed on that server support mailing list/forum.
>

> The issue that has given rise to this thread is essentially to provide
> explicit tokinisation for at least the password field, but I might suggest
> given the "Which clients canNOT hndle '@' in authentication identifiers"
> thread that tokinising this might be worthwhile as well.

Hmm. Try to look at it from the other side, OK. How long did it take for
major client developers to avoid stripping off the "@xxxx" things from the
account name? As has been already noted here, even 4.5 versions of those
products still do it (though they do not do it for me, I have to say). Now
you want to change the way the password field is sent. Look - they did not
fix the problem that at least 100,000 people have (and those people did
write to them, not only to the server vendors), and you want them to
change something to make their clients work with just one server that has
no installed base (at least, not now).

That's simply won't work. You cannot EXTEND a protocol by breaking it
compatibility with the original one. All you can do is to create a NEW
version of the protocol, or the new protocol completely. Not a bad thing,
but the question is - is anybody gonna support it? So, the changes you are
talking about are changes that can be made in a NEW protocol only, and
given the time it took for major players to implement at least SOME IMAP
functionality, I would say that we can disccus the NEW protocol, but we
are unlikely to see it deployed during the next 5-10 years.

This is why my position is: stay with whatever is specified in the current
standard, and invest the development efforts to the extensions, not to a
completely new protocol. I'd say that designing a new, not
backward-compatible protocol is worth doing when it is realized that the
current one prohibits the further development. It was like switch from POP
to IMAP - it's hardly possible to "extend" the POP protocol to give it
IMAP functionality, so a new protocol was worth developing efforts and all
troubles of deploying it. Mark was doing IMAP for over 10 years (correct
me if I'm wrong here), and IMAP started to play SOME role on the
marketplace only 2-3 years ago. When I made a presentation of IMAP
protocol (thanks, Terry for the papers) on Macworld'97 most of the
audience heard about IMAP for the first time.

And now you (and Sam) suggest to, actually, develop a NEW protocol - i.e.
enter the same 10-years cycle - and for what? For better parsing of some
lexems in IMAP?! Please, get real.


> I was having a problem because the '[]' are use as tokin limiters (MIME
> stuff) else where in the protocol spec where as they where not 'special'
> characters in the password auth field. For various reasons Sam at this
> stage treats [] as distrinct tokin limits where every. (Personally I think
> his reasons that it reduce code bloat are extremely valid.)

IMAP is NOT a laguage. While one can try to deploy a "traditional"
yacc-style parser for it, there are many catches there, because it is
place-dependent. There are no "atoms" and "special symbols" there, in the
strict sense of those terms. A regular parser that just calls "getLexem"
function would fail in many places there, as it would fail for most of
Inet protocols - SMTP, for example.

For some people - it's a very bad thing, because it's not what they EXPECT
to see. That's just a question of selecting the right tool for the job. A
generic, programming-language-style lexical analizer is not a right tool
for dealing with IMAP protocol command parsing.


> The fix he gave me for including [] in the password auth field is to
> stringify the field, ie. "foo[]bar". Once this is done I managed to get my
> "telnet localhost imap" testing to work, outlook express worked regards,
> but pine using the complete spec doesn't.

It's a good idea to put the password in quotes in any case. Otherwise you
have to check that the password does not contain a quite mark. More,
passwords can contain characters that are not allowed in a q-string,
either. Some clients (I do not remember their names) always send passwords
as LITERALs.


> I can only say IMO that having a token limiter in one part of the protocol
> stream but not in another other part, would seem to increase the code bloat
> and maintance requirements. Of course not being a imap server authour I
> can't say this exactly. Adding four bytes (tokinising the login id and
> password auth fields) certainly in this case seems worth while.

Any CLIENT vendor can do that. But any SERVER is supposed to accept
anything there - atom, q-string or a LITERAL.

> Of course back to the topic on hand, the thing that pissed me off about
> Mark's actions is he took a paragraph out context and used it to extend his
> agenda. I don't care if he thinks he's doing in the public good or other
> people accept this . It's bad behaviour and I'm going to tell him off for
> it. There are more civil ways of doing this without resorting to being a
> bully-boy.

I think that it was already some ageement here and things, hopefully,
calmed down with the lesson taken by all parties. Let's move forward.


> Nicholas

Vladimir A. Butenko

unread,
Mar 14, 2000, 3:00:00 AM3/14/00
to
In article <s9u4saah9...@zappa.esys.ca>, Lyndon Nerenberg
<lyn...@MessagingDirect.COM> wrote:

> Vladimir> of some kind is a must. But how for God sake can we find
> Vladimir> interoperability problems if we just sit together
> Vladimir> somewhere and start to chat? I really do not get it,
> Vladimir> please explain.
>
> Vladimir, it's an interop event. 80 engineers in a large room with
> a large ethernet and a large number of computers running a large
> number of client and server implementations, all making sure they
> can talk to each other. [Note that sales, marketing, press, and other
> non-engineering scum are explicitly forbidden from attending ;-)]

Than I have to admin that my position as not only the CTO, but the
president of Stalker completely disquilifies me from joining, since such a
scum as myself is not welcome there :-(. Poor me.

> One example of this involved early deployments of the SASL DIGEST-MD5
> mechanism.

YOu got one more point. DIGEST-MD5 is not implemented correctly in
CGatePro :-). But since no popular client supported it, it was a forgotten
issue waiting for an RFC standard to appear. OK, one more point to go (if
such a scum as myself is allowed there).

> At the interop last March there were three or four vendors
> working on this. We discovered quite quickly that there were some
> vast differences in interpretation of some parts of the spec.

You bet.


> and identify the issues, propose a solution, implement that, and see
> if it got us any further along. All inside of an hour. If we had tried
> to do that by email it a) probably never would have happened to begin
> with and b) still be a work in progress.

While I'm talking to you, the source code of CGatePro is open in an IDE 2
feet from my desk. And it's always open and accessable for the tech stuff
24x7 hours. If a problem rises and there is a solution for it, the code is
changed the same day, if not immediately. But, OK, you got the point.



> Interop events are invaluable for this sort of thing, and I stand
> completely behind my statement about serious vendors attending
> them. And that's not a bullet aimed at you.

C'mon. I always wear a vest. Bullets are very welcome here :-)

> It sounds like you've
> not been to one, so it's hard to appreciate just how valuable they
> are.

Heh... Things are more complicated for my poor soul. AFAIR, last time you
had a meeting during one of the shows (LinuxWorld?). I was about to go
there and at least see what it was all about (guys from Cyrusoft told me
about it and convinced), but my scum part had to stay on our booth, since
there were some large clients or press people coming with whom I had to
talk. Not an excuse, but an explanation. I think if you take that meeting
next time during some show time, we have to schedule at least two people
to attend, so if I cannot go, someone from Stalker will.

> (And for those of us who have attended the IMC IMAP events
> over the years, we have noticed a *direct* correlation between participation
> and product quality. It's amazing to see how the quality of a product
> has improved by the second time a vendor shows up at the event. This
> is a Good Thing for everyone.)

OK, OK, you bought me. When will the next IMC meeting happen? I think it's
some fee to attend - to whom and when should we pay this? I hope that it
will be posted here, as Mark has said, and if some of you drop a message
to my E-mail address, that would work too. Just make it happen in the Bay
area, pleeese... :-)

Vladimir A. Butenko

unread,
Mar 14, 2000, 3:00:00 AM3/14/00
to
In article <95298367...@shelley.paradise.net.nz>, "Nicholas Lee"
<nj.le...@kiwa.co.nz> wrote:

You forgot to mention one thing - the OS you are using. I guess that you


are doing this on Linux, and Mark spends most of his time on - let me
guess - Solaris :-). The speed of directory scan varies a lot on those
platforms.

CGatePro in the "native" mode scans all its account directories to read
the names of available accounts. And on some site, it takes up to 10
minutes. That's too bad because they cannot afford 10 minutes downtime to
upgrade the software. Yes, there are not 2000 files in that directory
(several orderes of magnitude more) and they are not in a flat directory,
and there is a "stat" call issued for each of the found files, but that's
still too slow. So, a 3 second delay to scan a 2000 files directory can be
easily observed on some platforms. And with 2000 files, you usually have
about 5MB mailbox that can be easily read and parsed in 3 seconds.

So, it all depends, and things are not as simple as they seem on the surface.

> Nicholas

Vladimir A. Butenko

unread,
Mar 14, 2000, 3:00:00 AM3/14/00
to

> On Tue, 14 Mar 2000, Vladimir A. Butenko wrote:

> > You forgot to mention one thing - the OS you are using. I guess that you
> > are doing this on Linux, and Mark spends most of his time on - let me
> > guess - Solaris :-). The speed of directory scan varies a lot on those
> > platforms.
>

> Not Solaris, but your point is correct. On some platforms, doing a stat()
> on 2000 files in a directory takes 30 seconds, not 3 seconds. Ouch.

Hmmmm. AIX? :-)


> It quickly gets worse. If you are lucky, you can get the IMAP
> internaldate and rfc822.size from the stat() call. But, that assumes that
> the messages are stored in RFC822 format, not UNIX format, meaning CR/LF
> newlines instead of LF newlines. If you don't have some sort of index
> file (ala Cyrus), you end up having to open and read the file to count the
> newlines. A lot of clients do "FETCH 1:* FAST"...oh dear...oh my...

I'd even say most of them.


> You also should have an index file to store envelopes and body structures,
> so you don't have to open the file. Many filesystems serialize path
> references, so lots of consecutive opens are a problem. Yes (shudder),
> there are clients which do "FETCH 1:* ALL" or "FETCH 1:* FULL".

Again, not just "there are", there are pretty popular IMAP clients that do
that. Not ALL, not FULL, but at least ENVELOPE and BODYSTRUCTURE (just to
show if there is an attachment in the file). But even if they need just
the "envelope" - that's enough to demand file opening for all files or you
need an index file.



> So, you want to avoid doing a stat() at all; meaning that you have some
> other way of discovering which objects from readdir() are files (messages)
> and which are directories (subfolders). Again, the index file.

Not exactly. There is no need to store the messages and the subfolders in
the same physical directory. In CgatePro, for example, the mailboxes are
name.mbox or name.mdir files/directories, while submailboxes are stored
inside name.folder directory.

But that's not the point. Even if your can get the RFC size directly from
the file, it's still a "stat" call. And almost all mail clients want the
to know the size of the message when they scan the mailbox.

> That's a lot of work for the index file to do. One reason that my mx
> format is not encouraged is that it didn't go far enough and failed to
> elimate the stat(). mx and mbx were simultaneous experiments, and mbx
> kicked mx's rear end by two orders of magnitude. That's why work on mx
> was abandoned with it half-finished.
>
> The problem with an index file in the maildir context is that it defeats
> one of the primary advertised capabilities of maildir: the use of
> filesystem primitives to do locking (and hence NFS safety).

And this still does not solve the problem of using a cluster when several
clients can access the store from different servers: mdir helps to avoid


file locks (if it does not use index files), but does not help to

synchronise changes. At least, it requires the server to check what has
changed in the directory each time (to issue the IMAP unilateral
FETCH/EXPUNGE messages), and for a mailboxes with 2000 messages it means
constant delays.

> So, an index
> file, which is precisely what maildir tries to avoid, is probably out of
> the question. What maildir does is store lots of stuff in the file names,
> taking advantage of readdir() being much cheaper than the stat(). But you
> can only cram so much here; and you have to interact with other maildir
> software.

That's the problem only for the servers that rely on other software for


mail delivery/automatd processing. But the point is clear, and I hope that
all of us should agree:

a) there are much more problems with .mdir format than one can see on the
first look.

b) there are rather small narrow areas where .mdir really provides an
improvement over .mbox, while in most other cases it's slower (and always
- more resource hungry)

c) in those narrow areas (storing a small number of large (MB+) messages)
where .mdir has a plus.

d) as a result of a)-c) that's very reasonable for a server implementor
not to support .mdir format, if he does not see a great demand (in those
narrow areas)

e) as a result of a)-c) that's very unreasonable for any server
implementor to provide .mdir as the ONLY way to store messages.


> It is a set of difficult design tradeoff decisions that I don't wish to
> make; no matter which one I choose, someone will be unhappy. That's why
> it's better to let the maildir enthusiasts develop their own c-client
> maildir drivers.

Exactly. Actually, I know only a handful of our clients that actually use
the .mdir format on their servers. I can never say for sure, because it's
not even in the admin hands - any CGatePro user can create mailboxes of
any type inside his/her account, but at least we have not heard about
.mdir being too popular- on the server that provides BOTH.

There is, though one case that came to my mind, and it can become more and
more important these days. Some of our clients said that they chose mdir
for only one feature. It has nothing to do what we have discussed here
before. THis feature is complete transparency.

In "classic".mbox mailbox managers, you have to add a ">" to any line that
starts with "From " and has any empty line in front of it. Not a big deal,
one would say - but not these days. S/MIME becomes more and more popular,
and if the message is not encrupted (and thus is in base-64), but is just
signed, then adding that ">" to the message by the server invalidates the
digital signature.

I'm not 100% sure, probably it's not an issue with S/MIME, but with PGP
only, but it does exist, and we know some business customers who use
secure mail a lot and thus had to choose .mdir over .mbox.

Again, it has nothing to do with .mdir as the format, any single-file
format that provides trasnparency will do this job, too.


> -- Mark --

Sam

unread,
Mar 14, 2000, 3:00:00 AM3/14/00
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

In article <butenko-1303...@stalker.gamma.ru>,


but...@stalker.com (Vladimir A. Butenko) writes:

> Look. Calm down, please. You are not the only person Mark attacted here
>:-). And I'd agree that taking info from a personal E-mail and posting it
> on a public forum is a BAD thing.

No, I have no problem with that, but only as long my position is accurately
represented, and not twisted around in order to further one's own petty
disputes with third parties.

> a) Yes, IMAP standard is not perfect. But it is a complicated protocol,

It's not much more complicated than SMTP with all the ESMTP extensions; yet
you do not find the same level of interoperability problems there.

> d) there are many other issues in IMAP specs that can be discussed. But
> they should be discussed in the professional manner, and not in starting
> the "pissing match" that you always say you do not want to participate in.

Well, you don't have much choice there when someone's deliberately
misrepresenting yourself, in order to further his own agenda. If I see
Mark Crispin, or anyone else for that matter, behaving in a totally
unprofessional and unethical manner -- again, that has nothing to do with
publishing personal mail -- I am going to challenge that no matter who he
or she is.

>> making positive comments vis-a-vis Microsoft). But, it won't. One thing
>> I've realized is that I don't think that I really want to earn the same
>> reputation as UW-IMAP. In fact, I wrote Courier-IMAP precisely because of
>> UW-IMAP's reputation of ignoring repeated requests for compatibility with
>> software written by someone who's been feuding with UW-IMAP's authors.
>
> Could you please list those requests? We have now aprx. 5mln seats sold
> last year. Aprx 10% of those are using IMAP. We would hear about any
> problem with any IMAP software. And I should tell you - we have no
> "improvements" of the IMAP4rev1 in our servers. It's strictly on the
> standard, and those clients do follow the standard. So I'd be very
> interested in learning about the problems the current standard presents to
> the current clients.

Not those kind of requests -- I've watched for a couple of years as
repeated requests to add maildir support to c-client (UW IMAP and Pine)
were refused. The initial excuses given were that there is no need for
maildir support in the UW, and this software is really for UW's use only.
I suppose that at some point that no longer seemed to be very credible;
eventually c-client grew to support a large assortment of back end mail
formats, and it would've been a stretch to ask people to believe that every
one of them was in use in the University of Washington.

So, I suppose the current excuse has something to do with performance, and
I see that elsewhere that FUD has been thoroughly debunked, so I don't need
to go over that. But the bottom line is that I got tired of watching this
constant bickering, and decided to do the job myself, and did it.

>> And I'm certainly not going to go down the same path myself.
>
> If you want to DEVELOP a BETTER standard - let's discuss it. Right here.
> As you can see, there are many IMAP4rev1 extensions that are documented in
> RFCs that do not have Mark's name on them. So, I do not understand why you

Oh, although I do believe that a better remote mail access protocol would
be very welcome, and very useful, that's nothing more than a nice fantasy.
It's not going to happen.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.1 (GNU/Linux)
Comment: http://www.geocities.com/SiliconValley/Peaks/5799/GPGKEY.txt

iD8DBQE4zbfq+3BFaxHnGY0RApovAKCPfnnjkQoc7vfy5X4VbKY7IMk3OQCeL68p
GTNgGGe86mRVe5HQBwj5z2M=
=s161
-----END PGP SIGNATURE-----


Sam

unread,
Mar 14, 2000, 3:00:00 AM3/14/00
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

In article <butenko-1403...@stalker.gamma.ru>,


but...@stalker.com (Vladimir A. Butenko) writes:

> In article <95298367...@shelley.paradise.net.nz>, "Nicholas Lee"
> <nj.le...@kiwa.co.nz> wrote:
>
>> "Mark Crispin" <m...@CAC.Washington.EDU> wrote in message
>> news:Pine.NXT.4.30.000313...@Tomobiki-Cho.CAC.Washington.ED
>> U...
>>
>>
>> > I believe that it is infeasible to build maildir support that scales well
>> > (e.g. does not exhibit performance problems with a moderately large
>> > mailbox of 2000 messages) and also does not violate a major rule of either
>>
>> As a point of interest, I have several mailbox at stage with near a 1000
>> message. Many with large attactments. I merged (copied) a few them into one
>> mailbox giving me 2060 messages, about 34 megs worth.
>>
>> Both Outlook express and pine (4.21) have no issue with this mailbox at all.
>> Pine opens the mailbox instantly having never seen it before and of course
>> outlook express goes though the process of caching the headers which takes a
>> little while.
>>
>> It might be noted that this is currently an semi-loaded K6-2 400 with
>> 128megs of ram. With only my mail box being open. However I can't see how
>> having too parse a 2000 mbox format message of size 25-34 megs is going to
>> be fast (and saver) than the file system and maildir format. Worse case you
>> use something liek Resifs (sp?) to deal with all the small files.
>>
>> Of course I'm using courier-imap.
>

> You forgot to mention one thing - the OS you are using. I guess that you
> are doing this on Linux, and Mark spends most of his time on - let me
> guess - Solaris :-). The speed of directory scan varies a lot on those
> platforms.

Really? There's such a huge performance difference in the speed of
opendir() and readdir() on different platforms?

> upgrade the software. Yes, there are not 2000 files in that directory
> (several orderes of magnitude more) and they are not in a flat directory,
> and there is a "stat" call issued for each of the found files, but that's
> still too slow.

Well, I guess I'm lucky, because I do not need to stat each file when
opening a maildir. Just opendir() and readdir(). That's all.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.1 (GNU/Linux)
Comment: http://www.geocities.com/SiliconValley/Peaks/5799/GPGKEY.txt

iD8DBQE4zbf8+3BFaxHnGY0RAqE4AJ4jiA3WfhZiEAGCY+/zlrXFwjHGBACgkIHx
c9KJy4WoXNWknedqKUde31A=
=jxqf
-----END PGP SIGNATURE-----


Sam

unread,
Mar 14, 2000, 3:00:00 AM3/14/00
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

In article <butenko-1303...@stalker.gamma.ru>,


but...@stalker.com (Vladimir A. Butenko) writes:

> I can argue with you that directory-based solutions can be made scalable,
> but this is not the point: the internal structure of some IMAP server has
> nothing to do with the IMAP protocol specs themselves. That's I'm afraid,
> the Sam's problem here: he had some disagreements with you as the designer
> of one of IMAP servers, and that's completely up to him - he can create an

Please don't misrepresent myself like Mark did. The only issues I've ever
had were some issues with IMAP4rev1 itself, and some problems with several
IMAP clients. Go ahead and search Deja, or any other search engine, and
try to catch me badmouthing either Mark Crispin, or UW-IMAP, before last
week. Perhaps you are referring to my comments regarding the repeated
refusals to add maildir support to UW-IMAP, but was only a personal
observation that I haven't really discussed with anyone, in fact I don't
even mention it in the release notes. That was only a personal motivation
for me, nothing more.

> IMAP server that has nothing in common with your server, and if it is a
> better server -so let it be so.
>
> But then he passes that disagreement with Mark-designer to
> Mark-protocol-maintainer, and this is completely different issue. One can

No. The only "disagreement", if you want to call it that, is with Mark
period. He decided to take a purely technical disagreement on the merits
of IMAP4rev1, and turn it into a personal attack and a smear. I don't care
who he is, a "designer" or "maintainer".

> What I wanted to see posted is not some problems of implementing this or
> that in someone's code, but the problems in the IMAP4rev1 protocol itself,

Well, yes, I think I wouldn't have any problems coming up with a list;
maybe tomorrow.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.1 (GNU/Linux)
Comment: http://www.geocities.com/SiliconValley/Peaks/5799/GPGKEY.txt

iD8DBQE4zbqv+3BFaxHnGY0RArDgAJsFQWUPHhSgQMiOOPA2c4k6E28SiwCeJ8gO
hnkJCnRy+Mx0tNpQKQwkXes=
=tWp8
-----END PGP SIGNATURE-----


Sam

unread,
Mar 14, 2000, 3:00:00 AM3/14/00
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

In article <butenko-1403...@stalker.gamma.ru>,


but...@stalker.com (Vladimir A. Butenko) writes:

> And now you (and Sam) suggest to, actually, develop a NEW protocol - i.e.

No, I don't believe I've ever made any such suggestion.

> IMAP is NOT a laguage. While one can try to deploy a "traditional"

Well, it's not a language in the traditional sense, like C or Pascal.
Still, I think it's more complicated to be described as a mere protocol.
There's a lot of sophistication there, more than just meets the eye.

> yacc-style parser for it, there are many catches there, because it is
> place-dependent. There are no "atoms" and "special symbols" there, in the
> strict sense of those terms. A regular parser that just calls "getLexem"
> function would fail in many places there, as it would fail for most of
> Inet protocols - SMTP, for example.

And that's precisely the problem. Anything with the syntactical complexity
of IMAP should have a distinct separation between its lexical and
grammatical constructors. There's no such thing here. Everything is just
a large writhing glob of spaghetti. And that is what I believe is the root
of most of the problems with IMAP.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.1 (GNU/Linux)
Comment: http://www.geocities.com/SiliconValley/Peaks/5799/GPGKEY.txt

iD8DBQE4zbvL+3BFaxHnGY0RAkYVAKCXyUuJCSF27G7FjAQQpN9movIPpQCdG6Nu
inQ00pVyMe7AuXxUiQufMlA=
=dBOl
-----END PGP SIGNATURE-----


Steve Sobol

unread,
Mar 14, 2000, 3:00:00 AM3/14/00
to
From 'Sam':

>> You forgot to mention one thing - the OS you are using. I guess that you
>> are doing this on Linux, and Mark spends most of his time on - let me
>> guess - Solaris :-). The speed of directory scan varies a lot on those
>> platforms.
>
>Really? There's such a huge performance difference in the speed of
>opendir() and readdir() on different platforms?

I would bet that it's not the performance within the system libraries,
but rather the layout of the filesystem that makes the difference.

Linux boxen and Sun Microsystems' products both use the same hard drives.
I could take the IBM SCSI hard drive connected to the SUN SCSI controller
in the SparcStation 20 I revived back at the end of 1998, and stick it in
a PC running Linux using a BT958, and it would work just as well.

But Suns do tend to handle large directories more efficiently, from what
I understand.


--
North Shore Technologies, Cleveland, OH http://NorthShoreTechnologies.net
Steve Sobol, President, Chief Website Architect and Janitor
sjs...@NorthShoreTechnologies.net - 888.480.4NET - 216.619.2NET

Yiorgos Adamopoulos

unread,
Mar 14, 2000, 3:00:00 AM3/14/00
to
In article <courier.38CD...@email-scan.webcircle.com>, Sam wrote:
>-----BEGIN PGP SIGNED MESSAGE-----
>Hash: SHA1
>
>In article <butenko-1403...@stalker.gamma.ru>,
> but...@stalker.com (Vladimir A. Butenko) writes:
>
>> And now you (and Sam) suggest to, actually, develop a NEW protocol - i.e.
>
>No, I don't believe I've ever made any such suggestion.

No, that was me ;-)

Yiorgos Adamopoulos

unread,
Mar 14, 2000, 3:00:00 AM3/14/00
to
>It's not going to happen.

If I am ever to finish my PhD, another (maybe not better) protocol is going
to happen ;-)

--
${talks} /* money talks */

Yiorgos Adamopoulos

unread,
Mar 14, 2000, 3:00:00 AM3/14/00
to
>Really? There's such a huge performance difference in the speed of
>opendir() and readdir() on different platforms?

Yes, because you do execute them on different filesystems. On linux this
is most probably ext2fs whereas on Solaris this is UFS.

Vladimir A. Butenko

unread,
Mar 14, 2000, 3:00:00 AM3/14/00
to
<s...@email-scan.webcircle.com> wrote:

> >
> > You forgot to mention one thing - the OS you are using. I guess that you
> > are doing this on Linux, and Mark spends most of his time on - let me
> > guess - Solaris :-). The speed of directory scan varies a lot on those
> > platforms.
>

> Really? There's such a huge performance difference in the speed of
> opendir() and readdir() on different platforms?

Yes, really. C'est la vie.



> > upgrade the software. Yes, there are not 2000 files in that directory
> > (several orderes of magnitude more) and they are not in a flat directory,
> > and there is a "stat" call issued for each of the found files, but that's
> > still too slow.
>
> Well, I guess I'm lucky, because I do not need to stat each file when
> opening a maildir. Just opendir() and readdir(). That's all.

Actually, in THAT operation (account scan, not mailbox scan and not
maildir mailbox scan) CGatePro in pre-3.2 version did not use "stat" at
all. The results are were still very slow under Solaris. OK, it was nt 10
minutes, it was 7, but that's still too long for a start-up of even a
major site.

So - the short answer is: really.

Vladimir A. Butenko

unread,
Mar 14, 2000, 3:00:00 AM3/14/00
to


> > IMAP is NOT a laguage. While one can try to deploy a "traditional"
>

> Well, it's not a language in the traditional sense, like C or Pascal.
> Still, I think it's more complicated to be described as a mere protocol.
> There's a lot of sophistication there, more than just meets the eye.
>

> > yacc-style parser for it, there are many catches there, because it is
> > place-dependent. There are no "atoms" and "special symbols" there, in the
> > strict sense of those terms. A regular parser that just calls "getLexem"
> > function would fail in many places there, as it would fail for most of
> > Inet protocols - SMTP, for example.
>

> And that's precisely the problem. Anything with the syntactical complexity
> of IMAP should have a distinct separation between its lexical and
> grammatical constructors. There's no such thing here. Everything is just
> a large writhing glob of spaghetti. And that is what I believe is the root
> of most of the problems with IMAP.

Sam, while I do understand your frustration, let me politely remind you
that there are much more things in the world that one can find on a farm
in Kansas. If you attempt to use a yacc-type parser with languages like
Fortran, Snobol4 or even APL, you will get into even more problems than
you have with IMAP. It's just not the right tool. Not all languages are
based on the same design, and some have a mark of long evolution...

Yes, I'd agree with you that today, after 40+ years of compiler
development and more or less standartized process of parsing, it would be
nice that everywhere we need a parser, we can use the same algorithms. But
- IMAP appeared many years ago, making it Pascal-language-like was not (I
think) one of the Mark's goals, and while we may all say - yes, it would
be better, IF... - we have what we have now, and I do not think anyone can
BLAME Mark for that.

If you are willing to accept a humble advice, I'd risk to give you one. I
have developed about 8 compilers during the 80th. In NO ONE I used a
table-driven parser, even in those that were designed to be table-parsed:
Pascal, Ada, for example. Plain SWITCH and IF/THEN parsing was proven (at
least, in my experience) to be much more readable, customizable and easier
to support than all those table-driven parsers one can read about in all
those thick and clever books. One of the advantages of that design is
context-sensitivity - you do not have to rely on a single "getLexem"
function (though you can rely on it for languages like Pascal/Ada), but
you can, instead, call getSymbol, or getAString, etc. in the places where
it is needed. This allows to parse languages like IMAP as easily as one
can parse languages like Pascal.

That's just my humble opinion. In no way I want to say that this is how
all IMAP parsers SHOULD be built. I just say that there is a method that
allows to parse IMAP easily, quickly, and w/o any spagetty code.

> -----END PGP SIGNATURE-----

Vladimir A. Butenko

unread,
Mar 14, 2000, 3:00:00 AM3/14/00
to
I think we should not keep the post subject unchanged, since the subject
of the discussion has changed, and since I do not think anyone should be
"beware of Courier-IMAP" - I think that we all should encourage Sam to
create a good IMAP server, and you, Mark, as the creator and evangelist of
the IMAP protocol would join me in wishing Sam the best luck. Good parent
can even kick their girl if she is acting wrong with her friends, but they
are not supposed to say "beware of her" to her potential, hmmm, customers.
And you are in the position of such a parent, does not matter if you like
it or not :-)

> On Tue, 14 Mar 2000, Vladimir A. Butenko wrote:

> > mdir helps to avoid
> > file locks (if it does not use index files), but does not help to
> > synchronise changes.
>

> Which is why we use locks in the first place! ;-)

Yes, but with a reminder: locks just ensure consistency on the mailbox
data, but they do not help to inform other "cleints" (client agents
within the server) that the change has occured. Thus - a need to check the
mailbox datafile/indexfile/directory periodically - thus an overhead.



> Yes, I agree completely with your (a)-(e).

Good! :-)



> Yes, transparency is very important, and often is neglected.
>
> c-client is "mostly" transparent with traditional UNIX mailbox format; the
> ">" is not needed unless the line really looks like a UNIX mailbox header
> line.

Mark, I'm pretty sure you get lots of support E-mail. And some of those
mails can say "what happened to my mailbox?" and contain the data from the
mailbox file. Copy-pasted. With exactly that separator line. So, to get it
inside a message is not such an unprobable event...



> However, I agree this is an issue with traditional UNIX mailbox format,
> and it's one of the big reasons why I never liked it.

There it's just more probable. BTW, on some systems (AIX?) Unix mailbox
format was modified to include one (or 4?) 0x01 (^A) characters in front
of the From line - I guess in attempt to make it more transparent.



> mbx, mtx, and tenex all allow shared read/write access, but they require
> the ability to synchoronize updates and that means no NFS. Even when you
> get locking out of the way, the inode vs. data cache problem over NFS will
> still bite you.

Nope :-). We do it differently, we do not acces data directly on NFS from
different sources, so there is not problem if something is cached.



> mbx has the additional win that it allows shared expunge. That's great
> for people who leave an IMAP client running 24/7, and then want to run an
> IMAP client on their mail someplace else. For many folks at UW, it's
> their office system that runs an IMAP client 24/7, for me, it's my home
> system...

So, you separate shared read/write and shared expunge, right? We treat
them all as read/write, and shared means shared - i.e. all types of
mailboxes support all types of shared operations - read, write (flag
modification), and expunge.

Yiorgos Adamopoulos

unread,
Mar 14, 2000, 3:00:00 AM3/14/00
to
In article <butenko-1403...@stalker.gamma.ru>, Vladimir A. Butenko wrote:
>There it's just more probable. BTW, on some systems (AIX?) Unix mailbox
>format was modified to include one (or 4?) 0x01 (^A) characters in front
>of the From line - I guess in attempt to make it more transparent.

IIRC, 4 ctrl-As means using MMDF.

--
${talks}

Mark Crispin

unread,
Mar 14, 2000, 3:00:00 AM3/14/00
to Vladimir A. Butenko
On Tue, 14 Mar 2000, Vladimir A. Butenko wrote:
> Mark, I'm pretty sure you get lots of support E-mail. And some of those
> mails can say "what happened to my mailbox?" and contain the data from the
> mailbox file. Copy-pasted. With exactly that separator line. So, to get it
> inside a message is not such an unprobable event...

Well, since I don't use traditional UNIX mailbox format (and have not for
many years), that has not been an issue for me. ;-)

Nevertheless, I can't remember the last time that I received a mailbox
that was copy-pasted. If people send me a mailbox to analyze, they
usually send it it as a MIME attachment. About once or twice a year
someone sends me a mailbox by uuencode, but I just don't see copy-pasted
mailboxes.

> BTW, on some systems (AIX?) Unix mailbox
> format was modified to include one (or 4?) 0x01 (^A) characters in front
> of the From line - I guess in attempt to make it more transparent.

That is MMDF format, which is the standard on SCO systems. It was quite
popular about 15-20 years ago. The CTRL/A characters replace the "From "
line, although some MMDF mailboxes have both.

> > mbx, mtx, and tenex all allow shared read/write access, but they require
> > the ability to synchoronize updates and that means no NFS. Even when you
> > get locking out of the way, the inode vs. data cache problem over NFS will
> > still bite you.
> Nope :-). We do it differently, we do not acces data directly on NFS from
> different sources, so there is not problem if something is cached.

Could you explain this? Do you access data on NFS? Do you access data
from different sources simultaneously? If the answer to these is "yes",
then how do you avoid doing both? That is, without telling your users
"don't do both", which is effectively what I do when I say "don't use
mbx/mtx/tenex formats over NFS"

> So, you separate shared read/write and shared expunge, right?

Not exactly, but for the purposes of this discussion, "yes".

More specifically, there are two lock states in mbx (mtx/tenex and
unix/mmdf also have two lock states, but these work differently so I'm not
discussion them here).

One lock state indicates that a process has the mailbox selected. Every
process that has the mailbox selected owns a share lock on this lock
state. If a process can acquire an exclusive share lock on this lock
state, it can compress out expunged messages during a CHECK or EXPUNGE;
otherwise, EXPUNGE will just mark deleted messages as invisible and allow
the other sharing processes to discover that (and percolate the untagged
EXPUNGE event to the MUA) on their own.

The other lock state governs the ability to parse the mailbox (meaning
discover new messages) with a share lock, or to append to the mailbox with
an exclusive lock. Unlike the first lock state, this is transient; a
process holds this lock (share or exclusive) as long as is necessary to do
the task at hand, then it releases it. The normal effect is that any
number of processes can be parsing the mailbox (this is done at select
time for the entire mailbox, and for new messages when they arrive), but
the MDA must be able to shut out all parsing (and other MDAs) when
delivering mail.

"Parse" in this context does not mean RFC822/MIME parsing. It just means
locating internal headers and acquiring the IMAP "fast" data for all the
messages reported with an untagged EXISTS.

> We treat
> them all as read/write, and shared means shared - i.e. all types of
> mailboxes support all types of shared operations - read, write (flag
> modification), and expunge.

If I remember correctly, you accomplish this by having a multi-threaded
server, so you don't have to worry about process/process interaction. You
also essentially assume that other software isn't going to be operating on
your files, right?

That's certainly the right thing to do if you can make those assumptions.
Unfortunately, I can not in UW imapd; I *must* have process/process
interaction and external software that is completely out of my control.

chris ulrich

unread,
Mar 15, 2000, 3:00:00 AM3/15/00
to
%%On Mon, 13 Mar 2000, Vladimir A. Butenko wrote:
%%
%%I believe that it is infeasible to build maildir support that scales well
%%(e.g. does not exhibit performance problems with a moderately large
%%mailbox of 2000 messages) and also does not violate a major rule of either
%%maildir or IMAP. It's a no-win situation for me; and therefore I choose
%%to allow the maildir enthusiast community to do their own development,
%%distribution, and support of maildir IMAP code.
%%-- Mark --

I just did a trial run with a folder with 9000+ messages in it, mostly
between 1 and 4k in size. I used the netscape4 mail client from a
pentiumII to connect to an E250 with an older CPU (250?mhz). It took
about one minute to open the entire folder. Moving about 2000 of these
messages from the middle of the folder to another folder that already had
stuff in it took about 2 minutes. The sun is running solaris 2.6 and is
using normal disks.
While this is hardly speedy, it is also not terrible given the size of
the folders. Given that this is a pretty evil boundry case, and given the
advantages of having an NFS safe mailstore, I'd consider this to be
acceptable performance. I'm using courier-imap version 0.25a.
(and for the record, wu-imap performed better opening a different 8000
message folder on another machine; I didn't test moving messages from one
folder to another; wu-imap was using mbox formatted folders).
chris

Sam

unread,
Mar 15, 2000, 3:00:00 AM3/15/00
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

In article <8amm0s$bie$1...@pravda.ucr.edu>,
cdu@jawa. (chris ulrich) writes:

> Moving about 2000 of these
> messages from the middle of the folder to another folder that already had
> stuff in it took about 2 minutes.

That's not bad, when you consider that 2000 messages were physically
duplicated there. They were not simply hardlinked into a different folder,
they were physically copied. Then, Netscape probably went in and marked
the originals as deleted, which translates to a filesystem rename. I'm
pretty sure there's no expunge here, though.

Your BUFSIZ is probably 8K, so all your messages required only one read and
one write to be copied over. So, that's 4000 opens, 4000 closes, 2000
reads, 2000 writes; and 2000 renames, in two minutes.

I don't think it would matter whether you've taken them from the beginning
or from the end of the folder. The order in which the messages were
displayed had absolutely nothing to do with their physical order in the
directory.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.1 (GNU/Linux)
Comment: http://www.geocities.com/SiliconValley/Peaks/5799/GPGKEY.txt

iD8DBQE4zvTd+3BFaxHnGY0RAinRAKCoEkH8HPAtCE1ObNbjwXrLLLIBEwCbBRij
pcqO2n2abfOZpI5+/TRcNgY=
=kGMG
-----END PGP SIGNATURE-----


Villy Kruse

unread,
Mar 15, 2000, 3:00:00 AM3/15/00
to
On Tue, 14 Mar 2000 18:44:58 +0300,
Vladimir A. Butenko <but...@stalker.com> wrote:

> ... taken out of context ...

>
>> However, I agree this is an issue with traditional UNIX mailbox format,
>> and it's one of the big reasons why I never liked it.
>

>There it's just more probable. BTW, on some systems (AIX?) Unix mailbox


>format was modified to include one (or 4?) 0x01 (^A) characters in front
>of the From line - I guess in attempt to make it more transparent.
>

FYI this is MMDF format which has ben used for a long time on SCO
systems, and whcih, by the ways, is also supported by c-client, and
therefore by WU imapd and pine.

However, I've seen a problem (very rarely) when some pop server on a SCO
system treats this mailbox a s a regular unix mailbox, and therefore
the ^A^A^A^A sneaks into the reply. Pine will then take this as a
message separator and the message is cut in half. This problem will
probably go away as the old SCO systems (now 5 to 10 years old) are
phased out and the MMDF mail box format with it.


Villy

Vladimir A. Butenko

unread,
Mar 15, 2000, 3:00:00 AM3/15/00
to
In article
<Pine.NXT.4.30.000314...@Tomobiki-Cho.CAC.Washington.EDU>,
Mark Crispin <m...@CAC.Washington.EDU> wrote:

> That is MMDF format, which is the standard on SCO systems. It was quite
> popular about 15-20 years ago. The CTRL/A characters replace the "From "
> line, although some MMDF mailboxes have both.

Thank you all for educating me - I did not know that.

> > > mbx, mtx, and tenex all allow shared read/write access, but they require
> > > the ability to synchoronize updates and that means no NFS. Even when you
> > > get locking out of the way, the inode vs. data cache problem over NFS will
> > > still bite you.
> > Nope :-). We do it differently, we do not acces data directly on NFS from
> > different sources, so there is not problem if something is cached.
>
> Could you explain this?

A bit :-)

> Do you access data on NFS?

On back-end Mail Servers - yes.

> Do you access data from different sources simultaneously?

If you look at this from the client point of view - yes. If you look from
the NFS server point of view you will see that any mailbox can be accessed
by only back-end server at the same time. That's the CommuniGate Pro
Dynamic Cluster software that does the trick.

> If the answer to these is "yes", then how do you avoid doing both?
> That is, without telling your users
> "don't do both", which is effectively what I do when I say "don't use
> mbx/mtx/tenex formats over NFS"

That's the point. Imagine a simpler cluster (w/o frontends). Just 5
back-end servers running with the same set of NFS boxes. With a load
balancer in front of them. BTW, regular load balancer are not that good in
balancing MAIL load, since mail load != mail traffic, but for a smaller
(1,000,000 users) site - regular load balancer is still OK.

So, you hit the server A and open an IMAP/POP session with account X
mailbox Y (WebMail sessions are processed in a slightly different manner,
so I'm talking about POP/IMAP only). The server A accesses the mailbox
data on NFS directly. Other connections can be opened, and till they hit
the same server, all works OK, like they work on a single-server
CommuniGate Pro installation - with the server software doing all
synchronization INSIDE the server.

Then a new connection is established (from the same user or from a
different location) - to the same account X, mailbox Y. Besides POP and
IMAP, it can be, for example, an incoming message that came from SMTP or
that was generated locally by some Automated Rule, etc.

So, that connection was directed by the load balancer to a DIFFERENT
server B. What the CommuniGate Pro Dynamic Cluster component does is
detecting the conflict and making the server B access that mailbox VIA THE
SERVER A, where the server "B" would be treated as just one more client -
like those real POP, IMAP, etc clients connected to the server A and
accessing that mailbox.

Advanatages of this design are obvious, disadvantages are few:
a) overhead caused by that conflict-catching mechanism
b) overhead caused by indirect access to mailbox if the conflict is detected.

a) - I just ask you to believe me that it's small - we have designed it so
that it can cause additional latency, but not decrease the throughput
For IMAP and WebMail the overhead is almost zero, for POP it is larger -
since there are much more POP *sessions* going up and down every second
compared to IMAP/Web sessions that stay open for a long time. Since major
sites are still mostly POP, and overhead there is small - we think that as
IMAP and Web will take over POP the overhead will only decrease.

b1) happens ONLY when the conflict is detected, and on a site with just 10,000
accounts, the probability is already very small (<<1%), and on a site with
1,000,000 accounts - it's VERY small.

b2) b1 is not a point at all :-). Because if you build a REALLY big
CGatePro Cluster, you do use front-ends - for reliability of the site
(they protect backends from all types of attacks), for REAL load
balancing, etc. And if you use frontends, ALL connections are "indirect",
so there is no additional overhead.

CommuniGate Pro 3.3 now has a built-in SNMP agent, and it allows to grab
that statistics data from all servers. Since it is in beta now, no major
Cluster site is using it, but as soon as it is released (hopefully, in
mid-April), we will have much more statistical data from the REAL servers,
not the test clusters in our labs. If you are interested, drop me a note
in May - I hope we'll have more precise data that time.

CGatePro Cluster ensures that you do not have indirection loops. Imagine
that the server B tries to connect to some mailbox via the server A, but
while it is doing that the server A closes all its session to that
mailbox, the server C has opened it, so if special measures are not taken,
the server A will have to connect to the server C on the server B behalf,
etc - you can even model a situation where you may have loops. CGatePro
Cluster ensures that there is no indirection loops and also takes special
efforts to minimize the probability of the "wrong hit". On one of the
clusters there were just 345 "wrong hits" during 11,000,000 IMAP sessions.

> One lock state indicates that a process has the mailbox selected. Every
> process that has the mailbox selected owns a share lock on this lock
> state. If a process can acquire an exclusive share lock on this lock
> state, it can compress out expunged messages during a CHECK or EXPUNGE;
> otherwise, EXPUNGE will just mark deleted messages as invisible and allow
> the other sharing processes to discover that (and percolate the untagged
> EXPUNGE event to the MUA) on their own.

But it will discover that by some [expensive] file-level operation that it
has to do periodically, right?

> The other lock state governs the ability to parse the mailbox (meaning
> discover new messages) with a share lock, or to append to the mailbox with
> an exclusive lock.

BTW, the design we use requires only one mailbox parsing, since the
Mailbox Object is opened as long as at least on client needs it, and
parsing happens only during mailbox opening process. But, as I said, on
large sites the probability of simultaneous access to mailbox is very
small - so there is no big win here.

> Unlike the first lock state, this is transient; a
> process holds this lock (share or exclusive) as long as is necessary to do
> the task at hand, then it releases it.

What happens if the process fails at that time? Will the lock be released
automatically?


> The normal effect is that any
> number of processes can be parsing the mailbox (this is done at select
> time for the entire mailbox, and for new messages when they arrive), but
> the MDA must be able to shut out all parsing (and other MDAs) when
> delivering mail.

Sure. But we do not need to re-parse the mailbox when a new message
arrives (or added by IMAP APPEND, for example): if the mailbox is in the
"parsed" state, the message is added, and the mailbox "parsed" data is
updated to include the new info (already available at the time of message
arrival).

> "Parse" in this context does not mean RFC822/MIME parsing. It just means
> locating internal headers and acquiring the IMAP "fast" data for all the
> messages reported with an untagged EXISTS.

Sure.


> > We treat
> > them all as read/write, and shared means shared - i.e. all types of
> > mailboxes support all types of shared operations - read, write (flag
> > modification), and expunge.
>
> If I remember correctly, you accomplish this by having a multi-threaded
> server, so you don't have to worry about process/process interaction. You
> also essentially assume that other software isn't going to be operating on
> your files, right?

Yes and no. You can specify that some mailboxes are "EXTERNAL", and thus
exposed to "legacy applications" - delivery agents, "local mailers", etc.
Those mailboxes are treated differently, of course, - but usually they
are used on small sites only (mostly - in Universities and during the
migration process).


> That's certainly the right thing to do if you can make those assumptions.
> Unfortunately, I can not in UW imapd; I *must* have process/process
> interaction and external software that is completely out of my control.

Yes, there are different markets for our products.

If someone needs to continue to use some legacy mailers and delivery
agents that deal directly with mailboxes (and deal differently and not
always correctly) - it is NOT wise to try to install CommuniGate Pro
there, and we always say that those sites should stay with UW imapd, since
UW imapd is designed for that.

If someone needs to build either a very large server, or a completely
"new" server, so there is no need to continue to support legacy delivery
agents and "local mailers" (i.e. all access will be via POP/IMAP/Web),
then that's the CGatePro market. As well as other vendor's market, of
course. But there is the field where we hope to continue to lead the pack
:-)

Mark Crispin

unread,
Mar 15, 2000, 3:00:00 AM3/15/00
to Vladimir A. Butenko
On Wed, 15 Mar 2000, Vladimir A. Butenko wrote:
> > [Discovery of flag changes in shared read/write mbx mailboxes.]

> But it will discover that by some [expensive] file-level operation that it
> has to do periodically, right?

It's the same operation that's done to discover new mail: stat().

> > Unlike the first lock state, this is transient; a
> > process holds this lock (share or exclusive) as long as is necessary to do
> > the task at hand, then it releases it.
> What happens if the process fails at that time? Will the lock be released
> automatically?

System call locks are used, so the locks are automatically released in the
event of process failure.

> Sure. But we do not need to re-parse the mailbox when a new message
> arrives (or added by IMAP APPEND, for example): if the mailbox is in the
> "parsed" state, the message is added, and the mailbox "parsed" data is
> updated to include the new info (already available at the time of message
> arrival).

We don't "re-parse" either; once the SELECT is done, the only parsing is
on new messages. The only thing that is ever touched on old messages
after the SELECT-time parse is done are the flags. Flags can be changed,
of course. Also, when some other process changes the flags of a message,
we only know that some message's flags were changed; we don't know which
so we have to do a global flags sweep. Fortunately, that sweep is
relatively fast, and doesn't need to be done often in any case.

> Those mailboxes are treated differently, of course, - but usually they
> are used on small sites only (mostly - in Universities and during the
> migration process).

UW has 80K users, so isn't quite a "small" site. ;-)

> If someone needs to continue to use some legacy mailers and delivery
> agents that deal directly with mailboxes (and deal differently and not
> always correctly) - it is NOT wise to try to install CommuniGate Pro
> there, and we always say that those sites should stay with UW imapd, since
> UW imapd is designed for that.

Exactly. However, given that CommuniGate Pro effectively runs in a
"sealed" server ala Cyrus and Exchange, I'm surprised that you stayed with
the UNIX mbox format instead of something that is much more optimized for
IMAP (like what Cyrus and especially Exchange did).

Vladimir A. Butenko

unread,
Mar 16, 2000, 3:00:00 AM3/16/00
to
In article
<Pine.NXT.4.30.00031...@Tomobiki-Cho.CAC.Washington.EDU>,
Mark Crispin <m...@CAC.Washington.EDU> wrote:

> > > Unlike the first lock state, this is transient; a
> > > process holds this lock (share or exclusive) as long as is necessary to do
> > > the task at hand, then it releases it.
> > What happens if the process fails at that time? Will the lock be released
> > automatically?
>

> System call locks are used, so the locks are automatically released in the
> event of process failure.

Hm. But you've said you use two locks per mailbox. Where mailbox is a
file. So, how do you put two different locks on one file using OS-level
locks?



> > Sure. But we do not need to re-parse the mailbox when a new message
> > arrives (or added by IMAP APPEND, for example): if the mailbox is in the
> > "parsed" state, the message is added, and the mailbox "parsed" data is
> > updated to include the new info (already available at the time of message
> > arrival).
>

> We don't "re-parse" either; once the SELECT is done, the only parsing is
> on new messages. The only thing that is ever touched on old messages
> after the SELECT-time parse is done are the flags. Flags can be changed,
> of course. Also, when some other process changes the flags of a message,
> we only know that some message's flags were changed; we don't know which
> so we have to do a global flags sweep. Fortunately, that sweep is
> relatively fast, and doesn't need to be done often in any case.

OK, so you have either a small additional file with "indeces" (that keeps
all message flags) or you keep them in the original mailbox file and then
you have to re-read it.



> > Those mailboxes are treated differently, of course, - but usually they
> > are used on small sites only (mostly - in Universities and during the
> > migration process).
>

> UW has 80K users, so isn't quite a "small" site. ;-)

While i can believe that there is ONE server @ UW that handles all those
users, it's hard to imagine that all of those users have and USE shell
accounts on that very server. For example, in Stanford (that has started
to switch its mail services to CGatepro from Sun's SIMS and other
solutions) an avg server handles just 1000-5000 users, though they have
many servers installed. A system with 5000 shell accounts is a large site
from the Unix OS point of view, but a small site in terms of CGatePro mail
server.


> > If someone needs to continue to use some legacy mailers and delivery
> > agents that deal directly with mailboxes (and deal differently and not
> > always correctly) - it is NOT wise to try to install CommuniGate Pro
> > there, and we always say that those sites should stay with UW imapd, since
> > UW imapd is designed for that.
>

> Exactly. However, given that CommuniGate Pro effectively runs in a
> "sealed" server ala Cyrus and Exchange, I'm surprised that you stayed with
> the UNIX mbox format instead of something that is much more optimized for
> IMAP (like what Cyrus and especially Exchange did).

We do not "stay" :-) CGatePro mailbox management is completely modular.
Hopefully it would be of some interest to this group readers - there is an
almost complete interface of the abstract Mailbox "object" as it is seen
from the CGatePro kernel (I left the virtual classes only):

class VMailbox : public STObject {
virtual int getUIDValidity(void) = NIL;
virtual bool getFirstRecent(VMailboxMessageID& theID,
bool resetRecent) = NIL;
virtual bool parse(void) = NIL;
virtual mailboxView* createPhysicalView(void) = NIL;
virtual STErrorCode addMessage(ReadableSource* theSource,
VString theReturnPath,
SBData* additionalHeaders,
messageView* pNewView) = NIL;
virtual bool getPhysicalMessageView(VMailboxMessageID theID,
messageView* pView) = NIL;
virtual STFileOffset getPhysicalMessageSize(VMailboxMessageID theID,
bool withCRLF) = NIL;
virtual STErrorCode readPhysicalMessage(SBMutableData& theBuffer,
VMailboxMessageID theID,
STFileOffset offset,size_t maxLength)=NIL;
virtual void physicallyRemoveMarkedMessages(mailboxView* theView)=NIL;
virtual int storeMessageFlags(VMailboxMessageID theID,
messageFlags* updateFlags,
flagsOperation operation) = NIL;
};

The synchronization and other boring tasks are performed by the Mailbox Manager
itself, and implementations of particular mailbox formats should not care
about it, they can be written (and they are written) in the same manner they
would be written for a server that handles just one client at a time.

As you can see, to support a new mailbox format one would just write 10
functions/methods.

We can support, for example, a BSD mailbox with an index, if anyone would
need it. But as you saw in the Log files I posted here few months ago,
parsing even a multi-MB "BSD" mailbox with 10,000 messages is a matter of
second in CGatePro, so why should we implement something (an additional
mailbox format) that noone is likely to use? .mdir is different - and as
we've ageed, there are several situations when it is useful, also there
are plenty of "mdir fans" running around - so we have .mdir support.

A much more interesting thing is "database"-based mailboxes. This was long
on our "To Do" list, but putting all those ORACLE client-side utilities
into CGatePro code seemed to be a VERY bad idea. Fortunately, now a
better, standartized interfaces to varios DB sources emerge, and we plan
to support them.

So, one is able to store messages in some SQL database, but still access
them via POP,IMAP, WebMail, and AT THE SAME time - make advanced SQL
operations on those messages. This feature becomes more and more popular
on many corporate sites.

Yiorgos Adamopoulos

unread,
Mar 16, 2000, 3:00:00 AM3/16/00
to
In article <butenko-1603...@stalker.gamma.ru>, Vladimir A. Butenko wrote:
>A much more interesting thing is "database"-based mailboxes. This was long
>on our "To Do" list, but putting all those ORACLE client-side utilities
>into CGatePro code seemed to be a VERY bad idea. Fortunately, now a
>better, standartized interfaces to varios DB sources emerge, and we plan
>to support them.
>
>So, one is able to store messages in some SQL database, but still access
>them via POP,IMAP, WebMail, and AT THE SAME time - make advanced SQL
>operations on those messages. This feature becomes more and more popular
>on many corporate sites.

You pretty much describe some of what I am trying to do. Although you can
act upon emails as BLOBs and have a simple design where every user is a
table on the system and every email a record on the table with attributes
serial number, headers and body. If you need subfolders it gets trickier
(but not hard) add OO - but hey Object Databases are still very slow.
You need to play with the native interface of every database (and version)
to achieve speed. For the large number of transactions that you will need,
ODBC (and JDBC, DBI/DBD) won't be faster than what you have today.

It would be nice to device a *storage manager* that deals only with email
(and multimedia data- since all those cool .jpg and .avi are sent arround).
You could then have filters to do feature extraction on the message and
flag it as needed, so that you could select "all the videos" from the mail
store, etc. A storage manager, not a complete DB with all the bells and
whistles.

--
Yiorgos Adamopoulos -- #include <std/disclaimer.h>
ad...@dblab.ece.ntua.gr -- Knowledge and Data Base Systems Laboratory, NTUA

Mark Crispin

unread,
Mar 16, 2000, 3:00:00 AM3/16/00
to Vladimir A. Butenko
On Thu, 16 Mar 2000, Vladimir A. Butenko wrote:
> > System call locks are used, so the locks are automatically released in the
> > event of process failure.
> Hm. But you've said you use two locks per mailbox. Where mailbox is a
> file. So, how do you put two different locks on one file using OS-level
> locks?

Ha! That's a trade secret! ;-)

I use an auxillary file whose sole purpose is to hold the second system
call lock. This is different from .lock locking, in which the second file
is the lock.

I wish that UNIX offered thawed vs. frozen opens; and that it offered
multiple named locks on a file. But it doesn't, so one has to do what he
can within UNIX's limitations.

> OK, so you have either a small additional file with "indeces" (that keeps
> all message flags) or you keep them in the original mailbox file and then
> you have to re-read it.

You don't have to re-read the file; just the flags. Random access I/O is
wonderful. A lot of people don't understand it; and NFS does its best to
discourage you from doing read/write random access I/O. But the
functionality is there.

> While i can believe that there is ONE server @ UW that handles all those
> users, it's hard to imagine that all of those users have and USE shell
> accounts on that very server.

We do not have shell accounts on our mail servers, and we have many mail
servers.

> We do not "stay" :-) CGatePro mailbox management is completely modular.

As is c-client. But the point is, if it's a sealed server why bother with
legacy formats?

> As you can see, to support a new mailbox format one would just write 10
> functions/methods.

Do you support IMAP as one of your mailbox formats (e.g. can you proxy)?
If so, I think that you need more than 10 methods. c-client has 33
methods per driver, mostly because of IMAP which uses most of them.
Fortunately, many of these can be null (= use the default) and some others
are broilerplate, so only about a dozen are significant. So we're in the
same ballpark.

The actual c-client API closely matches IMAP (no big surprise there), so
from the application's perspection it looks like all the world is IMAP.

This also means that the c-client based IMAP server is a very simple
program. All it does is parse IMAP commands into c-client API calls.

Vladimir A. Butenko

unread,
Mar 17, 2000, 3:00:00 AM3/17/00
to

> On Thu, 16 Mar 2000, Vladimir A. Butenko wrote:
> > > System call locks are used, so the locks are automatically released in the
> > > event of process failure.
> > Hm. But you've said you use two locks per mailbox. Where mailbox is a
> > file. So, how do you put two different locks on one file using OS-level
> > locks?
>

> Ha! That's a trade secret! ;-)
>
> I use an auxillary file whose sole purpose is to hold the second system
> call lock. This is different from .lock locking, in which the second file
> is the lock.

So, if the system crashes, you do not have to care about all those .lock
files that, hmm, some large-scale servers use. And that make them "close
down for a clean-up" for many hours....



> > OK, so you have either a small additional file with "indeces" (that keeps
> > all message flags) or you keep them in the original mailbox file and then
> > you have to re-read it.
>

> You don't have to re-read the file; just the flags. Random access I/O is
> wonderful. A lot of people don't understand it; and NFS does its best to
> discourage you from doing read/write random access I/O. But the
> functionality is there.

NFS has nothing to do with it. ANY file access is slow, and random is just
a bit slower. When we optimized the flag updates algorithm in CGatePro 5
months ago, om heavy-loaded sites disk i/o subsytem load dropped 2-3 fold.
Mostly for POP operations, but it improved IMAP, too.

Speaking about the flags: there is a Q for you. I think you should give us
a definite answer, since you was the person who has created the IMAP
specs, so you must know better ;-).

What SHOULD happen if I copy a message from one mailbox to a different
one? Should that message appear as "RECENT" in that mailbox or not? We
have a huge fight among our customers - what is the "right way" to do
this.


> > While i can believe that there is ONE server @ UW that handles all those
> > users, it's hard to imagine that all of those users have and USE shell
> > accounts on that very server.
>

> We do not have shell accounts on our mail servers, and we have many mail
> servers.

If you do not have shell account there, then what's the sense to use a
server design that limits itself intentionally to be compatible with tools
used from shell accounts on the same server?


> > We do not "stay" :-) CGatePro mailbox management is completely modular.
>

> As is c-client. But the point is, if it's a sealed server why bother with
> legacy formats?

Because the format itself is not bad enough. We could, of course, change
"From " to something like ^A^A^A^A, to make the mailbox more transparent,
but that's not a big deal. And the idea of one text file used as a mailbox
is not bad at all - if the manager handling that format is designed in an
efficient way.


> > As you can see, to support a new mailbox format one would just write 10
> > functions/methods.
>

> Do you support IMAP as one of your mailbox formats (e.g. can you proxy)?

Yes.

> If so, I think that you need more than 10 methods.

It happened that we need less, and I showed you all of them :-) Do not
forget that those are just the internal objects of the Mailbox Manager,
and the Manager itself has some brains. IMAP, POP, WebMail and Delivery
modules do not talk directly to those objects - they talk to the Manager.
That does a lot (like synching, flag handling optimization, etc.) - but
does not care about the physical implementation of the mail store.

> c-client has 33
> methods per driver, mostly because of IMAP which uses most of them.

That's the point: we have an additional layer that simplifies the things.


> Fortunately, many of these can be null (= use the default) and some others
> are broilerplate, so only about a dozen are significant. So we're in the
> same ballpark.

Then, it's very good.



> The actual c-client API closely matches IMAP (no big surprise there), so
> from the application's perspection it looks like all the world is IMAP.

As you can see, the VMailbox methods I showed are not quite IMAP-like, but
if you look at the mailbox manager methods, they do resemble IMAP to a
certain extent.



> This also means that the c-client based IMAP server is a very simple
> program. All it does is parse IMAP commands into c-client API calls.

Sure. As I said, the IMAP server is a very small portion of CGatePro code
- even with all those ACL, QUOTA, STARTTLS, etc extensions it has to
handle.

Vladimir A. Butenko

unread,
Mar 17, 2000, 3:00:00 AM3/17/00
to
In article <slrn8d27p4...@ithaca.dbnet.ece.ntua.gr>,
ad...@MyRealBox.com wrote:

> In article <butenko-1603...@stalker.gamma.ru>, Vladimir A.
Butenko wrote:

> >A much more interesting thing is "database"-based mailboxes. This was long
> >on our "To Do" list, but putting all those ORACLE client-side utilities
> >into CGatePro code seemed to be a VERY bad idea. Fortunately, now a
> >better, standartized interfaces to varios DB sources emerge, and we plan
> >to support them.
> >
> >So, one is able to store messages in some SQL database, but still access
> >them via POP,IMAP, WebMail, and AT THE SAME time - make advanced SQL
> >operations on those messages. This feature becomes more and more popular
> >on many corporate sites.
>

> You pretty much describe some of what I am trying to do.

No surprise. I have not posted here anything about something
extraordinary. Most of the things we do other companies do, too. The
difference is - how.

> Although you can
> act upon emails as BLOBs and have a simple design where every user is a
> table on the system and every email a record on the table with attributes
> serial number, headers and body.

a) every mailbox, not user
b) add an attribute called "mailbox ID", and add a table that matches user
name/mailbox name pair to that "mailbox ID", and you can store ALL
messages in one table.

>If you need subfolders it gets trickier

No. Submailbox is just something with a different "mailbox ID".

> (but not hard) add OO - but hey Object Databases are still very slow.

No need for OO here - regular SQL can handle it well.

> You need to play with the native interface of every database (and version)
> to achieve speed. For the large number of transactions that you will need,
> ODBC (and JDBC, DBI/DBD) won't be faster than what you have today.

That's what I was talking about - we do not want to put the ORACLE native
client libs into CGatePro.

> It would be nice to device a *storage manager* that deals only with email
> (and multimedia data- since all those cool .jpg and .avi are sent arround).
> You could then have filters to do feature extraction on the message and
> flag it as needed, so that you could select "all the videos" from the mail
> store, etc. A storage manager, not a complete DB with all the bells and
> whistles.

You can do this with RDBMS, too. Design a hierarchy you need, then convert
it into the "3rd Normal Form", and any SQL server will handle it well.
More or less :-)

> Yiorgos Adamopoulos -- #include <std/disclaimer.h>

--

Yiorgos Adamopoulos

unread,
Mar 17, 2000, 3:00:00 AM3/17/00
to
In article <butenko-1703...@stalker.gamma.ru>, Vladimir A. Butenko wrote:
>a) every mailbox, not user

I was roughly schetching something in a post ;-)

>b) add an attribute called "mailbox ID", and add a table that matches user
>name/mailbox name pair to that "mailbox ID", and you can store ALL
>messages in one table.

You pretty much describe .mbox stored in a DBMS.

>No need for OO here - regular SQL can handle it well.

Not always. You program using OO. Wouldn't querrying the storage in OO be
nicer? You could store mail objects at once. Of course I am not in a
position to know the internals of your design so I am only speculating.

>You can do this with RDBMS, too. Design a hierarchy you need, then convert

Yes. But having Informix, Oracle, whatever running just to play the
storage deposit is overkill (and maybe if not on performance, it surely is
a waste of money). Using say, Berkeley NEWDB with what you described above
would be faster.

>it into the "3rd Normal Form", and any SQL server will handle it well.
>More or less :-)


--

Mark Crispin

unread,
Mar 17, 2000, 3:00:00 AM3/17/00
to Vladimir A. Butenko
On Fri, 17 Mar 2000, Vladimir A. Butenko wrote:
> So, if the system crashes, you do not have to care about all those .lock
> files that, hmm, some large-scale servers use. And that make them "close
> down for a clean-up" for many hours....

Even with .lock files, that shouldn't ever be necessary. You're supposed
to break a .lock file that is more than 5 minutes old, or if you've been
waiting for longer than 5 minutes.

> NFS has nothing to do with it. ANY file access is slow, and random is just
> a bit slower. When we optimized the flag updates algorithm in CGatePro 5
> months ago, om heavy-loaded sites disk i/o subsytem load dropped 2-3 fold.
> Mostly for POP operations, but it improved IMAP, too.

Actually, NFS does have something to do with it. Update mode doesn't work
right with NFS, particularly when multiple NFS clients are involved. The
inode and the data caches get out of synch and you end up having to do
something to flush to buffer cache to fix it.

As far as the "optimizing flag updates algorithm", I assume that you mean
that you have the flags together in one place in the database so you can
read/update multiple flags with fewer I/O operations?

> What SHOULD happen if I copy a message from one mailbox to a different
> one? Should that message appear as "RECENT" in that mailbox or not? We
> have a huge fight among our customers - what is the "right way" to do
> this.

In my implementation, COPY never sets \Recent in the destination, and
APPEND always sets \Recent.

Rationale:

COPY is required to preserve flags; and thus what is copied is no longer a
"virgin" message. So COPY should not set \Recent.

APPEND takes an optional flags argument, but isn't defined to allow
setting \Recent (this is arguably a design flaw in IMAP). Since one of
the uses of APPEND can be for new mail delivery, it's better to set
\Recent than not to set it.

I wouldn't go so far as to say this is the only acceptable way to do
things; just that this seems to make the most sense in spite of being
obtuse.

The real "right thing" would probably be for COPY never to set \Recent,
and APPEND to allow \Recent as an argument.

> If you do not have shell account there, then what's the sense to use a
> server design that limits itself intentionally to be compatible with tools
> used from shell accounts on the same server?

Not server design. Driver design. And we don't use that driver!

We do have a few IMAP servers which run on shell systems, but those are
generally small workgroups, not the main facility.

> It happened that we need less, and I showed you all of them :-) Do not
> forget that those are just the internal objects of the Mailbox Manager,
> and the Manager itself has some brains. IMAP, POP, WebMail and Delivery
> modules do not talk directly to those objects - they talk to the Manager.
> That does a lot (like synching, flag handling optimization, etc.) - but
> does not care about the physical implementation of the mail store.

c-client works in the same way. Applications never talk directly to
drivers.

The main methods in c-client are:
validate mailbox (is this driver right for this mailbox?)
create mailbox (in this format)
delete mailbox (format-specific considerations)
rename mailbox (format-specific considerations)
open mailbox
close mailbox (format-specific considerations)
alter flags
fetch header
fetch text
check
expunge
copy
append

There are a number of additional methods which a driver can also supply in
lieu of c-client's standard handling. For example, the IMAP driver is the
only one which uses the search method, since in IMAP you want to send a
SEARCH command to the server rather than doing it locally in c-client.
IMAP is the hairiest driver, since most c-client operations need to be
turned into IMAP driver method calls, and in turn to IMAP commands. Local
file drivers tend to be much smaller.

> > c-client has 33
> > methods per driver, mostly because of IMAP which uses most of them.
> That's the point: we have an additional layer that simplifies the things.

I don't think that you understand. c-client provides a standard search
method; therefore most drivers do not have their own search method. Only
the IMAP driver has a search method. But that counts as one of the 33
methods. So does the sort method, the thread method, the "fetch fast"
method, the "fetch flags" method, the "fetch partial" method,... all of
which only the IMAP driver has and count towards those 33 methods.

c-client is an additional layer too, and for most of these IMAP-only
methods c-client has standard handling. But c-client doesn't know
anything about IMAP. It doesn't know how to send an IMAP SEARCH command.
All it does is either search itself, or see that the driver has a search
method and invoke the driver method.

Yes, I could have had many fewer methods, if c-client had special
knowledge about IMAP.

Vladimir A. Butenko

unread,
Mar 19, 2000, 3:00:00 AM3/19/00
to

> On Fri, 17 Mar 2000, Vladimir A. Butenko wrote:
> > So, if the system crashes, you do not have to care about all those .lock
> > files that, hmm, some large-scale servers use. And that make them "close
> > down for a clean-up" for many hours....
>
> Even with .lock files, that shouldn't ever be necessary. You're supposed
> to break a .lock file that is more than 5 minutes old, or if you've been
> waiting for longer than 5 minutes.

This can be acceptable on a small private site, but completely
unacceptable on a real production server. Recently, one of our clients had
a problem with their disk I/o infrastructure that suspended 15% of disk
operations for more than 10 minutes. If they were using locks and if
software using locks would have such am easy-breakable "locking" rules,
then they would already have most of their mailboxes corrupted.

This is the same as having a mutex/lock in a programming language: if you
say "lock the resource", and the system would wait for 5 minutes and then
pretend to lock it (breaking the existing lock), then this is definitely
not the system to be employed in a production environment.

> > NFS has nothing to do with it. ANY file access is slow, and random is just
> > a bit slower. When we optimized the flag updates algorithm in CGatePro 5
> > months ago, om heavy-loaded sites disk i/o subsytem load dropped 2-3 fold.
> > Mostly for POP operations, but it improved IMAP, too.
>
> Actually, NFS does have something to do with it. Update mode doesn't work
> right with NFS, particularly when multiple NFS clients are involved. The
> inode and the data caches get out of synch and you end up having to do
> something to flush to buffer cache to fix it.

What I meant was - ANY disk i/o operation is VERY expensive - NFS or
local. As far as I can see based on your postings, your entire site has
80K users, and they are distributed between several servers, with
something like 5K-10K users per server. That's not a big load, esp. if you
have a fast disk i/o system. On large sites you would quickly realize the
importance of disk i/o optimization at ANY cost.


> As far as the "optimizing flag updates algorithm", I assume that you mean
> that you have the flags together in one place in the database so you can
> read/update multiple flags with fewer I/O operations?

You do not HAVE to keep them together to minimize the i/o. You do know
that our default and most used format is plain old .mbox, where all flags
are stored together with a message. So, no - I was not talking about
changing the formats to minimize I/O operations needed to perform certain
mailbox operations. I meant designing the mailbox operations so that they
do not always result in disk i/o operations - no matter what the mailbox
physical format is.

> I wouldn't go so far as to say this is the only acceptable way to do
> things; just that this seems to make the most sense in spite of being
> obtuse.
>
> The real "right thing" would probably be for COPY never to set \Recent,
> and APPEND to allow \Recent as an argument.

Yeah... We always set \RECENT for new messages - does not matter how they
were added - COPY, APPEND, DELIVERY. Probably it's better to switch to
what you have outlined. We should do it in the current betas and see if it
causes any problem for any users.

> > If you do not have shell account there, then what's the sense to use a
> > server design that limits itself intentionally to be compatible with tools
> > used from shell accounts on the same server?
>
> Not server design. Driver design. And we don't use that driver!

You used to say that you had to use process-based server design (as
opposed to much more effective threads-based design) only because you had
to deal with other processes and building a server on fast inter-thread
interactions would not allow you to support external processes. So, I'm
talking about the server design at this moment, not about the mailbox
"driver" design. The "driver" design completely depends on the server
design.


> We do have a few IMAP servers which run on shell systems, but those are
> generally small workgroups, not the main facility.

Yes, and there, since inter-process interactions would exist in any case -
process-based server like UW imapd is a wise choice.

> > That does a lot (like synching, flag handling optimization, etc.) - but
> > does not care about the physical implementation of the mail store.
>
> c-client works in the same way. Applications never talk directly to
> drivers.
>
> The main methods in c-client are:
> validate mailbox (is this driver right for this mailbox?)
> create mailbox (in this format)
> delete mailbox (format-specific considerations)
> rename mailbox (format-specific considerations)
> open mailbox

Excuse me, but in any Objective Langue these could not me "methods" of a
virtual mailbox object. These ops would be methods of mailbox "class"
method, etc, - but in no way the mailbox methods themselves. BTW, what
language do you use for UW IMAPd? At least, not the create/open methods.

> close mailbox (format-specific considerations)
> alter flags
> fetch header
> fetch text
> check
> expunge
> copy
> append
>
> There are a number of additional methods which a driver can also supply in
> lieu of c-client's standard handling. For example, the IMAP driver is the
> only one which uses the search method, since in IMAP you want to send a
> SEARCH command to the server rather than doing it locally in c-client.
> IMAP is the hairiest driver, since most c-client operations need to be
> turned into IMAP driver method calls, and in turn to IMAP commands. Local
> file drivers tend to be much smaller.

Yes, but I do see the difference: since your design is completely
"light-linked", inter-process based - you have to have methods like
"check". Which, I assume, you have to call every time a mailbox
modification can be reported. And this call results in file i/o operations
- stat()/read()/etc. In CGatePro design you did not see that call. Instead
you saw that each mailbox client uses a mailbox "view", not the mailbox
directly, and the mailbox view contains all info about mailbox updates. No
need to use an additional method, no need to call i/o subsystem every so
often.

> > > c-client has 33
> > > methods per driver, mostly because of IMAP which uses most of them.
> > That's the point: we have an additional layer that simplifies the things.
>
> I don't think that you understand. c-client provides a standard search
> method; therefore most drivers do not have their own search method. Only
> the IMAP driver has a search method. But that counts as one of the 33
> methods. So does the sort method, the thread method, the "fetch fast"
> method, the "fetch flags" method, the "fetch partial" method,... all of
> which only the IMAP driver has and count towards those 33 methods.

OK, understood.



> Yes, I could have had many fewer methods, if c-client had special
> knowledge about IMAP.

Nope, we do not have any "c-clients", and our Mailbox Manager has no
special knowledge of IMAP - but it's unrelated to the topic. Which has
already shifted from Mailbox formats and areas of their pluses/minuses to
server designs and areas of their (server design) pluses/minuses.

Vladimir A. Butenko

unread,
Mar 19, 2000, 3:00:00 AM3/19/00
to
In article <slrn8d4q2v...@ithaca.dbnet.ece.ntua.gr>,
ad...@MyRealBox.com wrote:

> >No need for OO here - regular SQL can handle it well.
>
> Not always. You program using OO. Wouldn't querrying the storage in OO be
> nicer? You could store mail objects at once. Of course I am not in a
> position to know the internals of your design so I am only speculating.

What I was saying was: ALL operations one might need to control DB-based
mailbox store could be expressed in SQL (to be precise - in a good RDBMS
language).

The question is - how difficult it is to map mailbox operations onto RDBMS
operations - but that's a different question. I'm saying that it is
doable, and in a pretty effective manner, so regular RDBMS can be used,
and there is no need to use slow and ineffective OO-databases available on
the market today.

I do not want to say that it is BAD to use OO databases - as soon as they
become real, they will become a better candidate for mail store, too.



> >You can do this with RDBMS, too. Design a hierarchy you need, then convert
>
> Yes. But having Informix, Oracle, whatever running just to play the
> storage deposit is overkill (and maybe if not on performance, it surely is
> a waste of money). Using say, Berkeley NEWDB with what you described above
> would be faster.

If you need something, you pay. Either in cash, or in your own time, or in
consequences of not having what you really need. If you need mailbox store
- you need a mailbox store that can effectively handle lots of data. If
you do not need a mailbox store itself, but just want to play with it -
then you may use all those "newDB" or whatever.

That's all about the question of what is REALLY needed: a toy or a
product. When someone buys an expensive product to play with an idea for a
couple of weeks - that's not wise (call it an overkill). When someone buys
a toy to do a real job - that's not wise either. The first situation is
realized much quicker (since money for that unneeded expensive product
disappear immediately), the second situation is much harder to realize,
since all large things start as small games, and for some time toys used
instead of tools look like working.

Then they start to consume more and more of their owner time ("you start
to pay"), and if they do not break completely (as some freeware "DB" would
break on a 100GB store with some 10MB elements), they can continue to
pretend to "almost work" - for quite some time. By the day one finally
realizes that the investment was not done wisely, the expenses to support
"cheap" toy-product exceed the cost of the "expensive" product by an order
of magnitude.

So, on a large site using a product as cheep as ORACLE is not an overkill,
if this feature (DB store) is really needed. On a small university/lab
test site using a product as expensive as ORACLE can be an overkill. But
in this case I would ask ORACLE for a test license, rather than waste my
time learning "newDB". In case a real thing starts to emerge from my
games, I'd just pay ORACLE - and that's it. I would not have to re-design
the entire thing for a new DB platform, and I would not have to implement
a real thing using toy tools.

That's just my opinion. YMMV.

tmar...@andrew.cmu.edu

unread,
Mar 20, 2000, 3:00:00 AM3/20/00
to
This can be acceptable on a small private site, but completely
unacceptable on a real production server. Recently, one of our clients had
a problem with their disk I/o infrastructure that suspended 15% of disk
operations for more than 10 minutes. If they were using locks and if

Is this normal behavior for your product? Does every operation take 10
minutes? If my client was waiting on a FETCH for more than 5 minutes
I'd probably just give up and call the person on the telephone.


If you need something, you pay. Either in cash, or in your own time, or in
consequences of not having what you really need. If you need mailbox store
- you need a mailbox store that can effectively handle lots of data. If
you do not need a mailbox store itself, but just want to play with it -
then you may use all those "newDB" or whatever.

Are you claiming a product is inferior simply because it's open
source? You should probably email those apache guys to let them
know. Maybe you were sleeping while companies based upon open source
products (redhat, va research, etc..) got huge market capitalizations.

Or maybe you're insinuating that the most popular/expensive product is
always the best? I'll let you think of your own examples of why this is a
silly concept.

Berkeley db (the "toy" you refer to) I believe is a fairly good
product. It has it's problem areas but so do all the expensive
alternatives. If you run into trobles because "(as some freeware "DB"
would break on a 100GB store with some 10MB elements)" I'd argue your
system is broken not the database. Try a distributed approach.


-Tim

Andrew davison

unread,
Mar 20, 2000, 3:00:00 AM3/20/00
to
Man did you manage to grab the wrong end of the stick on EVERY point. I hope
Vlad doesn't dignify your comments with a response. So let me attempt...

<tmar...@andrew.cmu.edu> wrote in message
news:2000032007...@smtp1.andrew.cmu.edu...


> This can be acceptable on a small private site, but completely
> unacceptable on a real production server. Recently, one of our clients
had
> a problem with their disk I/o infrastructure that suspended 15% of disk
> operations for more than 10 minutes. If they were using locks and if
>

> Is this normal behavior for your product? Does every operation take 10
> minutes? If my client was waiting on a FETCH for more than 5 minutes
> I'd probably just give up and call the person on the telephone.
>

He was a highlighting a hardware failure that caused i/O suspensions for
long time periods. I've seen similar problems with storage units in the
past. I have no experience with Stalker mail software but I believe it to be
a quality high-performance product.

> If you need something, you pay. Either in cash, or in your own time, or
in
> consequences of not having what you really need. If you need mailbox
store
> - you need a mailbox store that can effectively handle lots of data. If
> you do not need a mailbox store itself, but just want to play with it -
> then you may use all those "newDB" or whatever.
>
> Are you claiming a product is inferior simply because it's open
> source? You should probably email those apache guys to let them
> know. Maybe you were sleeping while companies based upon open source
> products (redhat, va research, etc..) got huge market capitalizations.

No, he was pointing out the obvious that everything has a price. Whether
money, effort, time or whatever. Free (as in speech) software often has a
non-free (as in beer) price. You seem to be confused about what constitues
Open Source and/or Free Software. Sometimes they are the same, sometimes
not.

> Or maybe you're insinuating that the most popular/expensive product is
> always the best? I'll let you think of your own examples of why this is a
> silly concept.

You are very silly.

> Berkeley db (the "toy" you refer to) I believe is a fairly good
> product. It has it's problem areas but so do all the expensive
> alternatives. If you run into trobles because "(as some freeware "DB"
> would break on a 100GB store with some 10MB elements)" I'd argue your
> system is broken not the database. Try a distributed approach.
>

Very few freeware database products are of mission critical standard. Most
are very good, but from experience can be quite buggy. It can often be
better develop something yourself that has just the required features (and
no more) so as not distract effort.

>
> -Tim
>
>


Yiorgos Adamopoulos

unread,
Mar 20, 2000, 3:00:00 AM3/20/00
to
In article <2000032007...@smtp1.andrew.cmu.edu>, tmar...@andrew.cmu.edu wrote:
>Are you claiming a product is inferior simply because it's open
>source? You should probably email those apache guys to let them
>know. Maybe you were sleeping while companies based upon open source
>products (redhat, va research, etc..) got huge market capitalizations.

This is not what he wrote

>Berkeley db (the "toy" you refer to) I believe is a fairly good
>product. It has it's problem areas but so do all the expensive
>alternatives. If you run into trobles because "(as some freeware "DB"
>would break on a 100GB store with some 10MB elements)" I'd argue your
>system is broken not the database. Try a distributed approach.

Tim, again this is not what he wrote


--

bos...@bostic.com

unread,
Mar 20, 2000, 3:00:00 AM3/20/00
to
In article <butenko-1903...@stalker.gamma.ru>,

but...@stalker.com (Vladimir A. Butenko) wrote:
> If you need something, you pay. Either in cash, or in your own
> time, or in consequences of not having what you really need. If
> you need mailbox store - you need a mailbox store that can
> effectively handle lots of data. If you do not need a mailbox
> store itself, but just want to play with it - then you may use
> all those "newDB" or whatever.

I don't completely disagree with your argument, although I would
say that a better rule is that a group usually needs a revenue
stream in order to create a real product and support
organization. I strongly disagree with your categorization of
Berkeley DB. Berkeley DB is the message store behind several
commercial products, as well as the database behind the
AOL/Netscape portal site. It supports hundred-GB databases,
applications with thousands of simultaneous threads, and we can
often give you an order of magnitude increase in your
transactional throughput. As Tim said, "Open Source" doesn't
mean small or weak.

Regards,
--keith

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Keith Bostic
Sleepycat Software Inc. bos...@sleepycat.com
394 E. Riding Dr. +1-978-371-0408
Carlisle, MA 01741-1601 http://www.sleepycat.com


Sent via Deja.com http://www.deja.com/
Before you buy.

Mark Crispin

unread,
Mar 28, 2000, 3:00:00 AM3/28/00
to Vladimir A. Butenko
On Sun, 19 Mar 2000, Vladimir A. Butenko wrote:
> > Even with .lock files, that shouldn't ever be necessary. You're supposed
> > to break a .lock file that is more than 5 minutes old, or if you've been
> > waiting for longer than 5 minutes.
> This can be acceptable on a small private site, but completely
> unacceptable on a real production server.

I don't disagree with you, but since it's a UNIX standard rule, it's like
saying "UNIX can be acceptable on a small private site, but completely
unacceptable on a real production server." But since people want to use
UNIX systems as real production servers, you have to accomodate its quirks
as best you can, and program around the rest.

> This is the same as having a mutex/lock in a programming language: if you
> say "lock the resource", and the system would wait for 5 minutes and then
> pretend to lock it (breaking the existing lock), then this is definitely
> not the system to be employed in a production environment.

As I said above, this is how .lock files on UNIX work.

Not my design. Not my idea of a good design either.

> You used to say that you had to use process-based server design (as
> opposed to much more effective threads-based design) only because you had
> to deal with other processes and building a server on fast inter-thread
> interactions would not allow you to support external processes.

There are other reasons:

Not all UNIX systems have threads. Dependency on threads == less
portability. This is the #1 reason.

Robustness. A single false pointer reference in a multi-threaded server
can take down all the threads, unless the threading implementation offers
a way to protect threads from each other.

Leverage on kernel facilities. Separate threads in the same process can
not be logged in as different UNIX uids.

> > The main methods in c-client are:
> > validate mailbox (is this driver right for this mailbox?)
> > create mailbox (in this format)
> > delete mailbox (format-specific considerations)
> > rename mailbox (format-specific considerations)
> > open mailbox
>
> Excuse me, but in any Objective Langue these could not me "methods" of a
> virtual mailbox object. These ops would be methods of mailbox "class"
> method, etc, - but in no way the mailbox methods themselves.

This is an unimportant distinction for this discussion; it's a distinction
of various OOP languages but irrelevant here.

> BTW, what
> language do you use for UW IMAPd? At least, not the create/open methods.

C. Straight, unadorned, traditional C, as can be found on even the oldest
UNIX systems. Half the systems that it's been ported to don't have a C++
compiler.

> Yes, but I do see the difference: since your design is completely
> "light-linked", inter-process based - you have to have methods like
> "check".

No. "check" is a method because CHECK is an IMAP operation. It has
nothing to do with the how c-client is implemented.

> In CGatePro design you did not see that call. Instead
> you saw that each mailbox client uses a mailbox "view", not the mailbox
> directly, and the mailbox view contains all info about mailbox updates.

That only works as long as you only support mail stores which can be
implemented within CGatePro. But as soon as you have external mail stores
(e.g. MAPI, SQL, etc.), you can't have a view that automatically reflects
current state.

For example: consider an IMAP-proxy feature in CGatePro, where CGatePro is
simultaneously an IMAP server and client to another server. If the
CGatePro server gets a CHECK command, it wants to issue a CHECK command as
IMAP client to the other server. It doesn't want to use its view, since
its view may not be current.

Since you don't want to special-case the IMAP-proxy, you just add the
methods needed for it. That's what causes method bloat. But since
c-client is not just an IMAP server library, but also a library for IMAP
clients, this is a feature, not a bug.

With the exception of IMAP, most c-client drivers don't need most of the
methods, and they use an internal view.

Vladimir Butenko

unread,
Mar 29, 2000, 3:00:00 AM3/29/00
to

> This can be acceptable on a small private site, but completely
> unacceptable on a real production server. Recently, one of our clients had
> a problem with their disk I/o infrastructure that suspended 15% of disk
> operations for more than 10 minutes. If they were using locks and if
>

> Is this normal behavior for your product? Does every operation take 10
> minutes? If my client was waiting on a FETCH for more than 5 minutes
> I'd probably just give up and call the person on the telephone.

Please do not try to make us laugh. Of course, it's not normal behavior. I
did explicitly specify that they had a hiccup in the disk i/o
infrastructure. If you want the details, as far as I remember, their file
server LAN connection was misconfigured by mistake making all connections
to the file store do heavy retries. Of course, the users were frustrated
that time, of course they got complaints. As I said, it's a production
server on a large ISP site with most of their customers being business
customers, and It WAS a problem - though not our software problem. The
point here was that even with that infrastructure problem, the system did
NOT corrupt the mailboxes. And it would DEFINITELY corrupt them in this
situation when someone uses a "lock" mechanism that times out in 5
minutes.

Why? Imagine that situation yourself:

You have opened a connection to the server. You have started to read some
mail, mark messages and delete some of them. Suddenly the system stopped
to respond, though no connection was dropped. What would a regular user
do? "Damn!" - and start a new session. Since the server was still working
somehow, some of them were able to establish connections and open the
mailboxes again. And do the same operations. If the system were
.lock-based, when everything is brought back to normal in 10 minutes, all
those mailboxes would be corrupted.

OF COURSE that's not a USUAL situation, OF COURSE it does not happen too
often - but this is the difference between the production-grade system and
the other one - the production system should survive in any situation that
can appear during the operations, and this one was a rare one, but still a
very predictable one.

If you think that you can discard these situations (disk I/O subsystem
hiccups), let me present you a much more common one: the CPU becomes
unavailable for 5 minutes. How? Easily. A failure is detected on a
multi-CPU server, and in the process of phasing out that CPU all processes
(and processors) were effectively stopped. Yes, it usually takes much less
than 5 minutes. But it can take more than 5 - because of an operator error
or an incopmlete phase-out procedure and repeated attempts to
replace/remove the failed CPU.

> If you need something, you pay. Either in cash, or in your own time, or in
> consequences of not having what you really need. If you need mailbox store
> - you need a mailbox store that can effectively handle lots of data. If
> you do not need a mailbox store itself, but just want to play with it -
> then you may use all those "newDB" or whatever.
>

> Are you claiming a product is inferior simply because it's open
> source? You should probably email those apache guys to let them
> know. Maybe you were sleeping while companies based upon open source
> products (redhat, va research, etc..) got huge market capitalizations.

a) I did not say that any product was inferior just because it was open source.

b) Are you saying that the market capitalization is the indication of the
product quality rather than the quality of the marketing team and the
experience of the company underwriters?

c) what's so exciting about apache other than it is free?

d) have you checked the market cap of Software.com lately? :-)

I do repeat the thing I've said: you ALWAYS pay. If I need a small server
in an environment that requires integration with legacy mailers, I'd
install UW imapd, because it is built for that thing, and I'd pay - but
not much - in my time - to install it and configure it (you do know that
some configuration is needed there). That will be my price for that
environment. Something like $100.

If I do not want any user on that system do SELECT /etc/passwd, 1 FETCH 1
RFC822.TEXT - then it will be all I pay. If I do not want any user to be
able to read that file and any other file on my Unix server (I have not
checked UW imapd lately, and probably Mark has already fixed that), I
would have to dig into the server code and put some limitations on the
files regular IMAP users can read. Then my price jumps up - well over
$500.

The same applies to hardware: I can get a PC for almost nothing ($1000),
put a server on it, and it will run happily. If I'm a small ISP with 3000
subscribers, why should I pay for a $20,000 Sparc server that "has less
Mhz than my PC"? At the moment when I get 20,000 users and the PC just
crashes, because of its cheap components design - bus, disk i/o - just
cannot handle large load during long periods of times - at that moment I
will realize that $19,000 I "saved" on hardware actually put me out of my
business because those customers turned to my competitor.

> Or maybe you're insinuating that the most popular/expensive product is
> always the best? I'll let you think of your own examples of why this is a
> silly concept.


Neither most popular, nor most expensive is the best. The best product is
the product that:

a) does what the product in YOUR particular environment MUST do: some need zero
downtime, some need just fail-over, some are OK to say - "please wait till we
restore your mail from the last week backup". Needs vary. Some ISP need high
performance, while a school site with just 500 users does not care about it
that much.

b) provides the set of features YOU need. Again, needs vary. Some need basic
IMAP, some need IMAP with ACL, some need WebMail, etc.

c) the smallest product COST (which includes the up-front cost, installation /
customization cost, and maintenance cost)

If your need to deliver goods to an island, your need an airplain or a
boat, and you will not get a WV Bug, even if it is offered to you for
free: it's simply cannot get the job done.

If this island is 100 feet off-shore, you can get a cheap/free 15-footer,
and that will do the job perfectly. If the island is 3000 miles away, this
"inexpensive" solution can be as pricy as your life is.

Finally, when you have decided to get a real ship to get there, someone
offers you a ship for "a very good deal", but then you have to spend a
fortune to bring it up to the task, and then - spend a decent amount of
efforts/money every day to keep it afloat.

Believe me, we saw (and continue to see) a lot of companies where the
management was so excited that they could "pay nothing for software" and
heard all that hype about "open source", that they wanted to build their
sites only on the "open source" software. So, they hire 5-10 people, aprx
$100K/year each, and for several months, if not years they try to build a
site based on all those sendmails, qmails, etc. And the management gets
frustrated when the guys still cannot deliver, because they heard all
those stories that Hotmail runs qmail, other sites use sendmail, etc. What
they do not understand that those "qmails" and "sendmails" are deeply
customized versions of those products, with many millions bucks spent on
developing those custom clustering and other software "enchancments" and
that those sites still pay very high toll because of additinal hardware
they have to install and maintain. The "free" open source became the most
expensive thing for many-many sites that tried to make "open source"
software do the things it was not designed to do.

Bottom line: you always pay. You need a simple thing - you pay a small
amount, you need a big thing - you pay more.


> Berkeley db (the "toy" you refer to) I believe is a fairly good
> product. It has it's problem areas but so do all the expensive
> alternatives. If you run into trobles because "(as some freeware "DB"
> would break on a 100GB store with some 10MB elements)" I'd argue your
> system is broken not the database. Try a distributed approach.

Our system (DB) is not broken, because it does not exist yet. What we do
first we try the alternatives we have. I do not believe that you think we
just love to throw away a lot of R&D money to do something that exists in
the open source form. More, it's still uncertain if we will ever have our
own DB-like engine inside the product. So, I'm not biased here at all. But
when some customer installs some open source software and it appeared that
it's not up to the task, and does not work well when our server is under
heavy load - whom do they call? Berkeley University? RedHat? (BTW, have
you ever tried to call RedHed support line?) No. They call us. And while,
thanks God, they do understand that this is not our fault - we still have
to handle those calls.

If someone needs a directory server for 1000 people, even for 10,000
people - I think that OpenLDAP is OK - it will cost you just few hundred
$$ to install and configure it properly. If you want to handle more - you
may want to buy the Netscape Directory server, but it will cost you MUCH
more, and out of the box it will not handle more than 60,000 records well
- but it CAN be tuned to handle more. If you need several million records
- I do not have an answer. Most likely you would try to use Oracle and
LDAP adapter on top of it. Probably.

So, it's always the question of verifying what this or that product can do
- point (a) above. This MAY discard some products from further
consideration. Then you check the features, and only THEN - the price.
Looking at the price tag as the first step is not wise. Whether it is a
"free" product or if it is a $6mln/site product - does not mean that it is
good or bad. It's the tag. First, check what it can do and how it can do
the things you need.


> -Tim

Vladimir Butenko

unread,
Mar 29, 2000, 3:00:00 AM3/29/00
to
In article <8b5gj2$f8l$1...@nnrp1.deja.com>, bos...@bostic.com wrote:


> I don't completely disagree with your argument, although I would
> say that a better rule is that a group usually needs a revenue
> stream in order to create a real product and support
> organization. I strongly disagree with your categorization of
> Berkeley DB. Berkeley DB is the message store behind several
> commercial products, as well as the database behind the
> AOL/Netscape portal site. It supports hundred-GB databases,
> applications with thousands of simultaneous threads, and we can
> often give you an order of magnitude increase in your
> transactional throughput. As Tim said, "Open Source" doesn't
> mean small or weak.

I was saying about ONE particular sample - from OUR situations: records
of huge size. I did not claim that your product is good or bad. My ONLY
point was: check how this or that thing work for what YOUR situation
requires. If the demands can be handled by software A and software B, than
other things - like price tag, support costs, feature sets - can be used
to decide what product you want to select.

Nothing else than that, and sorry if my comments could be interpreted as
a claim that your product is "bad". If someone needs an IMAP server that
has complete integration with legacy systems - I'd recommend UW imapd rather
than other product - though this would definitely not mean that I call our
product "bad" :-)



> Regards,
> --keith
>
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Keith Bostic
> Sleepycat Software Inc. bos...@sleepycat.com

--

Vladimir Butenko

unread,
Mar 29, 2000, 3:00:00 AM3/29/00
to
In article
<Pine.NXT.4.30.00032...@Tomobiki-Cho.CAC.Washington.EDU>,
Mark Crispin <m...@CAC.Washington.EDU> wrote:

> On Sun, 19 Mar 2000, Vladimir A. Butenko wrote:
> > > Even with .lock files, that shouldn't ever be necessary. You're supposed
> > > to break a .lock file that is more than 5 minutes old, or if you've been
> > > waiting for longer than 5 minutes.
> > This can be acceptable on a small private site, but completely
> > unacceptable on a real production server.
>

> I don't disagree with you, but since it's a UNIX standard rule, it's like
> saying "UNIX can be acceptable on a small private site, but completely
> unacceptable on a real production server." But since people want to use
> UNIX systems as real production servers, you have to accomodate its quirks
> as best you can, and program around the rest.

Sorry, Mark, that's not a correct interpretation of my words. Unix, as any
other OS has a lot of things and features. Things that can be used for one
tasks are not appropriate for other tasks, and this is completely up to a
software designer to select the right tool for the task. I do undertsand
that at the time the UW-imapd was concieved the design you have selected
(lock, multi-process) was the only one avaialable on Unix. But the
situation is different today than it was 10 years ago.

> > This is the same as having a mutex/lock in a programming language: if you
> > say "lock the resource", and the system would wait for 5 minutes and then
> > pretend to lock it (breaking the existing lock), then this is definitely
> > not the system to be employed in a production environment.
>

> As I said above, this is how .lock files on UNIX work.
>
> Not my design. Not my idea of a good design either.

Lock file semantics is Unix design, not yours. BTW - that semantics varies
on different types of Unix. But a decision to rely on lock files was your
design, and let me repeat it again - THAT time it was - to the best of my
knowledge - the only design that could be used. THAT time.


> > You used to say that you had to use process-based server design (as
> > opposed to much more effective threads-based design) only because you had
> > to deal with other processes and building a server on fast inter-thread
> > interactions would not allow you to support external processes.
>

> There are other reasons:
>
> Not all UNIX systems have threads. Dependency on threads == less
> portability. This is the #1 reason.

Could you please least the Unix that is still alive and does not support
threads TODAY? I know only one - SCO OpenServer (SCO UnixWare is OK). We
support 18 platforms, with AS/400 (wow! ;-) to become the 19th this/next
week - and since we do support them, they all have threads.

So, this is not an issue TODAY. It was an issue, though, even in 1998. All
Unixes got threads since that.


> Robustness. A single false pointer reference in a multi-threaded server
> can take down all the threads, unless the threading implementation offers
> a way to protect threads from each other.

You say that bug is a bug. It's true. But the bugs have to be fixed. The
crashing bugs have to be fixed the same day. Actually, robustness issues
have been addressed in language development lately :-). We DO assume that
our software has bugs, and we do assume that we may have crashing bugs.
But the first option you see on the CGatePro setup page is "Crash
Recovery". It does work. You can connect to the server as an admin and
issue some special commands
that perform zero divide, NULL-pointer addressing, etc. With crash
recovery on, the server should survive and get out of the problem
gracefully - but producing a Crash-level records both in CGatePro and
system logs.

Note: do not try this on Linux and some other platforms where we have to
use gcc - free software is great, but we are still wating (for more than
18 months) for gcc code/libs to be able to handle hardware exceptions, as
commercial compilers/libs do :-(.


> Leverage on kernel facilities. Separate threads in the same process can
> not be logged in as different UNIX uids.

a) this is needed ONLY if you need to support legacy applications - and we
have closed that topic - we are talking about the hi-end servers now.

b) If a CGatePro user has a Unix account, and specifies an "automatic
rule" that executes an external program - that program is executed under
that used ID, in that user home directory, etc.

So, that's not a point here.


> > > The main methods in c-client are:
> > > validate mailbox (is this driver right for this mailbox?)
> > > create mailbox (in this format)
> > > delete mailbox (format-specific considerations)
> > > rename mailbox (format-specific considerations)
> > > open mailbox
> >
> > Excuse me, but in any Objective Langue these could not me "methods" of a
> > virtual mailbox object. These ops would be methods of mailbox "class"
> > method, etc, - but in no way the mailbox methods themselves.
>

> This is an unimportant distinction for this discussion; it's a distinction
> of various OOP languages but irrelevant here.

Absolutely irrelevant, sure. I was just curious, sorry.

> > BTW, what
> > language do you use for UW IMAPd? At least, not the create/open methods.
>

> C. Straight, unadorned, traditional C, as can be found on even the oldest
> UNIX systems. Half the systems that it's been ported to don't have a C++
> compiler.

That time - yes. Again, all survived systems have decent C++ compilers now.

> > Yes, but I do see the difference: since your design is completely
> > "light-linked", inter-process based - you have to have methods like
> > "check".
>

> No. "check" is a method because CHECK is an IMAP operation. It has
> nothing to do with the how c-client is implemented.

Let me clarify: if you have an integrated design, then changes to a
mailbox view could be delivered to all views at the moment they happen: no
action is required to "read" those changes. If clients/agents using the
same mailbox have no means to communicate, they have to implement some
kind of "check" operation that involves mailbox i/o to get the changes.


> > In CGatePro design you did not see that call. Instead
> > you saw that each mailbox client uses a mailbox "view", not the mailbox
> > directly, and the mailbox view contains all info about mailbox updates.
>

> That only works as long as you only support mail stores which can be
> implemented within CGatePro. But as soon as you have external mail stores
> (e.g. MAPI, SQL, etc.), you can't have a view that automatically reflects
> current state.

I do see your point, but that's not exactly the case. If the mailbox store
is designed so that changes cannot be detected by the server, then you do
not have the "automatic view update". This has nothing to do with SQL: if
all access to SQL-based mailboxes happens via the server, then all will
work OK. On the other hand, if some "external" party is allowed to modify
the mailbox store "behind the server back" then you get this problem, and
you get this problem even w/o SQL - fpr example, with any plain BSD
mailbox format that some legacy application accesses directly.

So, how can this problem be solved? The idea is simple: the Mailbox
Manager of that type knows better how and when detect the changes. In the
SQL case, you can use some "modification notifiers" that some servers
support, in other cases other mechanisms can be used. In the WORST case,
the mailbox manager can execute the same type of "check" operation
periodically. So, only in the WORST case we get the same performance hit
as when using the"check" operation on the agent level, and even in that
case the overhead is less: the mailbox manager does that check for the
MAILBOX, not for each view, so if we have 2-3 sessions using that mailbox,
the overhead is 2-3 times less -even in the worst case.



> For example: consider an IMAP-proxy feature in CGatePro, where CGatePro is
> simultaneously an IMAP server and client to another server. If the
> CGatePro server gets a CHECK command, it wants to issue a CHECK command as
> IMAP client to the other server. It doesn't want to use its view, since
> its view may not be current.

The guy who designed the IMAP protocol (do you happen to know him?) was
smart enough to realize that the change in the mailbox state can happen at
any moment, thus the IMAP protocol explicitly specifies when the changes
can be reported. If I issue the CHECK command, and it returned nothing, it
does not mean that nothing in the mailbox changed before I issue the next
command, let's say "FETCH". So, the real question is - how current is the
mailbox state reported to the client.

Now, back to your question: the IMAP-type Mailbox manager has nothing to
do between the agent calls. So, it can issue the IDLE command and read all
the changes in the remote mailbox state asynchronously. As soon as a
change is reported, it updates all the views and continues to read the
modification reports or a command sent by the agent.

In this situation, there is no unnecessary i/o operations at all. Even if
a client sends CHECK every 10 seconds instead of using IDLE, the only
damage it makes is in unneeded TCP traffic between the client and the
front-end server. The back-end server does not get any command at all,
since the front-end has issued the IDLE command and sits and waits.

As you see, this design not only improves the performance in the most
common one-server situations, but suggests more effective designs for
multi-server configs, too.

> Since you don't want to special-case the IMAP-proxy, you just add the
> methods needed for it. That's what causes method bloat. But since
> c-client is not just an IMAP server library, but also a library for IMAP
> clients, this is a feature, not a bug.

Nobody said anything about the bugs. I'm just trying to explain the
differences in the designs and ideologies.


> With the exception of IMAP, most c-client drivers don't need most of the
> methods, and they use an internal view.

Good. But let me clarify one more thing here, since I'm afraid I did not
explain this well: in CGatePro "view" is NOT a "mailbox". Mailbox object
is a representation of the real mailbox. That's only one "mailbox object"
for any open mailbox. "view" is created for each agent (POP, IMAP,
WebMail) when a mailbox is opened. The first "open" (actually, "parse")
operation creates the mailbox object and the first "view", all other
"open"s for that mailbox just create additional "views". Generally
speaking, "views" are different. They represent how the "agent" sees that
mailbox. If a change is made to mailbox by one agent, views are updated,
but then each agent handles them differently.
An IMAP agent in the IDLE state would immediately report the change the
client and update its view to reflect that fact of client notification,
IMAP agent in other state will not touch the view till it reaches the
point when it can report the changes to the client, POP agent simply
ignores all the changes in the view and do not commit them, and WebMail
agent has its own procedure of hadling the modification reports in its
view. So, at any given moment "views" of the same mailbox can be
different.


> -- Mark --
>
> * RCW 19.190 notice: This email address is located in Washington State. *
> * Unsolicited commercial email may be billed $500 per message. *
> Science does not emerge from voting, party politics, or public debate.

--

Mark Crispin

unread,
Mar 29, 2000, 3:00:00 AM3/29/00
to
On Wed, 29 Mar 2000, Vladimir Butenko wrote:
> Lock file semantics is Unix design, not yours. BTW - that semantics varies
> on different types of Unix. But a decision to rely on lock files was your
> design, and let me repeat it again - THAT time it was - to the best of my
> knowledge - the only design that could be used. THAT time.

Until such time as I am proclaimed King of the World and can decree the
extermination of all sendmail, /bin/mail, /usr/ucb/Mail, mail.local, etc.
ad nauseum, .lock files are the reality on the vast majority of UNIX
systems. And if I tell someone that to install my IMAP server they must
change the locking semantics for all their other mail programs, they'll
say "Oh, that's nice. Good bye."

I don't hold out much hope for becoming KotW. That's probably a good
thing. So, I support the .lock files. They're usually unnecessary for
communication between two instances of my software, since I have other
locking concurrently that does a better job. But on anything except a
sealed server, they're essential.

Since I choose to produce a general server and not one that is specialized
for sealed applications, I need to worry about this. As far as I know, my
server is the only such general server in existance.

Mark Crispin

unread,
Mar 29, 2000, 3:00:00 AM3/29/00
to
On Wed, 29 Mar 2000, Vladimir Butenko wrote:
> If I do not want any user on that system do SELECT /etc/passwd, 1 FETCH 1
> RFC822.TEXT - then it will be all I pay. If I do not want any user to be
> able to read that file and any other file on my Unix server (I have not
> checked UW imapd lately, and probably Mark has already fixed that)

It's not a bug. It's a feature of the "timeshared system server"
model. You can SELECT /etc/passwd for the same reason that you can cat it
from the shell.

There are two ways to switch it off:

1) Create a sealed server, with a restricted namespace. That's all under
the control of the mailboxfile() function in env_unix.c. It's either
an easy one-line hack; or better to rewrite it entirely to implement
precisely what is desired.

Our sealed servers certainly do not allow access to /etc/passwd.

2) Disable the phile driver, which would have the effect of disabling all
IMAP access to non-mailbox files.

Either one suffices.

Yiorgos Adamopoulos

unread,
Mar 29, 2000, 3:00:00 AM3/29/00
to
In article <Pine.NXT.4.30.00032...@Tomobiki-Cho.CAC.Washington.EDU>, Mark Crispin wrote:
>2) Disable the phile driver, which would have the effect of disabling all
> IMAP access to non-mailbox files.

Can't this be the default behavior?

--
ad...@ieee.org

0 new messages