In this issue:
Re: Email push and pull (was Re: matthew dillon)
Re: Email push and pull (was Re: matthew dillon)
Re: Email push and pull (was Re: matthew dillon)
Re: Email push and pull (was Re: matthew dillon)
Re: Email push and pull (was Re: matthew dillon)
Re: Email push and pull (was Re: matthew dillon)
----------------------------------------------------------------------
Date: Fri, 14 Feb 2003 01:53:06 +0100
From: Brad Knowles <brad.k...@skynet.be>
Subject: Re: Email push and pull (was Re: matthew dillon)
At 7:14 AM -0800 2003/02/13, Terry Lambert wrote:
>> Okay, what parts of the problem doesn't Perdition solve?
>
> Replication and failover.
True. But is the POP3/IMAP4 proxy really the best place to try
to solve this problem?
> The bck end stores that the Perdition proxy accesses have to have
> the content locally available.
Yup, that's a back-end issue, not one that Perdition can solve.
> The result is that you provide a unified view onto a backend farm,
> but you lack replication and failover in the back-end, and it does
> not magically appear, merely because you are running Perdition.
Fair enough. But how does this relate to the domain problem?
That's all you had mentioned previously.
> There are other POP3 and IMAP4 proxies that can do the same things
> Perdition can: it's no big deal.
I've done some research in this area. I'd be interested to know
which ones you're talking about.
> In fact, it doesn't deal with
> LDAP, which is probably where the routing to the back end store will
> occur.
Do I really need to quote the relevant sections of
perdition/db/ldap/perdition.schema, dated Mar 27, 2002?
- --
Brad Knowles, <brad.k...@skynet.be>
"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety."
-Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+
!w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++)
tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)
------------------------------
Date: Fri, 14 Feb 2003 03:44:16 +0100
From: Brad Knowles <brad.k...@skynet.be>
Subject: Re: Email push and pull (was Re: matthew dillon)
At 8:09 AM -0800 2003/02/13, Terry Lambert wrote:
> OK, then why do you keep talking about I/O throughput? Do you
> mean *network I/O*? Why the hell would you care about disk I/O
> on a properly designed message store, when the bottleneck is
> going to first be network I/O, followed closely by bus bandwidth?
Disk I/O is many orders of magnitude slower than any other thing
on the system. Moreover, disk I/O suffers from issues with
synchronous meta-data updates where entire directories must be locked
for the entire period of time during which an update is occuring,
thus reducing by many more orders of magnitude the number of small
operations (e.g., file creation and deletion, renaming, updating of
other file attributes, etc...) that we can perform in a given unit of
time.
This is an issue for MTAs, and is an issue for message stores,
especially when the message stores use a meta-data intensive storage
mechanism such as found in Maildir and Cyrus (to a lesser degree).
> So what's the difference between not enforcing a quota, and ending
> up with the email sitting on your disks in a user maildrop, or
> enforcing a quota, and ending up with the email sitting on your
> disks in an MTA queue?
In "free" systems, quotas are frequently set ridiculously low.
In systems with a sustainable business model, you pay for the storage
you use. If you want a higher quota, you pay for it (one way or
another). In those situations, quotas rarely need to be enforced,
and this problem is not one that is faced very often. In the case
where you do have this issue, at the very least you can hold the
message in the queue for a while, in the hope that the user will come
clean out their mailbox.
> Quotas are actually a strong argument for single image storage.
SIS increases SPOFs, reduces reliability, increases complexity,
increases the probability of hot-spots and other forms of contention,
and all for very little possible benefit.
> Obviously, unless setting the quota low on purpose is your revenue
> model (HotMail, Yahoo Mail).
As I said above, "free" systems frequently set quotas
ridiculously low. They are not of interest for this discussion.
> How? It's going to sit on your disks, no matter what, the only
> choice you really have on it is *which* disk it's going to sit on.
True, but it's easier for me to deal with multiple gigabyes of
DOS crap in the mail queue than it is for the user to try to deal
with multiple gigabytes of crap in their mailbox. There are things
that they need to be protected from, because they don't have the
access or the power on their end. If they did, they wouldn't need us.
>> If 95-99% of all users never even notice that there is a quota,
>> then I've solved the part of the problem that is feasible to solve.
>> The remainder cannot possibly be solved with any quota at any level,
>> and these users need to be dealt with separately.
>
> Again, how?
Outside of the DOS problem, they need education and proper
management of their expectations. TANSTAAFL.
> Flood fill will only work as part of an individual infrastructure,
> not as part of a shared infrasstrusture, if what you are trying to
> sell is to be any different from what everyone else is giving away
> for free.
Ahh, something akin to the Yasushi model. See
<http://www.shub-internet.org/brad/papers/dihses/lisa2000/sld038.htm>.
When restricted to the network internal to the mail system,
replicating the mailbox over multiple servers is not a bad idea,
although I don't think it matters so much what replication model you
use.
>> If you store them on the recipient system, you have what exists
>> today for e-mail. Of the three, this is the only one that has proved
>> sustainable (so far) and sufficiently reliable.
>
> This argument is flawed. Messages are not stored on recipient
> systems, they are stored on the systems of the ISP that the
> recipient subscribes to.
That's what I was calling the "recipient system". It is the
system where the message was received.
> Yet those same guarantees are specifically disclaimed by HotMail
> and other "free" providers, even though there is no technological
> difference between a POP3 maildrop hosted at EarthLink and accessed
> via a mail client, and a POP3/IMAP4 maildrop hosted at HotMail and
> accessed via a mail client.
Again, you're referencing situations that I consider to be
irrelevant to the discussion. I don't give a flying flip about the
poor business model they employ. I care about real systems that are
paid for by real people and real companies.
> Who the hell uses IDE on servers?!? Get real! You can't detach an
> IDE drive during the data transfer on a write, so tagged command
> queueing only works for *reading* data. For a server that does writes,
> you use *SCSI* (or something else, but *not* IDE).
Okay, so two 15kRPM SCSI hard drives, or FibreChannel. The type
of interface doesn't matter when you're talking about a number of
disks that is grossly inadequate to the task.
> I think I see the misunderstanding here. You think IDE disks are
> server parts. 8-).
No, not at all. I think that focusing on disk storage capacity
and not paying attention to disk I/O latency and I/O capacity is pure
folly.
> Use SCSI, or divide the load between a number of IDE spindles
> equal to the tagged command queue depth for a single SCSI drive
> (hmmm... should I buy five SCSI drives, or should I buy 500 IDE
> drives?).
See above. Regardless of the drive interface technology, what's
important is the I/O latency and the I/O capacity.
> It gets rid of the quota problem.
No, not at all. You eliminate damn few duplicate messages, you
greatly increase system complexity, you increase SPOFs, you increase
system hot-spots, you reduce system reliability (and replication,
something which you seem to be so fond of), and all for very, very
little benefit.
Try taking a real-world mail server and processing the logs.
Count the number of recipients per message and see just how much
space you'd actually save. I did that, and included my numbers in
the previous message -- an average of ~1.3 recipients per message.
You want to do all this for about 30% savings?!?
> Heck, you could even store your indices on a SCSI drive, and then
> store your SIS on an IDE drive, if you wanted.
See above. This is pointless.
> Mark's wrong. His assumptions are incorrect, and based on the
> idea that metadata updates are not synchronous in all systems.
Meta-data updates are at least partially synchronous on all
systems I know of. Well, unless you are running with asynchronous
mounts, but if you're doing that then you shouldn't be running a mail
system until you understand why that's a bad idea.
Even if they're not synchronous, they're still bottlenecks to be
avoided if possible.
> Cyrus is much closer to commercial usability, but it has it's own
> set of problems, too.
It is somewhat closer. If you want real commercial usability,
you have to start with the MessagingDirect code, which is based on
Cyrus but with lots of bug fixes, increased reliability and
robustness, etc.... Then you graduate to Sendmail Advanced Message
Server, which takes that to the next level.
>> Either way, locking is a very important issue that has to be
>> solved, one way or the other.
>
> No, it's a very important issue that has to be designed around,
> rather than implemented.
Somebody said that when they invented Maildir. I didn't believe
it then, and I don't believe it now.
> Yes, and no. It's very easy to paint a rosy picture in a technical
> paper, particularly when you are in a position to need to obtain
> funding.
Nick didn't need any funding. He was describing a project that
was largely complete, and which he had already left by that time. He
definitely made use of that design at various customer sites while
working for Sendmail, but he couldn't possibly have known that at the
time.
> You are unlikely to ever find someone using NFS in this capacity,
> except as a back end for a single server message store.
Show me an IMAP server that actually implements SIS. I don't know of any.
> The point was that, without making changes requiring an in depth
> understanding of the code of the components involved, which Nick's
> solution doesn't really demonstrate, you're never going to get more
> than "marginally better" numbers.
Could be. In that case, we may have to find an alternative
message store solution. If I can prove that this really is a
problem, then I'll try to help them find a suitable SAN solution and
then drop in SAMS. If not, I may end up writing a paper or doing
another invited talk.
> It works on NFS. You just have to run the delivery agent on the
> same machine that's running the access agent, and not try to mix
> multiple hosts accessing the same data.
Nope. mmap on NFS doesn't work.
> I understand you want a distributed, replicated message store, or
> at least the appearance of one, but in order to get that, well,
> you have to "write a distributed, replicated message store".
A distributed, replicated message store would be nice, but is not
strictly a requirement of this solution. One thing that was
originally given as an absolute requirement was to find a way to put
an e-mail front end on NFS. The distributed, replicated message
store was a side-effect.
Indeed, the architecture already has a concept of a primary
server for a particular mailbox (as determined by LDAP), the only
thing we'd have to change is whether or not that mailbox was also
accessible from the other servers. However, we do have only one
message store mount point at the moment.
> The part of Netscape that Sun bought used to provide an IMAP4
> server (based on heavily modified UW IMAP code). Is there a
> reason you can't use that? I guess the answer must be "I have
> been directed to use Open Source". 8-).
Actually, no. They would much prefer commercial software.
However, they don't have any money to spend on software, and I know
from personal experience that the Netscape/iPlanet stuff doesn't
scale. Indeed, we're already in the process of scrapping all other
Netscape/iPlanet software because we've had excessive problems with
it.
> This should be no problem. You should be able to handle this
> with a single machine, IMO, without worrying about locking, at
> all.
Remember, Maildir doesn't do locking.
> 10,000 client machines is nothing.
10,000 LAN clients? With 44MB messages and 200MB mailboxes? On
NFS? Sorry, my testing so far indicates that this is a significant
load and we need to take care to make sure that it is handled
properly.
> At worst, you should
> seperate inbound and outbound SMTP servers,
Already planned.
> so you can treat the
> inbound one as a bastion host, and keep the outbound entirely
> inside, and the inbound server should use a transport protocol
> for internal delivery to the machine running the IMAP4 server,
> which makes lockign go away.
How does locking go away? Through Maildir? Or did you have
something else in mind?
> At worst, you can limit the number
> of bastion to internal server connections, which will make things
> queue up at the bastion, if you get a large activity burst, and
> let it drain out to the internal server, over time.
I'm not worried about internal SMTP connections. But we have to
be careful to make sure we don't put any additional limits on POP3 or
IMAP connections.
> At most,
> you are well under 40,000 simultaneous TCP connections to the
> IMAP4 server host, even if you are using OutLook, people have
> two mailboxes open, each, and are monitoring incoming mail in
> several folders.
Sorry, I am still not convinced.
- --
Brad Knowles, <brad.k...@skynet.be>
"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety."
-Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+
!w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++)
tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)
------------------------------
Date: Thu, 13 Feb 2003 23:09:06 -0800
From: Terry Lambert <tlam...@mindspring.com>
Subject: Re: Email push and pull (was Re: matthew dillon)
Brad Knowles wrote:
> At 7:14 AM -0800 2003/02/13, Terry Lambert wrote:
> >> Okay, what parts of the problem doesn't Perdition solve?
> >
> > Replication and failover.
>
> True. But is the POP3/IMAP4 proxy really the best place to try
> to solve this problem?
No... but does proxy really solve anything, then, more than
a DNS rotor solves? All it really does is add a single point
of failure. Unless you can target a subset of back end content
servers, you might as well use DNS round-robin. Using a proxy
implies the back end replica problem is *already* solved.
> > The result is that you provide a unified view onto a backend farm,
> > but you lack replication and failover in the back-end, and it does
> > not magically appear, merely because you are running Perdition.
>
> Fair enough. But how does this relate to the domain problem?
> That's all you had mentioned previously.
A proxy server doesn't solve the domain problem; Perdition was
*your* answer to the domain problem. 8-).
> > There are other POP3 and IMAP4 proxies that can do the same things
> > Perdition can: it's no big deal.
>
> I've done some research in this area. I'd be interested to know
> which ones you're talking about.
The Cyrus one seems OK. Personally, I'd never use a proxy for
this, except to front-end the authentication. Even then, it's
somewhat of a tossup as to whether it really has any utility,
unless it's capable of targetting a subset of the back end (in
other words, it has a priori knowledge of where the replica
lives; maybe it does LDAP lookups to select a backend server to
point the client to). At that point, you are better taking the
LARD/CARD approach, and adding "referral" to the IMAP4 protocol,
and just handling it at the server level as a peering relationship,
so the reason you'd do it is to avoid modifying client programs.
> > In fact, it doesn't deal with
> > LDAP, which is probably where the routing to the back end store will
> > occur.
>
> Do I really need to quote the relevant sections of
> perdition/db/ldap/perdition.schema, dated Mar 27, 2002?
Maybe I should say "doesn't deal with LDAP the way it should"
instead?
- -- Terry
------------------------------
Date: Fri, 14 Feb 2003 01:40:53 -0800
From: Terry Lambert <tlam...@mindspring.com>
Subject: Re: Email push and pull (was Re: matthew dillon)
Brad Knowles wrote:
> At 8:09 AM -0800 2003/02/13, Terry Lambert wrote:
> > OK, then why do you keep talking about I/O throughput? Do you
> > mean *network I/O*? Why the hell would you care about disk I/O
> > on a properly designed message store, when the bottleneck is
> > going to first be network I/O, followed closely by bus bandwidth?
>
> Disk I/O is many orders of magnitude slower than any other thing
> on the system.
If you can saturate 50% of a PCI bus, and the other half of that
goes to networking, you are set, as far as disk I/O speed. If you
are using an NFS server (which you are), then it's based on your
ability to saturate your network device.
> Moreover, disk I/O suffers from issues with synchronous meta-data
> updates where entire directories must be locked for the entire
> period of time during which an update is occuring, thus reducing
> by many more orders of magnitude the number of small operations
> (e.g., file creation and deletion, renaming, updating of other file
> attributes, etc...) that we can perform in a given unit of time.
Disagree. These locking issues are an artifact of the system
design (FS, application, or both).
> This is an issue for MTAs, and is an issue for message stores,
> especially when the message stores use a meta-data intensive storage
> mechanism such as found in Maildir and Cyrus (to a lesser degree).
Simple answer: Don't use a metadata intensive storage mechanism.
> > So what's the difference between not enforcing a quota, and ending
> > up with the email sitting on your disks in a user maildrop, or
> > enforcing a quota, and ending up with the email sitting on your
> > disks in an MTA queue?
[ ... bogus business model ... ]
> In the case
> where you do have this issue, at the very least you can hold the
> message in the queue for a while, in the hope that the user will come
> clean out their mailbox.
In other words, the message takes up your disk space, no matter
what.
> > Quotas are actually a strong argument for single image storage.
>
> SIS increases SPOFs, reduces reliability, increases complexity,
> increases the probability of hot-spots and other forms of contention,
> and all for very little possible benefit.
The only one of these I agree with is that it increases complexity.
> > Obviously, unless setting the quota low on purpose is your revenue
> > model (HotMail, Yahoo Mail).
>
> As I said above, "free" systems frequently set quotas
> ridiculously low. They are not of interest for this discussion.
This discussion *started* because there was a set of list floods,
and someone made a stupid remark about an important researcher
indicating he was cancelling his subscription to the -hackers
mailing list over it, and I pointed out to the person belittling
the important researcher that such flooding has consequences that
depend on the mail transport technology over and above "just having
to delete a bunch of identical email".
> > How? It's going to sit on your disks, no matter what, the only
> > choice you really have on it is *which* disk it's going to sit on.
>
> True, but it's easier for me to deal with multiple gigabyes of
> DOS crap in the mail queue than it is for the user to try to deal
> with multiple gigabytes of crap in their mailbox. There are things
> that they need to be protected from, because they don't have the
> access or the power on their end. If they did, they wouldn't need us.
They need the middlemen because there are antidisintermediation
strategies in use on most leaf node connections to the Internet,
not because the middlement have some inherent value that can be
obtained no other way. 8-|.
As far as "dealing with DOS", in for a penny, in for a pound: if
you are willing to burn CPU cycles, then implement Sieve or some
other technology to permit server-side filtering.
In reality, we both know that at some point it becomes too
computationally expensive to deal with thise sort of thing on
the ISP side of things, and that there's an impedence mismatch
in the transport mechanism vs. the bandwidth reduction point.
That's exactly the niche that "value added email services"
attempt to exploit (fee for compute resources on the fat side
of the pipe).
We also know that, for most DOS cases on maildrops, the user
simply loses, and that's that.
> >> If 95-99% of all users never even notice that there is a quota,
> >> then I've solved the part of the problem that is feasible to solve.
> >> The remainder cannot possibly be solved with any quota at any level,
> >> and these users need to be dealt with separately.
> >
> > Again, how?
>
> Outside of the DOS problem, they need education and proper
> management of their expectations. TANSTAAFL.
Let's quit talking about the free services. Outside of funneling
idiots into the Microsoft Passport or competing Yahoo "single
signon" mechanisms, the free mail services are loss-leaders. The
business model is simply unsustainable. Neither service offers
that they will *guarantee* commision to stable storage before
sending the "250 OK" response and taking ultimate responsibility
that the message will not be lost/dropped/dumped/hacked prior to
final delivery at *any* level of payment.
So let's limit ourselves to the realm of LWCYM - "Lunches Which
Cost You Money".
> > Flood fill will only work as part of an individual infrastructure,
> > not as part of a shared infrasstrusture, if what you are trying to
> > sell is to be any different from what everyone else is giving away
> > for free.
>
> Ahh, something akin to the Yasushi model. See
> <http://www.shub-internet.org/brad/papers/dihses/lisa2000/sld038.htm>.
>
> When restricted to the network internal to the mail system,
> replicating the mailbox over multiple servers is not a bad idea,
> although I don't think it matters so much what replication model you
> use.
The replication model is actually a pretty profound issue. Prior
to replication, if you connect to one of the replicas, the message
can be seen as "in transit". Post deletion on an original prior to
the replication, and the deletion can bee seen as "in transit". The
worst case failure modes are that a message has increased apparent
delivery latency, or the message "comes back" after it's deleted.
Both of these are acceptable, in terms of failure modes, particularly
if you compare them to the alternatives.
> >> If you store them on the recipient system, you have what exists
> >> today for e-mail. Of the three, this is the only one that has proved
> >> sustainable (so far) and sufficiently reliable.
> >
> > This argument is flawed. Messages are not stored on recipient
> > systems, they are stored on the systems of the ISP that the
> > recipient subscribes to.
>
> That's what I was calling the "recipient system". It is the
> system where the message was received.
This is not useful to talk about in terms of a POP3 maildrop.
To all intents and purposes, message in a POP3 maildrop are
"in transit on a point to point mail transport". That's really
the whole point of acknowledging a "pull" technology exists, in
the first place.
> > Yet those same guarantees are specifically disclaimed by HotMail
> > and other "free" providers, even though there is no technological
> > difference between a POP3 maildrop hosted at EarthLink and accessed
> > via a mail client, and a POP3/IMAP4 maildrop hosted at HotMail and
> > accessed via a mail client.
>
> Again, you're referencing situations that I consider to be
> irrelevant to the discussion. I don't give a flying flip about the
> poor business model they employ. I care about real systems that are
> paid for by real people and real companies.
Good, then we are in agreement that we will not reference things
like quotas and so on, which are artifacts of their business model,
and not things which actually save anyone total disk space. 8-).
[ ... ]
> > I think I see the misunderstanding here. You think IDE disks are
> > server parts. 8-).
>
> No, not at all. I think that focusing on disk storage capacity
> and not paying attention to disk I/O latency and I/O capacity is pure
> folly.
The majority of that latency is an artifact of the FS technology,
not an artifact of the disk technology, except as it impacts the
ability of the FS technology to be implemented without stall
barriers (e.g. IDE write data transfers not permitting disconnect
ruin your whole day).
> > It gets rid of the quota problem.
>
> No, not at all. You eliminate damn few duplicate messages, you
> greatly increase system complexity, you increase SPOFs, you increase
> system hot-spots, you reduce system reliability (and replication,
> something which you seem to be so fond of), and all for very, very
> little benefit.
Unless I can use someone else's stored copy of the message to
recover my corrupted stored copy of the message, that's not
replication, it's duplication.
The reason I brought up SIS again is that you seemed more than
willing to let a message sit in the main mail queue, but almost
paniced at the idea of throwing it into the user mailbox instead.
The only legitimate reason for such a panic is if you felt that
moving it into the user's mailbox would result in amplification
of the disk space being used. Otherwise, you've already accepted
responsibility for delivery of the message, and deleting it out
of the mail queue is not really an option.
> Try taking a real-world mail server and processing the logs.
> Count the number of recipients per message and see just how much
> space you'd actually save. I did that, and included my numbers in
> the previous message -- an average of ~1.3 recipients per message.
>
> You want to do all this for about 30% savings?!?
Nope; I want to do it to get you to agree to turn off quotas,
if your business model is not based on the idea that it's OK
to drop email into /dev/null for customers who don't pay you
more money.
> > Mark's wrong. His assumptions are incorrect, and based on the
> > idea that metadata updates are not synchronous in all systems.
>
> Meta-data updates are at least partially synchronous on all
> systems I know of. Well, unless you are running with asynchronous
> mounts, but if you're doing that then you shouldn't be running a mail
> system until you understand why that's a bad idea.
>
> Even if they're not synchronous, they're still bottlenecks to be
> avoided if possible.
FS design issue. And metadata updates in FreeBSD (with soft
updates) or SVR4.2 or Solaris (with delayed ordered writes) are
*NOT* synchronous, they are merely ordered.
> > Cyrus is much closer to commercial usability, but it has it's own
> > set of problems, too.
>
> It is somewhat closer. If you want real commercial usability,
> you have to start with the MessagingDirect code, which is based on
> Cyrus but with lots of bug fixes, increased reliability and
> robustness, etc.... Then you graduate to Sendmail Advanced Message
> Server, which takes that to the next level.
You limited my options to Open Source, however.
> >> Either way, locking is a very important issue that has to be
> >> solved, one way or the other.
> >
> > No, it's a very important issue that has to be designed around,
> > rather than implemented.
>
> Somebody said that when they invented Maildir. I didn't believe
> it then, and I don't believe it now.
Maildir is a kludge aound NFS locking. Nothing more, and nothing
less.
> > You are unlikely to ever find someone using NFS in this capacity,
> > except as a back end for a single server message store.
>
> Show me an IMAP server that actually implements SIS. I don't know of any.
MS Exchange does, and so does Lotus Notes. I know they suck, but
they are examples.
In the Open Source world, you're not going to find one: another
problem that Open Source has is an inability to tackle problems
above a certain level of complexity.
> > The point was that, without making changes requiring an in depth
> > understanding of the code of the components involved, which Nick's
> > solution doesn't really demonstrate, you're never going to get more
> > than "marginally better" numbers.
>
> Could be. In that case, we may have to find an alternative
> message suore solution. If I can prove that this really is a
> problem, then I'll try to help them find a suitable SAN solution and
> then drop in SAMS. If not, I may end up writing a paper or doing
> another invited talk.
8-).
> > It works on NFS. You just have to run the delivery agent on the
> > same machine that's running the access agent, and not try to mix
> > multiple hosts accessing the same data.
>
> Nope. mmap on NFS doesn't work.
Who's using mmap?!?
[ ... ]
> > The part of Netscape that Sun bought used to provide an IMAP4
> > server (based on heavily modified UW IMAP code). Is there a
> > reason you can't use that? I guess the answer must be "I have
> > been directed to use Open Source". 8-).
>
> Actually, no. They would much prefer commercial software.
> However, they don't have any money to spend on software, and I know
> from personal experience that the Netscape/iPlanet stuff doesn't
> scale. Indeed, we're already in the process of scrapping all other
> Netscape/iPlanet software because we've had excessive problems with
> it.
This is interesting to know; from the documentation available,
they imply they scale, and a single instance of one seems to
match their claims for a single instance. I guess it's always
worse than the marketing literature, when you deploy it. 8-(.
> > This should be no problem. You should be able to handle this
> > with a single machine, IMO, without worrying about locking, at
> > all.
>
> Remember, Maildir doesn't do locking.
>
> > 10,000 client machines is nothing.
>
> 10,000 LAN clients? With 44MB messages and 200MB mailboxes? On
> NFS? Sorry, my testing so far indicates that this is a significant
> load and we need to take care to make sure that it is handled
> properly.
40 seconds to transfer on a Gigabit ethernet... assuming you can get
it of the disks. 8-). Do you really expect them all simultaneously?
> > so you can treat the
> > inbound one as a bastion host, and keep the outbound entirely
> > inside, and the inbound server should use a transport protocol
> > for internal delivery to the machine running the IMAP4 server,
> > which makes lockign go away.
>
> How does locking go away? Through Maildir? Or did you have
> something else in mind?
You don't need to assert a lock over NFS, if the only machine doing
the reading is the one doing the writing, and it asserts the lock
locally (this was more talking about the Cyrus cache files, not
maildir).
> > At worst, you can limit the number
> > of bastion to internal server connections, which will make things
> > queue up at the bastion, if you get a large activity burst, and
> > let it drain out to the internal server, over time.
>
> I'm not worried about internal SMTP connections. But we have to
> be careful to make sure we don't put any additional limits on POP3 or
> IMAP connections.
I was talking about machine capacity for connections. POP3 is one
at a time, IMAP4 is (usually, worst case) 4 per client.
- -- Terry
------------------------------
Date: Fri, 14 Feb 2003 14:03:47 +0100
From: Brad Knowles <brad.k...@skynet.be>
Subject: Re: Email push and pull (was Re: matthew dillon)
At 11:09 PM -0800 2003/02/13, Terry Lambert wrote:
> No... but does proxy really solve anything, then, more than
> a DNS rotor solves? All it really does is add a single point
> of failure. Unless you can target a subset of back end content
> servers, you might as well use DNS round-robin. Using a proxy
> implies the back end replica problem is *already* solved.
Yes, the proxy does solve the domain problem. The user logs in
with "user@domain", the proxy looks this up in the LDAP database,
which then tells it which back-end server to contact. You can
decide, on a user-by-user basis, which back-end server they will be
using for their mail. If one back-end server gets overloaded, you
can choose individual users to shift off to another machine.
Besides, you don't use just one front-end proxy. You use them in
sets of at least two, and you drop L3/L4 load balancing switches in
front of them, and the L3/L4 switches get DNS round-robin. The
switches handle balancing the connection load, and the proxy+database
handles the balancing of user mailboxes over the set of potentially
asymmetric back-end servers.
The issue of replication is a totally different matter.
> Maybe I should say "doesn't deal with LDAP the way it should"
> instead?
In what way?
- --
Brad Knowles, <brad.k...@skynet.be>
"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety."
-Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+
!w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++)
tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)
------------------------------
Date: Fri, 14 Feb 2003 15:58:09 +0100
From: Brad Knowles <brad.k...@skynet.be>
Subject: Re: Email push and pull (was Re: matthew dillon)
At 1:40 AM -0800 2003/02/14, Terry Lambert wrote:
> If you
> are using an NFS server (which you are), then it's based on your
> ability to saturate your network device.
You're still limited by disk devices that may be used temporarily
on the local server, as well as the disk devices on the other end of
that network connection. Putting them on the network does not
magically solve the problem that disk I/O is still many orders of
magnitude slower than any other thing we ever do on computer systems.
> Disagree. These locking issues are an artifact of the system
> design (FS, application, or both).
And you have magically solved all these problems in what way?
> Simple answer: Don't use a metadata intensive storage mechanism.
So, use what -- a pure memory-based file system for hundreds of
gigabytes or even multiple terabytes of storage? Even that will
still have synchronous meta-data update issues with regards to the
in-memory directory structure, even if those operations do take place
much faster.
> In other words, the message takes up your disk space, no matter
> what.
I other words, I can protect the entire system from being taken
down by a concerted DOS attack on a single user. They're going to
have to work harder than that if they want to take down my entire
system.
>> SIS increases SPOFs, reduces reliability, increases complexity,
>> increases the probability of hot-spots and other forms of contention,
>> and all for very little possible benefit.
>
> The only one of these I agree with is that it increases complexity.
In what way does SIS *not* increase SPOFs, reduce reliability,
increase the probability of hot-spots and other forms of contention,
and in what way does it magically solve all the storage problems of
the system?
> This discussion *started* because there was a set of list floods,
> and someone made a stupid remark about an important researcher
> indicating he was cancelling his subscription to the -hackers
> mailing list over it, and I pointed out to the person belittling
> the important researcher that such flooding has consequences that
> depend on the mail transport technology over and above "just having
> to delete a bunch of identical email".
Okay, so let's say that you've got this magical SIS which solves
all storage problems, and you let your users have unlimited disk
space. All it takes is someone applying trivial changes to the
messages so that they are not all actually identical, and you're back
to storing at least one copy of each.
Such transformations are typically found in message headers
(message-ids are supposed to be unique, and combinations of date/time
stamps and process ids will probably be unique, especially when taken
over the entire message and the multiple hops it might have
traversed).
Such transformations are becoming much more typical with spam,
where the recipient's name is part of the message body.
So, you're right back where you started, and yet you've paid such
a very high price.
> As far as "dealing with DOS", in for a penny, in for a pound: if
> you are willing to burn CPU cycles, then implement Sieve or some
> other technology to permit server-side filtering.
We're doing that, too. However, server-side filtering can only
do so much. Yes, it can eliminate duplicates that have the same
message-id (although there is some risk that you'll eliminate unique
messages that have colliding ids), and there is the possibility to
program it so that it can actually inspect the content and eliminate
additional messages that have the same message body fingerprint as
previously seen.
But even that can only go so far. See above.
> We also know that, for most DOS cases on maildrops, the user
> simply loses, and that's that.
True enough. But I don't have to throw out all of my users
simply because just one of them was the target of a DOS.
> Let's quit talking about the free services.
Yes, please.
> So let's limit ourselves to the realm of LWCYM - "Lunches Which
> Cost You Money".
Sounds good.
> The replication model is actually a pretty profound issue. Prior
> to replication, if you connect to one of the replicas, the message
> can be seen as "in transit". Post deletion on an original prior to
> the replication, and the deletion can bee seen as "in transit". The
> worst case failure modes are that a message has increased apparent
> delivery latency, or the message "comes back" after it's deleted.
Yes, at another level, the particular replication model chosen
will be important. However, at this level what we really care about
is the fact that the message/mailbox is replicated, and we don't
really care how.
>> That's what I was calling the "recipient system". It is the
>> system where the message was received.
>
> This is not useful to talk about in terms of a POP3 maildrop.
Sure it is. I've got limited disk space that I can afford to
give each user, in accordance to the amount of money that they are
paying for their service (or is being paid on their behalf). But
their local disk storage is limited only by their own budget (or the
budget of their group), and is not an expense that I have to account
for.
So, when defining "recipient system", it makes perfect sense that
this would be the point at which the mail is accumulated into some
sort of a mailbox or queue and held on their behalf, regardless of
whether that mailbox/queue is downloaded/retrieved with UUCP, POP3,
IMAP4, or some other protocol.
> To all intents and purposes, message in a POP3 maildrop are
> "in transit on a point to point mail transport". That's really
> the whole point of acknowledging a "pull" technology exists, in
> the first place.
Yes, there is another component to the system, which comprises
the system of the end user, their bandwidth to the server that holds
their mail, etc.... But this is not the "recipient system". This is
the "end-user system". It's an important system in the overall
scheme of things, but is different from the one we're talking about
- -- they manage their own end-user system, but I manage the recipient
system(s).
> The majority of that latency is an artifact of the FS technology,
> not an artifact of the disk technology, except as it impacts the
> ability of the FS technology to be implemented without stall
> barriers (e.g. IDE write data transfers not permitting disconnect
> ruin your whole day).
Again, I'd like to know where you get this magic filesystem
technology that solves all disk I/O performance issues and makes them
as fast as a RAM disk, while also being 100% perfectly safe.
> Unless I can use someone else's stored copy of the message to
> recover my corrupted stored copy of the message, that's not
> replication, it's duplication.
Correct. But with only ~1.3 recipients per message (on average),
there isn't much duplication to be had anyway. The whole replication
issue is a different matter.
> The reason I brought up SIS again is that you seemed more than
> willing to let a message sit in the main mail queue, but almost
> paniced at the idea of throwing it into the user mailbox instead.
No, I don't panic "...at the idea of throwing it into the user
mailbox...". I have defined queueing & buffering mechanisms that
function system-wide, which help me resist problems with even
large-scale DOS attacks, and help ensure that all the rest of my
customers continue to receive service even if a single user has an
overflowing mailbox.
But it's easier to solve this problem at the system-wide level
where I can allocate relatively large buffers, as opposed to
inflicting it on the end user and letting them try to deal with it
across their slow dial-up line (or whatever).
> Nope; I want to do it to get you to agree to turn off quotas,
> if your business model is not based on the idea that it's OK
> to drop email into /dev/null for customers who don't pay you
> more money.
Bait not taken. The customer is paying me to implement quotas.
This is a basic requirement.
Moreover, even if it wasn't a basic requirement, I'd go back to
the customer and make sure that they understood that they're placing
the entire mail system for all thousands of users at risk if there is
a single mail loop or a large DOS attack on a single user, where I
have better tools to constrain these issues at a system-wide level.
If they still said that they didn't want quotas, then I'd let
someone else build the system for them -- I wouldn't want my name on
it.
I don't drop the stuff in /dev/null. I just put some limits on
things so that I've got brakes that will automatically kick in and
start slowing the train down if there is an excessive overspeed
problem for an excessive period of time.
> FS design issue. And metadata updates in FreeBSD (with soft
> updates) or SVR4.2 or Solaris (with delayed ordered writes) are
> *NOT* synchronous, they are merely ordered.
Well, we're not talking about FreeBSD. I wish we were. However,
I can assure you that UFS+Logging definitely has synchronous
meta-data update issues -- making them ordered or putting them into a
commit log and doing them in larger chunks does not eliminate them.
Fortunately, in this case I have architected the system so that
we shouldn't run into those problems very often.
However, there's nothing I can do about synchronous meta-data
issues with the network & filesystem implementation of the NFS
server, and any related problems with the NFS client.
> You limited my options to Open Source, however.
Because there is no additional money to spend, open source is
really the only practical choice. However, neither UW-IMAP nor Cyrus
will work on NFS, thus leaving us with either the complete Courier
package, or just the Courier-IMAP component.
> Maildir is a kludge aound NFS locking. Nothing more, and nothing
> less.
Yup. And I'm convinced that it introduces more problems than it
solves. But I still don't have much choice.
> MS Exchange does, and so does Lotus Notes. I know they suck, but
> they are examples.
They're not IMAP servers. They are proprietary LAN e-mail
systems that may happen to have an interface to this alien IMAP
protocol.
>> Nope. mmap on NFS doesn't work.
>
> Who's using mmap?!?
Cyrus. All those databases it keeps to help inform it what the
status is of the various messages, etc... are using mmap to access
the information inside the database files. Or are you not familiar
with the method of operation of tools like Berkeley DB?
> This is interesting to know; from the documentation available,
> they imply they scale, and a single instance of one seems to
> match their claims for a single instance. I guess it's always
> worse than the marketing literature, when you deploy it. 8-(.
Actually, the Netscape/iPlanet e-mail server is just a re-badged
SIMS, which is itself a partial port of PMDF from Vax/VMS to Unix,
which was a port of the original MMDF from Unix to Vax/VMS.
While I have a lot of respect for PMDF and the work that Innosoft
did, we know from practical experience that SIMS can't scale beyond
~60,000 POP3 users with 5MB mailbox quotas, if you're using a Sun
Enterprise 5500. At that point, if you want to add any new users,
you must first delete some old ones.
Belgacom Skynet bought a small ISP co-op in southern Belgium that
was using SIMS as their mail system, and one of the reasons they were
selling themselves to us was the fact that their mail system couldn't
scale. We moved their users over to a system on a Sun E420R with an
external Comparex D1400/Hitachi Data Systems DF400 RAID array which
was already serving several hundred thousand users, and we didn't
even notice.
SIMS and Netscape/iPlanet mail server are dead-end products.
Scott McNealy was very unpleasantly surprised when the Sun Europe
guys sprung SIMS on him, and it is definitely going the way of the
dodo. Note that Sun is a major investor in Sendmail, Inc. and they
have on their payroll one of the key members of the Sendmail
Consortium.
> 40 seconds to transfer on a Gigabit ethernet... assuming you can get
> it of the disks. 8-). Do you really expect them all simultaneously?
Not a one of these machines has GigaBit Ethernet. They all have
100Base-TX FastEthernet, and the front-end machines may also have a
second 100Base-TX FastEthernet interface (if I can scrounge a couple
of NICs).
The big problem is that most of the users will also have
100Base-TX FastEthernet. It won't take too many of them trying to
access the server at once to completely swamp it.
> You don't need to assert a lock over NFS, if the only machine doing
> the reading is the one doing the writing, and it asserts the lock
> locally (this was more talking about the Cyrus cache files, not
> maildir).
This assumes that there is only one machine ever writing to a
particular mailbox. This is not a valid assumption.
- --
Brad Knowles, <brad.k...@skynet.be>
"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety."
-Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+
!w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++)
tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)
------------------------------
End of freebsd-chat-digest V5 #704
**********************************
To Unsubscribe: send mail to majo...@FreeBSD.org
with unsubscribe freebsd-chat-digest in the body of the message