Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

freebsd-chat-digest V5 #703

0 views
Skip to first unread message

owner-freebs...@freebsd.org

unread,
Feb 13, 2003, 11:12:59 AM2/13/03
to

freebsd-chat-digest Thursday, February 13 2003 Volume 05 : Number 703

In this issue:
Re: Email push and pull (was Re: matthew dillon)
Re: Email push and pull (was Re: matthew dillon)
Re: Email push and pull (was Re: matthew dillon)
散户宝E组成立,新春大特惠,名额有限,快去索取新章程吧!
Re: Email push and pull (was Re: matthew dillon)
Re: Email push and pull (was Re: matthew dillon)
Re: Email push and pull (was Re: matthew dillon)
Re: Email push and pull (was Re: matthew dillon)

----------------------------------------------------------------------

Date: Thu, 13 Feb 2003 00:14:33 +0100
From: Brad Knowles <brad.k...@skynet.be>
Subject: Re: Email push and pull (was Re: matthew dillon)

At 9:26 AM -0800 2003/02/12, Terry Lambert wrote:

>> In what way does IMAP4 need to be changed?
>
> Add domain support. There is no support in it in the login
> process, which asks only username and password. The trick of
> specifying "username@domain" will not work for a number of mail
> clients, such as some versions Netscape, strip trailing "@*"
> before sending it to the server, on the assumption that they
> know better than you do.

Certainly, you could pass pretty much whatever you want. The
issue is whether or not the IMAP server understands the @domain
portion.

This problem is already solved -- check out Perdition.

> Well, the "OP" in this case is actually a reference to a DJB
> document, and what we're interested in here is solving the
> problem. We don't have to accept both the data store and the
> "don't send messages" parts of the document, all or nothing.

Well, if we're interested in solving the problem and not
necessarily accepting the entire document as specified, then we might
as well throw the whole thing away.

I can't see any way to apply any part of this proposal and make it work.

> No, you'd direct-send notices. You'd flood-fill the messages
> the notices referred to, to address the scalability and availability
> problems, and to partially address the "revisionist history" and
> "delete before receipt" problems.

You can do this today. Send an e-mail and tell people to go read
a specific USENET news message. Doesn't work too well.

- --
Brad Knowles, <brad.k...@skynet.be>

"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety."
-Benjamin Franklin, Historical Review of Pennsylvania.

GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+
!w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++)
tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

------------------------------

Date: Wed, 12 Feb 2003 19:32:10 -0800
From: Terry Lambert <tlam...@mindspring.com>
Subject: Re: Email push and pull (was Re: matthew dillon)

Brad Knowles wrote:
> At 9:17 AM -0800 2003/02/12, Terry Lambert wrote:
> > In terms of I/O throughput, you are right.
> >
> > But we are not interested in I/O throughput, in this case, we
> > are interested in minimizing dynamic pool size, for a given
> > pool retention time function, over a given input and output
> > volume.
>
> Under what circumstances are you not interested in I/O throughput?!?

When the problem is recipient maildrop overflow, rather than
inability to handle load. Since a single RS/6000 with 2 166MHz
CPUs and a modified Sendmail can handle 500,000 average-sized
email messages in a 24 hour period, load isn't really the problem:
it's something you can "throw hardware at", and otherwise ignore.


> I have seen some mail systems that were short of disk space, but
> when we looked carefully at the number of messages in the mailboxes
> and the number of recipients per message, there just wasn't a whole
> lot of disk space that we could potentially have recovered. This was
> across 100,000+ POP3/dialup users at an earlier time in the life of
> Belgacom Skynet, the largest ISP in Belgium.

The issue is not real limits, it is administrative limits, and, of
you care about being DOS'ed, it's about aggregate limits not
resulting in overcommit.


> Virtually all other mail systems I've ever seen have not had disk
> space problems (unless they didn't enforce quotas), but were instead
> critically short of I/O capacity in the form of synchronous meta-data
> updates. This was true for both the MTA and the mailbox storage
> mechanism.

You are looking at the problem from the wrong end. A quota is good
for you, but it sucks for your user, who loses legitimate traffic,
if illegitimate traffic pushed them over their quota.

What this comes down to is the level of service you are offering
your customer. Your definition of "adequate" and their definition
of "adequate" are likely not the same.

If we take two examples: HotMail and Yahoo Mail (formerly Rocket
Mail), it's fairly easy to see that the "quota squeeze" was
originally intended to act as a circuit break for the disk space
issue.

However, we now see that it's being used as a lever to attempt to
extract revenue from a broken business model ("buy more disk space
for only $9.95/month!").

The user convenience being sold here lies in the ability for the
user to request what is, in effect, a larger queue size, in
exchange for money.

If this queue size were not an issue, then we would not be having
this discussion: it would not have value to users, and, not having
any value, it would not have a market cost associated with its
reduction.


> > The Usenet parallel is probably not that apt. Usenet provides
> > an "Expires:" header, which bounds the pool retention time to a
> > fixed interval, reagardless of volume.
>
> Almost no one ever uses the Expires: header anymore. If they do,
> it's in an attempt to ensure that the message stays around much, much
> longer than normal as opposed to the reverse.

Whether the expires is enforced by default, self, or administratively
is irrelevent to the mere fact that there is a limited lifetime in
the distributed persistant queueing system that is Usenet.


> No, what I was talking about was the fundamental fact that you
> cannot possibly handle a full USENET feed of 650GB/day or more, if
> you don't have enough spindles going for you. It doesn't matter how
> much disk space you have, if you don't have the I/O capacity to
> handle the input.

This is a transport issue -- or, more properly, a queue management
and data replication issue. It would be very easy to envision a
system that could handle this, with "merely" enough spindles to
hold 650GB/day. An OS with a log structured or journalling FS,
or even soft updates, which exported a transaction dependency
interface to user space, could handle this, no problem.

Surely, you aren't saying an Oracle Database would need a significant
number of spindles in order to replicate another Oracle Database,
when all bottlenecks between the two machines, down to the disks,
are directly managed by a unified software set, written by Oracle?


> > Again, I disagree. Poor design is why they don't scale.
>
> Right, and another outcome of poor design is their stupid choice
> of single-instance store -- a false economy.

I'm not positive that it matters, one way or the other, in the
long run, if thigs are implemented correctly. However, it is
Esthetically pleasing, on many levels.


> >> These slides have absolutely nothing whatsoever to do with the
> >> MTA. They have to do with the mailbox, mailbox delivery, mailbox
> >> access, etc.... You need message locking in some fashion, you may
> >> need mailbox locking, and most schemes for handling mailboxes involve
> >> either re-writing the mailbox file, or changing file names of
> >> individual messages (or changing their location), etc.... These are
> >> all synchronous meta-data operations.
> >
> > You do not need all the locking you state you need, at least not
> > at that low a granularity.
>
> Really? I'd like to hear your explanation for that claim.

Why the heck are you locking at a mailbox granularity, instead
of a message granularity, for either of these operations?


> I would be very interested to know at what time they have ever
> used any Vax/VMS systems anywhere in the entire history of the
> company. I have plenty of contacts that I can use to verify any
> claims.

Sorry, I was thinking of Compuserve, who had switched over to
FreeBSD for some of its systems, at one point.


> > Sendmail performance tuning is not the issue, although if you
> > are a transit server for virtual domains, you should rewrite the
> > queueing algorithm.
>
> My point is not that Sendmail is the issue. My point is that
> Nick has designed and built some of the largest open-source based
> mail systems in the world, and he and I worked extensively to create
> the architecture laid out in my LISA 2002 talk.

And I was the architect for the IBM Web Connections NOC, and
for an unannounced IBM Services product. This isn't a size
contest... 8-).


> > See:
> >
> > ftp://ftp.whistle.com/pub/misc/sendmail/
>
> This was written for sendmail 8.9.3, way before the advent of
> multiple queues and all other sorts of new features. It is no longer
> relevant to modern sendmail.

I was at the Sendmail MOTM (Meeting Of The Minds) architecture
discussion in 2000. I was one of about 8 outside people in the
world who was invited to the thing. I am aware of Sendmail. I
still have the thermal underwear.

The answer is that it *is* relevent to modern sendmail, because
the multipl queues in the modern sendmail are, effectively,
hashed traversal domains. If you read the presentation that
David Wolfskill did for BayLISA (the "mpg" in that directory),
you will see the difference.

The point of the queue modification, in this system, which was a
large transit mail server designed for 50,000 virtual domains on
a single server instance, was to ensure a 100% hit rate in all
queue runs.

The main problem sendmail faces, when performing fractional queue
runs, is that it must open each queue file and examine its contents,
in order to know whether or not a given queue element should be
operated upon.

Breaking up the queue runs into multiple hash directories, and
running each queue with a seperate process avoids the global
queue lock issue, but only *statistically* decreases the chance of
a run-collision between two runners. It does *not* increase the
probability of a "hit" for a given queue element, *at all*.

Even if you did "the logical thing", and ensured that all domain
destinations ended up in the same hash bucket (I would be rather
amazed if you could do that, and simultaneously balance queue
depth between queues, given a fixed hash selection algorithm!),
the increase in "hit" probability will only go up by the average
total queue depth divided by the average number of queue entries
per queue. This number is *negligible*, until the number of
queues approaches the number of domains.

Compare; before modification, the maximum load on a SPARC Center
10 was 300 client machines making a queue run every hour. After
modification, the maximim load on an RS6000/50 was 50,000 client
machines making a queue run every hour. Before modification, the
number of messages which could transit the system was 30,000 8K
messages in a 24 hour period. After, it was 5,000,000 8K messages
in a 24 hour period.

Assuming 100 hash queues... the increase in processing over 30,000
messages is *negligible*. The reason for this is that we have
reached queue saturation, since a triggered run must take place in
all queues, anyway.


> > The Open Source book is wrong. You can not build such a system
> > without significant modification. My source tree, for example,
> > contains more than 6 million lines of code at this point, and
> > about 250,000 of those are mine, modifying Cyrus, modifying
> > OpenLDAP, modifying Sendmail, modifying BIND, etc..
>
> IIRC, Nick talks about the changes that were required -- some,
> but not excessive. Read the paper.

I have read it. The modifications he proposes are small ones,
which deal with impedence issues. They are low hanging fruit,
available to a system administrator, not an in depth modification
by a software engineer.


> > Because Open Source projects are inherently incapable of doing
> > productization work.
>
> True enough. However, this would imply that the sort of thing
> that Nick has done is not possible. He has demonstrated that this is
> not true.

*You've* demonstrated it, or you would just adopt his solution
wholesale. The issue is that his solution doesn't scale nearly
as well as is possible, it only scales "much better than Open
Source on its own".

Try an experiment for me: tune the bejesus out of a FreeBSD box
with 4G of RAM. Do everything you can think of doing to it, in
the same time frame, and at the same level of depth of understanding
that Nick applied to his system. Then come back, and tell me two
numbers: (1) Maximum number of new connections per second before
and after, and (2) Total number of simultaneous connections, before
and after.


> Yes, using open source to do this sort of thing can be difficult
> (as I am finding out), but it doesn't have to be impossible.

It doesn't have to be, that's agreed, but it takes substantially
more investment than it would cost to build out using multiple
instances of commercial software, plus the machines to run it, to
"brute force" the problem. Or the resulting system ends up being
fragile.


> > $0 is not really true. They are paying for you, in the hopes
> > that it will end up costing less than a prebuilt system.
>
> It's a different color of money. They had already signed the
> contract stating that I would be working for them through April (at
> the earliest), before this project was dumped in my lap. So, that's
> not anything extra. Buying new machines, or buying software, now
> that's spending extra.

I can't imagine a business which did not run on a cost accounting
basis. 8-).


> > Contact Stanford, MIT, or other large institutions which have
> > already deployed such a system.
>
> I've already read much of Mark Crispin's writings. I know how
> they did it at UW, and they didn't use NFS. I've read the Cyrus
> documentation, and they didn't use NFS either.

UW is not the place you should look. Stanford (as I said)
already has a deployed system, and they are generally helpful
when people want to copy what they have done.


> That only leaves Courier-IMAP, and while I've read the
> documentation they have available, I am finding it difficult to find
> anyone who has actually built a larger-scale system using
> Courier-IMAP on NFS. Plenty of people say they've heard of it being
> done, or it should be easily do-able, but I'm not finding the people
> themselves who've actually done it.

If you are looking at IMAP4, then Cyrus or a commercial product
are your only options, IMO, and neither will work well enough, if
used "as is".


> > Not in Open Source; Open Source does not perform productization or
> > systems integration.
>
> Therein lies the problem. You may be able to write or re-write
> all of the open source systems in existence, but that sort of thing
> is not within my capabilities, and would not be accepted for this
> project. They're looking askance at my modifications to procmail to
> get it to use Maildir format and hashed mailboxes -- there's no way
> they'd accept actual source code changes.

How many maildrops does this need to support? I will tell you if
your project will fail. 8-(.

- -- Terry

------------------------------

Date: Wed, 12 Feb 2003 19:37:25 -0800
From: Terry Lambert <tlam...@mindspring.com>
Subject: Re: Email push and pull (was Re: matthew dillon)

Brad Knowles wrote:
> At 9:26 AM -0800 2003/02/12, Terry Lambert wrote:
> >> In what way does IMAP4 need to be changed?
> >
> > Add domain support.
>
> Certainly, you could pass pretty much whatever you want. The
> issue is whether or not the IMAP server understands the @domain
> portion.
>
> This problem is already solved -- check out Perdition.

The problem is *not* solved by Perdition. Perdition does not even
*begin* to solve the problem.


> > Well, the "OP" in this case is actually a reference to a DJB
> > document, and what we're interested in here is solving the
> > problem. We don't have to accept both the data store and the
> > "don't send messages" parts of the document, all or nothing.
>
> Well, if we're interested in solving the problem and not
> necessarily accepting the entire document as specified, then we might
> as well throw the whole thing away.

I agree. It is a poor specification.


> > No, you'd direct-send notices. You'd flood-fill the messages
> > the notices referred to, to address the scalability and availability
> > problems, and to partially address the "revisionist history" and
> > "delete before receipt" problems.
>
> You can do this today. Send an e-mail and tell people to go read
> a specific USENET news message. Doesn't work too well.

Doesn't address any privacy issues. Even encrypted, your data is
out there for anyone to perform traffic analysis upon.

- -- Terry

------------------------------

Date: Thu, 13 Feb 2003 13:25:47
From: xxxxxxxxx...@163.com
Subject: 散户宝E组成立,新春大特惠,名额有限,快去索取新章程吧!

散户宝E组成立,新春大特惠,名额有限,快去索取新章程吧!

先获利后付费!

稳健获利,散户宝再一次利用我们的实力,有效的规避大盘的风
险。2002年下半年的操作结束了,在大盘跌幅达20.8%的大熊市,下
半年散户宝各账户的均值获利依旧达到了8.92%,最佳账户获利达17.
4% 2003年的行情依旧不容乐观,只有对消息面的准确把握, 技术面
的透彻理解,才能实现账户市值的稳健增涨。四年来的会员服务实力
,对中国股市行情的有力把握,这一切都表明散户宝真真正正服务于
广大散户,是您值得信任的大管家。选择我们,也就选择了稳健盈利
,无忧炒股的未来!真正把“赚钱就是硬道理”落到实处!先获利后
付费,这就是我们实力的承诺!

散户宝E组成立,新春大特惠,名额有限,快去索取新章程吧!

先获利后付费!

股市风云变幻,对于在资金技术消息等方面占劣势的您来讲能得
散户宝金牛资讯的指导无疑是个好选择。踏踏实实的分析和明确的指
导,是用时间换取稳定收益的投资理念,也是本工作室的宗旨;相信
本工作室的职业水准必将使您摆脱“花力气、赔精力、亏老本”的局
面,真正把“赚钱就是硬道理”落到实处!先获利后付费,这就是我
们实力的承诺!

先获利后付费!

一样的盘面,不一样的收益,来吧,选择我们,选择你们无忧盈利
的将来!和我们的会员一起享受无忧炒股,轻松盈利的乐趣吧!

有意者请立即致信sanhub...@vip.163.com索取入会章程(含
散户宝的业绩)。


散户宝金牛咨询

------------------------------

Date: Thu, 13 Feb 2003 14:00:08 +0100
From: Brad Knowles <brad.k...@skynet.be>
Subject: Re: Email push and pull (was Re: matthew dillon)

At 7:37 PM -0800 2003/02/12, Terry Lambert wrote:

>> This problem is already solved -- check out Perdition.
>
> The problem is *not* solved by Perdition. Perdition does not even
> *begin* to solve the problem.

Okay, what parts of the problem doesn't Perdition solve?

>> You can do this today. Send an e-mail and tell people to go read
>> a specific USENET news message. Doesn't work too well.
>
> Doesn't address any privacy issues. Even encrypted, your data is
> out there for anyone to perform traffic analysis upon.

I don't see how you can do a flood-fill mechanism without having
the message accessible to anyone who'd want to read it. Of course,
it should be public-key encrypted, so that the only traffic analysis
that could be performed was the path that it took over the flood-fill
servers, which could be obscured.

- --
Brad Knowles, <brad.k...@skynet.be>

"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety."
-Benjamin Franklin, Historical Review of Pennsylvania.

GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+
!w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++)
tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

------------------------------

Date: Thu, 13 Feb 2003 14:13:55 +0100
From: Brad Knowles <brad.k...@skynet.be>
Subject: Re: Email push and pull (was Re: matthew dillon)

At 7:32 PM -0800 2003/02/12, Terry Lambert wrote:

>> Under what circumstances are you not interested in I/O throughput?!?
>
> When the problem is recipient maildrop overflow, rather than
> inability to handle load. Since a single RS/6000 with 2 166MHz
> CPUs and a modified Sendmail can handle 500,000 average-sized
> email messages in a 24 hour period, load isn't really the problem:
> it's something you can "throw hardware at", and otherwise ignore.

Again, you're talking about the MTA. For this discussion, I
couldn't give a flying flip about the MTA. I care about the message
store and mailbox access methods. I know how to solve MTA problems.
Solving message store and mailbox access methods tend to be more
difficult, especially if they're dependant on a underlying technology
that you can't touch or change.

> The issue is not real limits, it is administrative limits, and, of
> you care about being DOS'ed, it's about aggregate limits not
> resulting in overcommit.

Quotas and making sure you have enough disk space are
well-understood problems with well understood solutions.

> You are looking at the problem from the wrong end. A quota is good
> for you, but it sucks for your user, who loses legitimate traffic,
> if illegitimate traffic pushed them over their quota.

There's no way around this issue. If you don't set quotas then
the entire system can be trivially taken down by a DOS attack, and
this affects thousands, hundreds of thousands, or millions of other
users. If you do set quotas, the entire system can still be taken
down, but it takes a more concerted effort aimed at more than just
one user.

You have to have quotas. There simply is no other viable
alternative. They key is setting them high enough that 95-99% of
your users never hit them, and the remainder that do would probably
have hit *any* quota that you set, and therefore they need to be
dealt with in a different manner.


For dealing with DOS attacks that take a single user over their
quota, that's a different issue that has to be addressed in a
different manner.

> What this comes down to is the level of service you are offering
> your customer. Your definition of "adequate" and their definition
> of "adequate" are likely not the same.

If 95-99% of all users never even notice that there is a quota,
then I've solved the part of the problem that is feasible to solve.
The remainder cannot possibly be solved with any quota at any level,
and these users need to be dealt with separately.

> If we take two examples: HotMail and Yahoo Mail (formerly Rocket
> Mail), it's fairly easy to see that the "quota squeeze" was
> originally intended to act as a circuit break for the disk space
> issue.

Right. A valid use of quotas, especially when you're talking
about a free service.

> However, we now see that it's being used as a lever to attempt to
> extract revenue from a broken business model ("buy more disk space
> for only $9.95/month!").

Another valid use, in this case allowing you to have an actual
sustainable business model.

Or would you prefer for everyone to offer all their services for
"free" only to go bankrupt six months later, and forcing you to go
somewhere else for your next fix of "free" service? That way lies
madness.

> The user convenience being sold here lies in the ability for the
> user to request what is, in effect, a larger queue size, in
> exchange for money.
>
> If this queue size were not an issue, then we would not be having
> this discussion: it would not have value to users, and, not having
> any value, it would not have a market cost associated with its
> reduction.

You have to pay for storage somehow.

If you store it all on the sender's system, then you run into
SPOFs, overload when a billion people all check their e-mail and read
a copy of the same message, backups, etc....

If you use a flood-fill mechanism, then everyone pays to store
everyone's messages all the time, and then you run into problems of
not enough shared storage space so old messages get tossed away very
quickly and then they just re-post them again. Look at what's
happening to USENET today.

If you store them on the recipient system, you have what exists
today for e-mail. Of the three, this is the only one that has proved
sustainable (so far) and sufficiently reliable.

> Whether the expires is enforced by default, self, or administratively
> is irrelevent to the mere fact that there is a limited lifetime in
> the distributed persistant queueing system that is Usenet.

Yeah, at 650GB/day for a full feed, it's called not having enough
disk space for an entire day's full feed. At ~2GB/day for text only,
it's called not having enough disk space for a weeks traffic. And
you still lose messages that never somehow managed to flood over to
your system. For USENET, this doesn't really matter. But for
personal e-mail that needs some reasonable guarantees, this just
doesn't fly.

> This is a transport issue -- or, more properly, a queue management
> and data replication issue. It would be very easy to envision a
> system that could handle this, with "merely" enough spindles to
> hold 650GB/day.

Two IDE 320GB disks are not going to cut it. They cannot
possibly get the data in and out fast enough.

> An OS with a log structured or journalling FS,
> or even soft updates, which exported a transaction dependency
> interface to user space, could handle this, no problem.

Bullshit. You have to have sufficient underlying I/O capacity to
move a given amount of data in a given amount of time, regardless of
what magic you try to work at a higher level.

> Surely, you aren't saying an Oracle Database would need a significant
> number of spindles in order to replicate another Oracle Database,
> when all bottlenecks between the two machines, down to the disks,
> are directly managed by a unified software set, written by Oracle?

Yup. Indeed, this is *precisely* what is needed. Just try doing
this on a single 320GB hard drive. Or even a pair of 320GB hard
drives.

Large-capacity hard drives don't do us any good for applications
like this. If they did, then companies like EMC, Hitachi Data
Systems, Auspex, Network Appliance, etc... wouldn't exist.


We need enough drives with enough I/O capacity to handle the
transaction rates. We worry about disk space secondarily, because we
know that we can always buy the next size up.

> I'm not positive that it matters, one way or the other, in the
> long run, if thigs are implemented correctly. However, it is
> Esthetically pleasing, on many levels.

Aesthetically pleasing or not, it is not practical. SIS causes
way too many problems and only solves issues that we don't really
care about.

>> > You do not need all the locking you state you need, at least not
>> > at that low a granularity.
>>
>> Really? I'd like to hear your explanation for that claim.
>
> Why the heck are you locking at a mailbox granularity, instead
> of a message granularity, for either of these operations?

For IMAP, you need to lock at message granularity. But your
ability to do that will be dependant on your mailbox format.
Choosing a mailbox directory format has a whole host of associated
problems, as well understood and explained by Mark Crispin at
<http://www.washington.edu/imap/documentation/formats.txt.html>.

Either way, locking is a very important issue that has to be
solved, one way or the other.

> Sorry, I was thinking of Compuserve, who had switched over to
> FreeBSD for some of its systems, at one point.

I don't have any direct knowledge of the Compuserve systems.

I can tell you that the guys at Compuserve appeared to be
blissfully unaware of many scaling issues when they had one millions
customers and AOL had five million. I don't understand why, but
somewhere between those two numbers, a change in scale had become a
change in kind.

> I have read it. The modifications he proposes are small ones,
> which deal with impedence issues. They are low hanging fruit,
> available to a system administrator, not an in depth modification
> by a software engineer.

The point is that these low-hanging fruit were enough to get Nick
to a point where he could serve multiple millions of customers using
this technology, and he didn't need to go any further.

That same design was used by Nick and the other consultants at
Sendmail for a number of early customers, the largest publicly known
member of which was FlashNet with about ten million customers. There
were others, even larger, but their names have been withheld at their
request.


Sendmail has since moved on to SAMS, which is much more
full-features, scalable, etc.... But the original starting point was
all Nick's work, and it did quite a lot for what little was done.

>> True enough. However, this would imply that the sort of thing
>> that Nick has done is not possible. He has demonstrated that this is
>> not true.
>
> *You've* demonstrated it, or you would just adopt his solution
> wholesale. The issue is that his solution doesn't scale nearly
> as well as is possible, it only scales "much better than Open
> Source on its own".

I can't adopt his solution. He did POP3, I'm doing IMAP.

The mailbox formats have to change, because we have to assume
multiple simultaneous processes accessing it (unlike POP3). He did
just fine with mailbox locking (or methods to work around that
problem). I need message locking (or methods to work around that
problem). There are a whole series of other domino-effect changes
that end up making the end solution totally different.

Simply put, there just aren't that many medium-scale IMAP
implementations in the world, period. Even after my LISA 2000 paper,
there still haven't been *any* truly large-scale IMAP
implementations, despite things like
<http://www-1.ibm.com/servers/esdd/articles/sendmail/>,
<http://www.networkcomputing.com/1117/1117f1.html?ls=NCJS_1117bt>,
<http://store.sendmail.com/pdfs/whitepapers/wp_samscapacity.pdf>,
<http://store.sendmail.com/pdfs/whitepapers/wp_samscapacity.zseries.pdf>,
and <http://www.dell.com/downloads/global/topics/linux/sendmail.doc>.

Certainly, so far as I can tell, none of them have used NFS as
the underlying mailbox storage method.

> Try an experiment for me: tune the bejesus out of a FreeBSD box
> with 4G of RAM. Do everything you can think of doing to it, in
> the same time frame, and at the same level of depth of understanding
> that Nick applied to his system. Then come back, and tell me two
> numbers: (1) Maximum number of new connections per second before
> and after, and (2) Total number of simultaneous connections, before
> and after.

Give me such a box and wait until I've gotten this project out of
the way, and I'll be glad to do this sort of thing. I'm setting up
my own consulting business, and a large part of the work I want to do
is in relation to research on scaling issues. This would be right up
my alley.

But, I can't buy boxes like this for myself.

> It doesn't have to be, that's agreed, but it takes substantially
> more investment than it would cost to build out using multiple
> instances of commercial software, plus the machines to run it, to
> "brute force" the problem. Or the resulting system ends up being
> fragile.

Operations and maintenance is going to be significantly higher,
that much I can guarantee you.

> UW is not the place you should look. Stanford (as I said)
> already has a deployed system, and they are generally helpful
> when people want to copy what they have done.

I'll check and see what they've done.

> If you are looking at IMAP4, then Cyrus or a commercial product
> are your only options, IMO, and neither will work well enough, if
> used "as is".

Cyrus doesn't work on NFS. Most of the commercial products I've
been able to find are based on Cyrus or Cyrus-like technology and
don't support NFS, either. The ones I've been able to find that
would (theoretically) support NFS are based on Courier-IMAP, and run
on Linux on PCs.

One of the other can't-change criteria for this system is that it
has to run on SPARC/Solaris, so for example Bynari Insight Server is
not an option.

> How many maildrops does this need to support? I will tell you if
> your project will fail. 8-(.

~1800 LAN e-mail clients initially, quickly growing to
~3000-4000, and possible growth to ~6,000-10,000.

Not counting headers, during one week of fairly typical activity
for the initial ~1800 users, message size distributions were
(measured in terms of bytes):

Mininum: 0
5th Percentile: 328
10th Percentile: 541
25th Percentile: 623
Median: 1424
75th Percentile: 4266
90th Percentile: 41743
95th Percentile: 159314
Maximum: 41915955
Mean: 66502
Sample Std. Deviation: 553042

For the initial ~1800 users, the mailbox distributions are (bytes):

Mininum: 0
5th Percentile: 318
10th Percentile: 318
25th Percentile: 318
Median: 595430
75th Percentile: 919726.25
90th Percentile: 9026371
95th Percentile: 25530278
Maximum: 200702940
Mean: 4.02673e+06
Sample Std. Deviation: 1.32811e+07

For the initial ~1800 users, during the same sample time above,
message arrival rates per second were:

Mininum: 1
5th Percentile: 1
10th Percentile: 1
25th Percentile: 1
Median: 1
75th Percentile: 1
90th Percentile: 2
95th Percentile: 2
Maximum: 28
Mean: 1.20909
Sample Std. Deviation: 0.577905

For the initial ~1800 users, during the same sample time above,
message arrival rates per minute were:

Mininum: 1
5th Percentile: 1
10th Percentile: 2
25th Percentile: 3
Median: 6
75th Percentile: 17
90th Percentile: 25
95th Percentile: 28
Maximum: 419
Mean: 10.4627
Sample Std. Deviation: 11.2166

For the initial ~1800 users, during the same sample time above,
message arrival rates per hour were:

Mininum: 113
5th Percentile: 153
10th Percentile: 186
25th Percentile: 240
Median: 360.5
75th Percentile: 1134
90th Percentile: 1388
95th Percentile: 1498
Maximum: 1844
Mean: 614.102
Sample Std. Deviation: 489.391

For the initial ~1800 users, during the same sample time above,
message arrival rates per day were:

Mininum: 4883
5th Percentile: 4883
10th Percentile: 4883
25th Percentile: 7763
Median: 17047
75th Percentile: 17467
90th Percentile: 21458
95th Percentile: 21458
Maximum: 21458
Mean: 14056.1
Sample Std. Deviation: 6333.29

For the initial ~1800 users, during the same sample time above,
the distribution of number of recipients per message was:

Mininum: 0
5th Percentile: 1
10th Percentile: 1
25th Percentile: 1
Median: 1
75th Percentile: 1
90th Percentile: 2
95th Percentile: 3
Maximum: 294
Mean: 1.33054
Sample Std. Deviation: 3.03305

- --
Brad Knowles, <brad.k...@skynet.be>

"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety."
-Benjamin Franklin, Historical Review of Pennsylvania.

GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+
!w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++)
tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

------------------------------

Date: Thu, 13 Feb 2003 07:14:22 -0800
From: Terry Lambert <tlam...@mindspring.com>
Subject: Re: Email push and pull (was Re: matthew dillon)

Brad Knowles wrote:
> At 7:37 PM -0800 2003/02/12, Terry Lambert wrote:
> >> This problem is already solved -- check out Perdition.
> >
> > The problem is *not* solved by Perdition. Perdition does not even
> > *begin* to solve the problem.
>
> Okay, what parts of the problem doesn't Perdition solve?

Replication and failover.

Perdition provides distributed load balancing.

The bck end stores that the Perdition proxy accesses have to have
the content locally available.

Even if you are able to load-balance, you are only able to do so
between back-end servers that contain the actual content in question.

The result is that you provide a unified view onto a backend farm,
but you lack replication and failover in the back-end, and it does
not magically appear, merely because you are running Perdition.

There are other POP3 and IMAP4 proxies that can do the same things
Perdition can: it's no big deal. In fact, it doesn't deal with
LDAP, which is probably where the routing to the back end store will
occur.


> >> You can do this today. Send an e-mail and tell people to go read
> >> a specific USENET news message. Doesn't work too well.
> >
> > Doesn't address any privacy issues. Even encrypted, your data is
> > out there for anyone to perform traffic analysis upon.
>
> I don't see how you can do a flood-fill mechanism without having
> the message accessible to anyone who'd want to read it. Of course,
> it should be public-key encrypted, so that the only traffic analysis
> that could be performed was the path that it took over the flood-fill
> servers, which could be obscured.

The primary use of such a thing is statistical client location
anonymity. Basically, it's useful for terrorists and other
covert communications networks (e.g. "Blacknets"), but not for a
lot else, unless you put more effort into it. Insertion points
are always known.

Basically, it's only useful as a replication technology, and then,
only behind the scenes at a particular provider, as part of their
provider network.

You could address these issues, but since Kazaa and GNUtella have
failed to address the scaling issues when they tried to address the
same issues, it's a complex enough problem that it's probably not
going to be solved by Open Source.

- -- Terry

------------------------------

Date: Thu, 13 Feb 2003 08:09:14 -0800
From: Terry Lambert <tlam...@mindspring.com>
Subject: Re: Email push and pull (was Re: matthew dillon)

Brad Knowles wrote:
> >> Under what circumstances are you not interested in I/O throughput?!?
>
> Again, you're talking about the MTA. For this discussion, I
> couldn't give a flying flip about the MTA. I care about the message
> store and mailbox access methods. I know how to solve MTA problems.
> Solving message store and mailbox access methods tend to be more
> difficult, especially if they're dependant on a underlying technology
> that you can't touch or change.

OK, then why do you keep talking about I/O throughput? Do you
mean *network I/O*? Why the hell would you care about disk I/O
on a properly designed message store, when the bottleneck is
going to first be network I/O, followed closely by bus bandwidth?


> > The issue is not real limits, it is administrative limits, and, of
> > you care about being DOS'ed, it's about aggregate limits not
> > resulting in overcommit.
>
> Quotas and making sure you have enough disk space are
> well-understood problems with well understood solutions.

The consequences of quatas are (apperently) not well understood.


> > You are looking at the problem from the wrong end. A quota is good
> > for you, but it sucks for your user, who loses legitimate traffic,
> > if illegitimate traffic pushed them over their quota.
>
> There's no way around this issue. If you don't set quotas then
> the entire system can be trivially taken down by a DOS attack, and
> this affects thousands, hundreds of thousands, or millions of other
> users. If you do set quotas, the entire system can still be taken
> down, but it takes a more concerted effort aimed at more than just
> one user.

So what's the difference between not enforcing a quota, and ending
up with the email sitting on your disks in a user maildrop, or
enforcing a quota, and ending up with the email sitting on your
disks in an MTA queue?

Quotas are actually a strong argument for single image storage.


> You have to have quotas. There simply is no other viable
> alternative. They key is setting them high enough that 95-99% of
> your users never hit them, and the remainder that do would probably
> have hit *any* quota that you set, and therefore they need to be
> dealt with in a different manner.

Obviously, unless setting the quota low on purpose is your revenue
model (HotMail, Yahoo Mail).


> For dealing with DOS attacks that take a single user over their
> quota, that's a different issue that has to be addressed in a
> different manner.

How? It's going to sit on your disks, no matter what, the only
choice you really have on it is *which* disk it's going to sit on.


> > What this comes down to is the level of service you are offering
> > your customer. Your definition of "adequate" and their definition
> > of "adequate" are likely not the same.
>
> If 95-99% of all users never even notice that there is a quota,
> then I've solved the part of the problem that is feasible to solve.
> The remainder cannot possibly be solved with any quota at any level,
> and these users need to be dealt with separately.

Again, how?


> > However, we now see that it's being used as a lever to attempt to
> > extract revenue from a broken business model ("buy more disk space
> > for only $9.95/month!").
>
> Another valid use, in this case allowing you to have an actual
> sustainable business model.
>
> Or would you prefer for everyone to offer all their services for
> "free" only to go bankrupt six months later, and forcing you to go
> somewhere else for your next fix of "free" service? That way lies
> madness.

No. you are misunderstanding. Their business model is:

1) Attract people who are unwilling to pay for service

2) try to sell things to the people who will not pay for
things in the first place

3) Profit!!!

It's a losing proposition, entirely. It's like the "whitebox" sellers
in Computer Shopper, whose businesses all go under in ~3 months when
the run out of capital from trying to "undercut the market to establish
a customer base, then raise prices to cash in".

I call this "The Chinese Restaurant Model": they expect to attract
people who have no brand/vendor loyalty, and then they expect them
to stay, out of brand/vendor loyalty.


> > The user convenience being sold here lies in the ability for the
> > user to request what is, in effect, a larger queue size, in
> > exchange for money.
> >
> > If this queue size were not an issue, then we would not be having
> > this discussion: it would not have value to users, and, not having
> > any value, it would not have a market cost associated with its
> > reduction.
>
> You have to pay for storage somehow.

I understand. I'm saying that the business model is fundamentally
flawed, because it depends on something to get users, and then it
depends on the logical NOT of that same something, in order to keep
them.


> If you store it all on the sender's system, then you run into
> SPOFs, overload when a billion people all check their e-mail and read
> a copy of the same message, backups, etc....

You mean like storing content on HTTP servers?


> If you use a flood-fill mechanism, then everyone pays to store
> everyone's messages all the time, and then you run into problems of
> not enough shared storage space so old messages get tossed away very
> quickly and then they just re-post them again. Look at what's
> happening to USENET today.

Flood fill will only work as part of an individual infrastructure,
not as part of a shared infrasstrusture, if what you are trying to
sell is to be any different from what everyone else is giving away
for free. You can't have a general "the Internet is a big disk"
mentality. At best, you can have peering arrangements, and then
only between peers within half an order of magnitude in size.


> If you store them on the recipient system, you have what exists
> today for e-mail. Of the three, this is the only one that has proved
> sustainable (so far) and sufficiently reliable.

This argument is flawed. Messages are not stored on recipient
systems, they are stored on the systems of the ISP that the
recipient subscribes to. Users, with the exception of some bearded
weirdos (Hi, guys!) do not run their own mail servers. That's
where quotas become an issue.


> > Whether the expires is enforced by default, self, or administratively
> > is irrelevent to the mere fact that there is a limited lifetime in
> > the distributed persistant queueing system that is Usenet.
>
> Yeah, at 650GB/day for a full feed, it's called not having enough
> disk space for an entire day's full feed. At ~2GB/day for text only,
> it's called not having enough disk space for a weeks traffic. And
> you still lose messages that never somehow managed to flood over to
> your system. For USENET, this doesn't really matter. But for
> personal e-mail that needs some reasonable guarantees, this just
> doesn't fly.

Yet those same guarantees are specifically disclaimed by HotMail
and other "free" providers, even though there is no technological
difference between a POP3 maildrop hosted at EarthLink and accessed
via a mail client, and a POP3/IMAP4 maildrop hosted at HotMail and
accessed via a mail client.

*This* is what you are supposedly paying for, but a quota is in
place in both cases.


> > This is a transport issue -- or, more properly, a queue management
> > and data replication issue. It would be very easy to envision a
> > system that could handle this, with "merely" enough spindles to
> > hold 650GB/day.
>
> Two IDE 320GB disks are not going to cut it. They cannot
> possibly get the data in and out fast enough.

Who the hell uses IDE on servers?!? Get real! You can't detach an
IDE drive during the data transfer on a write, so tagged command
queueing only works for *reading* data. For a server that does writes,
you use *SCSI* (or something else, but *not* IDE).


> > An OS with a log structured or journalling FS,
> > or even soft updates, which exported a transaction dependency
> > interface to user space, could handle this, no problem.
>
> Bullshit. You have to have sufficient underlying I/O capacity to
> move a given amount of data in a given amount of time, regardless of
> what magic you try to work at a higher level.

I think I see the misunderstanding here. You think IDE disks are
server parts. 8-).


> > Surely, you aren't saying an Oracle Database would need a significant
> > number of spindles in order to replicate another Oracle Database,
> > when all bottlenecks between the two machines, down to the disks,
> > are directly managed by a unified software set, written by Oracle?
>
> Yup. Indeed, this is *precisely* what is needed. Just try doing
> this on a single 320GB hard drive. Or even a pair of 320GB hard
> drives.

IDE again.

> We need enough drives with enough I/O capacity to handle the
> transaction rates. We worry about disk space secondarily, because we
> know that we can always buy the next size up.

Use SCSI, or divide the load between a number of IDE spindles
equal to the tagged command queue depth for a single SCSI drive
(hmmm... should I buy five SCSI drives, or should I buy 500 IDE
drives?).


> > I'm not positive that it matters, one way or the other, in the
> > long run, if thigs are implemented correctly. However, it is
> > Esthetically pleasing, on many levels.
>
> Aesthetically pleasing or not, it is not practical. SIS causes
> way too many problems and only solves issues that we don't really
> care about.

It gets rid of the quota problem.

Heck, you could even store your indices on a SCSI drive, and then
store your SIS on an IDE drive, if you wanted.


> > Why the heck are you locking at a mailbox granularity, instead
> > of a message granularity, for either of these operations?
>
> For IMAP, you need to lock at message granularity. But your
> ability to do that will be dependant on your mailbox format.
> Choosing a mailbox directory format has a whole host of associated
> problems, as well understood and explained by Mark Crispin at
> <http://www.washington.edu/imap/documentation/formats.txt.html>.

Mark's wrong. His assumptions are incorrect, and based on the
idea that metadata updates are not synchronous in all systems.
He's worrying about a problem that only exists on some platforms,
and he has to do that, because his software *may* have to run on
those platforms.

If you want me to get into criticizing his code, I can; at one
point, I converted the UW IMAP server to C++, with a pure virtual
base class for the driver interfaces, and then implemented each
driver as an implementation class. There are tons of places that
you would get runtime errors that doing this converts to compile
time errors (e.g. potential NULL pointer dereferences turn into
compilation errors about not having implementations for member
functions in the pure virtual base class).

At best, UW IMAP is an academic project.

Cyrus is much closer to commercial usability, but it has it's own
set of problems, too. Most of them, though, are solvable by
adding depth to the mail directory, so that you can seperate out
the metadata, and remove the "." separator restriction.


> Either way, locking is a very important issue that has to be
> solved, one way or the other.

No, it's a very important issue that has to be designed around,
rather than implemented.

FreeBSD has this same problem: global resources with more than
one acessor automatically require addition of locking.


> I can tell you that the guys at Compuserve appeared to be
> blissfully unaware of many scaling issues when they had one millions
> customers and AOL had five million. I don't understand why, but
> somewhere between those two numbers, a change in scale had become a
> change in kind.

Amen.


> > I have read it. The modifications he proposes are small ones,
> > which deal with impedence issues. They are low hanging fruit,
> > available to a system administrator, not an in depth modification
> > by a software engineer.
>
> The point is that these low-hanging fruit were enough to get Nick
> to a point where he could serve multiple millions of customers using
> this technology, and he didn't need to go any further.

Yes, and no. It's very easy to paint a rosy picture in a technical
paper, particularly when you are in a position to need to obtain
funding. 8-). It's something else entirely to deal with support
and scalability issues, to the point where you "just throw hardware"
at the problem. Nick's solution seems to require a lot of manual
load distribution, or a lot of proactive capacity planning, both of
which are damaging, in terms of not locking up cash flow. 8-(.


[ ... Nick's Magic Mail ... ]

> I can't adopt his solution. He did POP3, I'm doing IMAP.
>
> The mailbox formats have to change, because we have to assume
> multiple simultaneous processes accessing it (unlike POP3). He did
> just fine with mailbox locking (or methods to work around that
> problem). I need message locking (or methods to work around that
> problem). There are a whole series of other domino-effect changes
> that end up making the end solution totally different.
>
> Simply put, there just aren't that many medium-scale IMAP
> implementations in the world, period. Even after my LISA 2000 paper,
> there still haven't been *any* truly large-scale IMAP
> implementations, despite things like
> <http://www-1.ibm.com/servers/esdd/articles/sendmail/>,
> <http://www.networkcomputing.com/1117/1117f1.html?ls=NCJS_1117bt>,
> <http://store.sendmail.com/pdfs/whitepapers/wp_samscapacity.pdf>,
> <http://store.sendmail.com/pdfs/whitepapers/wp_samscapacity.zseries.pdf>,
> and <http://www.dell.com/downloads/global/topics/linux/sendmail.doc>.
>
> Certainly, so far as I can tell, none of them have used NFS as
> the underlying mailbox storage method.

You are unlikely to ever find someone using NFS in this capacity,
except as a back end for a single server message store. What you
appear to be asking for is a way to store all the mail on a big
NetApp Filer, and then have a bunch of front end machines accessing
the same mailboxes (inbound SMTP severs and outbound and inbound
IMAP4 acessors.

I submit that you've got a lot of work ahead of you. I've personally
got code that can do it, but I have six months into it, and I value it
at over $3M.


[ ... level of depth of understanding ... ]

> Give me such a box and wait until I've gotten this project out of
> the way, and I'll be glad to do this sort of thing. I'm setting up
> my own consulting business, and a large part of the work I want to do
> is in relation to research on scaling issues. This would be right up
> my alley.

The point was that, without making changes requiring an in depth
understanding of the code of the components involved, which Nick's
solution doesn't really demonstrate, you're never going to get more
than "marginally better" numbers.

[ ... ]

> Cyrus doesn't work on NFS. Most of the commercial products I've
> been able to find are based on Cyrus or Cyrus-like technology and
> don't support NFS, either. The ones I've been able to find that
> would (theoretically) support NFS are based on Courier-IMAP, and run
> on Linux on PCs.

It works on NFS. You just have to run the delivery agent on the
same machine that's running the access agent, and not try to mix
multiple hosts accessing the same data.

I understand you want a distributed, replicated message store, or
at least the appearance of one, but in order to get that, well,
you have to "write a distributed, replicated message store".


> One of the other can't-change criteria for this system is that it
> has to run on SPARC/Solaris, so for example Bynari Insight Server is
> not an option.

The part of Netscape that Sun bought used to provide an IMAP4
server (based on heavily modified UW IMAP code). Is there a
reason you can't use that? I guess the answer must be "I have
been directed to use Open Source". 8-).


> > How many maildrops does this need to support? I will tell you if
> > your project will fail. 8-(.
>
> ~1800 LAN ene-mail clients initially, quickly growing to
> ~3000-4000, and possible growth to ~6,000-10,000.

[ ... lot of stats ... ]

This should be no problem. You should be able to handle this
with a single machine, IMO, without worrying about locking, at
all. 10,000 client machines is nothing. At worst, you should
seperate inbound and outbound SMTP servers, so you can treat the
inbound one as a bastion host, and keep the outbound entirely
inside, and the inbound server should use a transport protocol
for internal delivery to the machine running the IMAP4 server,
which makes lockign go away. At worst, you can limit the number
of bastion to internal server connections, which will make things
queue up at the bastion, if you get a large activity burst, and
let it drain out to the internal server, over time. At most,
you are well under 40,000 simultaneous TCP connections to the
IMAP4 server host, even if you are using OutLook, people have
two mailboxes open, each, and are monitoring incoming mail in
several folders.

- -- Terry

------------------------------

End of freebsd-chat-digest V5 #703
**********************************

To Unsubscribe: send mail to majo...@FreeBSD.org
with unsubscribe freebsd-chat-digest in the body of the message

0 new messages