Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

freebsd-chat-digest V5 #705

1 view

Skip to first unread message

owner-freebs...@freebsd.org

unread,

Feb 14, 2003, 10:40:08 PM2/14/03

freebsd-chat-digest Friday, February 14 2003 Volume 05 : Number 705

In this issue:
Re: Email push and pull (was Re: matthew dillon)
Re: Email push and pull (was Re: matthew dillon)
Re: Email push and pull (was Re: matthew dillon)
Re: Email push and pull (was Re: matthew dillon)
2 Misc questions
Re: 2 Misc questions
Re: 2 Misc questions
Re: Email push and pull (was Re: matthew dillon)

----------------------------------------------------------------------

Date: Fri, 14 Feb 2003 11:03:12 -0800
From: Terry Lambert <tlam...@mindspring.com>
Subject: Re: Email push and pull (was Re: matthew dillon)

Brad Knowles wrote:
> At 11:09 PM -0800 2003/02/13, Terry Lambert wrote:
> > No... but does proxy really solve anything, then, more than
> > a DNS rotor solves? All it really does is add a single point
> > of failure. Unless you can target a subset of back end content
> > servers, you might as well use DNS round-robin. Using a proxy
> > implies the back end replica problem is *already* solved.
>
> Yes, the proxy does solve the domain problem. The user logs in
> with "user@domain", the proxy looks this up in the LDAP database,
> which then tells it which back-end server to contact. You can
> decide, on a user-by-user basis, which back-end server they will be
> using for their mail. If one back-end server gets overloaded, you
> can choose individual users to shift off to another machine.

I solved this particular problem by modifying Cyrus to know
about domains. But the point was kind of that there are IMAP4
clients that strip "@.*" off logins, if they have "@" in them.

> > Maybe I should say "doesn't deal with LDAP the way it should"
> > instead?
>
> In what way?

The LDAP should be used to determine the set of back end servers,
not to strip the domain name, and use it to pick a domain-specific
back end server. The reason this is true should be obvious: if it
doesn't, then the number of domains you can support is limited to,
at most, the number of back end servers.

- -- Terry

------------------------------

Date: Fri, 14 Feb 2003 20:29:41 +0100
From: Brad Knowles <brad.k...@skynet.be>
Subject: Re: Email push and pull (was Re: matthew dillon)

At 11:03 AM -0800 2003/02/14, Terry Lambert wrote:

> I solved this particular problem by modifying Cyrus to know
> about domains. But the point was kind of that there are IMAP4
> clients that strip "@.*" off logins, if they have "@" in them.

Sorry, client problems are things I can't fix.

> The LDAP should be used to determine the set of back end servers,
> not to strip the domain name, and use it to pick a domain-specific
> back end server. The reason this is true should be obvious: if it
> doesn't, then the number of domains you can support is limited to,
> at most, the number of back end servers.

I believe that Perdition maps a given user@domain string to a
unique username on a given back-end machine. Therefore, you should
be able to support an indefinite number of domains on a small number
of back-end machines.

- --
Brad Knowles, <brad.k...@skynet.be>

"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety."
-Benjamin Franklin, Historical Review of Pennsylvania.

GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+
!w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++)
tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

------------------------------

Date: Fri, 14 Feb 2003 15:08:50 -0800
From: Terry Lambert <tlam...@mindspring.com>
Subject: Re: Email push and pull (was Re: matthew dillon)

Brad Knowles wrote:
> You're still limited by disk devices that may be used temporarily
> on the local server, as well as the disk devices on the other end of
> that network connection. Putting them on the network does not
> magically solve the problem that disk I/O is still many orders of
> magnitude slower than any other thing we ever do on computer systems.

I've got to say that on any mail server I've ever worked on, the
limitation on what it could handle was *never* disk I/O, unless
it was before trivial tuning had been done. It was always network
I/O, bus bandwidth, or CPU.

As far as I'm concerned, for most applications, "disks are fast
enough"; even an IDE disk doing thermal recalibration can keep
up with full frame rate digitized video for simultaneous record
and playback. 5 of those, and you're at the limit of 100Mbit
ethernet in and out, and you need 5 for RAID.

FWIW, since you really don't give a damn about timestamps on the
queue files in question, etc., you can relax POSIX guarantees on
certain metadata updates that were put there to make the DEC VMS
engineers happy by slowing down UNIX, relative to VMS, so they
would not file lawsuit from POSIX being required for gvernment
contracts.

> > Disagree. These locking issues are an artifact of the system
> > design (FS, application, or both).
>
> And you have magically solved all these problems in what way?

By writing an appropriate filesystem for the task at hand, and
by using the stacking proxy described in Heidemann's Master's
Thesis from the FICUS project, the source of FreeBSD stacking
vnode code.

> > Simple answer: Don't use a metadata intensive storage mechanism.
>
> So, use what -- a pure memory-based file system for hundreds of
> gigabytes or even multiple terabytes of storage? Even that will
> still have synchronous meta-data update issues with regards to the
> in-memory directory structure, even if those operations do take place
> much faster.

No, not "pure memory". Survey all the metadata you update. Then
survey all the metadata that you *ned* to update, and subtract the
one from the other, and turn the rest off. Trivially, look at the
"noasync" mount option, or the "inode FS".

> > In other words, the message takes up your disk space, no matter
> > what.
>
> I other words, I can protect the entire system from being taken
> down by a concerted DOS attack on a single user. They're going to
> have to work harder than that if they want to take down my entire
> system.

Like that's frigging hard.

> >> SIS increases SPOFs, reduces reliability, increases complexity,
> >> increases the probability of hot-spots and other forms of contention,
> >> and all for very little possible benefit.
> >
> > The only one of these I agree with is that it increases complexity.
>
> In what way does SIS *not* increase SPOFs, reduce reliability,
> increase the probability of hot-spots and other forms of contention,

Because those are not magically a consequence of increased complexity.
Complexity can be managed.

> and in what way does it magically solve all the storage problems of
> the system?

It doesn't solve *all* of them. As I stated, you have to do in
depth modification of the software involved. Turning off mailboxes
and turning on maildirs in the software hardly qualifies as "in depth".

> > This discussion *started* because there was a set of list floods,
> > and someone made a stupid remark about an important researcher
> > indicating he was cancelling his subscription to the -hackers
> > mailing list over it, and I pointed out to the person belittling
> > the important researcher that such flooding has consequences that
> > depend on the mail transport technology over and above "just having
> > to delete a bunch of identical email".
>
> Okay, so let's say that you've got this magical SIS which solves
> all storage problems, and you let your users have unlimited disk
> space. All it takes is someone applying trivial changes to the
> messages so that they are not all actually identical, and you're back
> to storing at least one copy of each.

And they are back to transmitting 1 copy each, and they lose their
amplification effect in any attack.

> Such transformations are typically found in message headers
> (message-ids are supposed to be unique, and combinations of date/time
> stamps and process ids will probably be unique, especially when taken
> over the entire message and the multiple hops it might have
> traversed).

Only if the messages came in on seperate sessions, and they are back
to transmitting 1 copy each, and they lose their amplification effect
in any attack.

> Such transformations are becoming much more typical with spam,
> where the recipient's name is part of the message body.

And they are back to transmitting 1 copy each, and they lose their
amplification effect in any attack.

> So, you're right back where you started, and yet you've paid such
> a very high price.

It's a price you have to pay anyway.

What the difference between a hard RT system, and a soft RT system?
The major difference is that a hard RT system achieves bounded time
processing for kernel operations, and does so by supporting kernel
preemption, which requires function reentrancy.

What's the difference between a UP system and a single system image
shared memory SMP system? The major difference is that a shared
memory SMP system supports kernel reentrancy, which requires function
reentrancy.

Solve 100% of one problem, you solve 90% of the other.

You have to solve 90% of the SIS problem anyway, you might as well
solve the remaining 10%.

> > As far as "dealing with DOS", in for a penny, in for a pound: if
> > you are willing to burn CPU cycles, then implement Sieve or some
> > other technology to permit server-side filtering.
>
> We're doing that, too. However, server-side filtering can only
> do so much. Yes, it can eliminate duplicates that have the same
> message-id (although there is some risk that you'll eliminate unique
> messages that have colliding ids), and there is the possibility to
> program it so that it can actually inspect the content and eliminate
> additional messages that have the same message body fingerprint as
> previously seen.
>
> But even that can only go so far. See above.

And it can do all the SPAM filtering that people keep saying
the user's mail client should do, because they think everyone
has broadband. If a customer has broadband, and sets their
polling interval at the default for OutLook, then all of the
problems with server side storage for both the customer and
the provider move to the customer's machine, instead.

> > We also know that, for most DOS cases on maildrops, the user
> > simply loses, and that's that.
>
> True enough. But I don't have to throw out all of my users
> simply because just one of them was the target of a DOS.

You mean "simply because we, the provider, failed to protect them
from a DOS".

> > The replication model is actually a pretty profound issue. Prior
> > to replication, if you connect to one of the replicas, the message
> > can be seen as "in transit". Post deletion on an original prior to
> > the replication, and the deletion can bee seen as "in transit". The
> > worst case failure modes are that a message has increased apparent
> > delivery latency, or the message "comes back" after it's deleted.
>
> Yes, at another level, the particular replication model chosen
> will be important. However, at this level what we really care about
> is the fact that the message/mailbox is replicated, and we don't
> really care how.

I think you still care how. I think you care because create
event propagation has to be more reliable than delete event
propagation, because of the failure cases.

[ ... ]
> So, when defining "recipient system", it makes perfect sense that
> this would be the point at which the mail is accumulated into some
> sort of a mailbox or queue and held on their behalf, regardless of
> whether that mailbox/queue is downloaded/retrieved with UUCP, POP3,
> IMAP4, or some other protocol.

By this definition, DJB is right, and the SMTP server that the
original sender contacts to send the mail is a "recipient system".
I don't buy this definition.

I think the problem here is that you think your customer is the
person who will own the mail server, while I'm thinking the
customer is the person for whom the mail is being transported.

> > The majority of that latency is an artifact of the FS technology,
> > not an artifact of the disk technology, except as it impacts the
> > ability of the FS technology to be implemented without stall
> > barriers (e.g. IDE write data transfers not permitting disconnect
> > ruin your whole day).
>
> Again, I'd like to know where you get this magic filesystem
> technology that solves all disk I/O performance issues and makes them
> as fast as a RAM disk, while also being 100% perfectly safe.

At one point Matt Dillon was working on a system that did replication
into RAM on multiple knodes, and defined that as "stable storage",
since a system failure or two would not damage the ability to take
responsibility for final delivery; that's one potential implementation.

But the easiest implementation is to use an inode FS.

[ ... ]
> Correct. But with only ~1.3 recipients per message (on average),
> there isn't much duplication to be had anyway. The whole replication
> issue is a different matter.

OK, that's a 25% reduction in the metadata overhead required,
which is what you claim is the bottleneck. That doesn't look
insignificant to me.

> No, I don't panic "...at the idea of throwing it into the user
> mailbox...". I have defined queueing & buffering mechanisms that
> function system-wide, which help me resist problems with even
> large-scale DOS attacks, and help ensure that all the rest of my
> customers continue to receive service even if a single user has an
> overflowing mailbox.

My argument would be that this should be handled out-of-band
via a feedback mechanism, rather than in-band via an EQUOTA,
using the quota as a ffeedback mechanism.

IMO, quotas are useful in IMAP4 servers, where the tendency is
to leave data on the server. But the value in the quota applies
only to *old* mail, not to unread mail, or newly arrived mail.

> But it's easier to solve this problem at the system-wide level
> where I can allocate relatively large buffers, as opposed to
> inflicting it on the end user and letting them try to deal with it
> across their slow dial-up line (or whatever).

You're going to do that to the user anyway. Worse, you are going
to give them a mailbox full of DOS crap, and drop good messages
in the toilet (you've taken responsibility for the delivery, so
the sender may not even have them any more, so when you drop them
after the 4 days, they are screwed; you are especially screwed if
the things you are dropping are DSN's from someone *just like you*).

> Bait not taken. The customer is paying me to implement quotas.
> This is a basic requirement.

This is likely the source of the disconnect. I view the person
whose mail I'm taking responsibility for, as the customer.

> Moreover, even if it wasn't a basic requirement, I'd go back to
> the customer and make sure that they understood that they're placing
> the entire mail system for all thousands of users at risk if there is
> a single mail loop or a large DOS attack on a single user, where I
> have better tools to constrain these issues at a system-wide level.

But you don't. You are relying on the feedback from an EQUOTA.
Worse, the tools you are using don't turn a quota overage into
an protocol level refusal, e.g. "451 recipient over quota", on
attempts to send the user messages.

Actually, that error would be incredibly telling: by returning it
to the remote system, you are blaming the user for being over quota,
when it's probably not the user who's at fault.

Instead, what happens is the messages pile up in your queue.

> If they still said that they didn't want quotas, then I'd let
> someone else build the system for them -- I wouldn't want my name on
> it.

You wouldn't implement an out-of-band mechanism instead? You'd
insist on the in-band mechanism of a MDA error, after you've
already accepted responsibility for the message you aren't going
to be able to deliver?

> I don't drop the stuff in /dev/null. I just put some limits on
> things so that I've got brakes that will automatically kick in and
> start slowing the train down if there is an excessive overspeed
> problem for an excessive period of time.

You *will* drop stuff in /dev/null. Any queue entries you remove
are dropped in /dev/null. You've accepted responsibility for the
delivery. In some cases, you'll be able to generate a bounce
message, but not for DSN's. Basically, if you are talking to
someone who implements as you do, then information gets lost.

[ ... ]
> Well, we're not talking about FreeBSD. I wish we were. However,

Probably ought to take the discussion off this mailing list, then.
;^).

> I can assure you that UFS+Logging definitely has synchronous
> meta-data update issues -- making them ordered or putting them into a
> commit log and doing them in larger chunks does not eliminate them.

Matt Dillon was working on this problem at one point in time;
he defined "committed to stable storage" as "replicated in RAM
on some number of hosts with fault tolerant features". Even if
you lost one, you didn't lose the data. That's one approach.

My recommendation would be to use an inode FS as a variable
granularity block store, and use that for storing messages.

> However, there's nothing I can do about synchronous meta-data
> issues with the network & filesystem implementation of the NFS
> server, and any related problems with the NFS client.

Not if you constrain yourself to NFS, there isn't, I agree.

[ ... ]
> > Maildir is a kludge aound NFS locking. Nothing more, and nothing
> > less.
>
> Yup. And I'm convinced that it introduces more problems than it
> solves. But I still don't have much choice.

If you're convinced, then you should be doing something else. 8-(.

> > MS Exchange does, and so does Lotus Notes. I know they suck, but
> > they are examples.
>
> They're not IMAP servers. They are proprietary LAN e-mail
> systems that may happen to have an interface to this alien IMAP
> protocol.

They both have "IMAP connectors", actually.

> > Who's using mmap?!?
>
> Cyrus. All those databases it keeps to help inform it what the
> status is of the various messages, etc... are using mmap to access
> the information inside the database files. Or are you not familiar
> with the method of operation of tools like Berkeley DB?

This is an artifact of using the new Sleepycat code. You can
actually compile it to use the older code, which can be made to
not use mmap.

[ ... I, too, have fond/nightmarish memories of MMDF ... ]

> SIMS and Netscape/iPlanet mail server are dead-end products.
> Scott McNealy was very unpleasantly surprised when the Sun Europe
> guys sprung SIMS on him, and it is definitely going the way of the
> dodo. Note that Sun is a major investor in Sendmail, Inc. and they
> have on their payroll one of the key members of the Sendmail
> Consortium.

I like sendmail, and I like their people. In general, though, I
would say that they are still looking for their commercial market,
so this is less impressive to me than it would be otherwise.

> > 40 seconds to transfer on a Gigabit ethernet... assuming you can get
> > it of the disks. 8-). Do you really expect them all simultaneously?
>
> Not a one of these machines has GigaBit Ethernet. They all have
> 100Base-TX FastEthernet, and the front-end machines may also have a
> second 100Base-TX FastEthernet interface (if I can scrounge a couple
> of NICs).

That's all to the good: by pushing it from 40 seconds to ~8 minutes,
you favor my argument that the operation is network bound.

> The big problem is that most of the users will also have
> 100Base-TX FastEthernet. It won't take too many of them trying to
> access the server at once to completely swamp it.

That's a server stack implementation issue, if it's an issue for
you. There are boxes you can buy or build to perform QoS that
will deal with that issue.

> > You don't need to assert a lock over NFS, if the only machine doing
> > the reading is the one doing the writing, and it asserts the lock
> > locally (this was more talking about the Cyrus cache files, not
> > maildir).
>
> This assumes that there is only one machine ever writing to a
> particular mailbox. This is not a valid assumption.

Yes, it is. If you read previous postings, I suggested that the
bastion SMTP server would forward the messages to the IMAP server
that will in the future serve them, in order to permit local
delivery. It doesn't solve the replication issue, but it solves
your locking issue. 8-).

- -- Terry

------------------------------

Date: Sat, 15 Feb 2003 02:55:54 +0100
From: Brad Knowles <brad.k...@skynet.be>
Subject: Re: Email push and pull (was Re: matthew dillon)

At 3:08 PM -0800 2003/02/14, Terry Lambert wrote:

> I've got to say that on any mail server I've ever worked on, the
> limitation on what it could handle was *never* disk I/O, unless
> it was before trivial tuning had been done. It was always network
> I/O, bus bandwidth, or CPU.

I have yet to see a mail system that had limitations in any of
these areas, and other than the ones you've seen, I have yet to hear
of one such a mail system that any other mail systems expert has ever
seen that I have talked to -- including the AOL mail system.

> As far as I'm concerned, for most applications, "disks are fast
> enough"; even an IDE disk doing thermal recalibration can keep
> up with full frame rate digitized video for simultaneous record
> and playback. 5 of those, and you're at the limit of 100Mbit
> ethernet in and out, and you need 5 for RAID.

If all you ever care about is bandwidth and not latency, and you
really do get the bandwidth numbers claimed by the manufacturer as
opposed to the ones that we tend to see in the real world, I might
possibly believe this. Of course, I have yet to hear of a
theoretical application where this would be the case, but I am
willing to concede the point that such a thing might perhaps exist.

> FWIW, since you really don't give a damn about timestamps on the
> queue files in question, etc., you can relax POSIX guarantees on
> certain metadata updates that were put there to make the DEC VMS
> engineers happy by slowing down UNIX, relative to VMS, so they
> would not file lawsuit from POSIX being required for gvernment
> contracts.

In what way don't you care about timestamps? And which
timestamps don't we care about? Are you talking about noatime, or
something else? Note that noatime doesn't exist for NFS mounts, at
least it's not one I have been able to specify on our systems.

> Because those are not magically a consequence of increased complexity.
> Complexity can be managed.

At what cost?

> And they are back to transmitting 1 copy each,

If they're putting the recipient name into the body of the e-mail
message, then they're doing that anyway. Since they don't care about
whether any of their spam is lost, they can run from memory-based
filesystems. They can generate orders of magnitude more traffic than
you could handle on the same hardware, simply because they don't have
to worry what happens if the system crashes. Moreover, they can use
open relays and high-volume spam-sending networks to further increase
their amplitude.

> Only if the messages came in on seperate sessions, and they are back
> to transmitting 1 copy each, and they lose their amplification effect
> in any attack.

See above. Using SIS hurts you far more than it could possibly hurt them.

>> So, you're right back where you started, and yet you've paid such
>> a very high price.
>
> It's a price you have to pay anyway.

No, you don't.

> You mean "simply because we, the provider, failed to protect them
> from a DOS".

On user's DOS is another user's normal level of e-mail. It's
impossible to protect them from DOS at that level, because you cannot
possibly know, a priori, what is a DOS for which person. At higher
levels, you can detect a DOS and at least delay it by having circuit
breakers, such as quotas.

> OK, that's a 25% reduction in the metadata overhead required,
> which is what you claim is the bottleneck. That doesn't look
> insignificant to me.

Read the slides again. It doesn't reduce the meta-data overhead
at all, only the data bandwidth required. Using ln to create a hard
link to another file requires just as much synchronous meta-data
overhead as it does to create the file in the first place -- the only
difference is that you didn't have to also store another copy of the
file contents.

However, as we said before, storing a copy of the file contents
is cheap -- what kills us is the synchronous meta-data overhead.

> My argument would be that this should be handled out-of-band
> via a feedback mechanism, rather than in-band via an EQUOTA,
> using the quota as a ffeedback mechanism.

What kind of out-of-band mechanism did you have in mind? Are we
re-inventing the entirety of Internet e-mail yet once again?

> You're going to do that to the user anyway. Worse, you are going
> to give them a mailbox full of DOS crap, and drop good messages
> in the toilet (you've taken responsibility for the delivery, so
> the sender may not even have them any more, so when you drop them
> after the 4 days, they are screwed;

As soon as the user notices the overflowing mailbox, they can
call the helpdesk and the people on the helpdesk have tools available
to them to do mass cleanup, and avoid the problem for the user to
deal with this problem. That gives them seven days to notice the
problem and fix it, before things might start bouncing. We will
likewise have daily monitoring processes that will set off alarms if
a mailbox overflows, so that we can go take a look at it immediately.

>> Bait not taken. The customer is paying me to implement quotas.
>> This is a basic requirement.
>
> This is likely the source of the disconnect. I view the person
> whose mail I'm taking responsibility for, as the customer.

The users don't pay my salary. The customer does. I do
everything I can to help the users in every way possible, but when it
comes down to a choice of whether to do A or B, the customer decides
- -- not the users.

> You wouldn't implement an out-of-band mechanism instead?

Not at the price of re-inventing the entirety of Internet e-mail, no.

> You'd
> insist on the in-band mechanism of a MDA error, after you've
> already accepted responsibility for the message you aren't going
> to be able to deliver?

The message was accepted and delivered to stable storage, and we
would have done the best we could possibly do to actually deliver it
to the user's mailbox. However, that's a gateway function -- the
users mailbox doesn't speak SMTP, and therefore we would have
fulfilled all of our required duties, to the best of our ability. No
one has any right to expect any better.

> You *will* drop stuff in /dev/null. Any queue entries you remove
> are dropped in /dev/null.

They're not removed or dropped in /dev/null. I don't know where
you pulled that out of your hat, but on our real-world mail systems
we would generate a bounce message.

> My recommendation would be to use an inode FS as a variable
> granularity block store, and use that for storing messages.

It must be nice to be in a place where you can afford the luxury
of contemplating completely re-writing the filesystem code, or even
the entire OS.

> Not if you constrain yourself to NFS, there isn't, I agree.

Not my decision. I wasn't given a choice.

> If you're convinced, then you should be doing something else. 8-(.

I wish I could be. Not my decision. I wasn't given a choice.

> This is an artifact of using the new Sleepycat code. You can
> actually compile it to use the older code, which can be made to
> not use mmap.

As of what version is this still possible? How far back do you
have to go? And are you sure that Cyrus would still work with that?

Certainly, when it comes to SAMS, all this stuff is pre-compiled
and you don't get the option of building Berkeley DB in a different
manner, etc....

> I like sendmail, and I like their people. In general, though, I
> would say that they are still looking for their commercial market,
> so this is less impressive to me than it would be otherwise.

They're going after the large ISP/ASP and the large
corporate/Enterprise markets. However, their marketing & public
relations could use some improvement.

> That's all to the good: by pushing it from 40 seconds to ~8 minutes,
> you favor my argument that the operation is network bound.

Indirectly, perhaps. The real limitations is in the NFS
implementation on the server, including how it handles synchronous
meta-data updates. A major secondary factor is the client NFS
implementation.

> Yes, it is. If you read previous postings, I suggested that the
> bastion SMTP server would forward the messages to the IMAP server
> that will in the future serve them, in order to permit local
> delivery.

There will be a designated primary server for a give mailbox, but
any of the other back-end servers could potentially also receive a
request for delivery or access to the same mailbox. Our hope is that
99% of all requests will go through the designated primary (for
obvious performance reasons), but we cannot currently design the
system so that *only* the designated back-end server is allowed to
serve that particular mailbox.

- --
Brad Knowles, <brad.k...@skynet.be>

"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety."
-Benjamin Franklin, Historical Review of Pennsylvania.

------------------------------

Date: Fri, 14 Feb 2003 20:04:38 -0600
From: "pura life CR" <pural...@hotmail.com>
Subject: 2 Misc questions

Hi, I have a couple of question that dont allow me to sleep properly.
here we go:
1. is this suid root code exploitable with a buffer overflow technique:
/* foo.c */
main(int argc, char *argv[]){
...
setuid(0)
...
if ( ((strcmp(argv[i],"foo")) == 0)
|| ((strcmp(argv[i],"bar")) == 0) )
....
}

2. how can I redirect stderr to /dev/null? for example when I am 'finding' a
file in the whole dir tree I dont want to look at the "permised denied"
warning.
ej: find / -name "foo" -print > /dev/null & <--- how to redict stderr

that's all for now....

_________________________________________________________________

------------------------------

Date: Fri, 14 Feb 2003 18:41:17 -0800
From: Terry Lambert <tlam...@mindspring.com>
Subject: Re: 2 Misc questions

pura life CR wrote:
> 1. is this suid root code exploitable with a buffer overflow technique:
> /* foo.c */
> main(int argc, char *argv[]){
> ...
> setuid(0)
> ...
> if ( ((strcmp(argv[i],"foo")) == 0)
> || ((strcmp(argv[i],"bar")) == 0) )
> ....
> }

I depends on what's in the second "..." or "....". 8-) 8-). The
strcmp's along are not explotable, since both compares stop at the
4th byte in.

> 2. how can I redirect stderr to /dev/null? for example when I am 'finding' a
> file in the whole dir tree I dont want to look at the "permised denied"
> warning.
> ej: find / -name "foo" -print > /dev/null & <--- how to redict stderr
>
> that's all for now....

Depends on the shell. For /bin/sh, for example, it's:

find / -name "foo" -print > /dev/null 2>&1 &

See the man page for the shell you are using for information specific
to that shell.

- -- Terry

------------------------------

Date: Fri, 14 Feb 2003 18:54:37 -0800
From: David Schultz <dsch...@uclink.Berkeley.EDU>
Subject: Re: 2 Misc questions

Thus spake pura life CR <pural...@hotmail.com>:
> Hi, I have a couple of question that dont allow me to sleep properly.
> here we go:
> 1. is this suid root code exploitable with a buffer overflow technique:
> /* foo.c */
> main(int argc, char *argv[]){
> ...
> setuid(0)
> ...
> if ( ((strcmp(argv[i],"foo")) == 0)
> || ((strcmp(argv[i],"bar")) == 0) )
> ....
> }

No, but write another few thousand lines and we'll see...

> 2. how can I redirect stderr to /dev/null? for example when I am 'finding'
> a file in the whole dir tree I dont want to look at the "permised denied"
> warning.
> ej: find / -name "foo" -print > /dev/null & <--- how to redict stderr

In the C shell, you can't do it in a direct way. You have to say:

( my-command > /dev/tty ) >& /dev/null

In the Bourne shell, you just say:

my-command 2>/dev/null

------------------------------

Date: Fri, 14 Feb 2003 19:37:02 -0800
From: Terry Lambert <tlam...@mindspring.com>
Subject: Re: Email push and pull (was Re: matthew dillon)

Brad Knowles wrote:
> At 3:08 PM -0800 2003/02/14, Terry Lambert wrote:
> > I've got to say that on any mail server I've ever worked on, the
> > limitation on what it could handle was *never* disk I/O, unless
> > it was before trivial tuning had been done. It was always network
> > I/O, bus bandwidth, or CPU.
>
> I have yet to see a mail system that had limitations in any of
> these areas, and other than the ones you've seen, I have yet to hear
> of one such a mail system that any other mail systems expert has ever
> seen that I have talked to -- including the AOL mail system.

I like to build systems that are inherently scalable, so that you
can just throw more resources at them. That means I'm usually
more concerned with breaking the ties between where the data resides
and where it gets read/written from. For the 1 machine per every
50,000 domain units case we dealt with, the machines in question were
2x166MHz PPC 604's. I guess if the machines were 3 GHz P4's or
something, then my perspective might be different... though probably
not much, since such systems would not have had a crossbar bus, and
there are stalls that happen from CPU clock multipliers which I did
not have to deal with.

The pipes in and out have always been on the order of T1 to T3, as
well, with an assumption of 50% bandwidth in, and 50% bandwidth
out. In no case was I interested in the leaf node server, unless
it was on the other end of a (comparatively) low bandwidth link.

The only thing that concerned me about the disk was pool retention
time for time-in-queue for the main queue depth. I was only
concerned about that for iteration, not lookup: for lookup, the
system in question had a btree structured directory implementation,
so lookups were always O(log2(N)+1). Even then, since the data was
being moved to per domain queues, the main mail queue never ended
up getting very deep, even when the server was saturated at 100Mbit,
which was "worst case" (if the queue had started growing and continued
growing at that point, I would have had to throttle input rate to
avoid a queue livelock state, and ensure the drain rate was >= the
fill rate).

The FS's were AIX JFS, striped, on 6 10,000 RPM SCSI spindles.

I can imagine there being problems with UFS in a similar circumstance,
unless you carefully designed your usage semantics to ensure that you
would not introduce stalls through your choice of FS. Even so,
however, I think that it's possible to implement in such a way as
to avoid the stalls.

But my problems were *always* network saturation before hitting
any other limit (even CPU).

> > As far as I'm concerned, for most applications, "disks are fast
> > enough"; even an IDE disk doing thermal recalibration can keep
> > up with full frame rate digitized video for simultaneous record
> > and playback. 5 of those, and you're at the limit of 100Mbit
> > ethernet in and out, and you need 5 for RAID.
>
> If all you ever care about is bandwidth and not latency, and you
> really do get the bandwidth numbers claimed by the manufacturer as
> opposed to the ones that we tend to see in the real world, I might
> possibly believe this. Of course, I have yet to hear of a
> theoretical application where this would be the case, but I am
> willing to concede the point that such a thing might perhaps exist.

8-).

> > FWIW, since you really don't give a damn about timestamps on the
> > queue files in question, etc., you can relax POSIX guarantees on
> > certain metadata updates that were put there to make the DEC VMS
> > engineers happy by slowing down UNIX, relative to VMS, so they
> > would not file lawsuit from POSIX being required for gvernment
> > contracts.
>
> In what way don't you care about timestamps? And which
> timestamps don't we care about? Are you talking about noatime, or
> something else? Note that noatime doesn't exist for NFS mounts, at
> least it's not one I have been able to specify on our systems.

You have to specify it on the NFS server system, or you specify it
on the NFS client system, so the transactions are not attempted, if
you care about the request going over the wire and being ignored,
rather than just being ignored. Since the issue you are personally
fighting is write latency for requests you should probably not be
making (8-)), you ought to work hard on this.

If your NFS client systems are FreeBSD, then it's a fairly minor
hack to add the option the the NFS client code. If your NFS server
is FreeBSD, then turning it off on the exported mount is probably
sufficient.

If neither is FreeBSD... well, can you switch to FreeBSD?

> > Because those are not magically a consequence of increased complexity.
> > Complexity can be managed.
>
> At what cost?

Engineering.

> > And they are back to transmitting 1 copy each,
>
> If they're putting the recipient name into the body of the e-mail
> message, then they're doing that anyway. Since they don't care about
> whether any of their spam is lost, they can run from memory-based
> filesystems. They can generate orders of magnitude more traffic than
> you could handle on the same hardware, simply because they don't have
> to worry what happens if the system crashes. Moreover, they can use
> open relays and high-volume spam-sending networks to further increase
> their amplitude.

The point is not what they can do, it's what you can do. You;ve
already admitted a 1.3x multiple recipient rate.

> > Only if the messages came in on seperate sessions, and they are back
> > to transmitting 1 copy each, and they lose their amplification effect
> > in any attack.
>
> See above. Using SIS hurts you far more than it could possibly hurt them.

It's not intended to "hurt them". It's only intended to deal with
mutiple recipients for a single message. SPAM is almost the only
type of mail that's externally generated that gets multiple recipient
targets. The point is not to "hurt" them (if you wanted that, you
would run RBL or ORBS or SPEWS or ... and not accept connections from
their servers in the first place), but to mitigate their effect on
your storage costs. Note that this is the same philosophy you've been
espousing all along, with quotas: you don't care if it causes a problem
for your users, only if it causes a problem for you.

Internally, you have a higher connectedness between users, so you
get much larger than your 1.3 multiplier, and for email lists, it's
higher still. In fact, I would go so far as to say that DJB's idea
of sending a reference is applicable to email list messages, only
the messages would be stored on the list server, instead of on the
sender machine. In fact, there are MIME types for this, and it
would be really useful for any list which intends to archive its
content anyway. 8-).

> >> So, you're right back where you started, and yet you've paid such
> >> a very high price.
> >
> > It's a price you have to pay anyway.
>
> No, you don't.

At the point that you no longer care which machine you send a user
connection to to retrieve their mail, then you no longer care where
you send the mail, or if the mail is single instance multiple
time, a real replica, or a virtual replica (SIS). It takes a small
amount of additional work.

> > You mean "simply because we, the provider, failed to protect them
> > from a DOS".
>
> On user's DOS is another user's normal level of e-mail. It's
> impossible to protect them from DOS at that level, because you cannot
> possibly know, a priori, what is a DOS for which person. At higher
> levels, you can detect a DOS and at least delay it by having circuit
> breakers, such as quotas.

The repeated mailing ("mail bombing") that started this thread is,
or should be, simple to detect.

Yes, it's a trivial case, but it's the most common case. You don't
have to go to a compute-intensive technique to deal with it.

> > OK, that's a 25% reduction in the metadata overhead required,
> > which is what you claim is the bottleneck. That doesn't look
> > insignificant to me.
>
> Read the slides again. It doesn't reduce the meta-data overhead
> at all, only the data bandwidth required. Using ln to create a hard
> link to another file requires just as much synchronous meta-data
> overhead as it does to create the file in the first place -- the only
> difference is that you didn't have to also store another copy of the
> file contents.

You are storing the reference wrong. Use an encapsulated reference,
not a hard link. That will permit the metadata operations to occur
simultaneously, instead of constraining them to occur serially, like
a link does. In many of the systems I've seen, where the domain
name is used as an index into a hashed directory structure, you would
not be able to hard link in any case, since the link targets would be
on different physical FS's.

> However, as we said before, storing a copy of the file contents
> is cheap -- what kills us is the synchronous meta-data overhead.

You keep saying this, and then you keep arranging the situation
(order of operations, FS backing store, networ transport protocol,
etc.) so that it's true, instead of trying to arrange them so it
isn't.

> > My argument would be that this should be handled out-of-band
> > via a feedback mechanism, rather than in-band via an EQUOTA,
> > using the quota as a ffeedback mechanism.
>
> What kind of out-of-band mechanism did you have in mind? Are we
> re-inventing the entirety of Internet e-mail yet once again?

No, we are not. The transport protocols are the transport protocols,
and you are constrained to implement to the transport protocols, no
matter what else you do. But you are not constrained to depend on
rename-based two phase commits (for example), if your FS or data
store exports a transaction interface for use by applications: you
can use that transaction interface instead.

> > You're going to do that to the user anyway. Worse, you are going
> > to give them a mailbox full of DOS crap, and drop good messages
> > in the toilet (you've taken responsibility for the delivery, so
> > the sender may not even have them any more, so when you drop them
> > after the 4 days, they are screwed;
>
> As soon as the user notices the overflowing mailbox, they can
> call the helpdesk and the people on the helpdesk have tools available
> to them to do mass cleanup, and avoid the problem for the user to
> deal with this problem. That gives them seven days to notice the
> problem and fix it, before things might start bouncing. We will
> likewise have daily monitoring processes that will set off alarms if
> a mailbox overflows, so that we can go take a look at it immediately.

So your queue return time is 7 days.

I have to say, I've personally delt with "help desk" escalations
for problems like this, and it's incredibly labor intensive. You
should always design as if you were going to have to deal with
100,000 customers or more, so that you put yourself in a position
that manual processes will not scale, and then think about the
problem.

> >> Bait not taken. The customer is paying me to implement quotas.
> >> This is a basic requirement.
> >
> > This is likely the source of the disconnect. I view the person
> > whose mail I'm taking responsibility for, as the customer.
>
> The users don't pay my salary. The customer does. I do
> everything I can to help the users in every way possible, but when it
> comes down to a choice of whether to do A or B, the customer decides
> -- not the users.

Which explains the general level of user satisfaction with this
industry, according to a refcent survey, I think. 8-) 8-).

> > You wouldn't implement an out-of-band mechanism instead?
>
> Not at the price of re-inventing the entirety of Internet e-mail, no.

Something simple like recognizing repetitive size/sender/subject
pairing on the SMTP transit server.

> > You'd
> > insist on the in-band mechanism of a MDA error, after you've
> > already accepted responsibility for the message you aren't going
> > to be able to deliver?
>
> The message was accepted and delivered to stable storage, and we
> would have done the best we could possibly do to actually deliver it
> to the user's mailbox. However, that's a gateway function -- the
> users mailbox doesn't speak SMTP, and therefore we would have
> fulfilled all of our required duties, to the best of our ability. No
> one has any right to expect any better.

Ugh. Would you, as a user, bet your comapny on that level of service?

> > You *will* drop stuff in /dev/null. Any queue entries you remove
> > are dropped in /dev/null.
>
> They're not removed or dropped in /dev/null. I don't know where
> you pulled that out of your hat, but on our real-world mail systems
> we would generate a bounce message.

And send it to "<>", if it were a bounce for a DSN?

> > My recommendation would be to use an inode FS as a variable
> > granularity block store, and use that for storing messages.
>
> It must be nice to be in a place where you can afford the luxury
> of contemplating completely re-writing the filesystem code, or even
> the entire OS.

You mean the FreeBSD-chat mailing list? 8-) 8-). That capability
is one of the reasons people participate in the FreeBSD project.

> Not my decision. I wasn't given a choice.
[ ... ]
> I wish I could be. Not my decision. I wasn't given a choice.

So the cowboy tells his friend he'll be right back, and rides to town
to talk to the doctor. The doctor is in the middle of a delicate
surgery, but pauses long enough to tell the cowboy that he'll have to
cut X's over the bite marks, and then suck the poison out. The cowboy
rushes back to his friend and say "Bad news, Clem; Doc say's you're
going to die!".

8-) 8-).

> > This is an artifact of using the new Sleepycat code. You can
> > actually compile it to use the older code, which can be made to
> > not use mmap.
>
> As of what version is this still possible? How far back do you
> have to go? And are you sure that Cyrus would still work with that?

2.8. It's not like OpenLDAP, which needs the transactioning interfaces,
it's pretty straight-forward code.

> Certainly, when it comes to SAMS, all this stuff is pre-compiled
> and you don't get the option of building Berkeley DB in a different
> manner, etc....

Yes, you end up having to compile things yourself.

> > That's all to the good: by pushing it from 40 seconds to ~8 minutes,
> > you favor my argument that the operation is network bound.
>
> Indirectly, perhaps. The real limitations is in the NFS
> implementation on the server, including how it handles synchronous
> meta-data updates. A major secondary factor is the client NFS
> implementation.

If you have control over the clients, you can avoid making update
requests. If you have no control over either, well, "Bad news, Clem".

> > Yes, it is. If you read previous postings, I suggested that the
> > bastion SMTP server would forward the messages to the IMAP server
> > that will in the future serve them, in order to permit local
> > delivery.
>
> There will be a designated primary server for a give mailbox, but
> any of the other back-end servers could potentially also receive a
> request for delivery or access to the same mailbox. Our hope is that
> 99% of all requests will go through the designated primary (for
> obvious performance reasons), but we cannot currently design the
> system so that *only* the designated back-end server is allowed to
> serve that particular mailbox.

Not unless you are willing to accept "hot fail over" as a strategy to
use in place of replication.

Though... you *could* allow any of the replicas to accept and queue
on behalf of the primary, but then deliver only to the primary;
presumably you'd be able to replace a primary in 7 days.

- -- Terry

------------------------------

End of freebsd-chat-digest V5 #705
**********************************

To Unsubscribe: send mail to majo...@FreeBSD.org
with unsubscribe freebsd-chat-digest in the body of the message

0 new messages