Outbound email rate slows when inbound rate is high

Dave Sill

unread,

Sep 30, 2003, 8:35:40 AM9/30/03

to

> Here is what I'm seeing: I have a large email list (220K emails) that
> I am sending mail to. This is my companies weekly email newsletter to
> customers all over the net. Currently we use an email client to send
> the mail into qmail like any other client, then qmail relays the mail
> out to the net.

Before I help you, can you convince me you're not a spammer?

--
Dave Sill Oak Ridge National Lab, Workstation Support
Author, The qmail Handbook <http://web.infoave.net/~dsill>
<http://lifewithqmail.org/>: Almost everything you always wanted to know.

Mike

unread,

Sep 30, 2003, 4:36:21 PM9/30/03

to

Dave Sill <MaxFr...@sws5.ornl.gov> wrote in message news:<wx03cee...@sws5.ornl.gov>...

> t4m6...@sneakemail.com (Mike) writes:
>
> > Here is what I'm seeing: I have a large email list (220K emails) that
> > I am sending mail to. This is my companies weekly email newsletter to
> > customers all over the net. Currently we use an email client to send
> > the mail into qmail like any other client, then qmail relays the mail
> > out to the net.
>
> Before I help you, can you convince me you're not a spammer?

Sure... sign up for our newsletter! ;-)

I use sneakemail because of the instantaneous spam that I get when
posting to groups... here is my contact info, just remove the dashes
m-m-c-c-a-r-n-@-d-w-r-.-c-o-m
The company I work for is Design within Reach, we sell furniture:
www.dwr.com

Folks sign up for our newsletter that we send out every wednesday.

If you need more proof, give me a call: 510.433.3022.

FYI: I have set by conf-split to 503, up from 200. I suspect this may
help a bit. I have also been trying to figure out how to change my
ext3 filesystem from journaling mode Ordered to writeback. My research
has indicated that this should improve filesysem response time. I
haven't been able to get this to work yet... using RedHat 9. So if
anyone can help out there, I'd appreciate it as well.

Thanks,
Mike

Dave Sill

unread,

Oct 1, 2003, 9:50:08 AM10/1/03

to

t4m6...@sneakemail.com (Mike) writes:

> Dave Sill <MaxFr...@sws5.ornl.gov> wrote in message news:<wx03cee...@sws5.ornl.gov>...
>>

>> Before I help you, can you convince me you're not a spammer?
>
> Sure...

OK, thanks. Sorry I had to do that.

> FYI: I have set by conf-split to 503, up from 200. I suspect this may
> help a bit.

You mean conf-spawn? I presume you raised concurrencyremote,
too. Either way, I don't think that'll help much.

> I have also been trying to figure out how to change my
> ext3 filesystem from journaling mode Ordered to writeback. My research
> has indicated that this should improve filesysem response time. I
> haven't been able to get this to work yet... using RedHat 9. So if
> anyone can help out there, I'd appreciate it as well.

The data=writeback mount option should do it, but I don't think you
want to use that on the queue because it can cause problems if the
system crashes. See:

http://lifewithqmail.org/lwq.html#system-requirements

For more information about filesystem requirements for reliable qmail
operation.

Also, have you seen:

http://untroubled.org/benchmarking/

Now, back to your previous message. You wrote:

> What I am seeing is that the rate at which qmail is sending outbound
> email is very low, about 1/10 the rate when all incoming mail has
> ceased. It takes around 2.5 hours for all the emails to get into the
> system right now. So this slower rate goes on for over two hours right
> now. The CPU load is not maxed out and disk I/O looks to be OK as
> well. Any suggestions on what the cause and cure?

Sounds like you're suffering from "silly qmail syndrome": the fact
that qmail-send is single-threaded and handles both the preprocessing
of new messages and the dispatching of queued messages. There's a
patch called qmail-todo that separates these two functions. You might
want to give it a try.

> I know this is not the optimal way of doing this task. Down the road I
> am thinking about writing a front end to format and inject the mail
> into qmail, or maybe into ezmlm... any suggestions about this?

So you're currently you're sending a separate message to each
recipient? Ouch. That's worse than suboptimal, it's pessimal. Using
ezmlm would be a huge performance win. Even injecting the message in
1000- or 100-recipient batches would help tremendously.

> Looking at the qmail.org section on patches for high-volume servers, I
> see the section about servers with queues that get over 23,000. It
> mentions changing conf-split and recomplining. My value for conf-split
> right now is 200... should I change this value to something like 499?

The rule of thumb is that for a non-directory-hashing filesystem, you
don't want much more than 1000 files/directory. With the default
conf-split of 23, that means ~23,000 messages in the queue. You're
queueing 220K messages, so something in the neighborhood of 200 should
be fine. However, to insure even distribution of files in the split
directories, conf-split should be a prime number. The next time you
send a message, take a look at number of files in each of the
queue/mess/* directories. If they're not fairly level, you might want
to fix that. Whether it's a problem or not depends upon how your
filesystem allocates inodes.

The real fix, of course, is to queue one message with 220k recipients
instead of 220K messages with one recipient.

> Thanks for the help gang... I'm clearly a newbie with qmail and I
> really appreciate the help.

As always, I'm glad to help.

Mike

unread,

Oct 6, 2003, 8:28:54 PM10/6/03

to

> The data=writeback mount option should do it, but I don't think you
> want to use that on the queue because it can cause problems if the
> system crashes. See:
>
> http://lifewithqmail.org/lwq.html#system-requirements
>
> For more information about filesystem requirements for reliable qmail
> operation.
>
> Also, have you seen:
>
> http://untroubled.org/benchmarking/
>

I will leave ext3 in ordered mode for reliability as you suggest.

> Now, back to your previous message. You wrote:
>
> > What I am seeing is that the rate at which qmail is sending outbound
> > email is very low, about 1/10 the rate when all incoming mail has
> > ceased. It takes around 2.5 hours for all the emails to get into the
> > system right now. So this slower rate goes on for over two hours right
> > now. The CPU load is not maxed out and disk I/O looks to be OK as
> > well. Any suggestions on what the cause and cure?
>
> Sounds like you're suffering from "silly qmail syndrome": the fact
> that qmail-send is single-threaded and handles both the preprocessing
> of new messages and the dispatching of queued messages. There's a
> patch called qmail-todo that separates these two functions. You might
> want to give it a try.

I have installed the patch called exttodo patch. It fits the bill as
far as the patch you described... but the name is different. Is this
the correct patch?

The following line is in the directions, but I don't think I need to
do anything more than run patch and re-compile (make setup check)
qmail, without defining EXTERNAL_TODO since I am not using LDAP with
qmail. Am I correct in this assumption?

+ To enable the exttodo patch you need to define EXTERNAL_TODO while
compiling
+ qmail(-ldap) this can be done with the -D flag of cc (e.g. cc
-DEXTERNAL_TODO).

The patch seems to have taken. When I ps aux | grep qmail, I see
qmail-todo in the list.

I'll send emails out on Wed using this config, so I'll let you all
know how it worked.

>stuff deleted...

> The real fix, of course, is to queue one message with 220k recipients
> instead of 220K messages with one recipient.
>
> > Thanks for the help gang... I'm clearly a newbie with qmail and I
> > really appreciate the help.
>
> As always, I'm glad to help.

I'll be looking into messing around with using qmail-inject and
running some more tests in the coming weeks. Clearly our current
method is not very efficient.

Thanks again Dave!
I appreciate all the help.

-Mike

Dave Sill

unread,

Oct 7, 2003, 9:37:34 AM10/7/03

to

t4m6...@sneakemail.com (Mike) writes:

> I have installed the patch called exttodo patch. It fits the bill as
> far as the patch you described... but the name is different. Is this
> the correct patch?

Yes, sorry, forgot the name of the patch was different than the name
of the module it adds.

> The following line is in the directions, but I don't think I need to
> do anything more than run patch and re-compile (make setup check)
> qmail, without defining EXTERNAL_TODO since I am not using LDAP with
> qmail. Am I correct in this assumption?
>
> + To enable the exttodo patch you need to define EXTERNAL_TODO while
> compiling
> + qmail(-ldap) this can be done with the -D flag of cc (e.g. cc
> -DEXTERNAL_TODO).
>
> The patch seems to have taken. When I ps aux | grep qmail, I see
> qmail-todo in the list.

I you need EXTERNAL_TODO defined in order to get qmail-todo, with or
without qmail-ldap. But since you see qmail-todo running, it must be
defined by default with the patch.

Mike

unread,

Oct 9, 2003, 5:25:52 PM10/9/03

to

Dave Sill <MaxFr...@sws5.ornl.gov> wrote in message news:<wx0smm5...@sws5.ornl.gov>...

> t4m6...@sneakemail.com (Mike) writes:
>
> > I have installed the patch called exttodo patch. It fits the bill as
> > far as the patch you described... but the name is different. Is this
> > the correct patch?

> I you need EXTERNAL_TODO defined in order to get qmail-todo, with or

> without qmail-ldap. But since you see qmail-todo running, it must be
> defined by default with the patch.

I defined this, just in case and recompiled.

The performance I saw did lead me to believe that the patch worked
somewhat as outbound rates were higher this time. However, the data
doesn't look like I thought it would/should. See a snapshot of
qmailMRTG:
http://www.dwr.com/images/misc/qmail_stats.gif

Looking at the charts, the first thing is that the Queue data is way
off due to the patch ext-todo. this is documented, so don't worry
about that, although I think it shows in general what is going on with
the queue. It levels off at under 160K messages, when we had 220K in
at the peak.

Looking at the Bytes transfered graph, it appears to me that most
messages were done going out in bulk around 4am. However, the
"messages" and "Local/Remote Concurrency" graphs don't seem to agree
with that. They look like most bulk mailing was done around 7:30??

Am I reading this wrong, have a config issue, or just plain don't know
what I'm doing?? ;-)

To conclude, I believe the patch ext-todo did help out as I am seeing
a higher outbound rate on the graph at starttime, 1700-2000, wich is
the time it took the lame client we are currently using to load up the
messages. Looking at the "Messages" graph I see a higher rate than
before... but not a rate that is good as the best rate. Same thing
goes for the Concurrency graph.

I could explain this performance as limitations of the disk I/O. All
that mail coming in, being processed... just a physical limitation. I
can live with that until I write something that uses qmail-inject. The
word is that we won't use EZMLM because our marketing team doesn't
want the list controlled in that manner.

Let me know if anyone has any comments.

Thanks again for all the help from Dave

Dave Sill

unread,

Oct 10, 2003, 8:47:06 AM10/10/03

to

t4m6...@sneakemail.com (Mike) writes:

> http://www.dwr.com/images/misc/qmail_stats.gif
>
> Looking at the charts, the first thing is that the Queue data is way
> off due to the patch ext-todo. this is documented, so don't worry
> about that, although I think it shows in general what is going on with
> the queue. It levels off at under 160K messages, when we had 220K in
> at the peak.
>
> Looking at the Bytes transfered graph, it appears to me that most
> messages were done going out in bulk around 4am. However, the
> "messages" and "Local/Remote Concurrency" graphs don't seem to agree
> with that. They look like most bulk mailing was done around 7:30??

I don't know what "messages" is showing--is it successful deliveries,
delivery attempts, or what? I'd expect "bytes transferred" to drop off
pretty quickly once each message has had a delivery attempt because,
obviously, only deferred messages remain. Of course, by the time the
last of your outgoing messages is tried the first time, many earlier
messages have multiple retries--and the retries will continue at a
pretty high rate for a while. That could explain the difference
between "bytes transferred" and "local/remote concurrency".

To tell how successful the ext-todo patch was, you'd have to compare
this graph with one done without that patch applied. I'd expect the
difference to be most obvious in the "local/remote concurrency" during
the phase when you're still injecting new messages, from 1700-1930.
With the patch, you maintained a concurrency of about 70, which isn't
too bad, really. I'd say your disk I/O was the limiting factor, here.

> To conclude, I believe the patch ext-todo did help out as I am seeing
> a higher outbound rate on the graph at starttime, 1700-2000, wich is
> the time it took the lame client we are currently using to load up the
> messages. Looking at the "Messages" graph I see a higher rate than
> before... but not a rate that is good as the best rate. Same thing
> goes for the Concurrency graph.

Surely you're hitting your concurrency limit. Why don't you raise it?
You'll probably need the big-concurrency patch--I use it on my own
list server with a concurrencyremote of 500. A delivery to a
2000-subscriber list (tiny by your standards) makes spikes on the
graphs like those on your SMTP graphs.

> I could explain this performance as limitations of the disk I/O. All
> that mail coming in, being processed... just a physical limitation. I
> can live with that until I write something that uses qmail-inject.

Yeah, I don't think you can expect much improvement during the
injection phase until you cut down the number of messages queued.

> The word is that we won't use EZMLM because our marketing team
> doesn't want the list controlled in that manner.

What does "in that manner" mean? What are the marketing team's
requirements/objections?

Mike

unread,

Oct 10, 2003, 3:05:23 PM10/10/03

to

> I don't know what "messages" is showing--is it successful deliveries,
> delivery attempts, or what? I'd expect "bytes transferred" to drop off
> pretty quickly once each message has had a delivery attempt because,
> obviously, only deferred messages remain. Of course, by the time the
> last of your outgoing messages is tried the first time, many earlier
> messages have multiple retries--and the retries will continue at a
> pretty high rate for a while. That could explain the difference
> between "bytes transferred" and "local/remote concurrency".
>

I believe that the "Messages" graph represents successfully delivered
messages. It also shows failed attempts in the drill down, but I
didn't supply that.

The odd thing to me regarding the data in the graphs is how Messages,
that I take to be successful messages, remain high even after the
total bytes numbers drop off. This could be bounced messages being
delivered... but that would be more than I would expect. In any case,
I'll muster up the raw data and look at it, it should be more telling
than the graphs in any case.

> To tell how successful the ext-todo patch was, you'd have to compare
> this graph with one done without that patch applied. I'd expect the
> difference to be most obvious in the "local/remote concurrency" during
> the phase when you're still injecting new messages, from 1700-1930.
> With the patch, you maintained a concurrency of about 70, which isn't
> too bad, really. I'd say your disk I/O was the limiting factor, here.

I agree. And outgoing performance is up by about 120% during the
initial phase.

> Surely you're hitting your concurrency limit. Why don't you raise it?
> You'll probably need the big-concurrency patch--I use it on my own
> list server with a concurrencyremote of 500. A delivery to a
> 2000-subscriber list (tiny by your standards) makes spikes on the
> graphs like those on your SMTP graphs.
>

I'm currently maxing out my bandwidth. If I move to a bigger pipe I
will increase concurrencyremote. I am seeing a lot of
"Connected_to_xx.xx.xx.xx_but_connection_died" errors, I wonder if
this is due to my concurrency limit being higher than my bandwidth
will support? Or does qmail have a mechanism to control this?

> Yeah, I don't think you can expect much improvement during the
> injection phase until you cut down the number of messages queued.

Agreed.

>
> What does "in that manner" mean? What are the marketing team's
> requirements/objections?

Our Marketing team wants to make sure that any customer concern is
looked at by a real person. They don't want any complaints to fall
through the cracks or have anyone accidentally remove themselves from
our newsletter. They feel more comfortable with our current method of
processing removes and complaints. We also manually remove permanent
failures that bounce back from two separate databases, if I could
create a system to do this via EZMLM they might change their mind.
This would require keeping remove requests, bounce-back info etc. in
an Oracle database. I haven't ran across any Oracle implementation for
EZMLM yet and I don't have the time right now or the understanding to
write it myself.

So, it might be the case the I have not presented EZMLM correctly due
to my misunderstanding its capabilities. If that is the case, I would
be happy to learn more about it.

Thanks again Dave!