Throughput tests with multiple nodes. Question about UDP data loss.

236 views
Skip to first unread message

Frank San Miguel

unread,
Jan 23, 2011, 5:55:54 PM1/23/11
to s4-project
Hi All,

I did some throughput testing and am surprised at the amount of data
loss even at low CPU utilization. I ran my test on various
configurations on Amazon EC2, including 2, 4 ,8, 10 and 20 "large"
instances. All in redbutton mode. I've documented a few of the tests
below.

I would have expected setting the max queue size for listener and pe
container to 2,000,000 would have reduced the error rate to very small
numbers, but I'm still seeing 25% data loss even while the CPU
utilization is only at 15%.

I have never used UDP before. Is this to be expected? Any advice
would be appreciated.

Thanks,
Frank

=======================================
Experiment Summary:
* 6 PE's organized in more or less a pipeline, with one * splitter PE.
* one adapter feeds 100,000 messages capable of rates ranging from
1,000 messages per second to 3500 messages per second.
* redbutton mode (no zookeeper)
* I set the max queue size for listener and pe container to 2,000,000

Test 1:
* all on local machine
* adapter feeds 100k messages at 3.5k/sec
* 65% CPU utilization
Results
* 444135 total internal messages
* no data loss
Totals:
PE 0, key=* inCount 100000 outCount 100000 peCount 1
PE 1, key=a inCount 100000 outCount 100000 peCount 13757
PE 2, key=b inCount 100000 outCount 48045 peCount 51955
PE 3, key=a inCount 48045 outCount 48045 peCount 4354
PE 4, key=a inCount 48045 outCount 48045 peCount 4354
PE 5, key=c inCount 48045 outCount 16649 peCount 19897

Test 2:
* four amazon large instances
* adapter feeds 100k messages at 3.5k/sec
* %15 CPU utilization
Results
* 335988 total internal messages
* 25% data loss
* No errors reported

Test 3:
* same as Test2, but feed at 1k/sec
* very low machine CPU utilization
Results
* 438791 total internal messages
* 1% data loss
* No errors reported

Other notes:
* The largest test was with 19 machines. 10 adapters, each feeding
100k inputs at 1000 inputs/sec for a total of 1M messages at 10k
messages/sec. Resulted in about 30% data loss while nodes were at
about 15% CPU utilization. No errors reported.
* I calculated totals by having each PE record messages in/out with a
singleton bean in each node's application context. Each singleton
periodically writes the totals to the log file.
* The fastest that a single adapter can feed messages (no sleeps) is
around 3500/sec.

Bruce Robbins

unread,
Jan 24, 2011, 5:08:50 PM1/24/11
to s4-project
It's very interesting that you have no data loss when you run the 3.5k
test on a single machine, but you have data loss when you run the same
test on multiple machines. Spreading amongst multiple nodes should
help, in the usual case.

It would be interesting to know a few more things:

- Was that single machine test run on an EC2 instance or a dedicated
box?
- Try running the command "netstat -su" on each instance before the
test and again after the test. The difference between the two "packet
receive errors" counts will indicate your UDP packet loss due to full
UDP buffer.
- In your logs, check for the strings "lll_dr" and "pec_dr" anywhere
in any of the logs (the two will never be on the same line). These
lines have the count of messages that were successfully received by
the listener thread but could not be used because some internal queue
filled up. Or you could just send us your logs.
- The queue sizes would be interesting. Look for the strings "lll_qsz"
and "pec_qsz" in the logs (again, the two will not be on the same
line).

> I have never used UDP before.  Is this to be expected?

Depends. To use a painfully inaccurate characterization, UDP is kind
of a push model in the sense that if the target node is not ready to
accept the message, the message could get lost. The chance it could
get lost depends on the message rate, message size, size of the UDP
buffer, and the maximum time the target node's listener thread is
starved for CPU (or otherwise busy doing something else, like putting
the previously read message on a queue).

I assume EC2 instances are not given dedicated resources, so it's
possible that resources are taken away from the listener thread often
enough or long enough to allow the UDP buffer to fill.

This can be potentially mitigated by increasing the size of the UDP
buffer to help with those cases where the listener thread is not given
enough resources. To increase the buffer size in the kernel, you first
following the instructions here:

http://www.29west.com/docs/THPM/udp-buffer-sizing.html

But then you also have to tell S4 to create the socket with that
buffer size as well. Currently, S4 will try to use a buffer size of
4194302. You can try changing the kernel size to match that and see if
it helps.

If you want to make the buffer size even bigger, you will need to set
the system property udp.buffer.size on the java call. If you're using
the binary distribution, you will need to edit the script to set this
property. If you are using the latest from github, you can pass
arbitrary java options via the -j option.

You could also send us your application and we can try it on our
boxes.

It's possible that for an virtualized environment, we'll need a
different commlayer. That is, one that uses TCP/IP. But I won't make
that judgement yet.


On Jan 23, 2:55 pm, Frank San Miguel <frank.sanmig...@gmail.com>
wrote:

Bruce Robbins

unread,
Jan 24, 2011, 6:37:37 PM1/24/11
to s4-project
By the way, also check for things that might cause the S4 nodes to
pause, like:

- swapping on the instances. When we've seen data loss during testing,
it's because we accidently defined a maximum JVM size that was bigger
or close to the real memory size of the machine. The JVM sometimes
grows to that maximum size even when there's lots of free heap space,
and that larger process caused swapping on the machine.
- Concurrent garbage collection mode failure, which would force a full
GC. This seems doubtful since you had available CPU resources, but it
could happen if the garbage collection thread cannot run fast enough
to reclaim memory before the heap fills, resulting in a full
collection. A full collection would pause the JVM. You have to turn on
GC logging to see if such a failure is happening: -Xloggc:<filename>

Frank San Miguel

unread,
Jan 26, 2011, 6:26:46 PM1/26/11
to s4-project
Bruce,

Thanks for your suggestions.

I've done more experiments and I think you are right that this is a
result of virtualization. Here's a recent research paper that talks
about this issue (http://www.ece.rice.edu/~pjv/mclock.pdf). It notes
that none of the current Hypervisors try to make reservations,
guarantees or limits on I/O.

I played around with ttcp and wrote a simple UDP client/server to test
this theory and it clearly shows ec2 has high packet loss and a very
wide variability (See Experiment 1 below).

Then I re-ran the original 6 PE test on three non-virtual nodes in a
lightly loaded network. With three machines I was able to achieve a
throughput of 10,000 messages/sec with 0.7% packet loss. No packet
loss at 3400 messages/sec.

All of this makes me think that you may be correct that TCP/IP is more
appropriate for virtual environments.

I've answered your other questions below.

Frank

===============================
Experiment 1 UDP packet loss comparison
* client sends some test inf 10,000 1k blocks as fast as it can
* server listens for UDP requests, compares with expected results and
prints the error rate

Test 1:
* two large instances on amazon EC2, us-east-1b region
* repeatedly run tests in sequence
* nothing else running on these machines

Results:
* very inconsistent packet loss

./udp.py --hostname ec2-72-44-54-70.compute-1.amazonaws.com -c client
--bufsize=1000 --count 10000 --delay 0
Expected 10000, got 4557, error rate 0.5443
Expected 10000, got 7588, error rate 0.2412
Expected 10000, got 7515, error rate 0.2485
Expected 10000, got 7104, error rate 0.2896
Expected 10000, got 592, error rate 0.9408
Expected 10000, got 5502, error rate 0.4498
Expected 10000, got 7727, error rate 0.2273
Expected 10000, got 7142, error rate 0.2858
Expected 10000, got 5796, error rate 0.4204
Expected 10000, got 5541, error rate 0.4459
Expected 10000, got 5810, error rate 0.419
Expected 10000, got 7314, error rate 0.2686

Test 2:
* two non-virtual intel dual core 64bit 3Ghz "developer" machines
* quiet network
* low CPU utilization
* repeatedly run tests in sequence

Results:
* very little loss

Expected 10000, got 9999, error rate 0.0001
finished, count: 10000
Expected 10000, got 9999, error rate 0.0001
finished, count: 10000
finished, count: 10000
finished, count: 10000
finished, count: 10000
finished, count: 10000

Test 3:
* two dual core AMD 2Ghz machinees, circa 2008
* busy network
* server with low CPU utilization (~99% free)
* repeatedly run tests in sequence

Results:
* very consistent packet loss ~ 17%

./udp.py -c server --hostname=10.19.136.80
listening on host 10.19.136.80, port 5077
Expected 10000, got 8152, error rate 0.1848
Expected 10000, got 8330, error rate 0.167
Expected 10000, got 8303, error rate 0.1697
Expected 10000, got 8310, error rate 0.169
Expected 10000, got 8308, error rate 0.1692
Expected 10000, got 8287, error rate 0.1713
Expected 10000, got 8285, error rate 0.1715
Expected 10000, got 8348, error rate 0.1652
Expected 10000, got 8070, error rate 0.193
Expected 10000, got 8119, error rate 0.1881


===============================
Answers to questions:
> - Was that single machine test run on an EC2 instance or a dedicated
> box?
A: the single machine test was on a dual core dell laptop running
WinXP running a VirtualBox Ubuntu guest OS.

> - Try running the command "netstat -su" on each instance before the
> test and again after the test. The difference between the two "packet
> receive errors" counts will indicate your UDP packet loss due to full
> UDP buffer.
I did that and saw packet loss (see NETSSTAT Test 1).

> - In your logs, check for the strings "lll_dr" and "pec_dr" anywhere
> in any of the logs (the two will never be on the same line). These
> lines have the count of messages that were successfully received by
> the listener thread but could not be used because some internal queue
> filled up. Or you could just send us your logs.
A: yes there was lost data (see NETSSTAT Test 1).

> - The queue sizes would be interesting. Look for the strings "lll_qsz"
> and "pec_qsz" in the logs (again, the two will not be on the same
> line).

========================================================
NETSSTAT Test 1:
* 1 EC2 instance
* 3400 messagessec
* 100,000 total messages
* queue size 8000

Before test: ---------------------------------
[ec2-user@ip-10-114-234-81 bin]$ netstat -su
IcmpMsg:
InType3: 8
OutType3: 2
Udp:
698702 packets received
2 packets to unknown port received.
13248 packet receive errors
757342 packets sent
RcvbufErrors: 13248
UdpLite:
IpExt:
InOctets: 348825372
OutOctets: 370021990

TEST RESULTS
PE 1 93843
PE 2 90368
PE 3 87267
PE 4 36025
PE 5 35032
PE 6 33880
TOTAL 376415, 15% loss

After test: ---------------------------------
[ec2-user@ip-10-114-234-81 bin]$ netstat -su
IcmpMsg:
InType3: 8
OutType3: 2
Udp:
1083037 packets received
2 packets to unknown port received.
21674 packet receive errors
1150104 packets sent
RcvbufErrors: 21674
UdpLite:
IpExt:
InOctets: 542270871
OutOctets: 564602787


Error stats:
grep pec_dr ../s4_core/logs/s4_core/*
../s4_core/logs/s4_core/s4_core.log:2011-01-25 02:12:29,177 monitor
INFO (Log4jMonitor.java:57) S4::S4CoreMetrics:pec_dr = 7890
../s4_core/logs/s4_core/s4_core.mon:2011-01-25 02:12:29,177 monitor
INFO (Log4jMonitor.java:57) S4::S4CoreMetrics:pec_dr = 7890

grep pec_qsz ../s4_core/logs/s4_core/*
../s4_core/logs/s4_core/s4_core.log:2011-01-25 02:12:29,179 monitor
INFO (Log4jMonitor.java:57) S4::S4CoreMetrics:pec_qsz = 6622
../s4_core/logs/s4_core/s4_core.log:2011-01-25 02:12:29,185 monitor
INFO (Log4jMonitor.java:57) S4::S4CoreMetrics:pec_qsz_w = 482
../s4_core/logs/s4_core/s4_core.log:2011-01-25 02:12:59,168 monitor
INFO (Log4jMonitor.java:57) S4::S4CoreMetrics:pec_qsz = 127
../s4_core/logs/s4_core/s4_core.log:2011-01-25 02:12:59,169 monitor
INFO (Log4jMonitor.java:57) S4::S4CoreMetrics:pec_qsz_w = 2958
../s4_core/logs/s4_core/s4_core.log:2011-01-25 02:13:29,166 monitor
INFO (Log4jMonitor.java:57) S4::S4CoreMetrics:pec_qsz_w = 0
../s4_core/logs/s4_core/s4_core.mon:2011-01-25 02:12:29,179 monitor
INFO (Log4jMonitor.java:57) S4::S4CoreMetrics:pec_qsz = 6622
../s4_core/logs/s4_core/s4_core.mon:2011-01-25 02:12:29,185 monitor
INFO (Log4jMonitor.java:57) S4::S4CoreMetrics:pec_qsz_w = 482
../s4_core/logs/s4_core/s4_core.mon:2011-01-25 02:12:59,168 monitor
INFO (Log4jMonitor.java:57) S4::S4CoreMetrics:pec_qsz = 127
../s4_core/logs/s4_core/s4_core.mon:2011-01-25 02:12:59,169 monitor
INFO (Log4jMonitor.java:57) S4::S4CoreMetrics:pec_qsz_w = 2958
../s4_core/logs/s4_core/s4_core.mon:2011-01-25 02:13:29,166 monitor
INFO (Log4jMonitor.java:57) S4::S4CoreMetrics:pec_qsz_w = 0

grep lll_qsz ../s4_core/logs/s4_core/*
../s4_core/logs/s4_core/s4_core.log:2011-01-25 02:12:29,186 monitor
INFO (Log4jMonitor.java:57) S4::S4CoreMetrics:lll_qsz = 16
../s4_core/logs/s4_core/s4_core.log:2011-01-25 02:12:59,170 monitor
INFO (Log4jMonitor.java:57) S4::S4CoreMetrics:lll_qsz = 1
../s4_core/logs/s4_core/s4_core.mon:2011-01-25 02:12:29,186 monitor
INFO (Log4jMonitor.java:57) S4::S4CoreMetrics:lll_qsz = 16
../s4_core/logs/s4_core/s4_core.mon:2011-01-25 02:12:59,170 monitor
INFO (Log4jMonitor.java:57) S4::S4CoreMetrics:lll_qsz = 1


Bruce Robbins

unread,
Feb 10, 2011, 1:29:34 PM2/10/11
to s4-project
Thanks for comparing UDP performance between various machines and
virtualized environments.

> All of this makes me think that you may be correct that TCP/IP is more
> appropriate for virtual environments.

My guess is that similar issues could occur when using TCP/IP, except
the messages would get dropped due to our own queues filling rather
than the kernel buffer filling. Say, for example, we have a sender
process and a receiver process. The sender process has a thread that
reads from a queue and writes to a socket. It also has other threads
for producing the messages that go on that queue. If the receiver is
slow to receive messages, then the sender's socket thread will block
(and I am speaking as a novice regarding TCP/IP, so correct me if I am
wrong). The other threads will continue to add messages to the queue.
If the receiver is very slow, the queue could grow so large as to eat
up all the sender's memory. If the queue has a maximum size, then
messages will get dropped once the queue hits that size.

That is, the problem moves from one queue to another.

In fact, something similar can happen with UDP as well. If you are
able to fix the issue with the packets getting lost due to a full
kernel buffer, for example by increasing the size of the kernel
buffer, then that often solves the problem. However, that might also
just move the problem to the receiver's queue (if the receiver's
listener thread simply moves messages to a queue for processing, but
the processing thread cannot keep up). Eventually messages may get
dropped because the receiver's queue is full.

It might be worth rerunning your simple UDP tests in the EC2
environment with different kernel buffer sizes (again, see
http://www.29west.com/docs/THPM/udp-buffer-sizing.html). Remember, if
you change the kernel buffer size, you need to tell the listener
socket, ala:

socket.setReceiveBufferSize(bufferSize);

Of course, I am not sure what EC2 does if a UDP packet is destined for
an instance, but that instance's kernel is not responding within a
reasonable period. If packets get lost there, then increasing the
buffer size may not help completely.


On Jan 26, 3:26 pm, Frank San Miguel <frank.sanmig...@gmail.com>
wrote:

Gianmarco

unread,
Feb 10, 2011, 1:43:22 PM2/10/11
to s4-project

> My guess is that similar issues could occur when using TCP/IP, except
> the messages would get dropped due to our own queues filling rather
> than the kernel buffer filling. Say, for example, we have a sender
> process and a receiver process. The sender process has a thread that
> reads from a queue and writes to a socket. It also has other threads
> for producing the messages that go on that queue. If the receiver is
> slow to receive messages, then the sender's socket thread will block
> (and I am speaking as a novice regarding TCP/IP, so correct me if I am
> wrong). The other threads will continue to add messages to the queue.
> If the receiver is very slow, the queue could grow so large as to eat
> up all the sender's memory. If the queue has a maximum size, then
> messages will get dropped once the queue hits that size.
>
> That is, the problem moves from one queue to another.
>

I think the problem you are describing here is the same that TCP flow
control solves already.
http://en.wikipedia.org/wiki/Transmission_Control_Protocol#Flow_control

Given an internal queue with finite size, the thread writing to the
queue should block when it is full.
At this point he will not read from the source socket and this will
create some back pressure that should
regulate the system automatically. Eventually, packets will be
dropped, but only at the input point of the system,
not inside it.

Leonardo Neumeyer

unread,
Feb 10, 2011, 5:21:59 PM2/10/11
to s4-pr...@googlegroups.com
Our thinking is that if you run out of CPU cycles to keep up with real-time, something will have to give. Ideally, the platform will intelligently distribute load so as to minimize impact on app performance. Using TCP will add overhead and complexity since now you need connections from every node to every node. TCP-based messages are still desirable for non-time critical, low bandwidth, control messages but not for massive message passing as long as the transmission error is kept below an acceptable minimum.

In this case, we think that the problem is that the kernel queue is too small. A larger queue should be able to absorb the short-term variations in capacity in a virtualized environment resulting in a much lower error.

-leo
--
-leo


Frank San Miguel

unread,
Feb 11, 2011, 9:30:15 AM2/11/11
to s4-project
OK. I'll try to repeat the experiment with a larger queue.

While I'm at it, I may add some sort of test to establish packet loss
is caused by hypervisor context switching or CPU starvation. I still
need to read up on this issue, but I believe if I do two identical
tests, one with UDP loss and one with TCP/IP and then compare the
overall CPU time, I'll get a sense of what is going on. If the two
tests take the same amount of CPU time, then I think I can infer that
the problem is at the kernel or hypervisor. Does that sound right?

Also, I'm interested to know how s4_ext might be used to add my own
comm layer. For example, if I'm already using a queuing engine (e.g.
HornetQ or QPID) in other parts of an application, how might I "plug
in" that queuing system to replace the default UDP layer. I admit
that this is a much more complex solution, but it could offer a few
nice capabilities:
* configure large queues to handle momentary spikes
* track exact counts of message loss
* dead letter queue for recording which messages were lost and doing
"post-mortem" analysis.

Just an idea.

Frank


On Feb 10, 5:21 pm, Leonardo Neumeyer <l...@s4.io> wrote:
> Our thinking is that if you run out of CPU cycles to keep up with real-time,
> something will have to give. Ideally, the platform will intelligently
> distribute load so as to minimize impact on app performance. Using TCP will
> add overhead and complexity since now you need connections from every node
> to every node. TCP-based messages are still desirable for non-time critical,
> low bandwidth, control messages but not for massive message passing as long
> as the transmission error is kept below an acceptable minimum.
>
> In this case, we think that the problem is that the kernel queue is too
> small. A larger queue should be able to absorb the short-term variations in
> capacity in a virtualized environment resulting in a much lower error.
>
> -leo
>
> On Thu, Feb 10, 2011 at 10:43 AM, Gianmarco <gianmarco.dfmora...@imtlucca.it

kishore g

unread,
Feb 11, 2011, 12:55:06 PM2/11/11
to s4-pr...@googlegroups.com
Hi,

As of now its not possible to use s4_ext to add tcp/ip. Comm layer was started with the intention of supporting any transport mechanism but  we are not there yet. Currently it supports unicast and  multicast. The code needs little change to support tcp/ip or pub/sub

The way comm layer deciides the transport mechanism based on the task configuration in zoo keeper.


Here is the short term solution.
change this class to support tcp/ip mode pub/sub
https://github.com/s4/comm/blob/master/src/main/java/io/s4/comm/core/GenericSender.java

long term
need to have sender and listener interface and have implementations for multicast and unicast and tcp/ip pub/sub. Have the implementation class specified in task configuration in zoo keeper. This way it will be easier to plugin multiple transport mechanisms

Currently i cannot work on this as I am injured but hope you can quickly tweak the Sender to support tcp/ip and pub/sub. To get tcp/ip completely right is quite a task. Getting a pub/sub should be easier.

Let me know if you have questions.

thanks,
Kishore G

Bruce Robbins

unread,
Feb 11, 2011, 1:39:53 PM2/11/11
to s4-project
Alternately, you can

- subclass io.s4.emitter.CommLayerEmitter and
io.s4.listener.CommLayerListener. Override the functionality.
- stick those extensions into s4_ext (say emitterlistener/
emitterlistener.jar).
- edit the s4_core config file and change the bean definitions for the
beans commLayerEmitter and rawListener to use your new classes (and to
set the appropriate properties). The references to those beans won't
need to change, just the definitions of those two beans.

It's little yucky that you have to change a core config file. But you
don't need to modify any commlayer code (or any code, for that matter)
since you're essentially bypassing the commlayer.

This is what I've done in the past to bypass commlayer for testing. It
may still work :).

On Feb 11, 9:55 am, kishore g <g.kish...@gmail.com> wrote:
> Hi,
>
> As of now its not possible to use s4_ext to add tcp/ip. Comm layer was
> started with the intention of supporting any transport mechanism but  we are
> not there yet. Currently it supports unicast and  multicast. The code needs
> little change to support tcp/ip or pub/sub
>
> The way comm layer deciides the transport mechanism based on the task
> configuration in zoo keeper.
>
> Here is the short term solution.
> change this class to support tcp/ip mode pub/subhttps://github.com/s4/comm/blob/master/src/main/java/io/s4/comm/core/...
>
> long term
> need to have sender and listener interface and have implementations for
> multicast and unicast and tcp/ip pub/sub. Have the implementation class
> specified in task configuration in zoo keeper. This way it will be easier to
> plugin multiple transport mechanisms
>
> Currently i cannot work on this as I am injured but hope you can quickly
> tweak the Sender to support tcp/ip and pub/sub. To get tcp/ip completely
> right is quite a task. Getting a pub/sub should be easier.
>
> Let me know if you have questions.
>
> thanks,
> Kishore G
>
> On Fri, Feb 11, 2011 at 6:30 AM, Frank San Miguel <frank.sanmig...@gmail.com
Message has been deleted

Frank San Miguel

unread,
Feb 16, 2011, 6:09:34 PM2/16/11
to s4-pr...@googlegroups.com
I reran the UDP tests with an eye to characterizing the issue more fully and I think S4 can work fine on Amazon EC2, but it will require adding a way to throttle UDP packets.
 
Thanks for the input on extending S4 to use queues.  Based on my test results, I'm going to hold off on any attempt to add tcp/ip messages for a bit.  I'd rather work on a UDP throttle if you agree with my conclusions.

Summary:
==========================================
1. UDP requires throttling
UDP loss rate is much higher on EC2 than on my local test environment because EC2 has much larger network bandwidth (1G or 10G vs 100M).  This is somewhat counter intuitive, but I believe the explanation is that with a 1G network, it is much easier for the writer to out pace the reader.  With a constrained network (100MBits) and a fast computer (3Ghz), the reader has no problems keeping up.  I verified this theory by throttling the sender.  I also showed that increasing the UDP kernel buffer yielded a modest improvement.
 
In short:
* Increasing udp kernel buffers reduces the message loss from 22% to 18%
* Throttling from 335 mbps to 283 mpbs reduces the message loss rate from 18% to 0.8% message loss.  
* Throttling from 283 mbps to 211 mbps reduces the message loss rate to 0.004%.
 
UDP based applications often use network throttling.  For instance, Oracle Coherence, which is a distributed cache based on UDP, ships with a tuning/throttling application that allows you to set udp throttling to match your installation (network + CPUs).  
 
2. TCP/IP connections would be a bad idea for Amazon EC2
Contrary to my earlier statement, tcp/ip would be really bad:
* On the Amazon useast-b region, TCP/IP bandwidth was 100x worse than UDP (from 335 MBit/sec to 3 MBit/sec)
* On my own 100 mbit network, I saw a 7x reduction in throughput (from 84 Mbit/sec to 12 MBit/sec )
 
I'm not sure why the UDP vs TCP/IP is 7x locally but 100x Amazon EC2, but I suspect there are a number of factors:
* Physical hardware buffers.  UDP is fire and forget and so it requires less of the hardware.  TCP needs a lot more support and the buffers probably get overloaded when there are many VM's on a single physical machine.  From the virtual machine's point of view, a restriction on physical buffers would have the appearance of simple network latency.
* Network latency.  Traffic on EC2 is sometimes very heavy and always variable.  
* Traffic shaping.  Amazon may control the kinds of network I/O it allows each instance to use.  I saw a consistent 3 mbps in five different tests at different times on two different days (Sunday, Wednesday).
 
 
 Details:
==========================================
The script that I used to run these tests is at https://github.com/sanmi/scripts/blob/master/udp-tcp.py
  (I'm not a python wizard so please forgive any python faux pas).
 
Test 1.  Effect of kernel buffers on UDP loss
Summary: increasing kernel buffers reduces average message loss from 22% to 18%
 
Setup:
* two EC2 large instances
* 1k bytes per message
* send 100k messages via UDP and TCP
* count number of lost messages
 
Test 1.1 - default udp buffer size
* net.core.rmem_max = 131071
* net.core.rmem_default = 124928
* Average 22% message loss from 8 consecutive runs
 
Test 1.2 - 1G udp buffer size
* net.core.rmem_max = 1073741824.
* net.core.rmem_default = 124928
* Average 18% message loss from 8 consecutive runs
* Average 2.1 sec to send an average 88,000 messages
 
==========================================
Test 2.  Comparison of TCP and UDP on EC2 and local
Test Setup:
* 1k bytes per message
* send 100k messages via UDP and TCP
* time the sender and reciever
 
Results:
Test 2.1. UDP between two EC2 Large instances
* Average 2.1 sec to send an average 88,000 messages.  42k messages/sec = 335 MBits/sec
 
Test 2.2. TCP between two EC2 Large instances
* 253 sec to send 100,000 messages. 395 messages/sec = 3MBits/sec
 
Test 2.3. UDP between two 3Ghz machines on an unloaded 100Mbps network
* 9.5 sec to send 100,000 messages. 10.5k messages/sec = 84 MBits/sec
 
Test 2.4. TCP between two 3Ghz machines on an unloaded 100Mbps network
* 67 sec to send 100,000 messages. 1479 messages/sec = 12 MBits/sec
 
==========================================
Test 3.  Effect of throttling
Test 3.1 - 1G udp buffer size, 2x throttling
* insert a 2 msec delay every 100 messages
* Average .004% message loss from 7 consecutive runs
* Average 3.8 sec to send 100,000 messages = 26k messages/sec = 211 MBit/sec
 
Test 3.2 - 1G udp buffer size, 2x throttling
* insert a 1 msec delay every 100 messages
* Average .8% message loss from 8 consecutive runs
* Average 2.8 sec to send an average of 99200 messages = 35k messages/sec = 283 MBit/sec
 

Bruce Robbins

unread,
Feb 17, 2011, 12:46:29 AM2/17/11
to s4-project
> also showed that increasing the UDP kernel buffer yielded a modest
> improvement.

That makes sense if the rate is maintained at a level in which the
receiver cannot keep up. The buffer exists to handle 1) spikes in the
message rate 2) temporary contention for CPU resources on the reader.
If messages were being dropped because the reader was occasionally and
briefly denied CPU resources, then an increased buffer size could
help. If the long term average rate is too high for the reader, an
increased buffer size will not help much.

There may be an extra dimension in the EC2 environment, where I
imagine the kernel itself is denied CPU resources because the virtual
machine cannot be scheduled to run. In that case, the buffer would not
help much because the kernel cannot receive the message and also
cannot put it in the reader's UDP buffer.

> 1. UDP requires throttling

It seems like you were throttling in your initial tests of S4 on EC2,
true? You had results for different rates.

For testing only, we also use throttling. generate_load.sh includes a -
r option with which you specify a desired event rate.

In production, things are different. If your S4 application is
listening to external events generated by actions in the real world
(stock trades, page views, clicked links), you don't have much control
over the rate. The events come in when they come in. Hopefully, you
have enough S4 nodes to handle the rate and enough buffer and queue
space to handle occasional spikes.

If some future version of the commlayer uses TCP/IP, it will have more
control over what gets discarded from the queues during sustained
spikes: instead of letting the kernel discard arbitrary messages
because the buffer is full, S4 can discard messages that have, say,
lesser priority. Your observation that TCP/IP seems slower will of
course be a factor in choosing which commlayer implementation to use
(I didn't notice that difference in a recent test where I compared
ZeroMQ to UDP: I saw comparable maximum rates. Probably has something
to do with the network in which my machines live).


On Feb 16, 3:09 pm, Frank San Miguel <frank.sanmig...@gmail.com>
wrote:
> The script that I used to run these tests is athttps://github.com/sanmi/scripts/blob/master/udp-tcp.py

Frank San Miguel

unread,
Feb 17, 2011, 4:38:27 PM2/17/11
to s4-pr...@googlegroups.com

> 1. UDP requires throttling

It seems like you were throttling in your initial tests of S4 on EC2,
true? You had results for different rates.

Yes, I throttle the inputs, but not PE-to-PE messages.

 
For testing only, we also use throttling. generate_load.sh includes a -
r option with which you specify a desired event rate.

In production, things are different. If your S4 application is
listening to external events generated by actions in the real world
(stock trades, page views, clicked links), you don't have much control
over the rate. The events come in when they come in. Hopefully, you
have enough S4 nodes to handle the rate and enough buffer and queue
space to handle occasional spikes.

I agree, but something is still weird about this situation.  My PE test runs with no loss in my local environment (100m network) but the same test with the same input rate has major packet loss on amazon's environment (1G network).  My UDP test shows that I can reliably send a lot more data on EC2 than on my local network, so I should be seeing the opposite behavior.

For instance, with two High-CPU Extra Large instances, I loose about 2% in between each PE for a cumulative total of 9% loss (<10% CPU utilization).  

When I posted earlier, I was thinking it would be nice to have a small "throttled queue" for PE-to-PE communications, allowing the application developer to insert potential latency in exchange for lower message loss (this assumes bursty behavior between PEs), but now I feel I don't understand enough to have an opinion.  I'll have to think about it some more.
Reply all
Reply to author
Forward
0 new messages