Fast path / slow path

Peter Phaal

unread,

Mar 2, 2012, 1:50:44 PM3/2/12

to sf...@googlegroups.com

I was thinking about a performance problem someone was having with their switch. It turned out that this was an older switch with limited CAM space and they were running traffic from their wireless access points through the switch. The large numbers of client MAC addresses filled the CAM and packets were being switched in software, resulting in high CPU utilization and poor performance.

This is one example of a broader class of problems that can be very hard to diagnose - switches provide very high throughput and low latency as long as the packet is handled in hardware. Understanding which packets are being kicked up to the management CPU and taking the slow path would be very helpful for diagnosing performance problems and should be relatively straightforward to report on using sFlow.

If the switch added an extended flow structure to packet samples taking the slow path, then it would be easy for sFlow analysis tools to report on the ratio of slow/fast forwarded packets (a bit like a cache hit ratio). Analyzing the packet headers for the misses would give a good idea of which applications and users were being affected / were responsible for the performance problem. A reason code could provide additional information.

Reporting slow path packets is related to the previous "sFlow for queue length monitoring" proposal - both give ways to understand how packets are being treated by the switch and why particular services may be running slow.

Peter

------

enum slow_path_reason {

0 = unknown,

1 = other,

2 = CAM_MISS,

3 = CAM_FULL,

4 = NO_HW_SUPPORT, /* protocols that must be forwarded in software because switch lacks hardware support */

5 = CNTRL, /* spanning tree, LLDP, etc. messages intended for control plane */

}

/* Slow packet path data */

/* Used to indicate that the sampled packet was not handled in fast path */

/* opaque = flow_data; enterprise = 0; format = 1020 */

struct extended_slow_path {

slow_path_reason reason

}

rick jones

unread,

Mar 2, 2012, 4:31:47 PM3/2/12

to sf...@googlegroups.com

Any particular reason to thing the distribution of traffic taking the slow path will often be anything but (pseudo) random? Why not simply have stats for those reasons instead of making it be an extension to flow?

rick jones

Peter Phaal

unread,

Mar 2, 2012, 5:03:58 PM3/2/12

to sf...@googlegroups.com

On Fri, Mar 2, 2012 at 1:31 PM, rick jones <perf...@gmail.com> wrote:
>
> Any particular reason to thing the distribution of traffic taking the slow path will often be anything but (pseudo) random?

I can't think of cases where the distributions would appear random in
all dimensions. In the example, traffic from hosts already a slot in
the CAM would always take the fast path. The remaining traffic would
take the slow path. If you were plotting slow vs fast path for
protocols the values might look random, but if you break it out by MAC
addresses you would see a clear pattern. Similarly, if the packets
were hitting the slow path because the hardware didn't support a
particular feature (IP multicast for example), then there would be a
clear difference that would show up when you broke out the traffic by
protocol, but maybe not by source address.

> Why not simply have stats for those reasons instead of making it be an extension to flow?

A matching counter block would be a useful addition. Counters are more
sensitive and they should be easy to maintain since the packets are
hitting the management CPU anyway.

/* opaque = counter_data; enterprise = 0; format = 8 */
slow_path_counts {
unsigned int unknown,
unsigned int other,
unsigned int cam_miss,
unsigned int cam_full,
unsigned int no_hw_support,
unsigned int cntrl
}

rick jones

unread,

Mar 2, 2012, 5:24:54 PM3/2/12

to sf...@googlegroups.com

On Mar 2, 2012, at 2:03 PM, Peter Phaal wrote:

> On Fri, Mar 2, 2012 at 1:31 PM, rick jones <perf...@gmail.com> wrote:
>>
>> Any particular reason to thing the distribution of traffic taking the slow path will often be anything but (pseudo) random?
>
> I can't think of cases where the distributions would appear random in
> all dimensions. In the example, traffic from hosts already a slot in
> the CAM would always take the fast path. The remaining traffic would
> take the slow path. If you were plotting slow vs fast path for
> protocols the values might look random, but if you break it out by MAC
> addresses you would see a clear pattern. Similarly, if the packets
> were hitting the slow path because the hardware didn't support a
> particular feature (IP multicast for example), then there would be a
> clear difference that would show up when you broke out the traffic by
> protocol, but maybe not by source address.

I can see where over some short (ish) timescale it would not distribute among the MACs, but are switches "Once in the CAM, forever in the CAM?" I was assuming the over a timescale longer than the CAM age the distribution among MACs would be "better."

Hardware not supporting a specific feature is not going to be all that dynamic is it? The incrementing of the "no_hw_support" counter you define below would serve as the proverbial canary, which would get one asking "What doesn't the hardware support?" if one didn't already know what the hardware didn't support.

>> Why not simply have stats for those reasons instead of making it be an extension to flow?
>
> A matching counter block would be a useful addition. Counters are more
> sensitive and they should be easy to maintain since the packets are
> hitting the management CPU anyway.
>
> /* opaque = counter_data; enterprise = 0; format = 8 */
> slow_path_counts {
> unsigned int unknown,
> unsigned int other,
> unsigned int cam_miss,
> unsigned int cam_full,
> unsigned int no_hw_support,
> unsigned int cntrl
> }

A drift question - 10 GbE is something like O(14M) PPS each way right? (minimum size). With 100 GbE on the horizon are we approaching the day that 32 bit packet counters are insufficient? 1 GbE (125 million octets per second) wraps a 32 bit octet counter in 40ish seconds, so presumably a 100 GbE link at 140M PPS could wrap a 32 bit packet counter in 40ish seconds. Yes, that does presume something about the packet size distribution, but even then isn't the goal of most counters that they don't wrap over a period of minutes?

rick jones

Anoop Ghanwani

unread,

Mar 3, 2012, 10:22:06 AM3/3/12

to sf...@googlegroups.com

Hi Peter,

A few comments
- Hardware tables can be either TCAM or EXM (hash table).
So if we were to do this, we should probably just say something
like "HW_TABLE_FULL", etc.
- If the goal is to find out if hardware tables are full, then
this may not work, because in the L2 world, a full table,
which basically translates to a MAC DA lookup miss,
usually results in packets being flooded.
- The control messages identified below are not forwarded
in software. They are consumed by software. It may result
in software creating another packet to send to its other
neighbors, but that would look different than the sample.
- If you're looking at IP packets as well, then packets
that are forwarded by software include those where:
- ARP is not resolved;
- Fragmentation is needed;
- IPv4 options/IPv6 next headers are present in the packet.

Anoop

Reply all

Reply to author

Forward