Simple sFlow configuration profile

853 views
Skip to first unread message

Peter Phaal

unread,
Jul 25, 2013, 2:49:28 PM7/25/13
to sf...@googlegroups.com
Creating a simple sFlow configuration profile for large scale deployments would reduce operational complexity and ensure consistent settings across all the devices in the network. One way to greatly simplify the configuration is to avoid interface level options and configure sFlow at the device level instead. Here are some thoughts:

Sensible defaults reduce the number of configuration parameters and provide an useful out of the box . The default counter polling interval is 30 seconds. The following formula generates reasonable default sampling rates:

default_sampling_rate =  ifSpeed / 1000000, where ifSpeed is expressed in bits per second.

This calculation yields the following sampling rates for typical data center link speeds:

sampling.1G = 1 in 1,000
sampling.10G = 1 in 10,000
sampling.40G = 1 in 40,000
sampling.100G = 1 in 100,000

These default settings ensure that any traffic flow consuming 10% of link bandwidth will be detectable within seconds or less. The basis for this calculation and expected response times are given in the following article:


These exact sampling rates may not be realizable on all devices, but the sFlow standard allows the agent to pick the nearest achievable sampling rate when processing a configuration setting, so the device should simply implement the value that would apply had the sampling rate been set through the sFlow MIB or device CLI.

sFlow also has an IANA registered port (6343), so enabling sFlow on all ports on a switch could be as simple as the following two top level commands:

sflow destination 10.0.0.1
sflow enable

If the defaults aren't suitable in a particular environment, they can be selectively overridden, e.g. to change the sampling rate for 1G links:

sflow sampling 1G 2000

or to disable sampling for 40G links:

sflow sampling 40G 0

The particular syntax of the commands will vary from CLI to CLI, but it would be helpful if the semantics of the configurations were as uniform as possible between vendors.

However, more important for large scale configuration would be to agree standard settings for programmatic configuration of devices since these mechanisms simplify large scale configuration. Reducing the number of configuration options makes it much easier to configure sFlow through a variety of different mechanisms, including: NETCONF, REST, DHCP, and DNS-SD (http://www.slideshare.net/netvis/dnssd).

Please comment.

Peter

DJ Spry

unread,
Mar 3, 2015, 11:18:56 PM3/3/15
to sf...@googlegroups.com
Peter,
I am trying to understand best practice for a large scale deployment.  The architecture is leveraging a 40G links between spine and leafs. 

I have seen many reference the numbers below which seem quite different than the chart here and http://blog.sflow.com/2013/06/large-flow-detection.html


polling = 20

  sampling.1G=2048
  sampling.10G=4096
  sampling.40G=8192

Curious which chart is recommended?

Cheers!
DJ

Peter Phaal

unread,
Mar 4, 2015, 12:25:02 AM3/4/15
to sf...@googlegroups.com
The numbers I gave are adjusted to the nearest achievable rate. The settings would result in the following sampling rates on hardware that only allows sampling rates that are powers of two:

sampling.1G = 1024
sampling.10G = 8192
sampling.40G = 32768

The goal of these settings is to resolve significant traffic flows (10% of bandwidth) within around a second in order to give timely notifications to SDN applications. The settings also assume sFlow is enabled on all switch ports on all switches. In a leaf and spine network, traffic enters through the lower speed 1G / 10G ports. and so is sampled more aggressively than on the 40G inter-switch links. The sampling rates at the edge give good accuracy for traffic to/from hosts and can be used to detect large "Elephant" flows quickly enough to re-mark or drop them. Sampling rates at the core detect congested links and flow collisions quickly enough to load balance the traffic using hybrid OpenFlow, I2RS, PBR etc.

These settings are conservative and focus primarily on managing bandwidth. If you have lower traffic levels and want increased accuracy, for example to drive a usage based billing application, then you might want to shift the policy on the edge to a more aggressive setting.


However, in practice sampling rates can vary by a factor of 2 and it is unlikely that anyone would notice any difference in the results obtained. The numbers you provided are within a factor of two at the 1G and 10G speeds of the access layer but is much more aggressive in monitoring the 40G inter-switch links - I am not sure what the rationale would be for the monitoring the inter-switch traffic aggressively when it has already been monitored at the edge. If additional accuracy is required you might drop the sampling rate on the 10G ports.

The bigger point is to shift from explicit port based configuration of sFlow sampling rates to policy based settings that are applied network wide.

DJ Spry

unread,
Mar 4, 2015, 8:30:52 AM3/4/15
to sf...@googlegroups.com
Peter,
Thank you for the excellent response, that is much clearer now.  I presume the polling of 20sec is still quite adequate. 

WRT policy based settings, we are creating a policy "group" where all 10G and 40G interfaces automatically inherit sFlow at their appropriate rate.  Is this style of implementation in line with policy based settings or do you envision or suggest some other technique?

Again, many thanks. 

Peter Phaal

unread,
Mar 4, 2015, 11:17:53 AM3/4/15
to sFlow
The default polling interval in the sFlow v5 spec is 30 seconds. A
value of 20 makes counter based reporting somewhat more responsive and
provides an additional reading per minute which helps maintain
accuracy of per minute reporting under packet loss. InMon generally
sets the polling interval to 20 seconds.

Your strategy for creating policy groups sounds right. Does it ensure
that if a link speed changes the appropriate setting is applied?

For an example, the Host sFlow agent
(http://host-sflow.sourceforge.net/) implements port speed based
sampling settings, and the following article describes how the
settings are applied when running the Host sFlow agent on Cumulus
Linux:

http://blog.sflow.com/2014/06/cumulus-networks-sflow-and-data-center.html
> --
> You received this message because you are subscribed to the Google Groups
> "sFlow" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sflow+un...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages