Re: sFlow-RT prometheus extension overestimating incoming bytes

27 views

Skip to first unread message

Message has been deleted

Peter Phaal

unread,

May 28, 2024, 11:56:11 AMMay 28

to sFlow-RT

1. Could you please send the prometheus scrape config you are using?

2. What are the sFlow configuration settings on your switches (polling interval / packet sampling rate)?

3. How much traffic is flowing over a typical port (packets per second / bits per second)?

Do you intend to plot flows by switch port, or do you want to plot the sum over all switch ports. When you query top flows you are either getting: the maximum N flows across the ports (so for example, you would have the value from the port that reported the largest value for TCP port 80 for example - ignoring all values from the other ports); or the N largest flows computed by summing the values for each flow across all ports (so for example, you would have the total TCP port 80 traffic seen across all ports). The aggMode 'max' or 'sum' values control this behavior - max is the default behavior. If you include agent and inputifindex in the flow key you will be able to see all ports.

Are you are trying to monitor relatively small amounts of traffic with a large sampling rate? In this case, many ports would underreport (since there wasn't enough traffic to generate a sample) and a few would overreport since a singe sample is given a large weight. This problem would be made worse if you add additional keys to the flow definition as you subdivide the traffic. If this is the issue, here are a couple of suggestions:

1. Set a large value for t in the flow definition (t:300) to smooth out the data.

2. Increase the packet sampling rate.

On Tuesday, May 28, 2024 at 8:27:57 AM UTC-7 Michael McArthur wrote:

Dear everyone,

I have setup sFlow-host and sFlow-RT prometheus to analyze traffic. The first step is to measure the bytes per second for each port. (I rather have the total sampled bytes coming in per flow but I did not find that option at the value functions of https://sflow-rt.com/define_flow.php) However after validating I notice that my prometheus queries give wrong graphs. I know exactly what kind of traffic is flowing over this network. Certain ports are getting overestimated when looking at certain time intervals.

For example I make a graph of: avg_over_time(tcp_port)[5m] + avg_over_time(udp_port)[5m] where I sum results of similar ports together. Here I see that certain ports are up to 20 times larger than what they should be. I can fix that by reducing the time window to [1m] but that leads to underestimation of other ports. I do not know the source of this problem. Whether it is linked to the prometheus scrape configs or sFlow-RT itself. I do know this issue was noticed by my colleagues at ports that have a higher average bytes per flow in the prometheus database.

Reply all

Reply to author

Forward

0 new messages