Exporting sampled packet transit delay and queue depth

75 views
Skip to first unread message

Peter Phaal

unread,
Nov 30, 2020, 11:59:49 AM11/30/20
to sFlow
The following proposal defines a method of exporting packet transit delay and queue depth for each sampled packet:


This extension complements inband telemetry (INT) efforts, leveraging common instrumentation built into the hardware forwarding plane, but using sFlow's out of band transport.

Using sFlow as the telemetry transport has a number of benefits:
  1. Simple to deploy since there is no modification of packets (no issues with encapsulations, MTU, number of measurements, path length, incremental deployment, etc.)
  2. Extensibility of sFlow protocol allows additional forwarding plane measurements to augment standard sFlow measurements, fully integrating these new measurements with sFlow data exported from other switches in the network.
  3. sFlow's is a unidirectional telemetry transport protocol originates from the device management plane, can be sent out of band, limiting possible attack surfaces.
These performance metrics support proactive traffic management, allowing actions to reduce congestion and delay before packets are lost. 

The new measurements complement the recently published sFlow Dropped Packet Notification Structures extension that provides visibility into dropped packets.

Please review and comment.

Peter Phaal

unread,
Dec 1, 2020, 12:30:04 PM12/1/20
to sFlow
The draft has been updated based on off-list feedback:


The suggestion was to separate the measurements in the extended_delay structure since they are independent and separating them makes it easy to drop unsupported measurements.

Anoop Ghanwani

unread,
Dec 1, 2020, 1:32:17 PM12/1/20
to sf...@googlegroups.com
Hi Peter,

Does this require exact measurements for the queue and delay or are estimates considered acceptable?  If the latter, adding such a clarification would be helpful.

It might also be helpful to add that these structures are only valid if the sample is an egress sFlow sample.  (If this is not true, then we need to clarify how they can be provided for an ingress sFlow sample.)

Thanks,
Anoop

--
You received this message because you are subscribed to the Google Groups "sFlow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sflow+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sflow/613daa34-93e3-454f-a64f-be44e1229b17n%40googlegroups.com.

Peter Phaal

unread,
Dec 1, 2020, 2:43:44 PM12/1/20
to sFlow
These are exact measurements provided by the ASIC. Depending on the hardware pipeline, one or both measurements (transit delay / queue depth) may be available. It's also possible that the set of available measurements may vary depending on the sampling point (ingress or egress).

Anoop Ghanwani

unread,
Dec 1, 2020, 3:40:13 PM12/1/20
to sf...@googlegroups.com
Hi Peter,

Thanks for the clarification on the measurement accuracy.  To clarify, if an implementation cannot provide exact data, then it should treat these as unsupported, rather than trying to come up with an estimation, right?

Also, how would this work for an ingress sFlow sample (where the output port is not known since sampling is done before forwarding lookups)?  

Anoop

Peter Phaal

unread,
Dec 1, 2020, 6:42:57 PM12/1/20
to sFlow
Comments inline:

On Tue, Dec 1, 2020 at 12:40 PM Anoop Ghanwani <an...@alumni.duke.edu> wrote:
To clarify, if an implementation cannot provide exact data, then it should treat these as unsupported, rather than trying to come up with an estimation, right?

Correct. Either the hardware is able to provide an accurate measurement or the record must be omitted.
 

Also, how would this work for an ingress sFlow sample (where the output port is not known since sampling is done before forwarding lookups)? 

sFlow deployments typically use ingress sampling. Most current hardware delays reporting the sample until the egress port has been selected so that it can be included with the ingress packet sample. The results are returned late enough in the packet processing pipeline that additional forwarding plane transit metrics can be included. Which metrics can be included are likely to be architecture specific.
 

Anoop Ghanwani

unread,
Dec 1, 2020, 10:32:05 PM12/1/20
to sf...@googlegroups.com

Anoop Ghanwani

unread,
Dec 2, 2020, 1:57:07 PM12/2/20
to sf...@googlegroups.com
Peter,

One additional question -- how do systems where we have multiple queues report this?  For example, in a chassis system, the packet can be queued at the input line card (most likely a VoQ), get scheduled and transmitted through the fabric, and then get queued at the output port on the output line card.  Is the queue structure expected to report the sum of all the queue lengths encountered?

Anoop

Peter Phaal

unread,
Dec 2, 2020, 4:06:27 PM12/2/20
to sFlow
In a chassis switch, the queue depth measurement is the depth of the egress port packet queue on the egress line card (the extended_egress_queue structure indicates the specific queue on the egress port).

The capacity of current generation switch ASICs allows for single ASIC chassis switches:


In practice, given the hardware dependency for making accurate latency / queue depth measurements, I expect that they will only be implemented in single ASIC devices.

Anoop Ghanwani

unread,
Dec 2, 2020, 7:39:56 PM12/2/20
to sf...@googlegroups.com
Hi Peter,

Even in a single switch ASIC, there can be a VoQ implementation with separate queues on ingress and egress.

Are you saying that this would be limited only to output queued switches?

Anoop

Peter Phaal

unread,
Dec 3, 2020, 10:10:30 AM12/3/20
to sFlow
I am not an expert on the details of VoQ implementations, so I consulted with my co-author (Chen) and here is his response:

This is a very good question.
When looking at centralized solutions, i.e. one box, there is no problem when talking about queue occupancy.
The expected value is the VOQ Occupancy.

On a chassis based system, I agree that VOQ occupancy can be tricky.
I would say that in some architectures, it might be possible to get the value of the real VOQ, i.e. sum of all VOQ directed to the port. It will not be 100% accurate, but it would be good and close enough for collector analysis.
If the architecture does not support this ability, I would say that in a chassis based system this value is not very useful.

One more note on value estimation.
Regarding latency, the value will be very accurate.
for Queue occupancy, even if the value is not 100% accurate (like in chassis based systems), it is still good enough for analysis and presents the load on the VOQ.

Anoop Ghanwani

unread,
Dec 3, 2020, 1:12:46 PM12/3/20
to sf...@googlegroups.com
Hi Peter,

I'm not sure about an accurate value being available even with a single chip design because the internal chip architectures can be quite different.  I think some sort of clarification regarding VoQ implementations would be useful, even if it's just what is noted in your response below.

I can see that the latency value would be exact provided that the sampling is done at the egress at dequeue time.  It might be good to add that clarification to the spec.

Thanks,
Anoop

Chen Rozenbaum

unread,
Dec 7, 2020, 6:05:43 PM12/7/20
to sf...@googlegroups.com
Thanks Anoop
We will add clarification to the spec.

Regards,
Chen

Anoop Ghanwani

unread,
Dec 7, 2020, 6:14:22 PM12/7/20
to sf...@googlegroups.com

Peter Phaal

unread,
Dec 8, 2020, 5:44:45 PM12/8/20
to sFlow
The latest draft has been posted:

Anoop Ghanwani

unread,
Dec 10, 2020, 3:45:02 PM12/10/20
to sf...@googlegroups.com
Hi Peter and Chen,

I would suggest the following change.  
>>>
For VOQ based architectures, queue depth is the number of bytes already in the selected VOQ when the sampled packet is enqueued. For chassis based systems, the value should be the sum of all VOQs directed to the output port.
>>>
to
>>>
For VOQ based architectures, the queue depth value reported should be the sum of all VOQs that feed a particular queue at the output port.
>>>
This is to take care of the case where a fixed system (could be single or multi-chip but not a chassis) is also covered.

Thanks,
Anoop

Peter Phaal

unread,
Dec 12, 2020, 3:22:54 PM12/12/20
to sFlow
Thats a good suggestion. A new draft has been uploaded that includes the proposed change:

Anoop Ghanwani

unread,
Dec 14, 2020, 3:02:56 PM12/14/20
to sf...@googlegroups.com
Thanks Peter.

Another thought -- would it make sense to include an optional queue id for both the latency and queue structures?  This could be a system-specific number and would allow the collector to analyze the behavior of particular queues, as opposed to just tracking the queue length and latency experienced at a flow level.

Anoop

Peter Phaal

unread,
Dec 14, 2020, 6:19:22 PM12/14/20
to sFlow
The extended_egress_queue structure is required when queue length is reported and can optionally be included when latency is reported. The egress queue number and egress port number together uniquely identify the queue.

Peter Phaal

unread,
Mar 17, 2021, 3:20:53 PM3/17/21
to sFlow
The final draft has been posted:

There is now support for the new metrics in the Linux kernel, Host sFlow, sflowtool, and sFlow-RT:
Reply all
Reply to author
Forward
0 new messages