Exporting sampled packet transit delay and queue depth

Peter Phaal

unread,

Nov 30, 2020, 11:59:49 AM11/30/20

to sFlow

The following proposal defines a method of exporting packet transit delay and queue depth for each sampled packet:

https://sflow.org/draft_sflow_transit.txt

This extension complements inband telemetry (INT) efforts, leveraging common instrumentation built into the hardware forwarding plane, but using sFlow's out of band transport.

Using sFlow as the telemetry transport has a number of benefits:

Simple to deploy since there is no modification of packets (no issues with encapsulations, MTU, number of measurements, path length, incremental deployment, etc.)
Extensibility of sFlow protocol allows additional forwarding plane measurements to augment standard sFlow measurements, fully integrating these new measurements with sFlow data exported from other switches in the network.
sFlow's is a unidirectional telemetry transport protocol originates from the device management plane, can be sent out of band, limiting possible attack surfaces.

These performance metrics support proactive traffic management, allowing actions to reduce congestion and delay before packets are lost.

The new measurements complement the recently published sFlow Dropped Packet Notification Structures extension that provides visibility into dropped packets.

Please review and comment.

Peter Phaal

unread,

Dec 1, 2020, 12:30:04 PM12/1/20

to sFlow

The draft has been updated based on off-list feedback:

https://sflow.org/draft2_sflow_transit.txt

The suggestion was to separate the measurements in the extended_delay structure since they are independent and separating them makes it easy to drop unsupported measurements.

Anoop Ghanwani

unread,

Dec 1, 2020, 1:32:17 PM12/1/20

to sf...@googlegroups.com

Hi Peter,

Does this require exact measurements for the queue and delay or are estimates considered acceptable? If the latter, adding such a clarification would be helpful.

It might also be helpful to add that these structures are only valid if the sample is an egress sFlow sample. (If this is not true, then we need to clarify how they can be provided for an ingress sFlow sample.)

Thanks,

Anoop

--
You received this message because you are subscribed to the Google Groups "sFlow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sflow+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sflow/613daa34-93e3-454f-a64f-be44e1229b17n%40googlegroups.com.

Peter Phaal

unread,

Dec 1, 2020, 2:43:44 PM12/1/20

to sFlow

These are exact measurements provided by the ASIC. Depending on the hardware pipeline, one or both measurements (transit delay / queue depth) may be available. It's also possible that the set of available measurements may vary depending on the sampling point (ingress or egress).

Anoop Ghanwani

unread,

Dec 1, 2020, 3:40:13 PM12/1/20

to sf...@googlegroups.com

Hi Peter,

Thanks for the clarification on the measurement accuracy. To clarify, if an implementation cannot provide exact data, then it should treat these as unsupported, rather than trying to come up with an estimation, right?

Also, how would this work for an ingress sFlow sample (where the output port is not known since sampling is done before forwarding lookups)?

Anoop

To view this discussion on the web visit https://groups.google.com/d/msgid/sflow/6ea20b15-96b9-4bc8-b6d3-061039de34dfn%40googlegroups.com.

Peter Phaal

unread,

Dec 1, 2020, 6:42:57 PM12/1/20

to sFlow

Comments inline:

On Tue, Dec 1, 2020 at 12:40 PM Anoop Ghanwani <an...@alumni.duke.edu> wrote:

To clarify, if an implementation cannot provide exact data, then it should treat these as unsupported, rather than trying to come up with an estimation, right?

Correct. Either the hardware is able to provide an accurate measurement or the record must be omitted.

Also, how would this work for an ingress sFlow sample (where the output port is not known since sampling is done before forwarding lookups)?

sFlow deployments typically use ingress sampling. Most current hardware delays reporting the sample until the egress port has been selected so that it can be included with the ingress packet sample. The results are returned late enough in the packet processing pipeline that additional forwarding plane transit metrics can be included. Which metrics can be included are likely to be architecture specific.

To view this discussion on the web visit https://groups.google.com/d/msgid/sflow/CA%2B-tSzy886-JrWgePJOiv8D2dRSGUZa1-YRLx_R7rXL1%2BU_OcQ%40mail.gmail.com.

Anoop Ghanwani

unread,

Dec 1, 2020, 10:32:05 PM12/1/20

to sf...@googlegroups.com

Thanks Peter.

To view this discussion on the web visit https://groups.google.com/d/msgid/sflow/CAB8g2zzYfjQZgPkxz6uEFM7oTuwHdsXF-CcLHeMirFpvn3piew%40mail.gmail.com.

Anoop Ghanwani

unread,

Dec 2, 2020, 1:57:07 PM12/2/20

to sf...@googlegroups.com

Peter,

One additional question -- how do systems where we have multiple queues report this? For example, in a chassis system, the packet can be queued at the input line card (most likely a VoQ), get scheduled and transmitted through the fabric, and then get queued at the output port on the output line card. Is the queue structure expected to report the sum of all the queue lengths encountered?

Anoop

Peter Phaal

unread,

Dec 2, 2020, 4:06:27 PM12/2/20

to sFlow

In a chassis switch, the queue depth measurement is the depth of the egress port packet queue on the egress line card (the extended_egress_queue structure indicates the specific queue on the egress port).

The capacity of current generation switch ASICs allows for single ASIC chassis switches:

https://engineering.fb.com/2019/03/14/data-center-engineering/f16-minipack/

In practice, given the hardware dependency for making accurate latency / queue depth measurements, I expect that they will only be implemented in single ASIC devices.

Anoop Ghanwani

unread,

Dec 2, 2020, 7:39:56 PM12/2/20

to sf...@googlegroups.com

Hi Peter,

Even in a single switch ASIC, there can be a VoQ implementation with separate queues on ingress and egress.

Are you saying that this would be limited only to output queued switches?

Anoop

To view this discussion on the web visit https://groups.google.com/d/msgid/sflow/b8f4b2a5-ba56-4e68-a392-297f2ebb1been%40googlegroups.com.

Peter Phaal

unread,

Dec 3, 2020, 10:10:30 AM12/3/20

to sFlow

I am not an expert on the details of VoQ implementations, so I consulted with my co-author (Chen) and here is his response:

This is a very good question.

When looking at centralized solutions, i.e. one box, there is no problem when talking about queue occupancy.

The expected value is the VOQ Occupancy.

On a chassis based system, I agree that VOQ occupancy can be tricky.

I would say that in some architectures, it might be possible to get the value of the real VOQ, i.e. sum of all VOQ directed to the port. It will not be 100% accurate, but it would be good and close enough for collector analysis.

If the architecture does not support this ability, I would say that in a chassis based system this value is not very useful.

One more note on value estimation.

Regarding latency, the value will be very accurate.

for Queue occupancy, even if the value is not 100% accurate (like in chassis based systems), it is still good enough for analysis and presents the load on the VOQ.

Anoop Ghanwani

unread,

Dec 3, 2020, 1:12:46 PM12/3/20

to sf...@googlegroups.com

Hi Peter,

I'm not sure about an accurate value being available even with a single chip design because the internal chip architectures can be quite different. I think some sort of clarification regarding VoQ implementations would be useful, even if it's just what is noted in your response below.

I can see that the latency value would be exact provided that the sampling is done at the egress at dequeue time. It might be good to add that clarification to the spec.

Thanks,

Anoop

To view this discussion on the web visit https://groups.google.com/d/msgid/sflow/315e6b74-21a8-4482-9355-82fd6d3a2d6en%40googlegroups.com.

Chen Rozenbaum

unread,

Dec 7, 2020, 6:05:43 PM12/7/20

to sf...@googlegroups.com

Thanks Anoop

We will add clarification to the spec.

Regards,

Chen

To view this discussion on the web visit https://groups.google.com/d/msgid/sflow/CA%2B-tSzwD23fe5d5Fo3h%3D-BBgW5ZhQv8cDHLoTjN5P6GROmK0wg%40mail.gmail.com.

Anoop Ghanwani

unread,

Dec 7, 2020, 6:14:22 PM12/7/20

to sf...@googlegroups.com

Thanks Chen.

To view this discussion on the web visit https://groups.google.com/d/msgid/sflow/CAMym-WMeHo4bBomQuc_idWSwf9%2BxMBFfnCR66HAhA5AcRgDCxg%40mail.gmail.com.

Peter Phaal

unread,

Dec 8, 2020, 5:44:45 PM12/8/20

to sFlow

The latest draft has been posted:

https://sflow.org/draft3_sflow_transit.txt

Anoop Ghanwani

unread,

Dec 10, 2020, 3:45:02 PM12/10/20

to sf...@googlegroups.com

Hi Peter and Chen,

I would suggest the following change.

>>>

For VOQ based architectures, queue depth is the number of bytes already in the selected VOQ when the sampled packet is enqueued. For chassis based systems, the value should be the sum of all VOQs directed to the output port.

>>>

to

>>>

For VOQ based architectures, the queue depth value reported should be the sum of all VOQs that feed a particular queue at the output port.

>>>

This is to take care of the case where a fixed system (could be single or multi-chip but not a chassis) is also covered.

Thanks,

Anoop

To view this discussion on the web visit https://groups.google.com/d/msgid/sflow/dd484f5a-e7c6-4520-b5bf-95d2a60c7ffbn%40googlegroups.com.

Peter Phaal

unread,

Dec 12, 2020, 3:22:54 PM12/12/20

to sFlow

Thats a good suggestion. A new draft has been uploaded that includes the proposed change:

https://sflow.org/draft4_sflow_transit.txt

Anoop Ghanwani

unread,

Dec 14, 2020, 3:02:56 PM12/14/20

to sf...@googlegroups.com

Thanks Peter.

Another thought -- would it make sense to include an optional queue id for both the latency and queue structures? This could be a system-specific number and would allow the collector to analyze the behavior of particular queues, as opposed to just tracking the queue length and latency experienced at a flow level.

Anoop

To view this discussion on the web visit https://groups.google.com/d/msgid/sflow/6fb590cc-2ba5-450b-9bac-afa14e44c49dn%40googlegroups.com.

Peter Phaal

unread,

Dec 14, 2020, 6:19:22 PM12/14/20

to sFlow

The extended_egress_queue structure is required when queue length is reported and can optionally be included when latency is reported. The egress queue number and egress port number together uniquely identify the queue.

Peter Phaal

unread,

Mar 17, 2021, 3:20:53 PM3/17/21

to sFlow

The final draft has been posted:

https://sflow.org/sflow_transit.txt

There is now support for the new metrics in the Linux kernel, Host sFlow, sflowtool, and sFlow-RT:

https://blog.sflow.com/2021/03/transit-delay-and-queuing.html

Reply all

Reply to author

Forward