sFlow export of packet drop notifications

Peter Phaal

unread,

Sep 2, 2020, 1:26:36 PM9/2/20

to sFlow

The following proposal defines a method of exporting packet headers, metadata, and drop reason using sFlow:

https://sflow.org/draft_sflow_drops_6.txt

Extending sFlow to provide visibility into dropped packets offers significant benefits for network troubleshooting, providing real-time network wide visibility into the specific packets that were dropped as well the reason the packet was dropped. This visibility instantly reveals the root cause of drops and the impacted connections.

Packet discard records complement existing counter polling and packet sampling mechanisms and share a common data model so that all three sources of data can be correlated. For example, if packets are being discarded because of buffer exhaustion, the discard records don't necessarily tell the whole story. The discarded packets may represent mice flows that are victims of an elephant flow. Packet samples will reveal the traffic that isn't being dropped and provide a more complete picture. Counter data adds additional information such as interface speed, utilization, packet and discard rates that further completes the picture.

The Host sFlow agent implements the draft spec and has been tested using Linux drop_monitor instrumentation on a high performance compute cluster:

https://blog.sflow.com/2020/07/using-sflow-to-monitor-dropped-packets.html

Seeing drops on the host combines with the network instrumentation to give an end to end view of packet drops. The host can report on layer 2/3/4 drops that complements the layer 2/3 drop visibility from the network devices.

Please review and comment.

Anoop Ghanwani

unread,

Sep 2, 2020, 2:58:50 PM9/2/20

to sf...@googlegroups.com

Hi Peter,

This looks like a much needed extension to sflow. A few high-level questions for now. I can send more detailed editorial comments later.

- What does this mean for switch hardware? From my read of the draft, it sounds like the switch needs to be able to "sample" discarded packets. Sample may not be the right word because we are trying to report all discarded packets subject to the 10 pps limit?

- Would this document need to be updated each time IANA has a new ICMP unreachable error code?

- I think implementers will need more detail on what exactly each of the error codes mean, either a reference to a document where the discard type is defined or a few lines describing what causes such a discard. For example, it's not clear to me what VLAN tag mismatch means.

Thanks,

Anoop

--
You received this message because you are subscribed to the Google Groups "sFlow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sflow+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sflow/6718fd1c-529c-4077-8ad8-064b569374b1n%40googlegroups.com.

Peter Phaal

unread,

Sep 2, 2020, 5:43:46 PM9/2/20

to sFlow

Hi Anoop,

Thanks for the initial review. Comments inline.

Peter

On Wednesday, September 2, 2020 at 11:58:50 AM UTC-7 Anoop Ghanwani wrote:

- What does this mean for switch hardware? From my read of the draft, it sounds like the switch needs to be able to "sample" discarded packets. Sample may not be the right word because we are trying to report all discarded packets subject to the 10 pps limit?

The mechanism being described does not sample discarded packets. All discarded packets will be reported (provided that the number of notifications per second falls below the rate limit). The switch ASIC needs to have a mechanism copy discarded packets / packet headers and packet drop reasons to the sFlow agent, along with hardware rate limiting, in order to implement the spec.

- Would this document need to be updated each time IANA has a new ICMP unreachable error code?

The document doesn't need to change provided that the new ICMP codes fall in the currently reserved 0-255 range. The spec is consistent with the drop reason codes in the sFlow Version 5 document and extends the drop reason codes that can be used with samples packets. In future we can keep adding reason codes to the enumeration without re-publishing the full spec since the spec states "The drop_reason enumeration may be expanded over time. sFlow collectors must be prepared to receive discard_packet structures with unknown drop_reason values." and "The authoritative list of drop reasons will be maintained at sflow.org"

- I think implementers will need more detail on what exactly each of the error codes mean, either a reference to a document where the discard type is defined or a few lines describing what causes such a discard. For example, it's not clear to me what VLAN tag mismatch means.

The general philosophy with sFlow is to defer to authoritative sources (IEEEE, IETF, IANA etc) for basic definitions. Codes 0-255 defer to IANA for their definitions. Additional codes are drawn from Devlink Trap [2],

https://www.kernel.org/doc/html/latest/networking/devlink/devlink-trap.html

For example, vlan_tag_mismatch, is defined as "Traps incoming packets that the device decided to drop in case of VLAN tag mismatch: The ingress bridge port is not configured with a PVID and the packet is untagged or prio-tagged"

The definitions are only included by reference to ensure that we don't create ambiguity if the definition in the primary source changes.

Anoop Ghanwani

unread,

Sep 2, 2020, 6:09:49 PM9/2/20

to sf...@googlegroups.com

Hi Peter,

Comments inline.

Thanks,

Anoop

On Wed, Sep 2, 2020 at 2:43 PM Peter Phaal <peter...@gmail.com> wrote:

Hi Anoop,

Thanks for the initial review. Comments inline.

Peter

On Wednesday, September 2, 2020 at 11:58:50 AM UTC-7 Anoop Ghanwani wrote:
- What does this mean for switch hardware? From my read of the draft, it sounds like the switch needs to be able to "sample" discarded packets. Sample may not be the right word because we are trying to report all discarded packets subject to the 10 pps limit?

The mechanism being described does not sample discarded packets. All discarded packets will be reported (provided that the number of notifications per second falls below the rate limit). The switch ASIC needs to have a mechanism copy discarded packets / packet headers and packet drop reasons to the sFlow agent, along with hardware rate limiting, in order to implement the spec.

OK.

- Would this document need to be updated each time IANA has a new ICMP unreachable error code?

The document doesn't need to change provided that the new ICMP codes fall in the currently reserved 0-255 range. The spec is consistent with the drop reason codes in the sFlow Version 5 document and extends the drop reason codes that can be used with samples packets. In future we can keep adding reason codes to the enumeration without re-publishing the full spec since the spec states "The drop_reason enumeration may be expanded over time. sFlow collectors must be prepared to receive discard_packet structures with unknown drop_reason values." and "The authoritative list of drop reasons will be maintained at sflow.org"

OK.

- I think implementers will need more detail on what exactly each of the error codes mean, either a reference to a document where the discard type is defined or a few lines describing what causes such a discard. For example, it's not clear to me what VLAN tag mismatch means.

The general philosophy with sFlow is to defer to authoritative sources (IEEEE, IETF, IANA etc) for basic definitions. Codes 0-255 defer to IANA for their definitions. Additional codes are drawn from Devlink Trap [2],
https://www.kernel.org/doc/html/latest/networking/devlink/devlink-trap.html
For example, vlan_tag_mismatch, is defined as "Traps incoming packets that the device decided to drop in case of VLAN tag mismatch: The ingress bridge port is not configured with a PVID and the packet is untagged or prio-tagged"

The definitions are only included by reference to ensure that we don't create ambiguity if the definition in the primary source changes.

I think it would be helpful to list the source for each number or set of numbers. I also don't see anything on the Devlink Trap page for some entries, e.g. overlay_smac_is_dmac, and even with a quick scan of the table I couldn't see something that would correspond to these. Is there some way to make the cross referencing easier?

--
You received this message because you are subscribed to the Google Groups "sFlow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sflow+un...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/sflow/6f5dc276-d259-41c4-af00-f7cb127fc363n%40googlegroups.com.

Peter Phaal

unread,

Sep 2, 2020, 8:47:45 PM9/2/20

to sFlow

Hi Anoop,

Comments inline.

Peter

On Wednesday, September 2, 2020 at 3:09:49 PM UTC-7 Anoop Ghanwani wrote:

I think it would be helpful to list the source for each number or set of numbers. I also don't see anything on the Devlink Trap page for some entries, e.g. overlay_smac_is_dmac, and even with a quick scan of the table I couldn't see something that would correspond to these. Is there some way to make the cross referencing easier?

There are three sources of numbers currently referenced, IANA ICMP codes which are noted in the spec as occupying the reserved range 0-255, the sFlow Version 5 spec which defined the codes 0-262, and the Linux kernel document which covers the rest.

Some of the Linux reasons have been mapped into existing codes, for example, ttl_value_is_too_small was mapped to the sFlow Version 5 ttl_exceeded code. This is indicated by the comment in the drop_reason enum. The codes 289 and up are reasons that have yet to be upstreamed to the Linux kernel, but the intent is that they will eventually be upstreamed and documented as part of the Linux API, and so they have been reserved.

We could add a /* reserved */ comment against the codes in the document.

Are you aware of any other source of drop codes we might want to consider referencing and adding?

For the purpose of the sFlow Dropped Packet Notification Structures spec, the drop_reason enum is an open ended list that will be added to over time. It doesn't affect measurement architecture or structure definitions in the spec. Maintaining the enum on sflow.org ensures that agreement can be reached on any additional codes so that we can ensure interoperability and avoid the creation of overlapping codes and the resulting ambiguity.

I will add some text to the specification to clarify how the current numbers were allocated.

Anoop Ghanwani

unread,

Sep 3, 2020, 12:07:59 AM9/3/20

to sf...@googlegroups.com

Hi Peter,

Please see inline.

Thanks,

Anoop

On Wed, Sep 2, 2020 at 5:47 PM Peter Phaal <peter...@gmail.com> wrote:

Hi Anoop,

Comments inline.

Peter

On Wednesday, September 2, 2020 at 3:09:49 PM UTC-7 Anoop Ghanwani wrote:
I think it would be helpful to list the source for each number or set of numbers. I also don't see anything on the Devlink Trap page for some entries, e.g. overlay_smac_is_dmac, and even with a quick scan of the table I couldn't see something that would correspond to these. Is there some way to make the cross referencing easier?

There are three sources of numbers currently referenced, IANA ICMP codes which are noted in the spec as occupying the reserved range 0-255, the sFlow Version 5 spec which defined the codes 0-262, and the Linux kernel document which covers the rest.

Some of the Linux reasons have been mapped into existing codes, for example, ttl_value_is_too_small was mapped to the sFlow Version 5 ttl_exceeded code. This is indicated by the comment in the drop_reason enum. The codes 289 and up are reasons that have yet to be upstreamed to the Linux kernel, but the intent is that they will eventually be upstreamed and documented as part of the Linux API, and so they have been reserved.

We could add a /* reserved */ comment against the codes in the document.

Are you aware of any other source of drop codes we might want to consider referencing and adding?

For the kinds of protocols we deal with the list looks pretty comprehensive. Is there a way to mark whether the packet was discarded on ingress or egress? Reasons for discarding on egress includes egress vlan filtering (802.1Q), and split horizon suppression for multi-chassis LAG and variants or for EVPN multihoming.

For the purpose of the sFlow Dropped Packet Notification Structures spec, the drop_reason enum is an open ended list that will be added to over time. It doesn't affect measurement architecture or structure definitions in the spec. Maintaining the enum on sflow.org ensures that agreement can be reached on any additional codes so that we can ensure interoperability and avoid the creation of overlapping codes and the resulting ambiguity.

I will add some text to the specification to clarify how the current numbers were allocated.

I think that would be valuable.

--
You received this message because you are subscribed to the Google Groups "sFlow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sflow+un...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/sflow/5a36cfae-02bc-4b68-8f05-6abff7bd27c6n%40googlegroups.com.

Peter Phaal

unread,

Oct 6, 2020, 7:24:29 PM10/6/20

to sFlow

The final version of the spec has been published:

https://sflow.org/sflow_drops.txt

Anoop Ghanwani

unread,

Oct 6, 2020, 7:58:12 PM10/6/20

to sf...@googlegroups.com

Hi Peter,

A few comments, mostly editorial.

- The reference to "management plane" should probably say "control plane". (Management plane is used to refer to the part of the software that deals with interfacing with network management. Based on the context, it appears that control plane would fit better.)

pg 3

"sFlow collectors" -> "sFlow Collectors"

"A discarded_packet must" -> "A discarded_packet record must"

"...sampled_ipv6 may be used" -> "...sampled_ipv6 formats may be used"

"The discarded packets may represent mice flows" -> "The discarded packets may be part of mice flows"

"sFlow collectors must be prepared to receive discard_packet structures with unknown drop_reason values."

This should probably say "unsupported drop_reason values" and also say how these should be accounted for, e.g. have a separate counter that counts all unsupported drop_reason values.

pg 5

"reports the total number of drops detected"

extra space between drops and detected.

pg 6

Is there a standard way to report queue number?

Is there a standard way to report ACL/ACL rules? Should the entry include not just the ACL but the specific ACL entry/rule that would hit within a given ACL?/

Since we have a software function being reported, does it make sense to include the OS & OS version information, or is that information included somewhere else?

Thanks,

Anoop

To view this discussion on the web visit https://groups.google.com/d/msgid/sflow/c20fc420-aefa-43ac-849f-6565cbe64e6dn%40googlegroups.com.

Peter Phaal

unread,

Oct 6, 2020, 11:21:01 PM10/6/20

to sFlow

Hi Anoop,

Thanks for the editorial comments. I have fixed the issues you identified. Additional comments inline.

Peter

On Tuesday, October 6, 2020 at 4:58:12 PM UTC-7 Anoop Ghanwani wrote:

Is there a standard way to report queue number?

I am not aware of a standard way to identify queues. The extended_egress_queue structure reports a numeric queue index number. This allows packets reported by an agent to be grouped by egress port and queue - the queue number isn't expected to have global significance.

Is there a standard way to report ACL/ACL rules? Should the entry include not just the ACL but the specific ACL entry/rule that would hit within a given ACL?/

I am not aware of a standard way to identify ACL lists / rules across network operating systems. The list name and/or number is intended to provide additional level identifiers for records with ACL drop reason that have applicability across a range of systems. I am not sure that the process of compiling ACL rules to hardware is reversible - i.e. would the hardware have enough information to let you know the specific source rule that triggered the drop?

Since we have a software function being reported, does it make sense to include the OS & OS version information, or is that information included somewhere else?

The sFlow Host Structures (https://sflow.org/sflow_host.txt) spec allows the OS name and version to be reported.

Anoop Ghanwani

unread,

Oct 7, 2020, 11:29:30 AM10/7/20

to sf...@googlegroups.com

Hi Peter,

Please see inline.

Thanks,

Anoop

On Tue, Oct 6, 2020 at 8:21 PM Peter Phaal <peter...@gmail.com> wrote:

Hi Anoop,

Thanks for the editorial comments. I have fixed the issues you identified. Additional comments inline.

Peter

On Tuesday, October 6, 2020 at 4:58:12 PM UTC-7 Anoop Ghanwani wrote:
Is there a standard way to report queue number?

I am not aware of a standard way to identify queues. The extended_egress_queue structure reports a numeric queue index number. This allows packets reported by an agent to be grouped by egress port and queue - the queue number isn't expected to have global significance.

I think it would be good to mention this in the spec, i.e. that the queue index number is in the context of a port and is system-specific.

Is there a standard way to report ACL/ACL rules? Should the entry include not just the ACL but the specific ACL entry/rule that would hit within a given ACL?/

I am not aware of a standard way to identify ACL lists / rules across network operating systems. The list name and/or number is intended to provide additional level identifiers for records with ACL drop reason that have applicability across a range of systems. I am not sure that the process of compiling ACL rules to hardware is reversible - i.e. would the hardware have enough information to let you know the specific source rule that triggered the drop?

Some silicon is capable of reporting the ACL entry that was hit for a given packet if logging is enabled. I think it might be useful for the spec to allow it so that implementations that are capable of providing it can do so.

Since we have a software function being reported, does it make sense to include the OS & OS version information, or is that information included somewhere else?

The sFlow Host Structures (https://sflow.org/sflow_host.txt) spec allows the OS name and version to be reported.

I think it would be useful to mention that the function being reported can be used in context with this information so that an implementation reports those as well.

--
You received this message because you are subscribed to the Google Groups "sFlow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sflow+un...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/sflow/316c4475-4d56-4707-87e8-620cea9486f1n%40googlegroups.com.

Reply all

Reply to author

Forward