sFlow samples reporting truncated packet size

122 views
Skip to first unread message

Christian Svensson

unread,
Dec 27, 2021, 3:43:57 PM12/27/21
to sonicproject
Hi,

I am setting up an sFlow collector for my SONiC switch running 202021.
Everything works fine, except that my bandwidth measurements are about 10x off.
Looking at the sFlow samples I found something that I think seems a bit off.

Inside the sample I can see the following:
        Raw packet header
            0000 0000 0000 0000 0000 .... .... .... = Enterprise: standard sFlow (0)
            Flow data length (byte): 144
            Frame Length: 132
            Payload removed: 4
            Original packet length: 128
            Header of sampled packet: …
                Ethernet II, Src: ..
                802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 1991
                Internet Protocol Version 4, Src: X.Y.Z.A, Dst: X.Y.Z.B
                    0100 .... = Version: 4
                    .... 0101 = Header Length: 20 bytes (5)
                    Total Length: 1480

Notice that the "total length" reported by the IP packet is 1480 (which I know for a fact 99% of the packets transiting are), yet the "Original packet length" is reported as 128, and payload removed is 4.

My hypothesis is that hsflowd gets the sample from the ASIC and does not correctly propagate the actual frame size, so the sample from the ASIC is what is being considered as the full packet.
I know nothing on how hsflowd is set up to work with SONiC though, so this might be totally wrong - but it seems reasonable to me.

Can anyone else reproduce or disprove this behavior?

Thanks,

Christian Svensson

unread,
Dec 27, 2021, 4:06:18 PM12/27/21
to sonicproject
To add more to my own debugging, it seems the original length should be attached as PSAMPLE_ATTR_ORIGSIZE.
The tool "psample" seems to be able to read sampling data as it happens. It too reports 128 as the original size:

(vrf:mgmt)bluecmd@xxx:~$ psample -m
group 1 in-ifindex 28 out-ifindex 27 origsize 128 sample-rate 100000 seq 107703
group 1 in-ifindex 28 out-ifindex 27 origsize 128 sample-rate 100000 seq 107704
group 1 in-ifindex 28 out-ifindex 27 origsize 128 sample-rate 100000 seq 107705
group 1 in-ifindex 28 out-ifindex 27 origsize 128 sample-rate 100000 seq 107706
group 1 in-ifindex 28 out-ifindex 27 origsize 128 sample-rate 100000 seq 107707
group 1 in-ifindex 28 out-ifindex 27 origsize 128 sample-rate 100000 seq 107708
group 1 in-ifindex 28 out-ifindex 27 origsize 128 sample-rate 100000 seq 107709
group 1 in-ifindex 28 out-ifindex 27 origsize 128 sample-rate 100000 seq 107710
group 1 in-ifindex 28 out-ifindex 27 origsize 128 sample-rate 100000 seq 107711
group 1 in-ifindex 28 out-ifindex 27 origsize 128 sample-rate 100000 seq 107712
group 1 in-ifindex 28 out-ifindex 27 origsize 128 sample-rate 100000 seq 107713
group 1 in-ifindex 28 out-ifindex 27 origsize 128 sample-rate 100000 seq 107714
group 1 in-ifindex 28 out-ifindex 27 origsize 128 sample-rate 100000 seq 107715
group 1 in-ifindex 28 out-ifindex 27 origsize 128 sample-rate 100000 seq 107716

The debugging continues...

Christian Svensson

unread,
Dec 27, 2021, 4:45:47 PM12/27/21
to sonicproject
I activated psample debug in the Broadcom kernel psample driver and it seems that the packet from the ASIC is recognized with the proper size, but following the code it seems that the original length is just silently discarded.

[596946.149392] linux-knet-cb (0): psample_filter_cb: pkt size 1502, kf->dest_id 1, kf->cb_user_data 1
[596946.149399] linux-knet-cb (0): psample_meta_get: psample pkt metadata
[..]
[596946.149439] linux-knet-cb (0): psample_meta_get: srcport 17, dstport 13, src_ifindex 0x1c, dst_ifindex 0x1b, trunc_size 128, sample_rate 100000
[596946.149445] linux-knet-cb (0): psample_filter_cb: group 0x1, trunc_size 128, src_ifdx 0x1c, dst_ifdx 0x1b, sample_rate 100000
[596946.149463] linux-knet-cb (0): psample_meta_sample_reason: DCB36 sample_rx_reason_mask: 0x00000008, reason: 0x00000008, reason_hi: 0x00000000
[596946.149522] linux-knet-cb (1081096): psample_task: group 0x1, trunc_size 128, src_ifdx 0x1c, dst_ifdx 0x1b, sample_rate 100000

Looking at ./platform/broadcom/saibcm-modules/sdklt/linux/knetcb/psample-cb.c function psample_filter_cb it seems to me that the SKB is created with the truncated length - which seems reasonable. However, the upstream psample seems to use the SKB's len as the source of truth for the packet length.
Reading the code for psample_sample_packet it seems that it was made to never read more than trunc_size anyway - so maybe setting skb->len to the original length is the correct thing to do anyway?

Regards,

Christian Svensson

unread,
Dec 28, 2021, 6:57:49 PM12/28/21
to sonicproject
To close the loop on this, I created and tested a fix that I submitted as https://github.com/Azure/sonic-buildimage/pull/9650.

Hopefully this is useful for other people.

Regards,
Reply all
Reply to author
Forward
0 new messages