[stratum-dev] Is there a limit on the # of ECMP groups stratum_bmv2 can handle?

6 views
Skip to first unread message

A Sydney

unread,
May 10, 2021, 4:46:51 PM5/10/21
to strat...@lists.stratumproject.org
Hi Stratum folks,

# Context:                          
I have created a pipeline that enables IPv4-ECMP (P4 snippet shown in attached file action_selector*) and I am able to build and push p4info.txt and bmv2.json to the switches. 

# Issue:
When I attempt to add groups, the first one moves to the "ADDED" state. When I attempt to add a second, the first group and the second both move to the "PENDING_ADD_RETRY" state.

# ONOS logs:
At the same time, P4RuntimeClientImpl throws the error "Error while performing READ on device:spine2...Unexpected error in RPC handling". Then ONOS spirals into this cycle of attempting to reinstall all flows on the switch (See debug.log for details.).

# Logs on one of the bmv2 switches:
I then see the following error on the switch:
[libprotobuf FATAL external/com_google_protobuf/src/google/protobuf/repeated_field.h:1694] CHECK failed: (index) < (current_size_):

# Question:
Have you ever come across this one?  Are there any obvious mistakes that I'm running into (Perhaps my P4 ECMP definition attached is incorrect?)? Any suggestions on how to debug further?

Thanks!
-Syd

debug.log
action_selector_snippet

Maximilian Pudelko

unread,
May 10, 2021, 8:36:16 PM5/10/21
to A Sydney, strat...@lists.stratumproject.org
Hi Syd,

I have not observed this error before. Stratum-bmv2 is backed by PI (https://github.com/p4lang/PI).
All P4RT messages are passed to it nearly unmodified, so the issue must lie there.

Max

_______________________________________________
stratum-dev mailing list
strat...@lists.stratumproject.org
https://lists.stratumproject.org/listinfo/stratum-dev

Maximilian Pudelko

unread,
May 11, 2021, 11:38:02 PM5/11/21
to Maximilian Pudelko, A Sydney, strat...@lists.stratumproject.org
Hi Syd,

After some investigation I found the root cause to this issue: https://github.com/stratum/stratum/pull/687
I expect this to get merged soon.

Max

A Sydney

unread,
May 12, 2021, 12:53:50 PM5/12/21
to strat...@lists.stratumproject.org
Hi Stratum folks,
                         @Max, thanks a bunch! I updated the package in and indeed, this particular error has been fixed. 

However, I'm getting this new error where LLDP pkt-ins are causing a buffer underflow exception in ONOS. When I disable the lldpprovider at ONOS, the error is gone. Can you kindly take a look at the following log and provide feedback (i.e. Have you seen this one before, is stratum the source of the issue, or should I take it up with ONOS folks)?


Cheers!
-Syd





Maximilian Pudelko

unread,
May 12, 2021, 1:05:50 PM5/12/21
to A Sydney, strat...@lists.stratumproject.org
Hi Syd,

We fixed something similar in our internal version very recently. In short, byte strings returned by
Stratum are not zero-padded and ONOS has to handle them accordingly.
Here is a snippet of the new code:

        if (packetMetadata.isPresent()) {
            try {
                ImmutableByteSequence portByteSequence = packetMetadata.get()
                        .value().fit(P4InfoConstants.INGRESS_PORT_BITWIDTH);
                short s = portByteSequence.asReadOnlyBuffer().getShort();
                ConnectPoint receivedFrom = new ConnectPoint(deviceId, PortNumber.portNumber(s));
                ByteBuffer rawData = ByteBuffer.wrap(packetIn.data().asArray());
                return new DefaultInboundPacket(receivedFrom, ethPkt, rawData);
            } catch (ImmutableByteSequence.ByteSequenceTrimException e) {
                throw new PiInterpreterException(format(
                        "Malformed metadata '%s' in packet-in received from '%s': %s",
                        P4InfoConstants.INGRESS_PORT, deviceId, packetIn));
            }
        } else {
            throw new PiInterpreterException(format(
                    "Missing metadata '%s' in packet-in received from '%s': %s",
                    P4InfoConstants.INGRESS_PORT, deviceId, packetIn));
        }

Feel free to submit this as a fix to ONOS.

Max


On Wed, May 12, 2021 at 9:54 AM A Sydney via stratum-dev <strat...@lists.stratumproject.org> wrote:
Hi Stratum folks,
                         @Max, thanks a bunch! I updated the package in and indeed, this particular error has been fixed. 

However, I'm getting this new error where LLDP pkt-ins are causing a buffer underflow exception in ONOS. When I disable the lldpprovider at ONOS, the error is gone. Can you kindly take a look at the following log and provide feedback (i.e. Have you seen this one before, is stratum the source of the issue, or should I take it up with ONOS folks)?


Cheers!
-Syd





On Tue, May 11, 2021 at 11:38 PM Maximilian Pudelko <m...@opennetworking.org> wrote:

A Sydney

unread,
May 12, 2021, 3:06:05 PM5/12/21
to Maximilian Pudelko, strat...@lists.stratumproject.org
I'll take this one up with ONOS folks, but in the meantime, I'm attempting to insert this snippet in my codebase for a quick test. 

It appears that ONOS has a FabricConstants.java class (as opposed to P4InfoConstants.java) which does have "INGRESS_PORT" as follows:
'''
    public static final PiPacketMetadataId INGRESS_PORT =
            PiPacketMetadataId.of("ingress_port");
'''
I'm assuming that I can replace your "P4InfoConstants.INGRESS_PORT" with "FabricConstants.INGRESS_PORT"? But what about but P4InfoConstants.INGRESS_PORT_BITWIDTH? Can you kindly share how it is defined?

Thanks,
-Syd

Maximilian Pudelko

unread,
May 12, 2021, 3:27:19 PM5/12/21
to A Sydney, strat...@lists.stratumproject.org
Sure.

We parse the p4info file of the fabric pipeline and look for the `controller_packet_metadata` fields.
Currently it is 9 bits:

  metadata {
    id: 1
    name: "ingress_port"
    bitwidth: 9
  }


Max

A Sydney

unread,
May 12, 2021, 4:51:44 PM5/12/21
to Maximilian Pudelko, strat...@lists.stratumproject.org
So I've updated my pipeconf class with this info, and the error no longer occurs.

Thanks!
-Syd
Reply all
Reply to author
Forward
0 new messages