First of all I apologize if this is not the appropriate forum, I assume there should be a mailing list for ONOS, but I can't figure out which is it.
In the context of Akrino IEC, I've been investigating a problem on a SEBA-in-a-Box installation, at least sort of, because I'm deploying SEBA using the cord-platform -> seba -> att-workflow set of charts and then install PONSim. All of these use manually built Docker images that work on aarch64!
Right now I can do the authentication step, but the DHCP fails, although I can see the DHCP replies coming from the BNG in the mininet POD. Which then revealed that ONOS is not catching them.
Right now the most obvious problem is that Xconnet Manager crashes during what looks to be the operation of adding the corresponding flows in ONOS. Not all the flows are lost, by comparing against a working setup on x86 there were suppose to be 6 new flows added, but only 4 got there.
The problem is reproducible with the same outcome every time. Here is the relevant bactrace in the ONOS log:
2019-09-13 15:09:55,807 | INFO | AAA-radius-0 | AaaManager | 185 - org.opencord.aaa - 1.8.0 | Auth event APPROVED for of:0000aabbccddeeff/128
2019-09-13 15:10:23,278 | INFO | qtp300471503-38 | Olt | 187 - org.opencord.olt-app - 2.1.0 | Programming vlans for subscriber: [id:PSMO12345678,cTag:111,sTag:222,nasPortId:,uplinkPort:-1,slot:-1,hardwareIdentifier:null,ipaddress:null,nasId:null,circuitId:,remoteId:]
2019-09-13 15:10:23,287 | INFO | ispatch-default0 | AccessDeviceKafkaIntegration | 189 - org.opencord.kafka - 1.0.0 | Got AccessDeviceEvent: SUBSCRIBER_REGISTERED
2019-09-13 15:10:23,350 | INFO | ce-operations-29 | Olt | 187 - org.opencord.olt-app - 2.1.0 | DHCP v4 filter for device of:0000aabbccddeeff on port 128 installed.
2019-09-13 15:10:23,350 | WARN | tive-installer-3 | OltPipeline | 161 - org.onosproject.onos-drivers-default - 1.13.5 |
Only the following are Supported in OLT for filter ->
ETH TYPE : EAPOL, LLDP and IPV4
IPV4 TYPE: IGMP and UDP (for DHCP)
2019-09-13 15:10:23,351 | WARN | tive-installer-3 | InOrderFlowObjectiveManager | 130 - org.onosproject.onos-core-net - 1.13.5 | Flow objective onError DefaultFilteringObjective{id=1084928967, type=PERMIT, op=ADD, priority=10000, key=IN_PORT:128, conditions=[ETH_TYPE:ipv6, IP_PROTO:17, UDP_SRC:547, UDP_DST:546], meta=DefaultTrafficTreatment{immediate=[OUTPUT:CONTROLLER], deferred=[], transition=None, meter=[], cleared=false, StatTrigger=null, metadata=null}, appId=DefaultApplicationId{id=176, name=org.opencord.olt}, permanent=true, timeout=0}. Reason = UNSUPPORTED
2019-09-13 15:10:23,351 | INFO | tive-installer-3 | Olt | 187 - org.opencord.olt-app - 2.1.0 | DHCP v6 filter for device of:0000aabbccddeeff on port 128 failed installation because UNSUPPORTED
2019-09-13 15:10:25,097 | INFO | qtp300471503-38 | XconnectManager | 177 - org.onosproject.onos-apps-segmentrouting-app - 1.13.5 | Adding or updating xconnect. deviceId=of:0000000000000001, vlanId=222, ports=[1, 2]
2019-09-13 15:10:40,118 | ERROR | nt-partition-1-0 | ThreadPoolContext | 90 - io.atomix - 2.0.23 | An uncaught exception occurred
java.lang.IllegalStateException: org.onosproject.store.service.ConsistentMapException$Timeout: onos-sr-xconnect-next
at org.onosproject.store.primitives.impl.MeteredAsyncConsistentMap$InternalMeteredMapEventListener.event(MeteredAsyncConsistentMap.java:310)
at org.onosproject.store.primitives.impl.CachingAsyncConsistentMap.lambda$null$1(CachingAsyncConsistentMap.java:93)
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:399)[95:com.google.guava:22.0.0]
at org.onosproject.store.primitives.impl.CachingAsyncConsistentMap.lambda$null$2(CachingAsyncConsistentMap.java:93)
at java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1597)[:1.8.0_201]
at org.onosproject.store.primitives.impl.CachingAsyncConsistentMap.lambda$new$3(CachingAsyncConsistentMap.java:93)
at org.onosproject.store.primitives.impl.TranscodingAsyncConsistentMap$InternalBackingMapEventListener.event(TranscodingAsyncConsistentMap.java:366)
at org.onosproject.store.primitives.impl.TranscodingAsyncConsistentMap$InternalBackingMapEventListener.event(TranscodingAsyncConsistentMap.java:366)
at org.onosproject.store.primitives.resources.impl.AtomixConsistentMap.lambda$null$1(AtomixConsistentMap.java:128)[133:org.onosproject.onos-core-primitives:1.13.5]
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:399)[95:com.google.guava:22.0.0]
at org.onosproject.store.primitives.resources.impl.AtomixConsistentMap.lambda$null$2(AtomixConsistentMap.java:128)[133:org.onosproject.onos-core-primitives:1.13.5]
at java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1597)[:1.8.0_201]
at org.onosproject.store.primitives.resources.impl.AtomixConsistentMap.lambda$handleEvent$3(AtomixConsistentMap.java:128)[133:org.onosproject.onos-core-primitives:1.13.5]
at java.util.ArrayList.forEach(ArrayList.java:1257)[:1.8.0_201]
at org.onosproject.store.primitives.resources.impl.AtomixConsistentMap.handleEvent(AtomixConsistentMap.java:127)[133:org.onosproject.onos-core-primitives:1.13.5]
at io.atomix.protocols.raft.proxy.impl.DelegatingRaftProxy.lambda$addEventListener$4(DelegatingRaftProxy.java:122)
at io.atomix.protocols.raft.proxy.impl.BlockingAwareRaftProxyClient.lambda$null$2(BlockingAwareRaftProxyClient.java:67)
at io.atomix.utils.concurrent.ThreadPoolContext.lambda$new$0(ThreadPoolContext.java:81)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)[:1.8.0_201]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)[:1.8.0_201]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)[:1.8.0_201]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)[:1.8.0_201]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)[:1.8.0_201]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)[:1.8.0_201]
at java.lang.Thread.run(Thread.java:748)[:1.8.0_201]
Caused by: org.onosproject.store.service.ConsistentMapException$Timeout: onos-sr-xconnect-next
at org.onosproject.store.primitives.DefaultConsistentMap.complete(DefaultConsistentMap.java:258)
at org.onosproject.store.primitives.DefaultConsistentMap.containsKey(DefaultConsistentMap.java:77)
at org.onosproject.segmentrouting.xconnect.impl.XconnectManager.populateNext(XconnectManager.java:485)
at org.onosproject.segmentrouting.xconnect.impl.XconnectManager.populateXConnect(XconnectManager.java:455)
at org.onosproject.segmentrouting.xconnect.impl.XconnectManager.access$300(XconnectManager.java:98)
at org.onosproject.segmentrouting.xconnect.impl.XconnectManager$XconnectMapListener.event(XconnectManager.java:297)
at org.onosproject.store.primitives.impl.MeteredAsyncConsistentMap$InternalMeteredMapEventListener.event(MeteredAsyncConsistentMap.java:306)
... 24 more
At this point I can't match the trace against Xconnect version 1.13.5, I have no idea how the app is built and installed in the Docker image.
What information I received so far, the aarch64 Docker image has been built using this Dockerfile:
Any help is greatly appreciated as this points to a serious problem on aarch64 which needs to be fixed.