Hi,
As for timestamping packets, I agree that if we need two switches to mark timestamps to one probe packet based on their own clock, it is necessary and difficult to sync their clock.
Besides, it puts forward higher requirements to switches to timestamp packets, because most switches/routers/vSwitch perform table lookup and packet forward only. By the way, there is no protocol message that can report time data from switch to controller, such as OpenFlow.
I think timestamping and a synchronization such as PTP or any other similar might be necessary on switches, unfortunately. You can, of course, install PTP-aware interfaces on hosts and test end-to-end delay for a particular path, without involving switches on the timestamping task. I sometimes hear other researchers and engineers comment on the fact that the delay perceived in queues in a particular forwarding device can be significantly higher than the delay for a packet to traverse a particular link, which might make sense). You can measure this with P4, In-band Network Telemetry (INT), and BMv2 or Tofino targets/devices, I'd say. If the Tofino devices support PTP, there might be a way to carry timestamps on the packet using INT (we would have to see which timestamps are supported on the metadata). Then, at the end of the path,
this information is extracted from packets and collected.
As for switch stats, it can provide some source data to calculate bandwidth and packet loss ratio, but it does not provide any time information about latency/jitter.
Yes, bandwidth should be fine to be calculated because the controller receives the bytes sent/rec on ports. This is something that the ONOS already reports (I think) on the web UI. You can get a little more specific by calculating per-flow instead of port-based bw. And as we commented, maybe you can get some info on dropped packets but I don't know if the dropped packets report is because the treatment for a flow rule is dropping or because of congestion (we'd have to look into this). As you said, nothing about delay or jitter, which makes sense due to the necessary tools and protocols to measure this properly.
As for ICMP, it is an end-to-end method for host client/server. If we use it to measure RTT between switches, it requires loopback or interface address to be configured in switch, and timestamping ability. And if ICMP is executed between switches, there is no method to report the RTT result to controller.
When I mentioned ICMP, I was thinking about measuring the approximate value when a packet traversed a particular path, which involves one or more switches, and queueing might be more important than the time delay for the packet to traverse a link. I think I have seen some papers that calculate the value you are looking for by sending a packet to a switch, then fwd it to another switch and then back to the controller: Controller -> Sw1 -> Sw2 -> Controller. I think they follow several statistical methods to calculate the control plane link and processing delay (cannot remember how) and then subtract this to the overall delay they experienced. This however might not be as exact as expected and it also involved the controller to query all switches every N time.
Additionally, it is true that I don't think there is any way to report the RTT to the controller but I remember OpenFlow accommodates experimenter-based messages but I don't know if this could be used in any way.
As for iperf, it is a good way to measure link capability and device performance, but if we use it to perform routine measurement, it may affect network performance.
This is true, and I also thought to be end-to-end so you could test throughput but you might not know which device/link is performing worse (if any).
So maybe we need a new active measurement method in control plane, i.e. performed by ONOS, without timestamping at switches and performance pressure for whole network. How do you think? :)
This seems like a good idea. You'd have to consider that without timestamping measuring delay gets complicated. Using P4 and extending the work of the link I posted earlier, it could be a good use case but still difficult as it might require specialized hardware. If you'd use BMv2, then
Andy and Antonin make a really good point at the very end of this file. If you'd prefer to involve the controller and use OpenFlow switches, then you might be in search of work similar to
this or
this.
Cheers,