Selecting hardware for measuring real-time MachineKit/PreemptRT latencies using LTTng

809 views
Skip to first unread message

Michel Dagenais

unread,
Apr 28, 2015, 9:37:18 PM4/28/15
to machi...@googlegroups.com
Many people interested in LinuxCNC/MachineKit wonder about the control platform to select, from the most inexpensive (Raspberry PI) with more latency issues to the more expensive (Intel with FPGA card). The big question is always the real-time response for a typical challenging target such as 5 axis servo + spindle (3 phase variable frequency). We have the ability with LTTng to diagnose such real-time problems and thus to properly tune and then evaluate these different solutions. At the same time, it provides a challenging testbed for the LTTng toolchain. Your comments and suggestions on the hardware setup to test are most welcome!

Here is my current list of popular and interesting hardware that we would like to try out and properly assess this Summer:

- Raspberry PI 2, low end and most inexpensive.

- BeagleBone Black and eventually BeagleBoard X15, slightly more expensive but very interesting for control applications with PRUs, PWM and eQEP.

- Combined ARM and FPGA with Xilinx chips like Zedboards. The Adapteva Parallela board comes with a Xilinx Z-7010 or Z-7020. You get a dual-core A9, and FPGA for custom logic for encoder inputs and PWM or step generation.

- Intel board (Atom or better, suggestions for model welcome) with parallel port I/O or any of the available FPGA I/O cards.

Indeed, while some of the platforms may work without additional hardware (BeagleBone or Zedboard), an FPGA board provides a robust and high performance solution. I have a MesaNet 5I20 but the PCI bus is becoming legacy. Many newer models are offered using high speed data connections which are very convenient but not normally relied upon for real-time work (USB and Ethernet). However, given their high speed and with a dedicated link, it may be workable. Again an excellent testbed for the LTTng toolchain and some hardware measurements. We could test:

- Mesanet 6I24 FPGA PCIe card with 72 GPIO (baseline for low latency)

- Mesanet  7I80HD FPGA Ethernet with 72GPIO

- Mesanet 7I61 FPGA USB with 96 GPIO

Chris Morley

unread,
Apr 28, 2015, 10:36:34 PM4/28/15
to machi...@googlegroups.com


On Tuesday, April 28, 2015 at 6:37:18 PM UTC-7, Michel Dagenais wrote:


- Mesanet 6I24 FPGA PCIe card with 72 GPIO (baseline for low latency)

- Mesanet  7I80HD FPGA Ethernet with 72GPIO

- Mesanet 7I61 FPGA USB with 96 GPIO


On the linuxcnc side, realtime ethernet using Mesa products has been tested extensively and successfully.
I am not sure if anyone has built a ethernet based cnc machine (but I would bet yes)

I believe j1800 and j1900 are the new 'preferred' motherboard ( much faster then the Atom board )
Will be interesting to see your results.

Chris M

Kent A. Reed

unread,
Apr 29, 2015, 12:10:09 AM4/29/15
to machi...@googlegroups.com
On 04/28/2015 09:37 PM, Michel Dagenais wrote:
> - Raspberry PI 2, low end and most inexpensive.

Better to characterize one low-end board well than a bunch of them
poorly, and the PRi2 Model B clearly is the marketing winner, but if you
find time then it would be interesting to compare the real-time
performance of the RPi2 Model B with the comparably priced HardKernel
Odroid C1. The RPi2 runs 10M/100M Ethernet via one port on its USB
controller whereas the Odroid C1 has a dedicated 10M/100M/1G Ethernet
port (but cannot reach a full 1G according to their tech data).

There are other differences which would seem to give the performance
edge to the Odroid---see for example the two summary comments at
http://www.element14.com/community/thread/41374/l/odroid-c1-vs-raspberry-pi-2-speed?displayFullThread=true

I look forward to seeing LTTng do its stuff. For too long, we've been
like the drunkard who loses his car keys in a dark alley but is found
looking for them under a light post out on the street because that's
where he can see.

Regards,
Kent

Claudio Lorini

unread,
Apr 29, 2015, 2:42:37 AM4/29/15
to machi...@googlegroups.com
I'm doing quite a lot of work on the Zedboard platform in the field of 'robotics' and 'multichannel audio' 
both in realtime (xenomai) and standard linux environments, my colleagues are setting up a debug/trace 
setup with Lauterbach emulators and i think it will be very interesting to compare the results of a SW vs HW 
aproach...
C.

Dave Cole

unread,
Apr 29, 2015, 10:32:44 AM4/29/15
to machi...@googlegroups.com
On 4/29/2015 12:10 AM, Kent A. Reed wrote:
> For too long, we've been like the drunkard who loses his car keys in a
> dark alley but is found looking for them under a light post out on the
> street because that's where he can see.

Ahh... the streetlight effect! Yeah... that happens.

And it is amusing... as well. :-)

Dave

---
This email has been checked for viruses by Avast antivirus software.
http://www.avast.com

Michael Haberler

unread,
Apr 29, 2015, 11:14:18 AM4/29/15
to Michel Dagenais, Machinekit List, Peter C. Wallace
Michel,

> Am 29.04.2015 um 03:37 schrieb Michel Dagenais <michelr...@gmail.com>:
>
> Many people interested in LinuxCNC/MachineKit wonder about the control platform to select, from the most inexpensive (Raspberry PI) with more latency issues to the more expensive (Intel with FPGA card). The big question is always the real-time response for a typical challenging target such as 5 axis servo + spindle (3 phase variable frequency). We have the ability with LTTng to diagnose such real-time problems and thus to properly tune and then evaluate these different solutions. At the same time, it provides a challenging testbed for the LTTng toolchain. Your comments and suggestions on the hardware setup to test are most welcome!
>
> Here is my current list of popular and interesting hardware that we would like to try out and properly assess this Summer:
>
> - Raspberry PI 2, low end and most inexpensive.
>
> - BeagleBone Black and eventually BeagleBoard X15, slightly more expensive but very interesting for control applications with PRUs, PWM and eQEP.
>
> - Combined ARM and FPGA with Xilinx chips like Zedboards. The Adapteva Parallela board comes with a Xilinx Z-7010 or Z-7020. You get a dual-core A9, and FPGA for custom logic for encoder inputs and PWM or step generation.

the above two options look very reasonable to me; in fact with some VHDL background it might be possible to transplant the hostmot2 firmwares to those FPGA-based boards
the VHDL sources are here for the daring: http://git.linuxcnc.org/gitweb?p=hostmot2-firmware.git;a=tree;h=13bce0f420c95490155d9d6b451b72d4ceff6372;hb=13bce0f420c95490155d9d6b451b72d4ceff6372

>
> - Intel board (Atom or better, suggestions for model welcome) with parallel port I/O or any of the available FPGA I/O cards.

I have an old Atom D525 only which peforms fine (I have a 5i25 Mesanet card in there); also an Amd Brazos-chipset board (kindof the AMD Atoms), which is also performs reasonably

As Chris suggested - the newer Celerons seem to be the choice du jour in motherboard land

> Indeed, while some of the platforms may work without additional hardware (BeagleBone or Zedboard), an FPGA board provides a robust and high performance solution. I have a MesaNet 5I20 but the PCI bus is becoming legacy. Many newer models are offered using high speed data connections which are very convenient but not normally relied upon for real-time work (USB and Ethernet). However, given their high speed and with a dedicated link, it may be workable. Again an excellent testbed for the LTTng toolchain and some hardware measurements. We could test:
>
> - Mesanet 6I24 FPGA PCIe card with 72 GPIO (baseline for low latency)

I have no data point personally but generally Mesanet driver support is excellent, so very likely good

> - Mesanet 7I80HD FPGA Ethernet with 72GPIO

As Chris said - works great with RT-PREEMPT and a crossover cable (no switches - those add delay); it is UDP-based and as such vanilla Ethernet without any RT time slotting like EtherCAT, Ether/IP or Powerlink etc; but if point-to-point and an extra dedicated ethernet interface is fine it's a great choice also for motion control

I understand Peter has it running stable with servo rates of several kHz, which is very high for machinekit/linuxcnc setups (normally at 1kHz)

There are also SPI variations of these boards, but that better be explained by Peter Wallace of Mesanet; this would be an option for ARM boards, most of which have SPI

>
> - Mesanet 7I61 FPGA USB with 96 GPIO

The general wisdom is - USB will introduce much jitter, and with the machinekit/linuxCNC timing model normally the host is the timing source. So any jitter introduced by the link is bad news.

That said, Peter has come up with a clever feature - the card has a digitial PLL and corrects for host/link jitter, but I cant say if this is applicable for this card

- Michael


>
>
> --
> website: http://www.machinekit.io blog: http://blog.machinekit.io github: https://github.com/machinekit
> ---
> You received this message because you are subscribed to the Google Groups "Machinekit" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to machinekit+...@googlegroups.com.
> Visit this group at http://groups.google.com/group/machinekit.
> For more options, visit https://groups.google.com/d/optout.

Dave Cole

unread,
Apr 29, 2015, 11:35:54 AM4/29/15
to machi...@googlegroups.com
For a PC platform, I'd look hard at the J1900 solutions, for instance
the Gigabyte J1900N-D3V dual ethernet port board and the Mesa 7i80HD.
I think that combo has
tremendous potential at a low cost. I have both devices here but have
lacked the time needed for experimentation.

I have Raspberry PIs and BBBs here and I think the minimum worthwhile
board to work with is the BBB. IMO, the cost between the Raspberry PI
and the BBB is insignificant.

Dave

Charles Steinkuehler

unread,
Apr 29, 2015, 11:59:39 AM4/29/15
to machi...@googlegroups.com
On 4/29/2015 10:14 AM, Michael Haberler wrote:
>
>> Am 29.04.2015 um 03:37 schrieb Michel Dagenais <michelr...@gmail.com>:
>>
>> - Combined ARM and FPGA with Xilinx chips like Zedboards. The
>> Adapteva Parallela board comes with a Xilinx Z-7010 or Z-7020. You
>> get a dual-core A9, and FPGA for custom logic for encoder inputs
>> and PWM or step generation.
>
> the above two options look very reasonable to me; in fact with some
> VHDL background it might be possible to transplant the hostmot2
> firmwares to those FPGA-based boards the VHDL sources are here for
> the daring:
> http://git.linuxcnc.org/gitweb?p=hostmot2-firmware.git;a=tree;h=13bce0f420c95490155d9d6b451b72d4ceff6372;hb=13bce0f420c95490155d9d6b451b72d4ceff6372

The VHDL on the LinuxCNC github is typically out of date. You're likely
better off grabbing a current *.zip file with the firmware direct from
Mesanet. AFAIK, there is no "official" upstream code repository, as I
don't think Peter uses git/svn/cvs/etc.

I intend to port the hm2 VHDL code to one of the ARM+FPGA SoC devices
when a suitable board becomes available (sub $100 or so with easily
accessible I/O). The Parallela board is close, but doesn't quite seem
like the right fit for motion control applications, mostly due to lack
of I/O.

--
Charles Steinkuehler
cha...@steinkuehler.net

signature.asc

Michael Haberler

unread,
Apr 29, 2015, 6:52:54 PM4/29/15
to Machinekit List
[Peter Wallace suggested I forward this]

> Anfang der weitergeleiteten Nachricht:
>
> Von: "Peter C. Wallace" <p...@mesanet.com>
> Betreff: Aw: Fwd: [lttng-dev] introduction & usage of LTTNG in machinekit
> Datum: 29. April 2015 16:16:09 MESZ
> An: Michael Haberler <mai...@mah.priv.at>
>
>
> OK, not sure how valuable my generic arm-waving info is but I have spent a bit of time running our Ethernet FPGA boards on Linuxcnc/PC hardware and have some general observations on Preemt-RT/linuxcnc performance with hm2_eth. Hm2_eth uses 3 packets per servo thread (read request from host, read data from FPGA
> and write data from host).
>
> 1. CPU speed seems the best indicator of performance
> 2. Cache size is a close second
>
> Slow CPUs like Atom only allow ~1 KHz servo thread rates
> mid speed CPUs like the old Core2 Duos allow about 3 KHz
> (a E8500 = 6M cache 3.16 GHz dual is good for almost 4 KHz
> with about one error per week of uptime)
> High speed CPUs like the G3258 (or I3,I5,I7) work the best
>
> The G3258 on a asrock h97m pro4 is the best performing
> Preemt-rt/Ethernet MB I have (solid 4 KHz operation regardless of various tortures)
>
> Heres a 18 hour or so latency plot of the H97/G3258 with Preemt-RT:
>
> http://freeby.mesanet.com/h97-g3258-preemt-rt.png
>
>
> And here's a couple hour latency plot of my old Dell laptop:
>
> http://freeby.mesanet.com/e6420.png
>
> It will run linuxcnc/hm2_eth reliably at 2 KHz or so if you dont pull the power cord when running :-)
>
> One other observation (probably a caching issue) is that the faster you go, the faster you can go (If you run the servo thread faster, the worst case delay spikes are shorter)
>
> I have tested with Intel and Realtek macs
>
> Intel macs need to have IRQ coalescing turned of
>
> Never got Atheros macs to work very well
>
>
> I dont expect USB has much hope without specially tailored device hardware
> (maybe USB3?)
>
> As an aside I have run linuxcnc/hm2eth over a USB/Ethernet adapter and was surprised that it actually works pretty well (for an hour or so)
>
>
> Peter Wallace
> Mesa Electronics
>
> (\__/)
> (='.'=) This is Bunny. Copy and paste bunny into your
> (")_(") signature to help him gain world domination.

Michel Dagenais

unread,
Apr 30, 2015, 6:03:30 PM4/30/15
to machi...@googlegroups.com

I believe j1800 and j1900 are the new 'preferred' motherboard ( much faster then the Atom board )
Will be interesting to see your results.

Yes, these boards look very interesting. I will add a J1900 board to my list. Thanks!

Michel Dagenais

unread,
Apr 30, 2015, 6:06:59 PM4/30/15
to machi...@googlegroups.com
Better to characterize one low-end board well than a bunch of them
poorly, and the PRi2 Model B clearly is the marketing winner, but if you
find time then it would be interesting to compare the real-time
performance of the RPi2 Model B with the comparably priced HardKernel
Odroid C1.

Yes, Odroid deserves to be on the list. Thanks!
 
I look forward to seeing LTTng do its stuff. For too long, we've been
like the drunkard who loses his car keys in a dark alley but is found
looking for them under a light post out on the street because that's
where he can see.

It always takes a little time to learn a new tool but indeed it makes the whole difference seeing exactly what your system is doing and when, with every interrupt and scheduling switch event! 

Michel Dagenais

unread,
Apr 30, 2015, 6:12:37 PM4/30/15
to machi...@googlegroups.com, p...@mesanet.com, michelr...@gmail.com

> - Mesanet 7I61 FPGA USB with 96 GPIO

The general wisdom is - USB will introduce much jitter, and with the machinekit/linuxCNC timing model normally the host is the timing source. So any jitter introduced by the link is bad news.

That said, Peter  has come up with a clever feature - the card has a digitial PLL and corrects for host/link jitter, but I cant say if this is applicable for this card

More reason to try it. If any jitter is associated with Linux Preempt-RT, we should have a clear picture with LTTng and may be able to improve it. 

Michel Dagenais

unread,
May 1, 2015, 10:07:22 AM5/1/15
to machi...@googlegroups.com


On Tuesday, April 28, 2015 at 9:37:18 PM UTC-4, Michel Dagenais wrote:
Many people interested in LinuxCNC/MachineKit wonder about the control platform to select, from the most inexpensive (Raspberry PI) with more latency issues to the more expensive (Intel with FPGA card). The big question is always the real-time response for a typical challenging target such as 5 axis servo + spindle (3 phase variable frequency). We have the ability with LTTng to diagnose such real-time problems and thus to properly tune and then evaluate these different solutions. At the same time, it provides a challenging testbed for the LTTng toolchain. Your comments and suggestions on the hardware setup to test are most welcome!

The plan is twofold. First, run a demanding real-time task while monitoring the real-time deadlines (actual wakeup time versus scheduled one and service time). Already, with a trace of all important kernel events, it is possible to detect any violation as well as see what the competing tasks are. In addition, it is possible to detect at runtime the latency violation and log even more information in that case, for instance stack dumps. This was described in an interesting blog post.


The second part of the plan is to measure the I/O latency of the various solutions. First we need to determine and evaluate the most direct path to the exterior, for example GPIO on the ARM core or some other pin that can easily be toggled and monitored. We can then connect an output pin to an input pin, send a transition and read it back, measuring the delay in cycles. We can then hopefully assume that the delay is mostly symmetrical and does not vary much. From there, we can use this as a baseline to measure the latency for other I/O (GPIO of a MESANET card connected to PCIe, to Ethernet or to USB2 or USB3). A signal can be connected to the direct input pin and to an input pin for which the latency is to be measured, we can then see when a transition is seen on each input by a program and measure the difference in cycles. The same can be done for outputs. 

Kent A. Reed

unread,
May 1, 2015, 10:27:45 AM5/1/15
to machi...@googlegroups.com
Michel:

This is *very* exciting. I look forward to your findings.

From the earliest times of Machinekit's ancestral projects, EMC, then
EMC2/LinuxCNC, we tried various hacks to determine what really matters.
In the end, most folks satisfied themselves with the gross latency
numbers reported by the tools in the underlying-RT package or by a
primitive test included with EMC2/LinuxCNC and now Machinekit. Once the
x86 world (especially its myriad BIOSes) settled down the interest
waned, although occasionally a new technology causes the email lists to
light up, but the ARM jungle is a whole new ballgame.

Regards,
Kent

Michel Dagenais

unread,
May 1, 2015, 2:42:18 PM5/1/15
to machi...@googlegroups.com
Once the x86 world (especially its myriad BIOSes) settled down the interest
waned, although occasionally a new technology causes the email lists to
light up, but the ARM jungle is a whole new ballgame.

We have seen a nasty case of latency on a x86 system once. The trace would not show anything but there were gaps of 30ms once in a while. We used the hardware latency detector Linux module to check the problem. This module basically disables interrupts for 500ms each second and does read the cycle counter in a tight loop checking that the elapsed time for the loop is of a few cycles. If it is longer, it notes the k longest latencies encountered as well as their time of occurrence. In the end, it prints the offending values. This uncovers evidence of non maskable interrupts outside the control of the operating system. In that case it was a SMI interrupt that could not be disabled in any way. However, you have the possibility of configuring how the interrupts are distributed among cores. Multi-core is your friend, a lot of things become much simpler when one core can serve as non real-time do-it-all and you reserve the other cores for time critical tasks. 

Kent A. Reed

unread,
May 1, 2015, 3:04:41 PM5/1/15
to machi...@googlegroups.com
On 05/01/2015 02:42 PM, Michel Dagenais wrote:
> a SMI interrupt that could not be disabled in any way.

That bit us more than once too, e.g., see
http://rtai.dk/cgi-bin/gratiswiki.pl?Latency_Killer (dating from 2006)
and http://wiki.linuxcnc.org/cgi-bin/wiki.pl?FixingSMIIssues (which has
been around for years but I see it was last updated in 2014).

Regards,
Kent

Reply all
Reply to author
Forward
0 new messages