router on 0, 0 has a non-zero error status

36 views
Skip to first unread message

Vaggelis Ntouros

unread,
Mar 15, 2022, 7:20:06 AM3/15/22
to SpiNNaker Users Group
Hello all,

I have a setup with a MCU injecting controlled spikes in spinnaker using an fpga. I ve managed to sent a lot of spikes previously having my setup on a breadboard with success.

Building the main board though results in some spikes getting lost. (measuring the in and out messages, the out ones are less) and the following error message appears

The router on 0, 0 has a non-zero error status.  This could indicate a hardware fault.  The errors set are [<RouterError.ERROR: 2147483648>, <RouterError.OVERFLOW: 1073741824>, <RouterError.PARITY: 536870912>], and the error count is 5

I am trying to understand whether it is an issue with the board I built after the breadboard solution or something in spinnaker that occured in between.

Any ideas?

Thank you,
Vaggelis

PS1 I attach a photo of the spinnaker ack signal before and after the level shifter to be sure that its small jitter does not produce any issues.

PS2 Tried to test spinnaker link 1 but could not set up my script correctly as I do not get any spikes. I attach the script, maybe there is something wrong in it. 
 
275714439_1032111354057854_4132097514791104964_n.jpg
retina_to_snn.py

Andrew Rowley

unread,
Mar 15, 2022, 7:55:33 AM3/15/22
to Vaggelis Ntouros, SpiNNaker Users Group

Hi,

 

If there was an error on SpiNNaker, you would likely get these errors just running a normal simulation since that also sends packets through the routers.  If you only see it with your connected device, I would suspect the device. 

 

Note that the error is generated in software, and was created without thinking about connected devices, hence the “this could indicate a hardware fault” message; in a non-device simulation, if this happens it would likely mean something was broken on the SpiNNaker board since software on SpiNNaker can’t generate these sorts of errors.  I think they can happen with multi-board simulations if the cables between boards are loose for example.

 

For information, the routers have an error status register, which has multiple flags and a single counter.  The flags are sticky in that once set they don’t clear until read.  The count then tells you the total number of errors, but doesn’t indicate how many of each.  However, ERROR means any error was detected, OVERFLOW means that more than one error was detected before the register was read, so that leaves only PARITY, which indicates that you had 5 parity errors.

 

Hope that helps,

 

Andrew :) 

 

--
You received this message because you are subscribed to the Google Groups "SpiNNaker Users Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spinnakeruser...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/spinnakerusers/e8640fe2-ed8a-47c9-b830-4559a3271de4n%40googlegroups.com.

Luis Plana Cabrera

unread,
Mar 15, 2022, 7:56:27 AM3/15/22
to Vaggelis Ntouros, SpiNNaker Users Group
<RouterError.PARITY: 536870912>
​This indicates that the router received a packet with the wrong parity. This can be the result of the parity being incorrectly computed at the origin or electrical noise causing a bit in the packet to change value.

<RouterError.OVERFLOW: 1073741824>
This indicates that more than one parity error was received.

If this wasn't happening on the breadboard, I suggest that the parity bit is being computed correctly and you have a noise problem.



Sent: 15 March 2022 11:20
To: SpiNNaker Users Group <spinnak...@googlegroups.com>
Subject: [SpiNNaker Mailing List] router on 0, 0 has a non-zero error status
 
--

Vaggelis Ntouros

unread,
Mar 15, 2022, 12:47:09 PM3/15/22
to SpiNNaker Users Group
I fixed the noise and now I do not get any errors. I only get warnings that some packets arrived late. Due to 

[Simulation]
drop_late_spikes = False

the packets are not dropped. What I notice though is a very small packet loss. I inject about 40000 packets and I can see with live output about 10 packets less. Is this something normal? If not since there are no other warnings how should I approach this?

Vaggelis

Vaggelis Ntouros

unread,
Mar 16, 2022, 5:23:25 AM3/16/22
to SpiNNaker Users Group
After some tests I did today I wish to provide new data on the above. 

1) There is no packet loss. The number of incoming packets is the same as outgoing.
2) The issue that was observed is that some packets were dropped from the fpga because apparently spinnaker delayed enough the acknowledge of the previous packet.

So my question is the following. Is it normal that spinnaker occasionally delays the transfer that much? The incoming event rate is relatively low, 1kHz, thus by dropping packets it is assumed that spinnaker have delayed the transfer time in the order of milliseconds.

Vaggelis

Andrew Rowley

unread,
Mar 16, 2022, 7:22:50 AM3/16/22
to Vaggelis Ntouros, SpiNNaker Users Group

Hi,

 

Things that can hold up the delivery of packets includes whatever is receiving them at the other end.  If a core is failing to pull packets from the router, the router can be held up while this happens.  There is a queue in the router of course but if this becomes full due to waiting for down-stream routers and cores this can end up backing up over the whole fabric of the network. 

 

If you can do a run of the network for a fixed time, you should get some reports from the cores if they couldn’t keep up with the spikes.  If you are sending only 1000 packets per second, I would have expected them to keep up, but I can’t be sure!

 

Andrew :)

 

Andrew Rowley

unread,
Mar 16, 2022, 7:24:26 AM3/16/22
to Andrew Rowley, Vaggelis Ntouros, SpiNNaker Users Group

Hi,

 

I should add that the peak instantaneous rate is likely to be important here.  If you try to send 1000 packets very quickly, then wait until the next second and do this again, the receiver might not be able to pull the packets quickly enough so some will get lost.

 

Andrew :)

Vaggelis Ntouros

unread,
Mar 17, 2022, 5:15:14 AM3/17/22
to SpiNNaker Users Group
Hello,

Thank you for the information, I can replicate what you say, without any errors all of the packets are reported correctly. Currently I am testing the limits of the system to find out the max event rate.

Reminder: 
1) The goal is to connect a dvs with spinnaker using an fpga. 
2) The dvs emits spikes at maximum rate, around 1 MHz. 
3) Currently I simulate the dvs using a MCU feeding a controllable spike train.
4) The script implements just a dummy 8192 neuron population, that maps directly the spikes from the dvs, with weights=5 (each spike from retina results in one spike in population).

So, I noticed the following:
1) Feeding spikes to the same neuron at 100 Hz was ok. No packet loss.
1) Tested also with 50 kHz event rate (different neurons spike each time) and everything is ok. The script reports about 50 spikes per millisecond.
2) Increasing the frequency till 77 kHz approximately, I can see that every spike is reported but the rate is not stable. One step reports 12 spikes and another 63 and so on...
3) Increasing the frequency even more (125 kHz) results in A maximum of 2 background tasks were queued on pop_1:7936:8191 on 1, 1, 9.  Try increasing the time_scale_factor located within the .spynnaker.cfg file or in the pynn.setup() method. multiple times and a lot of spikes (30%) are lost.

So the questions are the following:
1) Is there any hard limit on the injected spikes? If not, could there be an application specific limit close to the frequencies I discuss above? Or am I missing anything?
2) About the timestep, does working on a 1 msec timestep mean, that spikes coming from a dvs "lose" their microsecond resolution? In other words, a spike arriving at 10 usec and one that arrives at 990 usec, are processed by the first population the same time?
3) Again, regarding the timestep, I saw in other threads, that increasing it, could result in issues in real time systems. I understand that this is related to the size of the SNN. So, the way one should approach this is to test with timesteps < 1 msec and see if any warnings/errors are reported, right?

Vaggelis   

Andrew Rowley

unread,
Mar 17, 2022, 6:25:47 AM3/17/22
to Vaggelis Ntouros, SpiNNaker Users Group

Hi,

 

1) Is there any hard limit on the injected spikes? If not, could there be an application specific limit close to the frequencies I discuss above? Or am I missing anything?

 

The hardware limit for transmitting on a SpiNNaker link is ~600 million packets per second.  The application running on a core will affect the actual limit in the number of packets, as the packets need to go somewhere of course.  A core runs at 200Mhz and takes ~17 clock cycles to receive a packet, so the absolute limit for a single core would be around 11.7 million packets per second.  That would assume that the core only stores the packets but of course it also needs to process them.  Unfortunately it is much harder to work out the limits for this, as it depends on what is happening to each spike in the simulation.  Each spike will likely require a DMA of a synaptic row from SDRAM, and then the size of that row (i.e. the number of synapses it contains) will dictate the processing duration for that spike.  In the current default operation of the master branch, the cores also execute the neuron state updates, which will then depend on how many neurons are running on each core.  With e.g. 256 neurons on a core (the default), state update might take 30% of the CPU time, leaving 70% for spike reception and synaptic processing.

 

If you are finding that ~100Khz still allows everything to run, but 125Khz doesn’t, it suggests that this is about the limit that the current code can cope with in terms of incoming spikes.  It is worth noting additionally that in the current code again, every spike is likely being received by every core that is being used to simulate the Population targeted by the Projection from the FPGA, meaning that the limit for one core is the limit for all.

 

The above assumes the basic model we have been using for some time, but it is worth noting that we now have two things available that could help with the situation to a certain extent, though there are some potential short falls for this exact situation which are discussed below:

 

  1. The current release code can be set up so that the processing of spikes is separated from the processing of neurons, and furthermore each neuron core is fed by multiple synapse processing cores, where each synapse core handles a subset of the incoming spikes.  The main issue with this is that we haven’t yet tried it with spikes from external devices.  I believe that this may not help because I don’t think the code knows how to split up the external device into subsets that can target each synapse core separately, so all spikes will likely be received by a single synapse core, which won’t help that much.  That said, even the separation into neuron and synapse cores will reduce the processing required on each, so may help a bit.

 

Examples of this in operation can be seen in the PyNN Examples e.g.:

https://github.com/SpiNNakerManchester/PyNN8Examples/blob/master/examples/split_examples/va_benchmark_split.py#L162-L169

This example shows two different Populations being created with “splitters”, one with 2 synapse cores per neuron and one with 3 (for no particular reason other than to show the example in this case).

 

  1. We have been working on code that can split up an incoming device into multiple sub-devices, allowing different parts of the device to then be received by different cores.  Additionally, if the input is a DVS, this can be represented as a 2D input, and connected to Populations that are 2D in nature.  Depending on the connectivity patterns it may be possible to reduce the number of spikes received by each core.  This is currently in a set of git branches, though these are quite close to the “git master” code, so should work without issue. 

 

To use these branches, you would need to checkout the git master software using these instructions:

https://spinnakermanchester.github.io/development/gitinstall.html

 

You would then need to change the following modules to use the “extdev_fpgas” branch:

SpiNNMachine, PACMAN, SpiNNFrontEndCommon, sPyNNaker and JavaSpiNNaker (if you have this, if not don’t worry).

 

You will then need to rebuild the application binaries by running:

SupportScripts/automatic_make.sh

 

We have then been concentrating on a specific retina device, which is then described here:

https://github.com/SpiNNakerManchester/sPyNNaker/blob/extdev_fpgas/spynnaker/pyNN/external_devices_models/spif_retina_device.py

This is specific in that it is connected to and FPGA on a 48-node board, which means that it uses the Application2DFPGAVertex, where you would likely need to try to use the Application2DSpiNNakerLinkVertex.  This is clearly less tested, though I am happy to try to fix any errors you find!  These both take a width, height, sub_width and sub_height.  The latter two items tell the code to split the input into several smaller squares or rectangles, which then affects what the receiver will have to receive.

 

  1. It may be possible to combine the above things; note that we haven’t really tried this, though I can’t immediately see why it wouldn’t work.

 

 

2) About the timestep, does working on a 1 msec timestep mean, that spikes coming from a dvs "lose" their microsecond resolution? In other words, a spike arriving at 10 usec and one that arrives at 990 usec, are processed by the first population the same time?

 

The spikes will only lose their resolution in the sense that the target Population will process the received spikes within the same time step and so they will only affect the neuron membrane as a group rather than individually.  The spikes are received by the Population essentially instantaneously, and are then put in a queue from which they are processed.  The processing will start as soon as the first spike is received at 10 usec; it doesn’t wait until the end of the time step to do this.  The spike received at 990 usec might end up being processed in the next time step, so in that sense the effect will be separate from the first.

 

 

3) Again, regarding the timestep, I saw in other threads, that increasing it, could result in issues in real time systems. I understand that this is related to the size of the SNN. So, the way one should approach this is to test with timesteps < 1 msec and see if any warnings/errors are reported, right?

 

Reducing the time step to smaller values will definitely affect the ability for the network to perform in real-time.  As described above, the neuron processing here will be increased.  It is unlikely that 256 neurons will be able to be processed in real-time on a core with a 0.1ms time step for example, since this requires 10x the amount of processing.  This was really what the split neuron-synapse core model was designed for though, since the neuron core is then separate.  Note that there will also be a smaller knock-on effect on synapse cores too since these also do some once-per-timestep operations; this will be much less though, especially if the expected spike reception rate is to be similar regardless of the time step (though note that simulated neurons can only spike once per timestep, so clearly reducing the time step could increase the spike rate of neurons if they are already saturated, which would affect any simulated neurons to which they are connected).

 

I hope that all helps you to decide what to do, but I understand I have given you a lot of information there, so feel free to ask more questions!

 

Andrew :)

 

ntouev

unread,
Mar 21, 2022, 2:44:10 PM3/21/22
to Andrew Rowley, SpiNNaker Users Group
Hello,

thanks a lot for the support, I managed to define the max event rate supported by my SNN. I will study your suggestions in case the system demands a higher temporal resolution.

One more question though. Mr Cabrera informed me in a previous reply that  <RouterError.PARITY: 536870912> indicates a rejected packet due to parity error. Occasionally I face some parity errors and some <RouterError.FRAMING> ones. Sometimes both. So first of all I would like to know what does a framing error mean? And secondly, should I worry about these occasional errors? I should mention that they are rare, meaning I get 0-30 of them in tens of thousants packets overall.

Vaggelis

Andrew Rowley

unread,
Mar 28, 2022, 9:33:34 AM3/28/22
to ntouev, SpiNNaker Users Group

Hi,

 

I think a framing error is a packet that doesn’t have the correct number of bytes.  In general, these sort of errors are generated by the external side of the interface; if they are generated inside SpiNNaker it would suggest a more serious hardware error, but this much less likely if you have been using the board otherwise without issue.  Whether they are of concern is application dependent.  It means that something isn’t quite working correctly in your set up, but if you don’t mind a few packets going missing, you could presumably ignore them.  If it is important that every packet gets through, you should try to work out what is causing the issue.

 

Andrew :)

Reply all
Reply to author
Forward
0 new messages