Hi,
If there was an error on SpiNNaker, you would likely get these errors just running a normal simulation since that also sends packets through the routers. If you only see it with your connected device, I would suspect the device.
Note that the error is generated in software, and was created without thinking about connected devices, hence the “this could indicate a hardware fault” message; in a non-device simulation, if this happens it would likely mean something was broken on the SpiNNaker board since software on SpiNNaker can’t generate these sorts of errors. I think they can happen with multi-board simulations if the cables between boards are loose for example.
For information, the routers have an error status register, which has multiple flags and a single counter. The flags are sticky in that once set they don’t clear until read. The count then tells you the total number of errors, but doesn’t indicate how many of each. However, ERROR means any error was detected, OVERFLOW means that more than one error was detected before the register was read, so that leaves only PARITY, which indicates that you had 5 parity errors.
Hope that helps,
Andrew :)
--
You received this message because you are subscribed to the Google Groups "SpiNNaker Users Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
spinnakeruser...@googlegroups.com.
To view this discussion on the web, visit
https://groups.google.com/d/msgid/spinnakerusers/e8640fe2-ed8a-47c9-b830-4559a3271de4n%40googlegroups.com.
<RouterError.PARITY: 536870912>
<RouterError.OVERFLOW: 1073741824>
Hi,
Things that can hold up the delivery of packets includes whatever is receiving them at the other end. If a core is failing to pull packets from the router, the router can be held up while this happens. There is a queue in the router of course but if this becomes full due to waiting for down-stream routers and cores this can end up backing up over the whole fabric of the network.
If you can do a run of the network for a fixed time, you should get some reports from the cores if they couldn’t keep up with the spikes. If you are sending only 1000 packets per second, I would have expected them to keep up, but I can’t be sure!
Andrew :)
To view this discussion on the web, visit https://groups.google.com/d/msgid/spinnakerusers/19dbdf71-6b81-4406-87b6-68d28562d2acn%40googlegroups.com.
Hi,
I should add that the peak instantaneous rate is likely to be important here. If you try to send 1000 packets very quickly, then wait until the next second and do this again, the receiver might not be able to pull the packets quickly enough so some will get lost.
Andrew :)
To view this discussion on the web, visit https://groups.google.com/d/msgid/spinnakerusers/LO2P265MB5545C0E0D636339A2DFE4D6FC5119%40LO2P265MB5545.GBRP265.PROD.OUTLOOK.COM.
Hi,
1) Is there any hard limit on the injected spikes? If not, could there be an application specific limit close to the frequencies I discuss above? Or am I missing anything?
The hardware limit for transmitting on a SpiNNaker link is ~600 million packets per second. The application running on a core will affect the actual limit in the number of packets, as the packets need to go somewhere of course. A core runs at 200Mhz and takes ~17 clock cycles to receive a packet, so the absolute limit for a single core would be around 11.7 million packets per second. That would assume that the core only stores the packets but of course it also needs to process them. Unfortunately it is much harder to work out the limits for this, as it depends on what is happening to each spike in the simulation. Each spike will likely require a DMA of a synaptic row from SDRAM, and then the size of that row (i.e. the number of synapses it contains) will dictate the processing duration for that spike. In the current default operation of the master branch, the cores also execute the neuron state updates, which will then depend on how many neurons are running on each core. With e.g. 256 neurons on a core (the default), state update might take 30% of the CPU time, leaving 70% for spike reception and synaptic processing.
If you are finding that ~100Khz still allows everything to run, but 125Khz doesn’t, it suggests that this is about the limit that the current code can cope with in terms of incoming spikes. It is worth noting additionally that in the current code again, every spike is likely being received by every core that is being used to simulate the Population targeted by the Projection from the FPGA, meaning that the limit for one core is the limit for all.
The above assumes the basic model we have been using for some time, but it is worth noting that we now have two things available that could help with the situation to a certain extent, though there are some potential short falls for this exact situation which are discussed below:
Examples of this in operation can be seen in the PyNN Examples e.g.:
This example shows two different Populations being created with “splitters”, one with 2 synapse cores per neuron and one with 3 (for no particular reason other than to show the example in this case).
To use these branches, you would need to checkout the git master software using these instructions:
https://spinnakermanchester.github.io/development/gitinstall.html
You would then need to change the following modules to use the “extdev_fpgas” branch:
SpiNNMachine, PACMAN, SpiNNFrontEndCommon, sPyNNaker and JavaSpiNNaker (if you have this, if not don’t worry).
You will then need to rebuild the application binaries by running:
SupportScripts/automatic_make.sh
We have then been concentrating on a specific retina device, which is then described here:
This is specific in that it is connected to and FPGA on a 48-node board, which means that it uses the Application2DFPGAVertex, where you would likely need to try to use the Application2DSpiNNakerLinkVertex. This is clearly less tested, though I am happy to try to fix any errors you find! These both take a width, height, sub_width and sub_height. The latter two items tell the code to split the input into several smaller squares or rectangles, which then affects what the receiver will have to receive.
2) About the timestep, does working on a 1 msec timestep mean, that spikes coming from a dvs "lose" their microsecond resolution? In other words, a spike arriving at 10 usec and one that arrives at 990 usec, are processed by the first population the same time?
The spikes will only lose their resolution in the sense that the target Population will process the received spikes within the same time step and so they will only affect the neuron membrane as a group rather than individually. The spikes are received by the Population essentially instantaneously, and are then put in a queue from which they are processed. The processing will start as soon as the first spike is received at 10 usec; it doesn’t wait until the end of the time step to do this. The spike received at 990 usec might end up being processed in the next time step, so in that sense the effect will be separate from the first.
3) Again, regarding the timestep, I saw in other threads, that increasing it, could result in issues in real time systems. I understand that this is related to the size of the SNN. So, the way one should approach this is to test with timesteps < 1 msec and see if any warnings/errors are reported, right?
Reducing the time step to smaller values will definitely affect the ability for the network to perform in real-time. As described above, the neuron processing here will be increased. It is unlikely that 256 neurons will be able to be processed in real-time on a core with a 0.1ms time step for example, since this requires 10x the amount of processing. This was really what the split neuron-synapse core model was designed for though, since the neuron core is then separate. Note that there will also be a smaller knock-on effect on synapse cores too since these also do some once-per-timestep operations; this will be much less though, especially if the expected spike reception rate is to be similar regardless of the time step (though note that simulated neurons can only spike once per timestep, so clearly reducing the time step could increase the spike rate of neurons if they are already saturated, which would affect any simulated neurons to which they are connected).
I hope that all helps you to decide what to do, but I understand I have given you a lot of information there, so feel free to ask more questions!
Andrew :)
To view this discussion on the web, visit https://groups.google.com/d/msgid/spinnakerusers/dfee969b-5d2f-478f-b73d-90cea129b88fn%40googlegroups.com.
Hi,
I think a framing error is a packet that doesn’t have the correct number of bytes. In general, these sort of errors are generated by the external side of the interface; if they are generated inside SpiNNaker it would suggest a more serious hardware error, but this much less likely if you have been using the board otherwise without issue. Whether they are of concern is application dependent. It means that something isn’t quite working correctly in your set up, but if you don’t mind a few packets going missing, you could presumably ignore them. If it is important that every packet gets through, you should try to work out what is causing the issue.
Andrew :)