SpiNNaker Scaling Studies?

13 vistas
Ir al primer mensaje no leído

Bernhard Vogginger

no leída,
23 nov 2022, 2:55:30 p.m.23/11/2022
para SpiNNaker Users Group,Leon Peto

Dear SpiNNaker team and community,

I was wondering whether a scaling studies as described below are available for the SpiNNaker machine:

When scaling up a task, like increasing the size of a SNN, how do execution time and power/energy consumption behave?

Of course, for the case of a real-time SNN simulation, I would expect that the actual network simulation time is constant, given a fixed duration. However, I would also expect that with increasing network size the time for data generation and loading increases. At the same time, I expect that more 48-node boards are required and thus the power consumption goes up.

Are there any papers reporting such scaling studies? Ideally going beyond single boards ...

The only work I'm aware are the Neuromorphic Sampling papers by [Mendat at al. 2016,2019] which scale the network size at a fixed hardware size.


In the same way, I'd be interested in the case with a fixed problem size. Are there studies where -- when increasing the number of cores or SpiNNaker chips -- the task can be accelerated?


I'd be really happy to know whether such studies exist and have a look at the results. Any reply is helpful!


Many thanks in advance and kind regards,

Bernhard


References:

-- 
Dipl.-Phys. Bernhard Vogginger

Technische Universität Dresden
Faculty of Electrical and Computer Engineering
Institute of Principles of Electrical and Electronic Engineering
Chair of Highly-Parallel VLSI-Systems and Neuro-Microelectronics

D-01062 Dresden, Germany 

---------------------------------------------------------
E-mail:     Bernhard....@tu-dresden.de
Tel.:       +49 (351) 463-34372
Mobile:     +49 176 38097278
Fax.:       +49 (351) 463-37794
WWW.:       https://tu-dresden.de/ing/elektrotechnik/iee/hpsn

Andrew Gait

no leída,
24 nov 2022, 12:02:57 p.m.24/11/2022
para Bernhard Vogginger,SpiNNaker Users Group,Leon Peto
Hi Bernhard,


You might be interested in these papers https://www.frontiersin.org/articles/10.3389/fnins.2016.00420/full and https://royalsocietypublishing.org/doi/10.1098/rsta.2019.0160 at least with regard to your second question on fixed problem sizes and increasing usage of chips/cores. We now have an implementation that's similar to this running in the main toolchain but we don't yet have any papers specifically using it / describing it as far as I am aware.


I'm not aware of any particular scaling studies across multiple boards - we have been more interested in general in making models run in real time - but someone else can jump in and say otherwise. I have worked on our own implementation of Markov Chain Monte Carlo methods (as yet unpublished - other things have taken priority in recent years) and I can say that there's definitely much more time required when running jobs over hundreds of boards on SpiNNaker simultaneously even without the added complication of using an SNN. I don't really have any feel for how this scales in particular, and I haven't tried it much recently since we have been doing some relatively intensive work on reducing the overheads of data generation and loading, so it would likely be quite different now from how it was when I previously ran these tests. If there is any interest then I could possibly run some tests over the next few weeks and report back...


Andy




--
You received this message because you are subscribed to the Google Groups "SpiNNaker Users Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spinnakeruser...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/spinnakerusers/1b932aa6-7222-6144-0c04-a8d1079d4f68%40tu-dresden.de.

Andrew Rowley

no leída,
25 nov 2022, 4:52:31 a.m.25/11/2022
para Andrew Gait,Bernhard Vogginger,SpiNNaker Users Group,Leon Peto

Hi,

 

Indeed, the papers pointed to are the best that I know about in terms of power measurement and scaling.  We have tried to make it so that the software isn’t too bad on this end, but there is definitely more that would need to be done to claim this complete.  Even in terms of a running network there can be issues related to the number of inputs that any group of neurons might receive; I say “group of neurons” as the current software can support this by having a configurable number of synapse cores (up to the limit of the cores available on a chip since they communicate via SDRAM).  It is easy to consider scaling in terms of the number of neurons in the network, but many of the networks encountered more likely scale with the number of incoming spikes per group of neurons, and this can make them harder to realise on the hardware (currently this is stated without much proof; more of a feeling based on how the software works on the hardware).  This is at least in part due to how the current routing works, where the spikes from every source group-of-cores in a population reach every target group-of-cores, so even very sparse connections can end up adding a very large number of spikes, only for some of those to be useless on the receiving side (but it still has to do work to receive them and throw them away).  If we could reduce the number of spikes in such sparse connections, they might become easier to handle.

 

In terms of scaling of mapping, this is probably much worse at present than the theoretical best possible as we still do many things on the host computer, like placement and routing.  These have been greatly improved recently but are still likely a bottleneck.  For major improvements in this area, we would need to look more at using SpiNNaker itself to do the mapping directly e.g. passing the problem to one part of SpiNNaker and then it can divide and conquer over the machine.  The recent improvements in placement and routing were done with that in mind, so it might not be too much effort to move in that direction, but we haven’t so far.

 

One area we already do better in is in data generation and expansion on the machine.  The generation has again recently been improved to scale more with the number of populations in the network than the number of cores, though loading of the data is still done core-by-core.  The population-level data is generated once and then reused on each core.  This includes the descriptions of the connectivity which are then expanded on the machine.  An improvement here, similar to the above mapping improvement, would be to load the data for each population once and then have the machine pass that around the cores of the population.

 

I hope that helps with some thoughts on this!

 

Andrew :)

Bernhard Vogginger

no leída,
28 nov 2022, 4:54:24 p.m.28/11/2022
para Andrew Rowley,Andrew Gait,SpiNNaker Users Group,Leon Peto

Dear Andy and Andrew,

Many thanks for your detailed replies!

Thanks for pointing again to the "heterogeneous parallelization" which really seems to be very effective for making the simulation faster and scalable while maintaining the real-time constraint. I'm also happy to read that this now in the main toolchain.

Thanks, Andrew, also for sharing your view on how the mapping and data generation scale and how you are working to improve that.

Thanks, Andy, for offering to run some tests with the latest software for the MCMC. While I'm very interested in looking at the results, I don't want to create additional work load for you ;-) There is no hurry or specific need for such results.

In general, I still think performing scaling studies as I sketched in my e-mail could be very helpful to prove one of SpiNNaker's unique selling, its scalability at real-time operation. We'll take this into consideration also for SpiNNaker2.

Of course, this requires the availability of scalable benchmarks ...

Thanks again for sharing your thoughts, it was very helpful!

best, Bernhard

Am 25.11.22 um 10:52 schrieb Andrew Rowley:
Responder a todos
Responder al autor
Reenviar
0 mensajes nuevos