Matlab is faster ?

Michael Rule

unread,

Jul 2, 2010, 12:58:15 PM7/2/10

to ahh-d...@googlegroups.com

I'm trying to benchmark against Izhkivech's demo script
http://www.izhikevich.org/publications/spikes.htm

and the matlab script is consistantly beating spikestream.
here is my SS script

from SpikeStream import *
import numpy
numpy.random.seed(9876413)
magnitude = 1000
Ne,Ni=8*magnitude,2*magnitude
re,ri=numpy.random.random_sample(Ne),numpy.random.random_sample(Ni)
a_data=numpy.array([0.02]*Ne+[0.02+0.08*x for x in ri],numpy.float32)
b_data=numpy.array([0.2]*Ne+[0.25-0.05*x for x in ri],numpy.float32)
c_data=numpy.array([-65+15*x**2 for x in re]+[-65]*Ni,numpy.float32)
d_data=numpy.array([8-6*x**2 for x in re]+[2]*Ni,numpy.float32)
v_data=numpy.array([-10]*(Ne+Ni),numpy.float32)
u_data=numpy.array([-65*x for x in b_data],numpy.float32)
sim=newSim(1,1,False,0.5)
izh=Model("izh")
izh.newVar('a','',lambda n:a_data)
izh.newVar('b','',lambda n:b_data)
izh.newVar('c','',lambda n:c_data)
izh.newVar('d','',lambda n:d_data)
izh.newVar('u',deqn('u','a*(b*v-u)'),lambda n:u_data)
izh.newVar('v',deqn('v','0.04*v*v+5*v+140-u+input_current'),lambda n:v_data)
izh.spike_condition = 'v>=3'
izh.reset=seqn('v','c')+seqn('u','u+d')
izh.newInput(CustomCurrent("input_current+=20*RANDN;"))
c=sim.newModel(Ne+Ni,izh)
e=ExponentialSynapse('e',5,-10,0.01)
i=ExponentialSynapse('i',5,-70,0.01)
izh.synapses.extend([e,i])
izh.target_function = SubsetTargets(sim,izh,[((0,Ne),e),((Ne,Ne+Ni),i)])
sim.joinModelsRandomly(izh,izh,0.5)
spikes=AllSpikesProbe(sim,izh)
sim.prepare_and_run(20000)

and here are the tentative benchmarks
units 100 1000 10000 100000
matlab .004s 0.12s 106s out of memory
SS 1.54s 42s 339s fails to build connectivity matrix

So,

thats really... good... news. I can't get on CMU's VPN at the moment so if anyone wants to e-mail me the Brette et al. paper so I can see if SS still beats that, that might be good.

--mrule

Michael Rule

unread,

Jul 2, 2010, 1:08:49 PM7/2/10

to ahh-d...@googlegroups.com

see my spike condition... its v>=3 (mV)

so I'm getting too many spikes.

so, no panic yet.

Rick Gerkin

unread,

Jul 6, 2010, 3:32:19 PM7/6/10

to ahh-d...@googlegroups.com

update?

--
You received this message because you are subscribed to the Google Groups "ahh-discuss" group.
To post to this group, send email to ahh-d...@googlegroups.com.
To unsubscribe from this group, send email to ahh-discuss...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/ahh-discuss?hl=en.

--
Richard C. Gerkin
Postdoctoral Fellow, Dept. of Biological Sciences
Carnegie Mellon University

Michael Rule

unread,

Jul 6, 2010, 3:40:06 PM7/6/10

to ahh-d...@googlegroups.com

just got back from camping.

matlab is not faster if spike rate is low

but it is hard/impossible to exactly duplicate the izh model in ss/ahh
without writing a new algorithm for spike transfer

cmu web VPN is down, can't access the brette et al. paper. last time
cyrus ran that benchmark it was faster. I probably don't need to
actually see the paper, just grab the old benchmark and tweak it to
any API changes.

Michael Rule

unread,

Jul 7, 2010, 2:12:52 AM7/7/10

to ahh-d...@googlegroups.com

http://dl.dropbox.com/u/4345112/bdata.ods

benchmark data comparing against Izhkivech's demo model.

-- ss uses 50%, but Izh uses all to all connectivity with weights in uniform random [0,1]

-- I didn't scale total synaptic input. I just realized this is a problem. Scaling up either model quickly makes them spike way too much ( way.. way too much.. 1000 Hz ? ). BUT... at least the SS and Matlab behaviors are similar.

I guess, TODO

-- redo this with scaling to preserve realistic rates

-- redo brette et al benchmark

-- benchmark by scaling copies/replicas rather than net size

--mrule

On Tue, Jul 6, 2010 at 3:32 PM, Rick Gerkin <rge...@gmail.com> wrote:

Rick Gerkin

unread,

Jul 7, 2010, 10:35:54 AM7/7/10

to ahh-d...@googlegroups.com

Looking at the slopes of time vs size, the asymptotic speedup is about 6.57x (8.53x GPU only), although it looks like maybe MATLAB failed for the larger sims? Now, what exactly was the comparison? On what machine was each running, and was MATLAB using jacket, or .mex files or anything like that?

I think with similar numbers of spikes, the comparison is fair, but it might be good to check in one of the smaller sims how these spikes are distributed. For example, the more they are concentrated in a few neurons, the greater the bottleneck, right?

Whatever the speedup for medium-size single instances, if MATLAB chokes on large networks, and doesn't parallelize replicas, then the two main selling points are still there. Comparing one instance of a small/medium network is really the least generous comparison from SS perspective, and you still get 7x, although I'm not sure about what MATLAB optimizations could have been taken and how easy they are to implement.

Rick

Cyrus Omar

unread,

Jul 7, 2010, 11:08:36 AM7/7/10

to ahh-d...@googlegroups.com

It's pretty shifty to compare performance on two different networks. A
more fair comparison would be the Brette et al, 2007 model. They have
code available for that on ModelDB for a variety of simulators
including SciLAB (which should work directly in Matlab or with very
few modifications) and Brian. I compared unoptimized Brian and
SpikeStream and there was a 2.5x speedup on a single realization
containing 4000 neurons. It would probably even out a bit more if I
figured out how to enable Brian's C code gen.

However there is no way to speed up multiple realizations and Brian
was 70x slower at 1000 realizations. This should be equivalent to a 4
million neuron network controlled for rate. So I don't think we are in
any danger as long as you are working on sufficiently large problems.

Cyrus

On Wednesday, July 7, 2010, Rick Gerkin <rge...@gmail.com> wrote:
> Looking at the slopes of time vs size, the asymptotic speedup is about 6.57x (8.53x GPU only), although it looks like maybe MATLAB failed for the larger sims? Now, what exactly was the comparison? On what machine was each running, and was MATLAB using jacket, or .mex files or anything like that?
>
> I think with similar numbers of spikes, the comparison is fair, but it might be good to check in one of the smaller sims how these spikes are distributed. For example, the more they are concentrated in a few neurons, the greater the bottleneck, right?
>
> Whatever the speedup for medium-size single instances, if MATLAB chokes on large networks, and doesn't parallelize replicas, then the two main selling points are still there. Comparing one instance of a small/medium network is really the least generous comparison from SS perspective, and you still get 7x, although I'm not sure about what MATLAB optimizations could have been taken and how easy they are to implement.
>
> Rick
>
> On Wed, Jul 7, 2010 at 2:12 AM, Michael Rule <mrul...@gmail.com> wrote:
>
> http://dl.dropbox.com/u/4345112/bdata.ods
> benchmark data comparing against Izhkivech's demo model.
> -- ss uses 50%, but Izh uses all to all connectivity with weights in uniform random [0,1]-- I didn't scale total synaptic input. I just realized this is a problem. Scaling up either model quickly makes them spike way too much ( way.. way too much.. 1000 Hz ? ). BUT... at least the SS and Matlab behaviors are similar.
>
>

> I guess, TODO-- redo this with scaling to preserve realistic rates-- redo brette et al benchmark-- benchmark by scaling copies/replicas rather than net size

>
> --mrule
> On Tue, Jul 6, 2010 at 3:32 PM, Rick Gerkin <rge...@gmail.com> wrote:
>
> update?
>
> On Fri, Jul 2, 2010 at 1:08 PM, Michael Rule <mrul...@gmail.com> wrote:
>

> see my spike condition... its v>=3 (mV)so I'm getting too many spikes.

Michael Rule

unread,

Jul 7, 2010, 11:45:44 AM7/7/10

to ahh-d...@googlegroups.com

both on tesla

plain matlab. no mex or jacket

matlab ran out of memory for 1000 units. It might have been able to pull of 900 but I didn't have the patience to wait for it.

if you look at the number of spikes, both the matlab and SS networks are behaving pathologically, with essentially a spike every other timestep for all neurons. so, the spikes are similarly distributed.

one problem though is that, for the smaller networks, oscillations are more syncronous in the SS simulation so the probability of spike collisions is higher.

I should note that achieving a speedup only in terms of being able to run several networks at once really isn't an achievement, since this can be done on a cluster with a simple bash script and frankly for Matlab users this probably the easier way to go.

Cyrus Omar

unread,

Jul 7, 2010, 11:47:38 AM7/7/10

to ahh-d...@googlegroups.com

The best you'll get on a desktop is a speedup of 8x or whatever number of cores you have though.

Michael Rule

unread,

Jul 7, 2010, 11:49:38 AM7/7/10

to ahh-d...@googlegroups.com

yeah but we have that new cluster,

and theoretically you have hundreds of machines over at linux.andrew.cmu.edu

though it might be some sort of violation to use them in this manner

Cyrus Omar

unread,

Jul 7, 2010, 11:50:54 AM7/7/10

to ahh-d...@googlegroups.com

Well yes I agree running on a cluster is going to be faster than a single GPU, but if thats the comparison that has to be made I think the GPU has already won eh?

Rick Gerkin

unread,

Jul 7, 2010, 11:55:12 AM7/7/10

to ahh-d...@googlegroups.com

if you look at the number of spikes, both the matlab and SS networks are behaving pathologically, with essentially a spike every other timestep for all neurons. so, the spikes are similarly distributed.

The vanilla Izhikevich matlab script behaves like this? Are you sure? That seems like a dumb model to put on your website then.

I should note that achieving a speedup only in terms of being able to run several networks at once really isn't an achievement, since this can be done on a cluster with a simple bash script and frankly for Matlab users this probably the easier way to go.

Plus, price of nvidia card(s) << price of cluster and admin hassles of cards << admin hassles of cluster (getting CUDA to work on any given machine notwithstanding).

Last night I had a dream that Nathan had a computer laying around where the GPU was fused to the CPU (not like it is in real projects, but like it would be in a stupid dream). I was telling him we could use this machine to increase the speed and volume of our probes written back to main memory. Given that you guys think about this stuff 10x as much as me, I can only imagine what kind of dreams you've been having.

--

Cyrus Omar

unread,

Jul 7, 2010, 11:57:40 AM7/7/10

to ahh-d...@googlegroups.com

On Wed, Jul 7, 2010 at 09:55, Rick Gerkin <rge...@gmail.com> wrote:

Plus, price of nvidia card(s) << price of cluster and admin hassles of cards << admin hassles of cluster (getting CUDA to work on any given machine notwithstanding).

Last night I had a dream that Nathan had a computer laying around where the GPU was fused to the CPU (not like it is in real projects, but like it would be in a stupid dream). I was telling him we could use this machine to increase the speed and volume of our probes written back to main memory. Given that you guys think about this stuff 10x as much as me, I can only imagine what kind of dreams you've been having.

haha, I think I've had an odd dream or two but mostly my dreams are more insane

but perhaps your dream was prescient, this was made public very recently: www.mellanox.com/pdf/whitepapers/TB_GPU_Direct.pdf

Michael Rule

unread,

Jul 7, 2010, 12:32:24 PM7/7/10

to ahh-d...@googlegroups.com

The Izh model only behaves like that if you scale it up without normalizing total synaptic input ( divide by sqrt(network size) or something ) ? The default model just oscillates.

yeah, increasing memory bandwidth is an open area of major hardware research

--

Rick Gerkin

unread,

Jul 7, 2010, 2:16:54 PM7/7/10

to ahh-d...@googlegroups.com

Scaling synaptic amplitude by N^a with constant connection probability P has the following problem. Suppose you intend to keep the mean firing rate R constant for all network sizes and across all time. The mean number of inputs is N*P and so the mean synaptic input current per neuron is N*P*R*N^a ~ N^(1+a) and the variance of the synaptic input current per neuron is N*P*(1-P)*N^2a = (P-P^2)*N^(1+2a). So there is no value of a (e.g. -1/2 as you suggested) for arbitrary P that keeps both the mean and the variance independent of N, which is probably what you need in order to keep R independent of N. However, for the special case of all-to-all connectivity (P=1, so P-P^2=0), you can get away with a=-1 (scaled synaptic amplitude to N^-1).

Suppose you scale synaptic amplitude by N^a and connection probability by N^b, so that P=P_0 now corresponds to the connection probability when N=1. Then mean = N*P*N^b*R*N^a ~ N^(1+a+b) and variance = N*P*N^b*(1-P*N^b)*N^2a ~ P*N^(1+2a+b) - P^2*N^(1+2a+2b). We need a + b = -1 to make the mean independent of N, and so variance ~ P*N^a - P^2*N^-1. Again, you are left with only P = 1 and a=-1 to achieve independence of N, which means b=0 so scaling connection probability is not helpful.

So try scaling ampltidue by N^-1, not N^-1/2 as you proposed, and use all-to-all connectivity.

Rick

On Wed, Jul 7, 2010 at 12:32 PM, Michael Rule <mrul...@gmail.com> wrote:

The Izh model only behaves like that if you scale it up without normalizing total synaptic input ( divide by sqrt(network size) or something ) ? The default model just oscillates.

Michael Rule

unread,

Jul 7, 2010, 5:10:25 PM7/7/10

to ahh-d...@googlegroups.com

This is getting somewhat awkward, as the Izh model is all-to-all with weights in Uniform random [0,1], while the SS model is 50% connectivity with uniform weights manually tuned so that the population rate tracks the rate for a same sized Izh model.

.. will look into brette et al benchmarks since those seem to raise fewer questions.

Michael Rule

unread,

Jul 7, 2010, 5:48:40 PM7/7/10

to ahh-d...@googlegroups.com

ok, well, tweaked the old brette-et-al benchmark to work with API changes, but haven't run a benchmark yet.

cyrus says AHH works now, so, maybe I can stop working on this ?

--mrule

Cyrus Omar

unread,

Jul 7, 2010, 8:07:10 PM7/7/10

to ahh-d...@googlegroups.com

The latest commit should have a working brette et al benchmark for ahh. The OpenCL is a bit slower for various reasons, at some point probably soon I'll do a CUDA backend. But we can benchmark against the cl.egans code now if we want. As of the latest commit, cl.egans should be a proper superset of spikestream; it is not yet fully documented nor tested though, give me another day for that.

Oh the random number generator is also not implemented but that shouldn't take long, I'll do it tomorrow.

Cyrus

Reply all

Reply to author

Forward