Spatial vs. Temporal whitening

Stephen Whitmarsh

unread,

Aug 30, 2018, 5:53:24 AM8/30/18

to SpyKING CIRCUS

Dear Pierre, and others,

First of all, these are naive beginner-in-spike-sorting questions, and know that my linear algebra is also not on the appropriate level.

However, I still trying to understand whitening in general, and its use in spike-sorting / SC specifically.I found this basic explanation of whitening helpful, and understand that whitening reduces correlation between dimensions/variables. And since one can base it on data where there should not be any correlation, this removes spurious versus actual correlation. So far, so good. However, don't understand the following:

1) Assuming the correlation(covariance) can only be calculated over both time and space (electrodes), I don't understand why there are those two options (temporal and spatial) in the params [whitening] section. I understand the need for uncorrelating electrodes (space), but what does the temporal whitening option mean?

2) To calculate the covariance, how much data is used, and is it calculated equally over the whole time period? That seems important, since the covariance might not be equal over time (which would be a problem if it is changes too much), right?

3) Does the [whitening] chunk_size setting specify the total or single size of the data used for calculating the covariance? It makes sense to have a minimum stretch of data without signal (I understand that the covariance is calculated on 'silent data'), but theoretically (and empirically), I do not expect my data/neurons to be silent for such long stretches, neither to have artefacts that last that long. Secondly, how is this 'silence' defined? Is it based on whether data (spikes) can be fitted, or just whether or not the spike-threshold is exceeded? In general, how will artifacts influence the whitening since they are part of the spurious correlation?

4) I don't understand why the max_elts and nb_elts are included under [whitening] - shouldn't that be under [clustering]? (I tried placing them under [clustering], but that gives an error, so SC expects them there).

5) the output_dim is set per default on 10 (I think). I can image the appropriate setting is very dependent on the structure of the data. In my case with only a couple of electrodes (1-8), would this setting make sense? The option to use % explained variance instead might be more consistent over number of electrodes, but what would be an appropriate setting?

Sorry for all those questions. If a skype is more convenient you know how to find me :-)

Cheers,

stephen

Pierre Yger

unread,

Aug 31, 2018, 8:40:59 AM8/31/18

to Stephen Whitmarsh, SpyKING CIRCUS

Dear Stephen

Here are some answers

1) The spatial filtering is the most important one, and the temporal one is turned off by default. It was an attempt, but we decided to turn it off as it was not very useful. The idea was that you may had temporally correlated noise within electrodes. But either this temporal whitening is barely affecting your data, either it distorts them, so my advice would be to not activate such an option. It should be removed from the parameters for the sake of clarity.

2) We use n_cpu chunks of size 30s, randomly selected all over the recording

3) In fact, we load chunks of data, and then, within these chunks, we search for periods of silences, i.e. where no spikes are detected on the channels (so no threshold crossings). We then concatenate all the silences, and the whitening is performed on the final vector of silent periods. The duration of this vector is displayed in the log.

4) These values are here because after whitening, the code searches for spikes in order to construct a basis to represent, by PCA, waveforms. To be more precise, after whitening, we search for spikes over all electrodes (with respect to mx_elts and nb_elts, as in the clustering). Then we gather them, and perform PCA to get a projection matrix used after in the clustering. So again, for the sake of clarity, we kept the parameters duplicated and separated for the two sections ([clustering] and [whitening])

5) output_dim is set by default to 5. You can I think set a number in percentage of the variance explained, but the key here is that you do not want to perform the clustering in a space with too many dimensions. I would recommend not to change this value, or to start to play with only if you are really not happy with the clustering (use the sanity plots to get a feeling of what is done)

Best

Pierre

--
You received this message because you are subscribed to the Google Groups "SpyKING CIRCUS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spyking-circus-u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spyking-circus-users/b6f58da7-2725-4989-ab5c-7ead03c11ca1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Stephen Whitmarsh

unread,

Aug 31, 2018, 11:08:08 AM8/31/18

to Pierre Yger, SpyKING CIRCUS

Dear Pierre,

Thanks a lot for answering all my questions! Also for your recommendation for output_dim – for some reason I was working with 10 – which seems to work fine BTW, but I’ll explore it further.