Dear Pierre, and others,
First of all, these are naive beginner-in-spike-sorting questions, and know that my linear algebra is also not on the appropriate level.
However, I still trying to understand whitening in general, and its use in spike-sorting / SC specifically.I found
this basic explanation of whitening helpful, and understand that whitening reduces correlation between dimensions/variables. And since one can base it on data where there should not be any correlation, this removes
spurious versus actual correlation. So far, so good. However, don't understand the following:
1) Assuming the correlation(covariance) can only be calculated over both time and space (electrodes), I don't understand why there are those two options (temporal and spatial) in the params [whitening] section. I understand the need for uncorrelating electrodes (space), but what does the temporal whitening option mean?
2) To calculate the covariance, how much data is used, and is it calculated equally over the whole time period? That seems important, since the covariance might not be equal over time (which would be a problem if it is changes too much), right?
3) Does the [whitening] chunk_size setting specify the total or single size of the data used for calculating the covariance? It makes sense to have a minimum stretch of data without signal (I understand that the covariance is calculated on 'silent data'), but theoretically (and empirically), I do not expect my data/neurons to be silent for such long stretches, neither to have artefacts that last that long. Secondly, how is this 'silence' defined? Is it based on whether data (spikes) can be fitted, or just whether or not the spike-threshold is exceeded? In general, how will artifacts influence the whitening since they are part of the spurious correlation?
4) I don't understand why the max_elts and nb_elts are included under [whitening] - shouldn't that be under [clustering]? (I tried placing them under [clustering], but that gives an error, so SC expects them there).
5) the output_dim is set per default on 10 (I think). I can image the appropriate setting is very dependent on the structure of the data. In my case with only a couple of electrodes (1-8), would this setting make sense? The option to use % explained variance instead might be more consistent over number of electrodes, but what would be an appropriate setting?
Sorry for all those questions. If a skype is more convenient you know how to find me :-)
Cheers,
stephen