Question of the day: hard refractory period violation threshold?

Kevin Bolding

unread,

Sep 3, 2015, 4:05:36 PM9/3/15

to KlustaViewas

I'd really appreciate if as many people as possible could directly answer these simple questions:

Do you apply a numerical threshold for refractory period violations to eliminate units from further analysis at the end of the clustering process?
If so, how is it calculated and what is the final threshold?

I am aware of some theoretical subtleties. A small percentage of spikes in the refractory period does not necessarily equal a well-isolated unit. I have calculated a false-positive rate according to Hill et al.'s cluster quality metrics paper, but I wouldn't know which value to use as a cutoff. There are papers that proudly pronounce that their units had <0.5 or 1% spikes in the refractory periods. But I looked in the papers of such luminaries as Lin, Okun, Carandini, and Harris, and I can only find "The data were spike-sorted and only stable, well-isolated single units were used for further analysis." I followed the chain of previously described methods back to Busse et al. 2009 and Hazan et al. 2006. In Hazan there is a detailed discussion of refractory periods but finally they come down to a subjective judgement: "noisy" or "clear".

I can dig into the literature more, but I would really like to get an idea what people who are actively dealing with this issue day in and day out are using. I've finally got to make a decision and finish my analysis with a certain set of clusters.

Your input is appreciated,
Kevin Bolding
Franks Lab
Duke University

Harris, Kenneth

unread,

Sep 3, 2015, 5:29:25 PM9/3/15

to Kevin Bolding, KlustaViewas

Hi Kevin,

Thanks for raising an important point.

Although refractory violations indicate poor isolation, the converse is not true. You can have clusters with perfectly clear refractory periods, that actually consist of multiple cells. For example, a unit containing two hippocampal neurons with non-overlapping place fields would show a clean refractory period, simply because these two cells never fire at the same moment. You can also see this behavior in Figure 6 of this paper, where we knew the cluster contained multiple cells because one of them was definitively identified by intracellular recording.

My personal view is that it is fine to use refractory periods to exclude poor cells, and that you might as well do this using ACG cleanness (in combination with other factors) during the manual spike sorting stage. A rule of thumb is that if the zero-lag ACG is not substantially below the value expected for a Poisson process of the same rate (shown by a dotted line in KlustaViewa / phy), you should throw it out.

However, accepting cells as good requires a different metric. There are several; the one we tend to use in the lab is isolation distance (defined here and here), and a threshold of 20 is usually considered acceptable for tetrode recordings. Even better than using a threshold however, is to make a scatter plot with one point per cell, with isolation quality on the x-axis, and your quantity of scientific interest on the y-axis. If you see that it converges to an asymptote for cells of large isolation quality, that is the correct value.

All the best,

Kenneth.

--
To subscribe, send email to klustaviewa...@googlegroups.com
---
You received this message because you are subscribed to the Google Groups "KlustaViewas" group.
To unsubscribe from this group and stop receiving emails from it, send an email to klustaviewas...@googlegroups.com.
To post to this group, send email to klusta...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kevin Bolding

unread,

Sep 4, 2015, 4:59:37 PM9/4/15

to KlustaViewas, k.k.b...@gmail.com

Hi Kenneth,

Thank you for your thoughtful response. Your discussion of isolation distance inspired me to finally compute if for my data. I pulled the features from the features_masks entry in my KWX file and then fed these numbers to the the IsolationDistance function from the MClust package. Clusters with an isolation distance below 20 are very rare. I will have to think some more if there is a variable I expect would be correlated with poor isolation (perhaps Lifetime Sparseness) to use in the procedure you described.

Best,
Kevin B.

--
To subscribe, send email to klustaview...@googlegroups.com

---
You received this message because you are subscribed to the Google Groups "KlustaViewas" group.
To unsubscribe from this group and stop receiving emails from it, send an email to klustaviewas...@googlegroups.com.

To post to this group, send email to klust...@googlegroups.com.

Harris, Kenneth

unread,

Sep 4, 2015, 5:32:03 PM9/4/15

to Kevin Bolding, KlustaViewas

You raise another important point.

With isolation distance, the values you get are very dependent on dimensionality. (I.e. how many channels there are). So while 20 is a reasonably good score for tetrode recordings, it isn’t so impressive for octrodes etc. If few of your cells are below 20 with tetrodes, I would be surprised. With high count probes, not so surprised.

I’m not sure what a good threshold is for higher channel counts. This is why the approach of plotting a scatter with quality on the x-axis and a measured biological variable on the y-axis is the best approach. If you see that above a certain quality threshold, your biological variable doesn’t depend on quality any more, that is the best way to choose the threshold.

More generally, with really high count probes (64 channels per shank, etc), it is not clear that isolation distance will work at all, due to the “curse of dimensionality”. There may be a need to come up with new metrics for this case.

--
To subscribe, send email to klustaviewa...@googlegroups.com

---
You received this message because you are subscribed to the Google Groups "KlustaViewas" group.
To unsubscribe from this group and stop receiving emails from it, send an email to klustaviewas...@googlegroups.com.

To post to this group, send email to klusta...@googlegroups.com.

Kevin Bolding

unread,

Sep 7, 2015, 11:29:56 AM9/7/15

to KlustaViewas, k.k.b...@gmail.com

Hi Kenneth,

Yes it makes sense now that you mention it that you might expect to easily get larger isolation distances with more channels. As you mentioned the curse of dimensionality, It seems that one might want to reuse your masking solution for calculating isolation. You wouldn't really want to try to measure distances for clusters that are on completely different sets of sites, so distances could be calculated on a subset of, say, the top (most unmasked?) 12 features for a given cluster with noise spikes as a subset of all spikes also unmasked on those features.

While I can appreciate the sensibility of the scatterplot solution you suggested, I'm a bit concerned that it will be knocked around by unrelated factors (the distribution/variance structure of your biological variable). It appears to be searching for a threshold at which, if you eliminate enough data, you no longer get a significant correlation. I suppose ideally you would at the very least want to eliminate the same number of randomly selected units and see when the correlation goes away.

Also, while most of my recordings use a single probe type such that I may have enough data to smooth out the scatter plot, there are recordings here and there that use novel probe types with fewer channels. These will be very sparse datasets for recalculating a new probe-dependent isolation threshold.

Of course, I'm not expecting the kwikteam to whip me up an adaptive cluster quality measure in their spare time, and I appreciate the discussion. It seems that one could continuously get deeper and deeper into measuring separation in multidimensional space and never get back to relating neural spike trains to events in the real world. On the other hand, people routinely do reach a conclusion that their spike sorting is 'good enough' and continue on with their analysis, but very few explicitly describe how they made this decision and I wonder if it's all coming back down to a gut feeling or an arbitrary threshold in the end.

Best,
Kevin Bolding

Harris, Kenneth

unread,

Sep 13, 2015, 1:37:48 PM9/13/15

to Kevin Bolding, KlustaViewas

Hi Kevin,

Sorry for the slow response (have been travelling).

1. Yes, there probably is a way to make a masked version of isolation distance. It would need a bit of thinking about and testing however: this would be a research project. If anyone is interested, let’s talk!

2. In the meantime, the quality measure used by the Wizard may be alright for now. I guess we should export this to a file, at the very least.

3. The scatterplot solution will work if you have enough cells that you can average out biological variability of all cells at a given isolation quality. If you can’t see a correlation of quality with your biological variable and you have at least 100 cells, quality probably has only a minor effect on this variable!

4. Regarding the novel probe type, it’s going to be hard to find a quality metric that you can safely compare across geometries to say which design is better. For that, you would need (simulated) ground truth.

--
To subscribe, send email to klustaviewa...@googlegroups.com

---
You received this message because you are subscribed to the Google Groups "KlustaViewas" group.
To unsubscribe from this group and stop receiving emails from it, send an email to klustaviewas...@googlegroups.com.

To post to this group, send email to klusta...@googlegroups.com.

Reply all

Reply to author

Forward