t-statistics in the topLabels of Spatial Shrunken centroids

59 views
Skip to first unread message

Olga Gavrilenko

unread,
May 26, 2019, 6:57:25 AM5/26/19
to Cardinal MSI Help
Dear Kylie,

I have a question regarding the topLabels output for the Spatial Shrunken Centroids class. It calculates t-statistics, but does it check the normality of the spectrum? As far as I am concerned, the Student test is only valid for normally distributed data. 

I have manually checked a couple of features that were assigned to a cluster with a p-value of 0 in my data with a Shapiro-Wilk test and discovered that the feature distribution was not normal (p-val ~10^(-22)), although the Mann-Whitney U test has confirmed this feature as a highly cluster-specific. Still, is it correct to speak of t-statistics if the data is not normally distributed? 

The second question, or a remark: in the same topLables output, there is a column with a name "adjusted p values", that as far as I understand refers to the p-value calculated from t-statistics minus the user-set s threshold. Perhaps it might be also useful to have a column with p-values corrected for multiple testing (e.g. Bonferroni correction)? In case of a large output (order of a thousand, for example) this would be very useful.

The third question: I have noticed that in topLables, a feature can be assigned to several clusters with a p-value of 0. Does this happen because when the t-statistics are calculated, the mean intensities in the cluster and in the whole area outside of it are compared? Wouldn't it be more reasonable to compare the mean intensities pairwise between this cluster and all of the other clusters separately? 

And the final question: do negative t-statistics values in the topLabels mean that the feature is depleted in the cluster?


Thank you very much for taking the time to reply!
With very best regards,
Olga

kbemis

unread,
May 26, 2019, 4:35:22 PM5/26/19
to Cardi...@googlegroups.com
1. Normality is not checked. In the paper we published about the method (https://www.mcponline.org/content/15/5/1761), we recommend using the t-statistics as a ranking of the relative importance of the mass features in distinguishing different segments. The p-values should not really be considered for a number of reasons. Statistical testing in the presence of regularization is somewhat questionable already.

In newer versions of Cardinal (>=2.2, when applied to an MSImagingExperiment rather than an MSImageSet), the p-values are not calculated at all and only t-statistics are returned (for this reason and others).

We call them t-statistics because they are calculated by dividing a measure of deviation from a mean by a standard error, and they are described that way in the original nearest shrunken centroids paper on which spatial shrunken centroids is inspired. There is no guarantee they actually follow a t-distribution.

Since spatial shrunken centroids is designed for either clustering or classification anyway, it is not truly intended for class comparison or hypothesis testing. The t-statistics are a useful heuristic for ranking the most important features distinguishing each segment, while using regularization to eliminate unimportant ones.

2. The "adjusted p-values" are FDR-adjusted; the given non-adjusted p-values are already from the shrunken t-statistics. Although as mentioned above, newer versions of the method (when applied to the newer MSImagingExperiment class) do not calculate or display p-values at all when they are not appropriate.

3. The t-statistics are calculated based on the differences between the mean spectrum of the segment and the global mean spectrum. They are then regularized (via the "s" parameter) and the mean spectra are themselves shrunken "toward" the global mean spectrum. All of these details are discussed in more depth in the linked paper (which is also cited in the documentation for the method). The t-statistics are used as part of the segmentation itself, rather than a posthoc calculation. They are not intended for pairwise comparisons between segments, but rather being representative of a particular segment as compared to the whole dataset.

4. Yes, negative t-statistics indicate that the mass feature is under-represented in the segment compared to the whole dataset, and positive t-statistics indicate the feature is over-represented in the segment.

* If you are interested in class comparison and statistical testing rather than segmentation/classification, then the newest Cardinal version (2.2) includes new methods for class comparison (see http://bioconductor.org/packages/release/bioc/vignettes/Cardinal/inst/doc/Cardinal-2-stats.html#class-comparison). Note these methods naturally require multiple replicates per condition for statistical validity, so they cannot be used if there are insufficient samples. We don't have a detailed paper or vignette on these methods yet, but we have an upcoming ISMB paper (Guo, D., et al. "Unsupervised segmentation of mass spectrometric ion images characterizes morphology of tissues. ISMB/ECCB 2019) that discusses some aspects of it. We are working on updating the CardinalWorkflows vignettes for our October release.

I hope this is helpful.

-Kylie

vitek...@gmail.com

unread,
May 26, 2019, 5:41:56 PM5/26/19
to Cardinal MSI Help
Let me also add that a hypothesis testing comparing these classes post-hoc (with a t-test or a Mann-Whitney test or any other test) is not appropriate. This is because in this case the data are used twice: first to determine the classes, and second to test for difference between these classes. The results will be too optimistic (i.e., the p-values will be too small) and not necessarily reproducible. 

In contrast, the t-statistics in Cardinal are by-products of class finding. And as Kylie said, their p-values are not necessarily meaningful. Using the t-statistics for ranking only is best.

HTH
OV

Noel Park

unread,
Dec 18, 2020, 1:04:41 PM12/18/20
to Cardinal MSI Help
Hi all- reviving this old thread. So how would you explain a cluster that is pulled out but with no feature t-statistics that are above 0? Why is there even a cluster? 

Olga Vitek

unread,
Dec 18, 2020, 11:28:55 PM12/18/20
to Cardinal MSI Help
Dear Noel: with the t-statistics, the important aspect is not the sign but the magnitude of the absolute value. If all the ions in the cluster are negative, it means that all the ions are depleted (i.e., have systematically lower intensity) in this cluster as compared to their average intensity in the entire tissue.

HTH
OV 

Reply all
Reply to author
Forward
0 new messages