Hello everyone,
I am conducting a clustering analysis using the Classification with Spatial Shrunken Centroids (SSC) method. My sample consists of nervous tissue and is therefore highly characterized by lipids. When I apply SSC, I can identify clusters that are characterized by m/z ions that likely belong to the lipid class. Furthermore, when I examine which ions have contributed to a specific cluster, it seems to me that the same ions have contributed to multiple clusters and, in any case (at least with my datasets), I cannot find a cluster mostly described by one ion. Here is an example:
> topFeatures(mse_tot_ssc_mean, model=list(r=2, s=0), class==1)
Top-ranked features:
mz r k s class centers statistic
1 888.6329 2 20 0 1 7360.356 288.8289
2 889.6364 2 20 0 1 3918.336 271.9897
3 890.6475 2 20 0 1 8066.914 268.1355
4 892.6499 2 20 0 1 2587.962 252.9644
5 891.6520 2 20 0 1 4109.068 251.9183
6 862.6159 2 20 0 1 2478.923 245.5485
7 878.6146 2 20 0 1 2233.640 235.3656
8 906.6454 2 20 0 1 2822.483 233.3550
9 864.6236 2 20 0 1 1192.637 230.4855
10 863.6157 2 20 0 1 1514.358 226.4598
> topFeatures(mse_tot_ssc_mean, model=list(r=2, s=0), class==2)
Top-ranked features:
mz r k s class centers statistic
1 890.6475 2 20 0 2 7667.4032 188.5471
2 891.6520 2 20 0 2 3926.4879 179.7538
3 888.6329 2 20 0 2 6588.2490 174.5789
4 726.5536 2 20 0 2 5242.1476 172.0922
5 462.3054 2 20 0 2 817.5346 169.2137
6 892.6499 2 20 0 2 2408.9569 168.1051
7 889.6364 2 20 0 2 3524.6678 165.9360
8 727.5567 2 20 0 2 2484.8955 161.3878
9 862.6159 2 20 0 2 2275.3719 155.9419
10 744.5618 2 20 0 2 1017.0777 144.2835
I believe that in my case, a specific cluster is defined by the co-presence of two or more ions. Although, this type of result does not allow me to identify a specific ion that can describe the cluster. I can conclude that several ions determine a specific cluster. Am I right?
Additionally, I would like to exclude lipids from my clustering analysis. How can I do this? I suppose instead of considering all ions, I can selectively analyze only a portion of the spectrum and then conduct the clustering analysis on the specific portion. When I analyze the entire dataset with the function summarizeFeatures, I tried several modes (mean, min, max), but I obtained more or less the same ions in the specific cluster. So, if I read only a portion of the spectrum, could I obtain reliable results or could I generate artifacts? Are there other methods to exclude classes of molecules?
Thanks in advance,
M