Possible bias due to matrix distribution in unsupervised analysis?

103 views
Skip to first unread message

Davide G. Franchina

unread,
Aug 19, 2022, 10:01:35 PM8/19/22
to Cardinal MSI Help

Hi Cardinal community,

I would like to bring to your attention something I have been struggling to understand.
Please advise if I am wrong or I misunderstood.

Starting from raw data MALDI imaging data (unprocessed .imZML files).... when you run an unsupervised pipeline (PCA, segmentation etc.) to discover regions of interest within your sample:

1.How would you discriminate if the segmentation has been partially biased/driven by unknown matrix artifacts (say the matrix creates adducts on one tissue-specific area, or it has higher affinity for certain areas of the tissue).

2.Do you assume that matrix peaks are excluded during pre-processing (e.g while filtering: peakFilter(freq.min=0.05, rm.zero=TRUE)?

3. Do you check your topFeatures() for each class and assume that if matrix peaks are present will not be picked afterwards (e.g. during peak annotation)?

4.Is there a way you, as part of your analysis pipeline, can identify matrix peaks and exclude them beforehand(e.g. before starting the unsupervised analysis)?

thank you all in advance for the awesome support!

Frankie

Marina Zavolskova

unread,
Aug 25, 2022, 4:53:56 AM8/25/22
to Cardinal MSI Help
Hi, Davide

That is how I usually remove matrix peaks:

After some preprocessing steps (peakPicking - peakAligning - peakFiltering - normalization) I use spatial shrunken centroid function to find area without slice:
file_ssc_5 <- spatialShrunkenCentroids(file, r = 2, k = 5, s = 1)

Than I make a dataframe, where each column is a cluster and each row is m/z
df_file_ssc_5 <- data.frame(file_ssc_5@resultData@listData[[1]][["centers"]])

Than we can compare mean intestines of each m/z in clusters of interest: for example I see that matrix is in 1, 2 and 3 clusters
df_file_ssc_5$Matrix <- rowMeans(df_file_ssc_5[, c(1, 2, 3)])

than I compare mean "matrix" intensity with other clusters:

df_file_ssc_5_mz <- df_file_ssc_5[df_file_ssc_5$Matrix<df_file_ssc_5$X4 | df_file_ssc_5$Matrix<df_file_ssc_5$X5,] 
df_file_ssc_5_sample_mz <- df_file_ssc_5_mz$mz 

Than I find indexes of obtained m/z and subset initial file:
file_No_Matrix_MZ <- file[which(file@featureData@mz %in% df_file_ssc_5_sample_mz), ]

I'm not sure, it's the best way, but it works for me
Hope, it will help

best, 
Marina

Davide G. Franchina

unread,
Aug 26, 2022, 6:41:02 PM8/26/22
to Cardinal MSI Help
Hi Marina,

thank you! Your approach sounds reasonable to me.
However, I tried to run the lines you provided with my data and I do not succeed.

In particular, I am having trouble understanding the line which aims at comparing the mean matrix column to the other clusters:
(in your code:)
df_file_ssc_5_mz <- df_file_ssc_5[df_file_ssc_5$Matrix<df_file_ssc_5$X4 | df_file_ssc_5$Matrix<df_file_ssc_5$X5,]

the dataframe df_file_ssc_5_mz is the same as df_file_ssc_5.
Could you please help me debug this?

thank you very much in advance,
Best

Marina Zavolskova

unread,
Aug 31, 2022, 3:17:32 AM8/31/22
to Cardinal MSI Help
Hi Davide
Sorry for a long answer

after this command 
df_file_ssc_5 <- data.frame(file_ssc_5@resultData@listData[[1]][["centers"]])
we have a data frame (if you have 5 clusters):
X1 | X2 | X3 | X4 | X5
--------------------------------
 2  |  3  |   6   | 11 | 22
 1  |  5  |   8   | 15 | 21
....
I see, that cluster 1, 2, 3 are Matrix (image(file_ssc_5)), so I find mean of these three columns: df_file_ssc_5$Matrix <- rowMeans(df_file_ssc_5[, c(1, 2, 3)])

now data frame looks like:

X1 | X2 | X3 | X4 | X5 | Matrix
----------------------------------------------
 2  |  3  |   6   | 11 | 22 |    3.6
 1  |  5  |   8   | 15 | 21 |    4.6
....

After that I compare other columns (X4 and X5) with new one (Matrix)
I take all the columns and just that rows (or m/z), where either X4 intensity, or X5 intensity higher than Matrix one
df_file_ssc_5_mz <- df_file_ssc_5[df_file_ssc_5$Matrix < df_file_ssc_5$X4 | df_file_ssc_5$Matrix < df_file_ssc_5$X5, ] 

Is it better?

best,
Marina

Davide G. Franchina

unread,
Sep 19, 2022, 3:16:14 PM9/19/22
to Cardinal MSI Help
Hi,

yes! thanks. In my case your code worked after setting the rows of df_file_ssc_5 to the actual mz file@featureData@mz
so that they have the same indexes (rownames(df_file_ssc_5) <- file@featureData@mz).

thank you!
Reply all
Reply to author
Forward
0 new messages