Can you use gl.assign.mahalanobis() to identify outliers?

瀏覽次數:27 次
跳到第一則未讀訊息

Chiara Duijser

未讀,
2023年3月30日 晚上11:53:252023/3/30
收件者:dartR

Hi everyone,

I have a question about the mahalanobis distance function: gl.assign.mahalanobis() which assigns an individual of unknown provenance to population based on Mahalanobis Distance.

I don’t have any individuals that belong to unknown populations but I have two individuals that seem like potential outliers based on the PCA plot. 

I was wondering if the gl.assign.mahalanobis() function can also show and flag outliers if there are any? I ran the function for my genlight object using gl.assign.mahalanobis(gl5, unknown=“21”).
The output shows assign to population [x] “yes” and it says “no” for my other populations. I’m not sure if that is because it is already categorized in that population since there aren’t any unknowns or if I can interpret this as the fact that this individual is not an outlier and it belongs to population [x]? 

Is this the correct way to check for outliers in my populations? 

Thanks in advance! 
Best wishes, 
Chiara

Arthur Georges

未讀,
2023年3月31日 晚上9:42:492023/3/31
收件者:dartR
Hi Chiara,

The gl.assign.mahalanobis script was written as a part of a series of scripts for population assignment. It is used in an exploratory sense to assign the most likely population membership, together with other indicators. It was not specifically designed to identify outliers, but I guess it could be.

It is a bit tricky because it standardizes on the axes, such that a confidence ellipse in multivariate space is transformed to a confidence sphere. Outliers are points outside that sphere. The problem with that of course when you apply it to ordinated space is that the deeper noise dimensions are given equal weight to the higher informative dimensions, or at least that is my understanding (please correct me if I am wrong someone). So you should apply it to a restricted space of the higher dimensions I guess. The script allows you to do that with the dim.limit parameter. If dim.limit = 2, there is not much value add over plotting confidence ellipses in your PCA or PCoA plots and looking where your suspect values lie. The gl.assign.mahalanobis will give you a probability.

If you are looking for outliers in deeper dimensional space, then the script will handle that. Points that are not outliers in a 2 dimensional plot can still be outliers in comparison with a three dimensional confidence ellipse for example.

Maybe have a play and let us know how it goes.

Arthur

Peter Unmack

未讀,
2023年3月31日 晚上11:57:272023/3/31
收件者:da...@googlegroups.com
Outliers often tend to be samples that are high in heterozygosity,
likely a result of contamination in which case they should be revmoed.
Check the heterozygosity of all your samples. This should be done with
all datasets really.

het <- rowMeans(as.matrix(gl)==1, na.rm=T)
write.csv (het, file="het.csv")

Cheers
Peter
> --
> You received this message because you are subscribed to the Google
> Groups "dartR" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to dartr+un...@googlegroups.com
> <mailto:dartr+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/dartr/732bcdec-a205-421a-8c34-606a32270ca2n%40googlegroups.com <https://groups.google.com/d/msgid/dartr/732bcdec-a205-421a-8c34-606a32270ca2n%40googlegroups.com?utm_medium=email&utm_source=footer>.
回覆所有人
回覆作者
轉寄
訊息已遭刪除
訊息已遭刪除
0 則新訊息