Nested MLGs in a maximum likelihood tree

Ollie White

unread,

Aug 6, 2020, 9:18:12 AM8/6/20

to poppr

Hello,

First of thank you sharing poppr, really enjoying using it for our research. I just have a few question regarding mlg.filter.

I have used poppr to identify MLGs in a GBS data using mlg.filter() with bitwise.dist and the farthest threshold.

For the same dataset we have also performed a maximum likelihood phylogenetic analysis using raxml.

We mapped the MLGs onto the maximum likelihood tree and found one instance where a MLG appears to be nested in another.

In the screen grab below is the clade in question and the "X" on the branches denotes that the clade can be collapsed as a MLG.

After looking at the MLGs in more detail it seems that two separate MLGs were identified.

MLG1: 000364 and 000149

MLG2: 000548, 000469, 000357, 000396, 000441, and 000297 (all remaining)

Am I correct in assuming this is simply due to differences in the relative relationships inferred by distance vs maximum likelihood?

I would be slightly more confident in the relative relationships inferred by our maximum likelihood tree. With this in mind, would it be appropriate to use pairwise branch lengths from the maximum likelihood as an input for mlg.filter?

I tried this using pairwise branch lengths calculated using cophenetic.phylo() from the R package ape.

With my first attempt, the plot didn't seem to work (see below)

Comparing the cophenetic.phylo() distance matrix with that produced by bitwise.dist(), the distance values were much lower based on cophenetic.phylo()

> mean(bitwise.dist(genlight))

[1] 0.09135868

> mean(as.dist(cophenetic.phylo(tre)))

[1] 0.001694099

If I crudely multiply the cophenetic.phylo() distance matrix values by 10 and repeat I get a plot more similar to what I would expect and reasonable MLGs.

Do the distance thresholds used need to be above a certain threshold to be used?

Hope this makes sense and I am happy to send example code if it is easier

Best wishes

Ollie

Zhian Kamvar

unread,

Aug 6, 2020, 8:15:06 PM8/6/20

to poppr

Hello Ollie,

Am I correct in assuming this is simply due to differences in the relative relationships inferred by distance vs maximum likelihood?

You are correct that there is a distinct difference between using naive genetic distance and maximum likelihood, which takes into account the state of the basepairs.

With this in mind, would it be appropriate to use pairwise branch lengths from the maximum likelihood as an input for mlg.filter?

I would imagine so. mlg.filter() does not really care where the distances come from, so it's fair game.

Do the distance thresholds used need to be above a certain threshold to be used?

Short answer: I'm not sure. Longer: I dont *think* so, but you have shown that there is some strangeness regarding minute distance measures. I don't know if it's reasonable to add a multiplier to the distances because they may represent some weird multidimensional space that gets stretched in odd ways when scaled by a single factor (but then again, you would probably want someone with a deeper maths background to chime in).

I wonder if the first plot is simply just a weird plotting edge case that I didn't anticipate.

Sorry I couldn't answer all of your questions, but I hope some of the answers I gave are useful.

Best,

Zhian

o.willi...@gmail.com

unread,

Aug 7, 2020, 5:11:51 AM8/7/20

to poppr

HI Zhian,

Thank you for your replies, this is really helpful.

Best wishes

Ollie

Reply all

Reply to author

Forward