basic question

9 views
Skip to first unread message

Zak Jones

unread,
Jan 15, 2018, 11:03:28 PM1/15/18
to DendroPy Users
Hello,

I'm currently looking at trying to create a tree dynamically based on Y-DNA SNP mutation data for a haplogroup project.
We have hundreds/thousands of participants in our study. Each participant stands to have a good many unique mutations as well - compared to the others in our study. As a legacy process, we've been doing this sort of tree work manually using a spreadsheet and a couple shell + python scripts to keep us continually moving forward.

The project now is looking at better ways to automate the process though.

Phylogenetic software seems the way to go here. And in that regard, DendroPy seems like the right package to use for python work. The tutorial material looks like a pretty tough read for a newbie like me. So I figured perhaps I could ask here ... and someone might know how to put me in the right the direction.

Here's my question:

If I have something like this as sample data:

kit,SNP1,SNP2,SNP3,SNP4,SNP5,SNP6,SNP7,SNP8,SNP9,SNP10
kit1,1,1,Null,1,1,1,1,1,1,1
kit2,0,0,0,1,0,0,0,0,1,0
kit3,0,1,1,Null,0,0,1,1,1,0
kit4,1,1,1,1,0,1,1,1,1,0
kit5,Null,Null,1,Null,Null,0,Null,1,Null,Null

This isn't real data. I'm just giving an example for the purpose of this question.

Assume the 1's are PASS's. The 0's are Fails. The Null's -- no data, so could be either.

How would I instantiate DendroPy and code this set of taxa and loci?

And how does DendroPy generate the results so I can understand the generated tree?

Zak
Reply all
Reply to author
Forward
0 new messages