Familytreedna Haplotree

0 views

Skip to first unread message

Piedad Coughlin

unread,

Aug 4, 2024, 7:41:25 PM8/4/24

to trichlepode

Thishaplotree was based exclusively on a type of Y-DNA called single nucleotide polymorphisms or SNPs (pronounced snips). The same is true in the modern haplotree. By comparing the similarities and differences in SNPs between different populations, the consortium was able to distinguish a general timeline of when the populations split and traveled to other regions of the globe.

Each population had SNPs inherited from shared ancestors, and as generations went by, they began to form their own unique SNPs. These new SNPs formed new branches on the tree of humankind. Many of these SNPs were calculated to have originated in ancestors that lived thousands or tens of thousands of years ago. The first haplotree contained fewer than 100 SNPs grouped into 13 branches, a pretty impressive number at the time.

The rate of advancement in identifying new SNPs and branches on the haplotree was slow, limited by the cost and efficiency of testing methods. Over the next 10 years, the haplotree grew to include 3,600 SNPs and numerous branches. These advances helped refine our understanding of how closely related different populations were at a genetic level. This is valuable information for anthropologists, archaeologists, and anyone studying global migrations.

However, the nagging problem of SNP testing is the extreme age estimates between branches. When a branch could be thousands or even tens of thousands of years old, how does it help you understand your own place in the global tree of humanity?

NGS works by analyzing an entire region of a chromosome rather than specific markers. This means that it will not always detect every single marker, but it has the advantage of discovering previously unknown markers.

The Big Y test looks at a SNP-rich region of the Y chromosome and detects as many SNPs as possible. It covers this area with up to 70 reads to ensure the results are accurate. Any SNP that returns 10 or more positive reads is reported. This helps ensure that potential misreads are caught and corrected. In 2014, with the release of the Big Y, this radical new technology helped grow our knowledge of the Y-chromosome by leaps and bounds.. This explosion in the number of SNPs meant branches were discovered that were hundreds of years old instead of thousands. It was only through a large database of sample results to compare that we were able to distinguish these markers.

At the advent of the Big Y, Y-STR markers were still the gold standard for comparing male relatives. It was rare when an STR match shared a common ancestor more than 500 years in the past. That made it more valuable for family history than SNPs that were even older.

Unfortunately, it was somewhat hazy where in that 500-year timeframe to search for a common ancestor with a Y-STR match. For this reason, FamilyTreeDNA developed the Big Y-500. This test incorporated both Y-SNP and Y-STR testing.

Coverage of the original Big Y was expanded and altered to include all 111 STRs found in the Y-111 and up to an additional 450 STRs. These 450 STRs were chosen because they were detected in over 95% of control samples. As with SNPs, the exploratory nature of NGS meant that not all 450 STRs would necessarily be detected and meet our quality control standards in every sample. To provide a large group of STRs for genealogists to work with, FamilyTreeDNA committed to rerunning a sample as many times as needed to produce results for a minimum of 389 STRs. These, combined with the 111 standard STRs, provided at least 500, hence the Big Y-500. In addition to the potential brought by these new STRs, expanded coverage of the Y chromosome meant even more SNPs became available for discovery.

Always at the forefront of innovation, FamilyTreeDNA continues to explore uncharted waters and grow the Y-DNA haplotree by leaps and bounds. In 2019, we released the Big Y-700 test. This innovation took Big Y back to the drawing board to expand coverage for Y-STRs and Y-SNPs. It expanded the coverage area even more and is discovering new SNPs daily.

What does the Big Y-700 test mean for your family history? First, having so many SNPs on the tree means you can potentially find a branch that originated no more than a couple of generations ago. This means you can find markers specific to your lineage and your lineage only. If you have a SNP that originated with your great-grandfather, for example, then anyone else who shares that SNP is a second cousin at the furthest.

This brings genetic testing from just one of many tools to one of the premier tools for genealogists studying paternal lines. The Y-DNA haplotree continues to grow daily. Having such a huge database to draw on has helped revolutionize the age estimates we are able to provide for SNPs. This is done by process of elimination and other factors to produce a time to most recent common ancestor (TMRCA) between any two SNPs.

Just a few months ago, we released FamilyTreeDNA Discover, a free new platform that provides TMRCA estimates in a user-friendly visual format. Discover provides a reliable estimate of TMRCA between two people based on genetic data (SNPs and STRs) alone, without traditional family trees. Depending on the branch, this estimate can be within 50 years, and half of all testers receive an estimate within the past 500 years or fewer. Such a tool significantly reduces the amount of legwork needed to find a common ancestor and make family connections.

Discover is currently in beta, and refinements are being made weekly, but it already shows tremendous potential and has received overwhelmingly positive feedback. This platform will only improve over time as more and more SNPs are added.

The long answer: The basic assumptions in tree construction, as I understand it, are that back mutations and parallel mutations of Y-chromosome SNPs are rare enough that they can be ignored. But this is not an absolute, so when these happen it must surely complicate things, like here:

But assuming this: the main thing to know about tree construction is that each branching is binary, corresponding to descendants of a single man, one branch being those who have some new mutation and one for those who don't. (The variant may not have happened in that man's son but could have happened several generations down.) Each branch in the *developing* tree can actually correspond to multiple mutations, but that's because the tree is still being constructed. We'll return to this later.

If two SNPs are measured and named, for instance:

This terminology just refers to whether one mutation occured in someone who had already gotten the other mutation, or if they occured independently in totally different men. This corresponds to either one branch being "below" the other and connected to it in the haplotree (if you put Y-chromosome Adam at the top of the tree) or else both branches just being in different parts of the tree, and not aligned vertically from each other. The upstream branch is the "higher" one in the tree, so occured earlier in time.

Now suppose BY12345 is a brand-new SNP, so it needs to be added to the tree with new branching. And say FGC54321 is already an existing branch in the tree. Here is the key to answering your question:

So when a new SNP is to be placed, usually that is because there are two men known to have it. If this SNP is BY12345, the tree-maker would first identify positions in the existing tree where these men would be placed. It could be that both men already had the same terminal SNP in the existing tree, so then that marker would no longer be terminal and BY12345 would be added as a "twig" at the very bottom of the tree, below the old terminal marker. BY12345 would then become these men's terminal marker.

Otherwise, if these men did not have the same terminal marker, then they were in different places in the tree. You would then find the most recent common branch that both belong to, say with marker Y-98765 (really probably a whole collection of markers, including this one), and then BY12345 will be downstream of that marker, but not of any lower marker. There will be branch just below Y-98765 corresponding to some variant, say BS56789. Then with the assumptions made above, it must be the case that one of the two testers is positive for BS56789 and the other is negative. Then there are then two possibilities:

In the latter case, there may still be some other SNPs that all men positive for BY12345 and BS56789 all share. Remember that branches in the developing tree often are defined by multiple SNPs? These SNPs cannot yet be disentangled to form into their own subtree. If it is discovered that one of these SNPs is actually upstream to both of BY12345 and BS56789, then a new branch will bre created for that SNP, just below Y-98765, and then the BY12345 and BS56789 men will be placed together on a branch below that SNP.

But then at that point, it cannot yet be determined how the tree should look. It could be that BY12345 is basal to BS56789, and this would show with further testing. Or it could be that BS56789 is basal to BY12345. So these SNPs will be placed together, maybe with a bunch of others, on the same branch of the tree. And with further testing, the might later be refined into separate positions.

Oops: I realize that I misread your question initially to be about when a new branch is placed, not where. I'll leave this at the bottom anyway, since I spent the time to write it. Theriterion for *when* a new branch will be added: There are multiple copies of the tree kept, and not all in the same condition of development, so it depends on which copy you mean.

Most people these days are doing testing at FTDNA, so I'll just answer this question about the tree they keep, which they have made publically accessible to everyone:

-family-tree-made-public-worlds-largest-ydna-haplotree

These days, when someone does a BigY-700 test, usually some new SNPs are discovered in the tester's genome that were previously unknown. FTDNA will reserve a name for these variants, but they won't officially name them or place them on the tree.