GENEALOGY-DNA-L Archives

Archiver > GENEALOGY-DNA > 2011-11 > 1321007459


From: "Anatole Klyosov" <>
Subject: [DNA] Correct TMRCA analysis
Date: Fri, 11 Nov 2011 05:31:07 -0500
References: <mailman.2262.1320883958.10215.genealogy-dna@rootsweb.com>


From: "Kenneth Nordtvedt" ("Challenge...")
> I have added another 'Challenge'...
> Someone wanted to see how much or if more STRs might increase the accuracy
> of tree reconstruction from a sample of final haplotypes. So
> Challenge4408 has haplotypes of 150 STRs with total mutation rate of about
> .4 and with individual rates varying from about .01 down to .0001
> If one wanted to compare how tree reconstruction compared with 150 STRs
> versus 100 or just 50, or 10 like we see in some popular books...
> I originally thought people who have software or algorithms for tree
> reconstruction might want a test bed to see how their tools work in the
> context of bushy trees of the type which are often the object of
> reconstruction attempts these days. Nature, unfortunately, does not tell
> us the next morning after we tried to reconstruct part of it, what its Y
> tree structure really is.

My response:

Those "challenges" are solved long ago. It is waste of time to consider
"bushy trees" since in reality they typically contain branches of different
size and different "ages". Therefore, an analysis of a "bushy tree" in its
entirety produces, as a rule, a phantom "common ancestor", and that is why a
margin of error mush be huge and the whole exercise useless. As a result,
fables on "different mutation rates for different haplogroups", "huge
confidence intervals in TMRCAs", etc.

Instead of walking in circles considering "bushy trees" all these years and
complaining on "huge confidence intervals", one better take ACTUAL genealogy
data, ACTUAL haplotype datasets, and compare actual dates with those
resulted from DNA genealogy. This will show what ACTUAL margins of error
looks like. With "bushy trees", they should be first subdivided on separate
branches, and each branch should be analyzed individually.

In fact, it is done on dozens of datasets.

Here is an example. The Donald Clan dataset (red, green, and yellow groups)
contains 214 haplotypes (effective this week), with 136 of them of 67 marker
haplotypes. A haplotype tree shows that the dataset in fact splits into 14
branches (including mini-branches [mini-lineages] of four-five haplotypes,
however, having their distinct position of the tree). By the way, in many
cases those color groups were mixed up in the original dataset. All 14 have
been analyzed, TMRCAs were obtained with their confidence intervals. The
main (principal) branch contains 39 haplotypes, with the base haplotype as
follows:

13 25 15 11 11 14 12 12 10 14 11 31 -- 16 8 10 11 11 23 14 20 31 12 15 15
16 - 11 12 19 21 17 16 17 18 34 39 12 11 - 11 8 17 17 8 12 10 8 11 10 12 22
22 15 11 12 12 13 8 14 23 21 12 12 11 13 11 11 12 12

No wonder, it is the "modal" haplotype on the original list, since the
branch is the largest one. It is the same haplotype we identified earlier
with our publication (201) with Andrew MacEacharn.

Those 39 haplotypes contain 27, 64, 120, and 159 mutations in the first 12,
25, 37, and 67 marker haplotypes. This gives

27/39/0.020 = 35 --> 36 gen, or 900±195 ybp, or
64/39/0.046 = 36 --> 37 gen, or 925±150 ybp, or
120/39/0.09 = 34 --> 35 gen, or 875±120 ybp, or
159/39/0.12 = 34 --> 35 gen, or 875±110 ybp.

Please notice how all the four lines fit each other with their TMRCA. The
most reliable is, of course, the 67 marker series, and it gives 2011 minus
875 = 1136 AD. As you might remember, Somerled lived 1100-1164 AD.

What say you?

(The denominators show the mutation rate constants for the respective 12-,
25-, 37- and 67-marker haplotypes. --> indicates a correction for back
mutations according to the published (2009) Table).

Another large 19-haplotype branch, with a base haplotype:

13 25 15 11 11 14 12 12 10 14 11 31 -- 16 8 10 11 11 23 14 20 31 12 15 15
16 - 11 12 19 21 17 16 17 18 34 38 12 11 - 11 8 17 17 8 12 10 8 11 10 12 22
22 16 11 12 12 13 8 14 23 21 12 12 11 13 11 11 12 12

If differs quite distinctly from the Somerled direct DNA-lineage in two
markers. All nineteen haplotypes have only 60 mutations which gives
60/19/0.12 = 26 --> 27 gen, = 675±110 ybp. Obviously, they are also derived
from Somerled, but through an intermediate common ancestor who had those two
mutations compared to Somerled. It might be from John, Lord of the Isles.

Another large, 23-haplotype branch, with a base haplotype:

13 25 15 11 11 14 12 12 10 14 11 31 -- 16 8 10 11 11 23 14 20 31 12 15 15
16 - 11 12 19 21 17 16 17 19 34 39 12 11 - 11 8 17 17 8 12 10 8 11 10 12 22
22 15 11 12 12 13 8 14 23 21 12 12 11 13 11 11 12 12

This is recent lineage, 50 mutations in all 23 haplotypes. It gives
50/23/0.12 = 18 gen, or 450±80 years to a common ancestor.

And so on. The green and yellow groups (actual branches) were analyzed as
well.

Anatole Klyosov


This thread: