GENEALOGY-DNA-L ArchivesArchiver > GENEALOGY-DNA > 2010-02 > 1265427850
From: "Anatole Klyosov" <>
Subject: Re: [DNA] DNA] Variance Assessment of R:U106 DYS425Null Cluster
Date: Fri, 5 Feb 2010 22:44:10 -0500
> From: "Ken Nordtvedt" <>
> Your 1853 count of mutations is very mysterious. How can you look back
> 130 generations and count mutations, only seeing the 284 final products?
> Some mutations back in the tree, indeed over half of them them, will lead
> to multiple final haplotype manifestations.
With all due respect, your question clearly shows that you are VERY far
from my approach in handling kinetics of mutations (some would say dynamics
of mutations) in haplotypes. In that case margins of error is a secondary
question. If mutation counting is very mysterious to you, how can we go
A very basic principal of my approach (and, of course, I stay on shoulders
of giants in that area) is that mutations in a series of haplotypes should
be counted from a "base" haplotype which in most cases is clearly
identified as the most frequent in the series. The number of mutations
is directly (but not linearly in a
general case) connected to a timespan to the common ancestor, which is
represented by the base haplotype. Of course there is a number of more
complicated cases, when the dataset (a series of haplotypes) is mixed in
terms of common ancestors, that is the dataset represents two or more
lineages (branches), or the lineage is so ancient that there is no visible
base haplotypes in the series. All those cases are systematically
in my recent paper in J. Genet. Geneal. 5(2):186-216 (2009), and I cannot
repeat here all its 30 pages.
I can only add that if mutation counting is mysterious to you, count just
those base haplotypes, identical to each other. In that 284-haplotype
there are 12 of them, if to consider the 12 marker format of the haplotypes
(there are only a couple of them in the 25 marker format, that is not
for proper calculations). Now, if you know the second basis principal of my
approach, which says that base (ancestral) haplotypes disappear according
the first-order kinetics, then you can apply the simple formula
[ln(284/12)]/0.022 to obtain a number of generations to the common
not corrected (yet) to back mutations (0.022 is the mutation rate constant
for the 12 marker haplotypes calibrated for 25 years per generation;
incidentally, it is practically equal to John Chandler's mutation rate for
the fist 12 markers). [ln(284/12)]/0.022 = 144 generations, and corrected
for back mutations (the correction table is given in the above reference)
gives 168 generations, that is 168x25 = 4200 years to the common ancestor.
Now, let me remind you that the mutation count method (also corrected for
back mutations) gave 4175+/-430 years to a common ancestor. What say you?
Does it become a bit less mysteries to you?
Now, you ask:
Anyway; was your plus/minus 430 years supposed to be 1 sigma or 2 sigma?
This is the question you supposed to ask in the first place, not
after you have expressed your critical opinion, without even knowing what
conditions were employed for my calculations of the margin of error.
It was two sigma, that is 95% confidence interval.
From the above you can already see that both methods, mutation count
method) and base haplotype count (logarithmic method) gave practically the
same results, around 4200 years to a common ancestor for R-U106. This is
an accidental fit. This is what is normally observed for many series of
haplotypes from many subclades and haplogroups. When calculations are done
accurately, of course, and datasets are verified for a "single" or multiple
The principle of calculation of margins of error is described in detail in
the same publication as given above.
>> In any event, 1 sigma for G estimate of clade goes like sqrt (G / M log
>> N) with M being total mutation rate and N being number of final
>> but this is even further modified by details of how the tree unfolded
>> from founder to final haplotypes.
>> Anyway; was your plus/minus 430 years supposed to be 1 sigma or 2 sigma?
>From various studies I have done the statistical 2 sigma confidence
for coalescence ages in the ball park of 4000 years is plus or minus 1000