Archiver > GENEALOGY-DNA > 2010-02 > 1266164413

From: "Anatole Klyosov" <>
Subject: Re: [DNA] Y Tree SNPs can not be counted
Date: Sun, 14 Feb 2010 11:20:13 -0500
References: <>

> From: "Lancaster-Boon" <>
> Dear Anatole
> I think things are becoming clear and indeed I think you have been
> fighting
> too hard for what is essentially a strange use of words.
> are calling "genetic distance" (the APPARENT number of mutation
> steps between two haplotypes)
> your "mutation count" (which implies a real counting of mutations).

Dear Andrew,

As I see, you are continuously trying hard to shift the essence of the
discussion elsewhere. You continue to do some acrobatics with semantics,
completely missing the issue of the discussion. The issue of the discussion
was focused at two items: (1) do counting of mutations in the contemporary
haplotypes give us an adequate tool to calculate TMRCA?, and (2) are margins
of error which I use "too optimistic"?

As you see, there is nothing "strange use of words" here. These two are very
pointed questions. They present a core of DNA genealogy.

These two questions are very clear to me as well as the answers. What I did
here was trying to explain it to people who do not have adequate knowledge
in basic principles of dynamisc of mutations in haplotypes, and - more
broadly - in basics of chemical kinetics, as well as in computing of
margins of error based on a number of haplotypes and mutations in them in a

Of course, there plenty of people who do not have an adequate knowledge in
these two areas. It is O.K., and there is nothing wrong about it. What is
wrong - is that people here do not admit it, and instead they pretend that
they have knowledge which (supposedly) allows them to argue here and to
accuse a professional on his own territory. This is a shear arrogance. This
is main reason why professionals avoid such gatherings.

Back to mutation counting. I have explained not once, that by "mutation
counting" I mean counting of one-step mutations in contemporary haplotypes
compared to a base haplotype in the dataset. This is mutation counting, not
a senseless term "genetic distance". An average (counted) number of
mutations, say, 0.156 per marker, shows how many mutations per marker
happened between the most recent common ancestor and averaged alleles of his
direct descendants, who live today. This number is an apparent one, and
should be corrected for a probability of back mutations. This probability is
very low and negligible when the common ancestor lived within 650 years
before present.

Now, an important addition. I use terms here my way. My terms are clearly
defined. If you want to follow my explanations and learn something, accept
my terms. Or quietly translate my terms in your terms, which I do not accept
on various reasons. If you do not want to follow my terms, you are free not
to follow me and not to learn what I can teach you. Understood?

By "you" I do not mean you personally. Everyone else has his or her right
not to learn things.

Back to mutation counting. That average number of mutations in a haplotype
set, say, 0.156 per marker, corresponds in this case to 2125 years to a
common ancestor, but only in one case - if that common ancestor is INDEED a
common ancestor of every haplotype in the dataset. In this case everyone in
the dataset is a DIRECT descendant of the common ancestor. This is what I
call an "uncomplicated" dataset. This is what I call a "first-order"
dataset. Mutation rate constants describing this (and similar in kind)
datasets are called first-order mutation rate constants. They are equal - to
the best of my knowledge - to 0.00183 mutations per marker for the first 12-
and 25-marker panels (in the FTDNA format), 0.00243 for the 37-marker
haplotypes, and 0.00216 for the 67-marker haplotypes.

(Just an illustration - a few minutes back I have calculated TMRCAs for a
reader of this forum who sent me his haplotype series offline, using 25-,
37- and 67-marker haplotypes, and TMRCAs came out as:

54 generations (25 year each) uncorrected for back mutations, or 57
generations (corrected), that is 1425 years to a common ancestor.

56 generations uncorrected --> 60 generations corrected = 1500 years to a
common ancestor

55 generations uncorr. --> 58 generations corrected = 1450 years to a common

This data as well as dozens if not hundreds of haplotype series I was
working on show that these mutation rates constants are consistent with each

Back to mutation counting. In case of the first-order datasets we do not
care of the "family tree structure", "nodes" and all this superficial stuff.
There is no need to cry that "things are complicated" and "we should know
all the family history". Or something like that:

>"So again, even after having estimated the time depth of the tree in
generations, there is no way to evaluate the statistical confidence interval
of that age estimate without knowing the tree structure through its whole
history. The number of mutations occurring in each of the haplotype lines
back to the founder are not independent of each other; they share more or
less of their lines with each other depending on whether the haplotypes are
close or distant relatives, respectively. And the degree to which different
haplotypes share part of their ancestral lines with each other depends on
the tree structure through all the tree's history".

As well as this:

">For instance; if a tree has 1200 father/son transitions in it, and a
has mutation rate of 1/200, the expected or average number of times that STR
will mutate in the tree is 6, but it could be more or less with the
statistical distribution known; and similarly for every other STR. And
where in the tree those six or so mutations happen for the STR is also
random and unknown to us".

Forget it and disregard it for first-order datasets. Some people have a
burning desire to present things more complicated than there are. Of course
some things are more complicated and some less, and before we dismiss
everything as "unknown to us", we should be able to figure out what do we
have in our hands.

Therefore, the most important question is whether or not our dataset is the
first-order dataset. There are two principal ways to do it, one - is to
apply a dual linear-logarithmic method of TMRCA calculations. Second - to
compose a haplotype tree for the dataset.

If the dual method gives you the same TMRCA for both linear (mutation
counting) and logarithmic (base haplotype counting) methods, you are all
set. It is a first-order tree. Many subclades in R1b1b2 group are
first-order haplotype sets.

However, in many cases datasets do not follow the first-order cases. For
those, I have developed a number of approaches to analyze those situations.
They are summarized as follows (including uncomplicated situations as well):
(1) to compose haplotype trees as a very important part of my methodology
(the description, explanations and references are given in my paper in
JoGG), (2) to apply the logarithmic method to calculated TMRCA without
counting mutations, (3) to use the "dual criterion" to verify whether a
dataset can be used for calculation "as is" (that is, following the first
order, that is "uncomplicated"), or it cannot be used "as is" and should be
separated to branches, (4) to employ a quantitative way to correct data for
back mutations, for which I published the respective table, (5) to employ a
quantitative method to correct data for asymmetry of mutations (published in
the same paper), (6) to use my calibrated set of mutation rate constants
(see above); a long list of those values for haplotypes of different formats
has been published (the same paper in JoGG).

Regarding margins of error, their calculations are published in the same
paper. I understand that people do not read papers, however, it is their
problem, not mine. The margins of error which I calculate are neither
"optimistic" nor "pessimistic", they are normal.


Anatole Klyosov

This thread: