Archiver > GENEALOGY-DNA > 2010-02 > 1266292370

From: "Anatole Klyosov" <>
Subject: Re: [DNA] TMRCA assessments
Date: Mon, 15 Feb 2010 22:52:52 -0500
References: <>

>From: "Ken Nordtvedt" <>
>Measuring and counting attributes about haplotypes in hand is not counting
>the original mutations.

It seems that you are addressing quite a different issue.

When a person uses a celestial mechanics for description of movements of
planets and other celestial objects, he does not need to know when and how
those object formed, and other irrelevant (in that context) issues. The
person knows that his equations and calculations work and give him an

The same thing is with DNA genealogy. As soon as you know that your dataset
can be described by clearly defined equations, you are all set. That is why
I pay so much attention to examine and verify that the system can be
described by the first-order equation. As soon as it can, and this is
verified by quantitative criteria, you are all set. You do not need to know
superficial details, which we cannot know anyway.

You are trying to describe a general situation, when one has a dataset
without an idea what is does represent, does it reflect a genetic drift or
not, how many lineages in it, etc. In that case it is indeed "shooting in
the dark", and the margin of error will be huge.

When you DO know that the system is described as a first-order dataset, and
it passes the necessary verifications, you treat it as a plain system, and
the margin of error is relatively small. Particularly when you have hundreds
of haplotypes and thousands (and tens of thousands) alleles amd mutations in

As simple as that.

Anatole Klyosov

----- Original Message -----
From: "Lancaster-Boon" <>

> Of course Ken's most basic point ....... is clear, and that
> is that you can not count real mutations, only "genetic distance". [[[
> The real number of mutations in a tree is just an appetizer to the whole
> meal. It was raised only because it was claimed that the mutations were
> being counted which at best is a very odd use of words. The real issue
> was statistical confidence intervals of age estimates which for intraclade
> cases can not be determined without assuming a full tree structure as well
> as having the mutational rules for the STRs.

[[[ One can count any attributes you want to define about FINAL haplotypes
in hand. Hopefully the attributes you invent are useful and can be related
to other interesting things. Variances between haplotypes turns out to be
useful; and in more restricted circumstances GDs between haplotypes are
useful. Measuring and counting attributes about haplotypes in hand is not
counting the original mutations. For every node in the tree DOWNSTREAM of
any mutation along a line of descent, the number of lines of descent
carrying the effect of the original mutation goes up by one. But even
knowing the number of underlying mutations would not tell you what the final
variance or average GD in your population would be. You also would need to
know WHERE each mutation happened. This is because early mutations in the
tree are carried in the end by a bigger fraction of the haplotypes than late
mutations in the tree are. That's why we have to live with expected values
for which averages are being taken over all possible placements of the
mutations in the tree which could occur given the mutational rules of the
STRs. And then we have to realize that the outcome is one example of a
distribution of possible outcomes. The shape (width) of that distribution
about the expected value for any parameter of interest is the statistical
confidence interval.

[[[ Fortunately for us, EXPECTED values for variance or average GD of final
haplotypes turn out to be independent of most all of the ugly details of the
tree structure. Actually, that's why these attributes are around and are
commonly used. To obtain the useful expected values, an average of the
parameter of interest is made over ALL possible number and placements of the
STR mutations into the tree, with each placement occuring with a calculable
probability given mutational rules.

[[[ But to obtain the distribution of variance or average GD about the
expected values (and hence produce the statistical confidence intervals),
the ugly details of tree structure DO NOT drop out of the examination of all
ways the mutations could have been thrown into the tree to produce the
variance or average GD. So unknown tree structure means unknown statistical
confidence intervals --- although sometimes one can produce "worse case" and
"best case" limits to their values.

[[[ Getting clear in mind this distinction between the expected values
(average value over distribution) of our key attributes of a haplotype
collection, and the shape of the distributions, especially its width,
answers just about all the issues at hand. Ken ]]]

This thread: