Archiver > GENEALOGY-DNA > 2010-02 > 1266257703

From: "Ken Nordtvedt" <>
Subject: Re: [DNA] Ken's point: problems other than back and parallelmutations
Date: Mon, 15 Feb 2010 11:15:03 -0700
References: <B16E9CC810A54776AF63EEBEF4C0E8AC@PC>

----- Original Message -----
From: "Lancaster-Boon" <>

> Of course Ken's most basic point ....... is clear, and that
> is that you can not count real mutations, only "genetic distance". [[[
> The real number of mutations in a tree is just an appetizer to the whole
> meal. It was raised only because it was claimed that the mutations were
> being counted which at best is a very odd use of words. The real issue
> was statistical confidence intervals of age estimates which for intraclade
> cases can not be determined without assuming a full tree structure as well
> as having the mutational rules for the STRs.

[[[ One can count any attributes you want to define about FINAL haplotypes
in hand. Hopefully the attributes you invent are useful and can be related
to other interesting things. Variances between haplotypes turns out to be
useful; and in more restricted circumstances GDs between haplotypes are
useful. Measuring and counting attributes about haplotypes in hand is not
counting the original mutations. For every node in the tree DOWNSTREAM of
any mutation along a line of descent, the number of lines of descent
carrying the effect of the original mutation goes up by one. But even
knowing the number of underlying mutations would not tell you what the final
variance or average GD in your population would be. You also would need to
know WHERE each mutation happened. This is because early mutations in the
tree are carried in the end by a bigger fraction of the haplotypes than late
mutations in the tree are. That's why we have to live with expected values
for which averages are being taken over all possible placements of the
mutations in the tree which could occur given the mutational rules of the
STRs. And then we have to realize that the outcome is one example of a
distribution of possible outcomes. The shape (width) of that distribution
about the expected value for any parameter of interest is the statistical
confidence interval.

[[[ Fortunately for us, EXPECTED values for variance or average GD of final
haplotypes turn out to be independent of most all of the ugly details of the
tree structure. Actually, that's why these attributes are around and are
commonly used. To obtain the useful expected values, an average of the
parameter of interest is made over ALL possible number and placements of the
STR mutations into the tree, with each placement occuring with a calculable
probability given mutational rules.

[[[ But to obtain the distribution of variance or average GD about the
expected values (and hence produce the statistical confidence intervals),
the ugly details of tree structure DO NOT drop out of the examination of all
ways the mutations could have been thrown into the tree to produce the
variance or average GD. So unknown tree structure means unknown statistical
confidence intervals --- although sometimes one can produce "worse case" and
"best case" limits to their values.

[[[ Getting clear in mind this distinction between the expected values
(average value over distribution) of our key attributes of a haplotype
collection, and the shape of the distributions, especially its width,
answers just about all the issues at hand. Ken ]]]

This thread: