**GENEALOGY-DNA-L Archives**

From:"Ken Nordtvedt" <>Subject:Re: [DNA] Ken's point: problems other than back and parallelmutationsDate:Mon, 15 Feb 2010 11:15:03 -0700References:<B16E9CC810A54776AF63EEBEF4C0E8AC@PC>----- Original Message -----

From: "Lancaster-Boon" <>

> Of course Ken's most basic point ....... is clear, and that

> is that you can not count real mutations, only "genetic distance". [[[

> The real number of mutations in a tree is just an appetizer to the whole

> meal. It was raised only because it was claimed that the mutations were

> being counted which at best is a very odd use of words. The real issue

> was statistical confidence intervals of age estimates which for intraclade

> cases can not be determined without assuming a full tree structure as well

> as having the mutational rules for the STRs.

[[[ One can count any attributes you want to define about FINAL haplotypes

in hand. Hopefully the attributes you invent are useful and can be related

to other interesting things. Variances between haplotypes turns out to be

useful; and in more restricted circumstances GDs between haplotypes are

useful. Measuring and counting attributes about haplotypes in hand is not

counting the original mutations. For every node in the tree DOWNSTREAM of

any mutation along a line of descent, the number of lines of descent

carrying the effect of the original mutation goes up by one. But even

knowing the number of underlying mutations would not tell you what the final

variance or average GD in your population would be. You also would need to

know WHERE each mutation happened. This is because early mutations in the

tree are carried in the end by a bigger fraction of the haplotypes than late

mutations in the tree are. That's why we have to live with expected values

for which averages are being taken over all possible placements of the

mutations in the tree which could occur given the mutational rules of the

STRs. And then we have to realize that the outcome is one example of a

distribution of possible outcomes. The shape (width) of that distribution

about the expected value for any parameter of interest is the statistical

confidence interval.

[[[ Fortunately for us, EXPECTED values for variance or average GD of final

haplotypes turn out to be independent of most all of the ugly details of the

tree structure. Actually, that's why these attributes are around and are

commonly used. To obtain the useful expected values, an average of the

parameter of interest is made over ALL possible number and placements of the

STR mutations into the tree, with each placement occuring with a calculable

probability given mutational rules.

[[[ But to obtain the distribution of variance or average GD about the

expected values (and hence produce the statistical confidence intervals),

the ugly details of tree structure DO NOT drop out of the examination of all

ways the mutations could have been thrown into the tree to produce the

variance or average GD. So unknown tree structure means unknown statistical

confidence intervals --- although sometimes one can produce "worse case" and

"best case" limits to their values.

[[[ Getting clear in mind this distinction between the expected values

(average value over distribution) of our key attributes of a haplotype

collection, and the shape of the distributions, especially its width,

answers just about all the issues at hand. Ken ]]]

**This thread:**

- [DNA] Ken's point: problems other than back and parallel mutations by "Lancaster-Boon" <>
**Re: [DNA] Ken's point: problems other than back and parallelmutations by "Ken Nordtvedt" <>**

- Re: [DNA] Ken's point: problems other than back and parallelmutations by "Alister John Marsh" <>