GENEALOGY-DNA-L Archives

Archiver > GENEALOGY-DNA > 2010-02 > 1266191728


From: "Ken Nordtvedt" <>
Subject: [DNA] Interclade TMRCA influenced by Number of STRs
Date: Sun, 14 Feb 2010 16:55:28 -0700


It is known that it is total mutation rate of all haplotype STRs which is used to make (interclade) age estimates which includes TMRCA for any pair of haplotypes.

<G> = Sum i of Var(i) divided by 2 Sum i of m(i) == Sum i of Var(i) / 2M
i is summed over STRs, <G> is the expected value estimator of age in generations.
M = Sum i of m(i)

But it turns out many small m(i) adding up to total M is better than fewer large m(i) adding up to total M as far as the statistical confidence interval of the age estimate is concerned. For simplicity, let's suppose our total M is composed of N identical rate STRs, each with m = M/N

Then the 1 sigma value for G estimation is given by:

dG(1sigma) = Squareroot { G(1+4MG/N) / 2M }

As can be seen, the larger N the smaller 1sigma dG is for fixed M. This is due to the non-linear nature of the individual STR distribution of variance values.
Variance of Var(i) = 2 m(i) G {1 + 4 m(i) G }

Without that non-linearity, the 1 sigma would be just Squareroot { G/2M } , in which case how M was composed would not matter.


The result has a generalization valid for haplotype STRs of mixed mutation rates.

It should be interesting to see how the actual shape of the Sum i of Var(i) distribution changes for fixed total M, as we go from few fast mutators to many slow mutators.
The simulation program has to be fired up to do this.
Ken



a..



This thread: