Archiver > GENEALOGY-DNA > 2010-02 > 1266191728

From: "Ken Nordtvedt" <>
Subject: [DNA] Interclade TMRCA influenced by Number of STRs
Date: Sun, 14 Feb 2010 16:55:28 -0700

It is known that it is total mutation rate of all haplotype STRs which is used to make (interclade) age estimates which includes TMRCA for any pair of haplotypes.

<G> = Sum i of Var(i) divided by 2 Sum i of m(i) == Sum i of Var(i) / 2M
i is summed over STRs, <G> is the expected value estimator of age in generations.
M = Sum i of m(i)

But it turns out many small m(i) adding up to total M is better than fewer large m(i) adding up to total M as far as the statistical confidence interval of the age estimate is concerned. For simplicity, let's suppose our total M is composed of N identical rate STRs, each with m = M/N

Then the 1 sigma value for G estimation is given by:

dG(1sigma) = Squareroot { G(1+4MG/N) / 2M }

As can be seen, the larger N the smaller 1sigma dG is for fixed M. This is due to the non-linear nature of the individual STR distribution of variance values.
Variance of Var(i) = 2 m(i) G {1 + 4 m(i) G }

Without that non-linearity, the 1 sigma would be just Squareroot { G/2M } , in which case how M was composed would not matter.

The result has a generalization valid for haplotype STRs of mixed mutation rates.

It should be interesting to see how the actual shape of the Sum i of Var(i) distribution changes for fixed total M, as we go from few fast mutators to many slow mutators.
The simulation program has to be fired up to do this.


This thread: