GENEALOGY-DNA-L Archives
Archiver > GENEALOGY-DNA > 2009-07 > 1247070327
From: "Ken Nordtvedt" <>
Subject: [DNA] Age Estimate Confidence Intervals
Date: Wed, 8 Jul 2009 10:25:27 -0600
If you estimate ages back to nodes between two haplotypes or two populations, assume simple mutation model, and optimally weight STRs in your estimate for G, the fractional statistical precision (squared) of your estimate will be:
Var(G) / G^2 = 2 / Sum i [4m(i)G / {1+4m(i)G}]
This can be evaluated in the young clade limit (all m(i)G <<1),
and the very old clade limit (all m(i)G>>1 which we don't actually reach for modern man's history since Adam.)
Young: Var(G) / G^2 = 1 / 2MG
Old: Var(G) / G^2 = 2 / N
with M being sum of marker mutation rates,
and N being number of STRs
Note several things.
1) fractional confidence interval is atrocious as G goes to zero (genealogy)
2) fractional confidence interval gets huge if you chop M down by throwing away the fast STRs
3) fractional confidence interval for old G still wants as many STRs (N) as possible
Some rough examples:
Example: G = 250 (7500 years), M = 1/50
95 percent confidence interval = plus/minus 5000 years.
This example was suggested by use of yhrd markers.
Example: G = 150 (4500 years), M = 1/10
95 percent confidence interval = plus/minus 1200 years
Example G = 2000 (60,000 years), N = 24
95 percent confidence interval = plus/minus 40,000 years
I used limiting forms of the equation to simplify the work. You can apply the actual sum for real m(i) and G if you wish.
The actual sum will do somewhat worse than my simplifications, so above numbers are optimistic.
The above based on node age estimator:
G = Sum i [ Var(i) w(i) ] / 2 Sum i [m(i) w(i) ]
with w(i) = 1 / [1+4m(i)G]
I realize some are throwing away STRs for various reasons; I just wanted to remind that there is a cost to doing so --- larger statistical confidence intervals for estimates.
This thread:
| [DNA] Age Estimate Confidence Intervals by "Ken Nordtvedt" <> |