**GENEALOGY-DNA-L Archives**

From:"Ken Nordtvedt" <>Subject:Re: [DNA] Intraclade Age Sigma "Unknowable"Date:Sun, 7 Feb 2010 09:39:53 -0700References:<009601caa80d$da9ca2a0$5e82af48@Ken1> <4B6EEA3B.3090005@san.rr.com>If you do the interclade age estimates for all the N(N-1)/2 pairs of

haplotypes of your collection, you will get a distribution of age

estimates --- that distribution being the result of two factors. 1) the

statistical distribution of interclade age estimates about the true values;

and 2) the actual distribution of true TMRCAs between the pairs of

haplotypes of your sample collection. For instance; if you happen to

include Mr Jones and his great uncle in your sample collection, their actual

TMRCA will be just a couple generations, so their estimated TMRCA will be

the distribution appropriate to a G of 2, regardless of the whole

collections TMRCA.

----- Original Message -----

From: "Al Aburto" <>

To: <>

Sent: Sunday, February 07, 2010 9:28 AM

Subject: Re: [DNA] Intraclade Age Sigma "Unknowable"

> We sure do need a sigma though. We have the data ... attached to it

> there is indeed a sigma, but you are saying it is unknowable from

> current haplotypes.

>

> I have tried doing TMRCA estimates in pairs of (unique) haplotypes

> (using Walsh's infinite allele method) on a group of "n" haplotypes and

> then getting the mean and sigma from that for the group (cluster) of "n"

> haplotypes. Is this meaningful?

> Al

>

> > On 2/7/2010 7:54 AM, Ken Nordtvedt wrote:

>> One way to estimate TMRCA of a clade is to find the sum of STR variances

>> of a sample of clade haplotypes of today, with variances measured to an

>> assumed founding haplotype. Then divide by sum of STR mutation rates:

>>

>> Gest = Sum i [r(i,m) - rf(m)]^2 / NM == Var / M

>>

>> r(i,m) is the repeat value of the mth STR of the ith haplotype. N is

>> number of haplotypes, M is sum of STR mutation rates. For young clades

>> the variances become essentially GD counts.

>> rf(m) is founder haplotype's repeat value for the mth STR.

>>

>> But due to the stocastic (random) nature of STR mutations, the right hand

>> side of the above equation (sum of STR variances) will be a distribution

>> which sometimes falls above its average value and sometimes below. We

>> want to know the width of that distribution, so we can get a sense for

>> the statistical uncertainty of the age estimate which is based on what

>> happens "on average" The more STRs we use in our haplotypes the better

>> we can assume to be near average, but there is always this statistical

>> uncertainty. How big is it for the intraclade age estimate? Basically

>> we can not tell without knowing the early demographics of the y tree

>> which starts with the haplotype sample population's MRCA and ends with

>> the N sample haplotypes G generations later.

>>

>> The analytic formula for the statistical confidence interval for

>> reasonably young clades is given by:

>>

>> Variance of Var = M { Sum c f(c)^2 }

>> And the 1 sigma confidence interval for Gest is then SquareRoot {Variance

>> of Var} / M

>>

>> Variance of Var is simply unknowable without knowing the tree

>> demographics --- the f(c), particularly their values in the tree's

>> earliest generations. The fractions f(c) are relatively large early in

>> the tree and get smaller and smaller as we approach the end of the tree,

>> being 1/N on each of the N branch segments which terminate with our N

>> sample haplotypes.

>>

>>

>> The label c stands for each male in the y tree which ends with our sample

>> population of N haplotypes. f(c) is the fraction of those N haplotypes

>> for which male "c" is an ancestor. The sum over c can be done as a sum

>> over branch lines in each generation of the tree and then a sum over

>> generations from 1 to G. The number of branch lines is 2 in the first

>> generation after the MRCA, and it increases by one every time a tree node

>> on one of the branch lines comes along, and that branch line number ends

>> up being N in the last generation before the present. We can simply call

>> that number of branch lines each generation P(G), the tree population in

>> generation G.

>>

>> Consider the sum of f(c)^2 in any particular generation of the tree.

>> While the sum of f(c) for any particular generation must be one, the sum

>> of squares can be as big as one if one particular branch line hogs almost

>> all the ancestry, but it can be no smaller than 1 / P(G). So we can

>> produce an expression for the minimum size that Variance of Var can be

>> under the most democratic tree ancestry scenario in which every ancestor

>> in the tree in every generation shares equally in the fraction of

>> ultimate descendants.

>>

>> Variance of Var> M { Sum g from 1 to G of 1 / P(g) }

>>

>> Even this lower limit for the Variance of Var depends on the

>> "unknowable" --- the tree population each generation, and especially the

>> earliest tree generations when P(g) is small. In particular, the

>> Variance of Var is basically determined by how fast the early tree

>> population grows from its initial size of 2.

>>

>> Note that Variance of Var does not keep getting smaller toward zero as N

>> gets larger. If we used for our sample population the entire population

>> today for some clade, perhaps numbering millions, we still have a

>> Variance of Var lower limit dominated by the reciprocal of the tree

>> population in the early generations. P(g) begins at 2, regardless. In

>> fact, a good sample population of haplotypes which covers the early

>> generations of the tree as does the full population is all that is

>> required to statistically do about as well as one practically needs to

>> do.

>>

>> Note that the only thing you can do to drive down the statistical

>> confidence interval more and more is to increase M, the sum of STR

>> mutation rates of your haplotypes.

>>

>> Ken

>>

>> PS: Interclade statistical confidence intervals on the other hand can be

>> conservatively quoted by an upper bound which depends on no demographic

>> knowledge. One can forget about the dispersion of the tree within each

>> clade and consider the simple "V tree" consisting of two branch lines

>> from the interclade node to the present. The statistical confidence

>> interval for the TMRCA of that simplified tree is then straightforwardly

>> evaluatable and is used in Generations4 for the 1 sigma values of the

>> interclade age estimates.

>>

>>

>

>

> -------------------------------

> To unsubscribe from the list, please send an email to

> with the word 'unsubscribe' without the

> quotes in the subject and the body of the message

>

**This thread:**

- [DNA] Intraclade Age Sigma "Unknowable" by "Ken Nordtvedt" <>
- Re: [DNA] Intraclade Age Sigma "Unknowable" by Al Aburto <>
- Re: [DNA] Intraclade Age Sigma "Unknowable" by "Ken Nordtvedt" <>
- Re: [DNA] Intraclade Age Sigma "Unknowable" by Al Aburto <>
- Re: [DNA] Intraclade Age Sigma "Unknowable" by "Ken Nordtvedt" <>
- Re: [DNA] Intraclade Age Sigma "Unknowable" by "Ken Nordtvedt" <>

- Re: [DNA] Intraclade Age Sigma "Unknowable" by "Ken Nordtvedt" <>

- Re: [DNA] Intraclade Age Sigma "Unknowable" by Al Aburto <>

**Re: [DNA] Intraclade Age Sigma "Unknowable" by "Ken Nordtvedt" <>**- Re: [DNA] Intraclade Age Sigma "Unknowable" by "Ken Nordtvedt" <>

- Re: [DNA] Intraclade Age Sigma "Unknowable" by "Ken Nordtvedt" <>

- Re: [DNA] Intraclade Age Sigma "Unknowable" by Al Aburto <>