Archiver > GENEALOGY-DNA > 2010-02 > 1265560474

From: "Ken Nordtvedt" <>
Subject: Re: [DNA] Intraclade Age Sigma "Unknowable"
Date: Sun, 7 Feb 2010 09:34:34 -0700
References: <009601caa80d$da9ca2a0$5e82af48@Ken1> <>

Doing TMRCA for a pair of haplotypes is doing an INTERCLADE age estimate.

Look at your tree. There is no demographic uncertainty; there is only the
tree depth uncertainty which you will estimate.

You don't need intraclade confidence intervals if you quit doing intraclade
age estimates. Reformulate what you want to learn in terms of a collection
of interclade age estimates.

----- Original Message -----
From: "Al Aburto" <>
To: <>
Sent: Sunday, February 07, 2010 9:28 AM
Subject: Re: [DNA] Intraclade Age Sigma "Unknowable"

> We sure do need a sigma though. We have the data ... attached to it
> there is indeed a sigma, but you are saying it is unknowable from
> current haplotypes.
> I have tried doing TMRCA estimates in pairs of (unique) haplotypes
> (using Walsh's infinite allele method) on a group of "n" haplotypes and
> then getting the mean and sigma from that for the group (cluster) of "n"
> haplotypes. Is this meaningful?
> Al
> > On 2/7/2010 7:54 AM, Ken Nordtvedt wrote:
>> One way to estimate TMRCA of a clade is to find the sum of STR variances
>> of a sample of clade haplotypes of today, with variances measured to an
>> assumed founding haplotype. Then divide by sum of STR mutation rates:
>> Gest = Sum i [r(i,m) - rf(m)]^2 / NM == Var / M
>> r(i,m) is the repeat value of the mth STR of the ith haplotype. N is
>> number of haplotypes, M is sum of STR mutation rates. For young clades
>> the variances become essentially GD counts.
>> rf(m) is founder haplotype's repeat value for the mth STR.
>> But due to the stocastic (random) nature of STR mutations, the right hand
>> side of the above equation (sum of STR variances) will be a distribution
>> which sometimes falls above its average value and sometimes below. We
>> want to know the width of that distribution, so we can get a sense for
>> the statistical uncertainty of the age estimate which is based on what
>> happens "on average" The more STRs we use in our haplotypes the better
>> we can assume to be near average, but there is always this statistical
>> uncertainty. How big is it for the intraclade age estimate? Basically
>> we can not tell without knowing the early demographics of the y tree
>> which starts with the haplotype sample population's MRCA and ends with
>> the N sample haplotypes G generations later.
>> The analytic formula for the statistical confidence interval for
>> reasonably young clades is given by:
>> Variance of Var = M { Sum c f(c)^2 }
>> And the 1 sigma confidence interval for Gest is then SquareRoot {Variance
>> of Var} / M
>> Variance of Var is simply unknowable without knowing the tree
>> demographics --- the f(c), particularly their values in the tree's
>> earliest generations. The fractions f(c) are relatively large early in
>> the tree and get smaller and smaller as we approach the end of the tree,
>> being 1/N on each of the N branch segments which terminate with our N
>> sample haplotypes.
>> The label c stands for each male in the y tree which ends with our sample
>> population of N haplotypes. f(c) is the fraction of those N haplotypes
>> for which male "c" is an ancestor. The sum over c can be done as a sum
>> over branch lines in each generation of the tree and then a sum over
>> generations from 1 to G. The number of branch lines is 2 in the first
>> generation after the MRCA, and it increases by one every time a tree node
>> on one of the branch lines comes along, and that branch line number ends
>> up being N in the last generation before the present. We can simply call
>> that number of branch lines each generation P(G), the tree population in
>> generation G.
>> Consider the sum of f(c)^2 in any particular generation of the tree.
>> While the sum of f(c) for any particular generation must be one, the sum
>> of squares can be as big as one if one particular branch line hogs almost
>> all the ancestry, but it can be no smaller than 1 / P(G). So we can
>> produce an expression for the minimum size that Variance of Var can be
>> under the most democratic tree ancestry scenario in which every ancestor
>> in the tree in every generation shares equally in the fraction of
>> ultimate descendants.
>> Variance of Var> M { Sum g from 1 to G of 1 / P(g) }
>> Even this lower limit for the Variance of Var depends on the
>> "unknowable" --- the tree population each generation, and especially the
>> earliest tree generations when P(g) is small. In particular, the
>> Variance of Var is basically determined by how fast the early tree
>> population grows from its initial size of 2.
>> Note that Variance of Var does not keep getting smaller toward zero as N
>> gets larger. If we used for our sample population the entire population
>> today for some clade, perhaps numbering millions, we still have a
>> Variance of Var lower limit dominated by the reciprocal of the tree
>> population in the early generations. P(g) begins at 2, regardless. In
>> fact, a good sample population of haplotypes which covers the early
>> generations of the tree as does the full population is all that is
>> required to statistically do about as well as one practically needs to
>> do.
>> Note that the only thing you can do to drive down the statistical
>> confidence interval more and more is to increase M, the sum of STR
>> mutation rates of your haplotypes.
>> Ken
>> PS: Interclade statistical confidence intervals on the other hand can be
>> conservatively quoted by an upper bound which depends on no demographic
>> knowledge. One can forget about the dispersion of the tree within each
>> clade and consider the simple "V tree" consisting of two branch lines
>> from the interclade node to the present. The statistical confidence
>> interval for the TMRCA of that simplified tree is then straightforwardly
>> evaluatable and is used in Generations4 for the 1 sigma values of the
>> interclade age estimates.
> -------------------------------
> To unsubscribe from the list, please send an email to
> with the word 'unsubscribe' without the
> quotes in the subject and the body of the message

This thread: