Archiver > GENEALOGY-DNA > 2008-07 > 1215789789

From: "Sandy Paterson " <>
Subject: Re: [DNA] calibrating for common ancestor - QUESTION?
Date: Fri, 11 Jul 2008 16:23:09 +0100
References: <><><001201c8e204$80f38a50$6400a8c0@Ken1><000001c8e261$131d6de0$0201a8c0@owner8151f88a9><003a01c8e28f$1540fb40$6400a8c0@Ken1><000001c8e293$afc23770$0201a8c0@owner8151f88a9><000001c8e2bc$7a6d6da0$0201a8c0@owner8151f88a9><00d101c8e2bf$f4ab5840$6400a8c0@Ken1><000001c8e335$b0d8f700$0401a8c0@owner8151f88a9><009401c8e361$4aa30670$6400a8c0@Ken1>
In-Reply-To: <009401c8e361$4aa30670$6400a8c0@Ken1>

Hi Ken

Many thanks.

Two things become clear. I need a faster computer (already ordered)
and a website to post some interesting results.

Sandy Paterson

-----Original Message-----
[mailto:] On Behalf Of Ken Nordtvedt
Sent: 11 July 2008 15:21
Subject: Re: [DNA] calibrating for common ancestor - QUESTION?

----- Original Message -----
From: "Sandy Paterson " <>

> In examining this it would help if I could produce correct results for the
> variance method, but I'm battling to understand something in "Extended
> Haplotype...... Computer Simulation". Perhaps you could elaborate
> on something for me? Other listers may benefit too.
> Var = mG is easily verifiable, with Var calculated with reference to the
> known ex-ante mode.
> For the next bit, namely SVar = m[G - sum of f(c)^2],tests quickly show
> that
> G = (SVar/m) + something.
> But I can't work out how that something can be estimated in practice, and
> I
> can't quite get my head round your definition of f(c).
> Can you help?

The derivation for self variance is OUTLINED in a file at my website Admittedly more detail would be needed
for a little paper on the derivation.

So we know for selfvariance that SVar / M is an UNDERESTIMATOR of the MRCA
age at the expected value level. And the amount of underestimation can be
expressed in terms of features of the actual tree, with the important
features being early in the tree.

Take any father/son transition (particular generation on a particular branch

segment) in the tree. Look downstream from that point and add up all the
final haplotypes which descend from that point. That fraction of the total
haplotypes terminating the tree is the fraction fc for that point

I try not to have to estimate those corrections in real life. That's why I
use more and more the INTERCLADE variance to estimate the age back to a MRCA

for two independent clades. That formula has none of that correction stuff.

G(AB) = Var(AB) / 2 M

Var(AB) = Sum over a in A, Sum over b in B of (ra-rb)^2 / { N(A) N(B) }

In words, Var(AB) is the average squared differences of marker repeats over
all pairs of haplotypes, one taken from clade A and one taken from clade B.
Draw a picture; every such pair has path length of precisely 2G between
them, so that is why the theoretical formula does not have that correction

The actual optimum formula has weighting factors which down weight somewhat
the fast markers. This is due to a variance of the variance which grows
non-linearly with marker mutation rate. You can see the weights and
resulting modification to G(AB) formula in my website or in Cullen's

If I have to estimate those corrections Sum over c of fc^2 I consider a
simple tree leading to my sample population, and just perform the sum. If I

have some special knowledge or suspicion about the tree, such as early
division into two sub-clades, the sum estimation can be altered accordingly.

But my preference is to formulate age studies in terms of interclade MRCA
estimations for the reason I gave above.


To unsubscribe from the list, please send an email to
with the word 'unsubscribe' without the
quotes in the subject and the body of the message

This thread: