GENEALOGY-DNA-L Archives

Archiver > GENEALOGY-DNA > 2007-03 > 1174948943


From: (John Chandler)
Subject: Re: [DNA] Cruciani and 2007 TMRCA estimates
Date: Mon, 26 Mar 2007 18:42:23 -0400 (EDT)
References: <KHEKIJEABJGJEKDPFEDMEEJADGAA.elizabethod@eircom.net>
In-Reply-To: <KHEKIJEABJGJEKDPFEDMEEJADGAA.elizabethod@eircom.net>


Elizabeth wrote:
> Am I correct that your 'statistical uncertainly' is equivalent to standard
> deviation?

Yes.

> Your own rates have an SD of 15-20% with a sample of 8430.
> Why/how is that different than using father-son pairs?

My analysis included all of the available father-son data in addition
to the collection of 37-marker haplotypes. Basically, the father-son
data establish the average level while the extended haplotypes carry the
calibration to markers that haven't been included in father-son
studies, as well as smoothing out the statistics for the markers
that have.

> logical to me that having 2 or 3 times as many pairs would be 2 or 3 times
> better in arriving at *real* rates. Why is that not so?

It's a consequence of probability theory. The variance of the sum of
independent random variables is equal to the sum of the variances of
the variables. However, the average of the random variables is equal
to the sum divided by the number of variables. In this context, each
pair is a random variable with a value of "0" for a match on the
marker of interest or "1" for a mismatch. The estimated mutation rate
is the average of these variables, and the standard deviation of the
estimate is the square-root of the variance of the average. Assuming
uniform probability of mutation, then, the standard deviation is
proportional to the inverse of the square-root of the number of pairs.

John Chandler


This thread: