GENEALOGY-DNA-L Archives

Archiver > GENEALOGY-DNA > 2009-12 > 1260651753

From: (John Chandler)
Subject: Re: [DNA] R-U152 and R-L21 on the European Continent
Date: Sat, 12 Dec 2009 16:02:33 -0500
References: <200912120748.nBC7mtLt011441@mail.rootsweb.com>

Tim wrote:
> Thanks for your comments today. My recollection of our lengthy
> discussion this past summer on this topic was that there is a reasonable
> probability that John's mutation rate estimates for the slowest mutating
> markers are inaccurate to some extent.

The issue isn't one of accuracy, but rather of likelihood. My results
are from a maximum-likelihood estimator, as opposed to a mean-value
estimator. What does this signify? The mean value is an average over
all possible universes that have different properties but give the
same experimental results we are analyzing. The maximum-likelihood
value is the most probable answer in any universe that gives these
experimental results, including this universe. A mean-value estimator
has the unfortunate property of changing in retrospect when two
independent experiments are combined, such as father-son studies and
relative-mutation-rate studies of public databases. To take a simple
example, suppose you perform an experiment of counting successful
outcomes of ten trials and get one success. The maximum-likelihood
estimate of the success rate is 1/10. The mean-value estimate is 1/6.
If someone else performs the same experiment independently and gets
the same result, his corresponding estimates would also be 1/10 and
1/6, respectively. If you then join up and make a combined estimate,
however, the two 1/10 MLE's give a joint estimate of 1/10, same as the
two separate experiments, but the MVE's do not -- they take 1/6 and
1/6 and "average" together as 3/22.

For that matter, the difference between the two approaches diminishes
as the data set grows, being about 1 part in N, where N is the number
of independent trials. The difference can be noticeable if N is 10,
as in the example I presented, but it's negligible when N is large. I
don't recall the exact numbers from my relative-rate analysis, but the
very slowest marker had an N somewhere between 70 and 100.

> "1. The TMRCA for Y Adam could be as much 300,000-400,000 years if the
> calculations using the slowest markers are to be believed.

It is important to quote uncertainties along with estimates. In the
example I gave above, the 1/10 estimate was 0.1 +/- 0.1 for the
individual experiments, encompassing the 1/6 mean-value estimate
comfortably. The combined experiment gave the same estimate, 1/10,
but the uncertainty was reduced to +/- 0.07, still comfortably
encompassing the MVE 3/22. As it happens, if you throw away all the
fast markers, the TMRCA uncertainty goes to pot, and that is why a
result based only on the slowest markers is not very reliable in any
case -- because it is too uncertain.

John Chandler