GENEALOGY-DNA-L Archives

Archiver > GENEALOGY-DNA > 2007-03 > 1174772421


From: (John Chandler)
Subject: Re: [DNA] Cruciani and 2007 TMRCA estimates
Date: Sat, 24 Mar 2007 17:40:21 -0400 (EDT)
References: <KHEKIJEABJGJEKDPFEDMGEIADGAA.elizabethod@eircom.net>
In-Reply-To: <KHEKIJEABJGJEKDPFEDMGEIADGAA.elizabethod@eircom.net>


Elizabeth wrote:
> On the contrary, I'm suggesting just the opposite. Such studies would have
> 'real' numbers to compare.

When you put quotation marks around a word, you are attributing that
word to other people and proclaiming that it's not the word you would
have chosen for the given context. You did that before, and you've
done it again. In other words, you are saying above that, in your
opinion, the numbers are *not* real, even though other people might
consider them so. If you were intending to convey just the opposite,
it would be clearer to leave out the quotation marks.

> would be interesting to put together the statistics from all the father-son
> pairs in all the published studies and see what rates would come out of
> them.

This has become standard practice, ever since Gusmao et al (2005)
published a summary of published data along with their new results.
The only problem is that, with say 5000 pairs and a typical rate of
0.002, there are only 10 observed mutations for any one marker. That
means the statistical uncertainty on the direct, single-marker rate is
about 0.0006, or about 30% of the rate itself. This statistical
uncertainty improves only as the inverse square root of the number of
pairs included, and so we're already reaching the stage of diminishing
returns.

> That appears to me to mean that there may be different mutation rates in
> different haplogroups

Not in the sense that most people would expect you to mean by that
statement. The point is that Zhivotovsky et al are *not* estimating
mutation rates at all. They are estimating scaled rates of diversity
increase. The actual mutation rate is certainly the driving force in
the process of increasing diversity, but there are so many other
things going on that "mutation rate" is a very bad name for what they
estimate, and "effective mutation rate" is, if anything, even worse.

> In your own paper, 'Estimating Per-Locus Mutation
> Rates', you indicated that about half your sample was R1b. Were you able to
> see any apparent differences in range of the loci you studied between R1b
> and the rest?

The problem is that the statistical uncertainty gets rapidly worse
as the number of included data decreases. Certainly, there are
differences in a solution based solely on R1b, but their significance
has to be judged by the larger uncertainties of the more-limited
solution, not the uncertainties of the overall solution. I didn't
see any differences that appeared to be statistically significant.
(This is to a large extent built into the situation, since R1b is
such a large proportion of the whole dataset to begin with.)

John Chandler


This thread: