**GENEALOGY-DNA-L Archives**

From:James Heald <>Subject:[DNA] Mutation rates (particularly for John Chandler)Date:Fri, 16 Feb 2007 12:54:28 +0000John (and list),

On Feb 16, 2007 at 02:38 Vincent Vizachero wrote:

> I don't think there is much dispute that

> the rates published by John Chandler in the Journal of Genetic

> Genealogy last fall are the best estimates of mutation rates for the

> most commonly used markers.

On the basis of a quick look at the paper, I've got some anxieties:

1. The assumption of no back-mutations, that the exponential

distribution is based on.

-- Given that your cut-off was 25/37 matches and closer, this starts to

look unsafe. At least for the fastest mutators, there has to be a real

chance of a mutation and then a back-mutation in the time for the other

loci to have accumulated 12 differences.

This could cause a systematic under-estimate of the mutation rates for

the fastest mutators. But it should be relatively straightforward to

calculate a reasonable correction.

2. The error uncertainties (the most important issue).

-- I think these could be /way/ too low.

Least squares and chi-squared estimates are based on the assumption that

your observations are IID Gaussian, with N independently identically

distributed observations.

My fear is the paper hugely underestimates the effect of shared

histories, and shared mutation paths, in reducing the effective number

of independent observations.

The calculation is based on analysing the conditional probability P_AB =

P_AB(j,b-1|b) of a mismatch at locus j (and b-1 others) between two

haplotypes A and B, _given_ b mismatches in total.

*But*, once you know that a mismatch has occurred (or not) for locus j

on the path from A to B, this informs the question of whether a mismatch

may also have occurred for locus j on the path from A to C, because some

of that path may be shared with the path from A to B. It also informs

the chances of a mismatch on the path from C to D, if C is close to A,

because some of that path may be shared with the path from A to B.

Conclusion: given P_AB(j,b-1|b), you cannot assume P_CD(j,b-1|b) is

independent of P_AB, if haplotype C is anywhere near haplotype A. So

you cannot assume you have two independent parameter observations, given

these probabilities.

If the mutation only occurred once, and that even is included in both

the paths A->B and C->D, then you only have one observation of its rate,

not two.

According to the paper, the number of independent observations was taken

to be the smaller of "the number of pairs found in a given b-bin, and

the total number of haplotypes".

I am anxious that that may actually be a huge over-estimate of the real

effective number of independent observations, leading to a huge

under-estimate of the possible error uncertainty.

To assess this, one thing to try might be to do separate estimates for

the haplotypes from R1b, R1a and I1 (or whatever happen to be useful

groups to partition the data in).

How does the "Sigma N-1" estimated error for each mutation rate, based

just on those three numbers, compare with the standard errors you were

estimating as a whole ?

3. (Technical point)

The mutation rates mu are scale parameters which cannot go negative. So

probably you should be estimating log(mu), rather than mu itself.

An accurate final probability distribution for mu is more likely to be

log-normal than normal. So on a non-log scale, the distribution is

likely to appear slanted to the left, with a long tail to the right;

with the 95% quantile is much further from the median than the 5%

quantile the other way.

In such a situation, a least squares estimate for mu (or even a

straigntforward average) will overweight high values, and underweight

low ones.

This can be taken care of by estimating log(mu) rather than mu.

Interested to know what you think,

Best regards,

James.

**This thread:**

**[DNA] Mutation rates (particularly for John Chandler) by James Heald <>**- Re: [DNA] Mutation rates (particularly for John Chandler) by "Elizabeth O'Donoghue" <>

- Re: [DNA] Mutation rates (particularly for John Chandler) by (John Chandler)
- Re: [DNA] Mutation rates (particularly for John Chandler) by James Heald <>
- Re: [DNA] Mutation rates (particularly for John Chandler) by (John Chandler)

- Re: [DNA] Mutation rates (particularly for John Chandler) by James Heald <>

- Re: [DNA] Mutation rates (particularly for John Chandler) by Doug McDonald <>
- Re: [DNA] Mutation rates (particularly for John Chandler) by (John Chandler)
- Re: [DNA] Mutation rates (particularly for John Chandler) by "Ken Nordtvedt" <>
- Re: [DNA] Mutation rates (particularly for John Chandler) by (John Chandler)

- Re: [DNA] Mutation rates (particularly for John Chandler) by Doug McDonald <>
- Re: [DNA] Mutation rates (particularly for John Chandler) by (John Chandler)

- Re: [DNA] Mutation rates (particularly for John Chandler) by "Ken Nordtvedt" <>

- Re: [DNA] Mutation rates (particularly for John Chandler) by (John Chandler)