GENEALOGY-DNA-L Archives

Archiver > GENEALOGY-DNA > 2006-08 > 1154669488


From: (John Chandler)
Subject: Re: Fast markers (was: [DNA] research strategy for genealogists)
Date: Fri, 4 Aug 2006 01:31:28 -0400 (EDT)
References: <307.268ac480.3203bdb3@aol.com> <7150F7A2-B358-4B80-9A1E-6BDBBD541F5E@vizachero.com>
In-Reply-To: <7150F7A2-B358-4B80-9A1E-6BDBBD541F5E@vizachero.com> (messagefrom Vincent Vizachero on Thu, 3 Aug 2006 21:08:15 -0500)


Vince wrote:
> Okay, I read the argument and I think I understand it. But I think
> it is incorrect.

It is correct for the infinite alleles model. I've done the calculation.

> I suspect the original example employed Bayes theorem, and thus
> conditional probabilities.

Only in the inversion that takes you from the probability of N
mutations given G enerations to the probability of G generations
given N mutations.

> If you make some assumptions about the
> interdependency of the mutations,

No, we assume that all mutations are independent.

Here is a sketch of the algebra:

Probability of a match on marker i after G generations is M_i = (1-m_i)^G,
where m_i is the mutation rate for marker i. Therefore, the probability
of a mismatch is 1 - M_i = M_i (G m_i + [G^2+G] m_i^2 / 2 + ...)
where I have truncated an infinite series to the two leading terms
in the small parameter G m_i.

If we keep only the first term, then the probability of one mismatch
out of all markers is just G m_i times a big exponential which is
symmetric in all markers. However, in the inversion, any constant
factor (such as m_i) drops out, and so the effective probability after
normalization is just G times the exponential. Therefore, the maximum
likelihood value of G is approximately the inverse of the sum of all marker
mutation rates -- independent of which marker has the mismatch.

However, if you keep the second term above, you get a correction to
the max. likelihood of G. Instead of 1/B (where B=sum of rates), you
get (1 + A/B + AB/4) / B (where A is m_i / 2 as in the factor of the
second term above). Thus the max. likelihood value of G actually
*increases* if you choose a faster marker instead of a slower one.

Note, however, that B is held fixed, and so we're *not* talking about
increasing the rates of the markers -- just choosing a faster one from
the list.

Ok, so it's counter-intuitive. Even the fact that the predicted
value of G is nearly independent of which marker you choose is
counter-intuitive. Nonetheless, if you think about it long and
hard, it will begin to seem "obvious".

John Chandler


This thread: