GENEALOGY-DNA-L Archives

Archiver > GENEALOGY-DNA > 2010-02 > 1266440866


From: (John Chandler)
Subject: Re: [DNA] Question for John Chandler
Date: Wed, 17 Feb 2010 16:07:46 -0500
References: <mailman.4572.1266045490.2099.genealogy-dna@rootsweb.com> <A597BD8569E24290A10B37465204E6B5@anatoldesktop> <525460709C6A493EBD3D6BECD514B319@john><REME20100214190716@alum.mit.edu><000601caaeef$640b0120$2c210360$@com><REME20100216194750@alum.mit.edu><D9854FFD8A924EB9A9295845122C981C@john>
In-Reply-To: <D9854FFD8A924EB9A9295845122C981C@john> (ajmarsh@arrrg.org)


John wrote:
> If there were chances of 3.5 mutations on CDYb between 9 haplotypes, (about
> 100 mutation opportunities on each marker in the group) that is about 30%
> chance of a back mutation occurring in a single individual on that marker.

Not quite. The probability is only 33% that any mutation at all will
occur for a given individual on that marker, and only 2.4% that a back
mutation will occur. That amounts to 23% chance of a back mutation
occurring somewhere in the set. And, of course, that's always assuming
that there really were 99 separate mutation opportunities, which is
very unlikely.

> Further, there is also a 30% chance of a back mutation one of the 9
> individuals on CDYa. That is only 2 markers out of 37, and we have in total
> about 60% chance of a back mutation in that set of 9 haplotypes in
> approximately a 330 year period.

This is in the non-linear regime where probabiliies don't simply add.
You have to take the complement, then multiply, and then complement
again. Some of the time, there will be two or more back mutations
in the same set:

(1 - (1-0.23)x(1-0.23)) = 2x0.23 - 0.23^2

That last term is the non-linear contribution.
Net: about 41% instead of 46% and distinctly short of 60%.

> If all 37 markers were considered, the
> chances of a back mutation would increase. If this is the case, back
> mutations might be more of an issue in the genealogical time frame than
> Anatole suggested.

The point you're missing is that the possibility of back mutations is
just another contribution to the margin of error. As you observed,
the expected number of mutations for CDYb is about 3.5 for the set.
There is already a probability of about 54% that there will be 3 or
fewer mutations, despite the expectation, and 32% that there will be
2 or fewer. The possibility of a back mutation is, in fact, strongest
when there are *more* than 3 mutations along the way, and so the bare
probabilities we have been slinging are less alarming than they may
seem. The primary effect is to compress the tail of the distribution
of "excessive" mutation cases.

> And this is not even considering the considerably
> increased chances of parallel mutations.

Keep in mind that there is a basic terminology issue here. When
Anatole talks about counting mutations, he means counting genetic
distances from the modal haplotype. A second, parallel mutation on a
given marker contributes to the genetic distance in exactly the same
way as the first instance on that marker. Of course, it may confuse
the estimated "tree" you construct for the testees, but it doesn't
hurt the TMRCA estimation.

John Chandler


This thread: