Archiver > GENEALOGY-DNA > 2010-02 > 1266274194

From: "Alister John Marsh" <>
Subject: Re: [DNA] Ken's point: problems other than back and parallelmutations
Date: Tue, 16 Feb 2010 11:49:54 +1300
References: <B16E9CC810A54776AF63EEBEF4C0E8AC@PC>
In-Reply-To: <B16E9CC810A54776AF63EEBEF4C0E8AC@PC>


Just a brief comment on the significance of back and parallel mutations in
Anatole's TMRCA calculations.

As I understand it, back mutations could mess things up in Anatole's method
if they were at inconvenient places, but Anatole says they are rare in the
genealogical time frame. In most cases they are much rarer than parallel
mutations in the genealogical time frame, so he is probably right that they
are less significant in the majority of genealogical time frame cases.

Concerning parallel mutations, I think I can see why Anatole is saying it
does not matter what you call them, they are still mutations. They would
still count as mutations in Anatole's system, and in the count it does not
matter if they are back mutations or forward mutations.

But the thing that concerns me, is that Anatole appears to use an "assumed
ancestral haplotype" as the basis for counting mutations, and I don't think
he always picks the right ancestral haplotype if there are parallel
mutations scattered in the field.

In the example I put to him a few days ago, there were I believe a few
parallel mutations which caused Anatole, in my view (he disagrees) to
misidentify the ancestral haplotype. Anatole appears to have superficially
examined the data, and because he did not think the matter of back mutations
through, misidentified the probably ancestral haplotype. He appears to have
simplistically assumed the modal haplotype, distorted by having over
representation of one branch of a 2 branch family tested, was the ancestral
haplotype. If he had recognized parallel mutations as impacting on his
system, he could have looked deeper, and found in my view what is more
likely to have been the ancestral haplotype. Or at the very least, he could
have recognized the ancestral haplotype was ambiguous, and the alternatives
might give results differing by a factor of 2. He could have qualified his
results, by stating that if the alternative ancestral haplotype applied,
which had perhaps 50% chance of applying, his calculation may be out by a
factor of 2.

To be fair to Anatole, I was aware of some partial genealogical, historical,
and geographical information concerning the example I put to him. I have
been studying this family for decades. This gave me a considerable
advantage over him. However, it showed I think that by simplistically
looking at the raw DNA data, without the contextual genealogical,
historical, and geographical data, and considering the potential affect of
back mutations, it was possibly to be out by of the order of a factor of 2.

Anatole's system in my view might work tolerably well if the ancestral
haplotype can be correctly identified. But if parallel mutations are
lightly dismissed as just as well being "vertical" mutations, then Anatole
is missing an opportunity to improve the outcomes of his system.

Sometimes ancestral haplotype can be inferred by considering parallel
mutations. But probably more often, parallel mutations make things very
ambiguous, and then selecting the ancestral haplotype becomes a lottery and
picking the wrong haplotype might change the mutation count by a factor of

Anatole says if the logarithmic method agrees with the mutation count
method, all is well. I can't comment on that, the maths is well outside of
my comfort zone. But "I think" in the genealogical time frame (particularly
with less than a dozen random mutations) , the play of feral mutations might
cause the logarithmic and mutation count method to sometimes randomly match
even if both are wrong. But I expect Anatole to remind me that what "I
think" has no place in mathematics. He is probably right, and I take no
offence if he should remind me of that.


-----Original Message-----
[mailto:] On Behalf Of Lancaster-Boon
Sent: Tuesday, February 16, 2010 6:02 AM
Subject: [DNA] Ken's point: problems other than back and parallel mutations

I wrote:
>> So the real discussion should be about HOW AND WHEN you can be sure
that the chance of things like back mutations and parallel mutations are
small, and how much their possible existence in any set of haplotypes is
affecting your confidence interval. I think this is Ken's point,

Ken wrote:
> That's not my main point.

Firstly thanks to Ken and John Chandler (in his post to John Marsh) for the
explanation I have snipped off. It really helped me at least. Hopefully
others also read both those posts.

However, having thought about it, my comment above MIGHT be closer than you
think to a description of where your points lead to, WHEN looking at
Anatole's method. But whether I am wrong or not I hope you do not mind if I
try wording it out for your consideration...

NOTE: It is highly likely I have misrepresented people below!
Sorry everyone as always in such cases, but please correct me where
necessary. That is the point. :)

Of course Ken's most basic point about Anatole's method is clear, and that
is that you can not count real mutations, only "genetic distance".

But (correct me if I am wrong) you are saying that while Anatole's method
might give an unbiased estimation of time back to common direct ancestor,
knowing the confidence interval will depend on knowing the real family tree,
and counting the real number of mutations. (Hence for confidence interval
the difference between genetic distance and real mutation count is
critical.) Correct so far?

Anatole on the other hand is clearly saying that this is all missing the
point, because he has a kind of step 1 in his procedure where he makes sure
that he has a group of haplotypes which are "first order" in their descent
from their common ancestor. And, (OPTION 1) once he is sure of this, he
believes that the maths says that you can ignore the possibility of things
like back mutations (and presumably also parallel mutations) because their
chance of having occurred is approximately zero in genealogical contexts.

I guess John Marsh, David Ewing and I are all saying this does not sound
right to us based on practical experience etc. I think I am right in saying
that we see the frequent apparent occurence of back and parallel mutations
in genealogical projects as a sign of where the problem might lie in
Anatole's step 1, because it shows that to begin with his initial
assumptions that he can the difference between complex and "first order"
haplotype sets raises a lot of questions about what you can know about the
family tree and number of mutations. Ken and John Chandler have reminded us
that it is bigger than this, but there is a sort of relationship between the
two concerns maybe.

Anatole seems to believe that he understands all these concerns anyway, and
also to think they are not relevant to his method:-

*Firstly he is possibly saying that he can objectively see when a tree is
"first order" or "complex" by seeing if his linear and logarithmic methods
agree or not. (I have asked him in another post if this is correct.)

*Secondly there is OPTION 2. I think he is saying that no matter what, the
logarithmic method always works, and the logarithmic method needs no
knowledge of anything about the family tree (and therefore the real number
of mutations) in order to give both an age estimate and a confidence

I hope the above run through my reading of others is close enough to reality
that it at least lets other people point to the errors.

Best Regards

