Archiver > GENEALOGY-DNA > 2010-02 > 1266253349

From: "Lancaster-Boon" <>
Subject: [DNA] Ken's point: problems other than back and parallel mutations
Date: Mon, 15 Feb 2010 18:02:29 +0100

I wrote:
>> So the real discussion should be about HOW AND WHEN you can be sure
that the chance of things like back mutations and parallel mutations are
small, and how much their possible existence in any set of haplotypes is
affecting your confidence interval. I think this is Ken's point,

Ken wrote:
> That's not my main point.

Firstly thanks to Ken and John Chandler (in his post to John Marsh) for the
explanation I have snipped off. It really helped me at least. Hopefully
others also read both those posts.

However, having thought about it, my comment above MIGHT be closer than you
think to a description of where your points lead to, WHEN looking at
Anatole's method. But whether I am wrong or not I hope you do not mind if I
try wording it out for your consideration...

NOTE: It is highly likely I have misrepresented people below!
Sorry everyone as always in such cases, but please correct me where
necessary. That is the point. :)

Of course Ken's most basic point about Anatole's method is clear, and that
is that you can not count real mutations, only "genetic distance".

But (correct me if I am wrong) you are saying that while Anatole's method
might give an unbiased estimation of time back to common direct ancestor,
knowing the confidence interval will depend on knowing the real family tree,
and counting the real number of mutations. (Hence for confidence interval
the difference between genetic distance and real mutation count is
critical.) Correct so far?

Anatole on the other hand is clearly saying that this is all missing the
point, because he has a kind of step 1 in his procedure where he makes sure
that he has a group of haplotypes which are "first order" in their descent
from their common ancestor. And, (OPTION 1) once he is sure of this, he
believes that the maths says that you can ignore the possibility of things
like back mutations (and presumably also parallel mutations) because their
chance of having occurred is approximately zero in genealogical contexts.

I guess John Marsh, David Ewing and I are all saying this does not sound
right to us based on practical experience etc. I think I am right in saying
that we see the frequent apparent occurence of back and parallel mutations
in genealogical projects as a sign of where the problem might lie in
Anatole's step 1, because it shows that to begin with his initial
assumptions that he can the difference between complex and "first order"
haplotype sets raises a lot of questions about what you can know about the
family tree and number of mutations. Ken and John Chandler have reminded us
that it is bigger than this, but there is a sort of relationship between the
two concerns maybe.

Anatole seems to believe that he understands all these concerns anyway, and
also to think they are not relevant to his method:-

*Firstly he is possibly saying that he can objectively see when a tree is
"first order" or "complex" by seeing if his linear and logarithmic methods
agree or not. (I have asked him in another post if this is correct.)

*Secondly there is OPTION 2. I think he is saying that no matter what, the
logarithmic method always works, and the logarithmic method needs no
knowledge of anything about the family tree (and therefore the real number
of mutations) in order to give both an age estimate and a confidence

I hope the above run through my reading of others is close enough to reality
that it at least lets other people point to the errors.

Best Regards

