Archiver > GENEALOGY-DNA > 2010-02 > 1266304547

From: "Alister John Marsh" <>
Subject: Re: [DNA] Variance Assessment wrt back and parallel mutations
Date: Tue, 16 Feb 2010 20:15:47 +1300
References: <><C73CADC1F38E4147A484F459509F2474@anatoldesktop>
In-Reply-To: <C73CADC1F38E4147A484F459509F2474@anatoldesktop>


I guess we have thrashed this subject for a while. I have learned a lot of
positive things from the discussion, so it has been worth while for me.

I don't dismiss your system. Rather, I think that when selectively used in
conjunction with other contextual information it may be very useful in the
majority of cases. I quite like the basis of your system, and my comments
were more related to the details, than the concept.

To summarize my views, I think that your system would work well in cases
where an ancestral haplotype can be inferred with reasonably accuracy.
Clearly there will be other cases where circumstances make the ancestral
haplotype very ambiguous, and these may not suit your system.

Keep up with your good work in this area.


-----Original Message-----
[mailto:] On Behalf Of Anatole Klyosov
Sent: Tuesday, February 16, 2010 5:45 PM
Cc: Anatole Klyosov
Subject: Re: [DNA] Variance Assessment wrt back and parallel mutations

>From: "Alister John Marsh" <>
>Just a brief comment on the significance of back and parallel mutations in
>Anatole's TMRCA calculations.
>But the thing that concerns me, is that Anatole appears to use an "assumed
>ancestral haplotype" as the basis for counting mutations, and I don't think

>he always picks the right ancestral haplotype if there are parallel
>mutations scattered in the field.

>In the example I put to him a few days ago, there were I believe a few
>parallel mutations which caused Anatole, in my view (he disagrees) to
>misidentify the ancestral haplotype. Anatole appears to have superficially

>examined the data, and because he did not think the matter of back
>mutations through, misidentified the probably ancestral haplotype.

Dear John,

There were no "parallel mutations" in your dataset. There were a few
scattered "regular" mutations, and two haplotypes out of 11 clearly showed a

different pattern and belonged to a different lineage, though, not very
distant from the "main" one, some 400 years apart.

I repeat that you do not have any ground to assign any mutation there to
"back mutations". Those are just normal one-step mutations. Apparently, you
persuaded yourself that there must be back mutations there, but it is
incorrect. A probability to have even one-step mutation in your system is
very small, there are only 13 mutations among 333 alleles in the main
branch, and this corresponds to 400 years to a common ancestor for the

If you look at my correction table, calculated based on probability of back
mutations, you will see next to 400 ybp a zero correction for back

Now, to the ancestral haplotype. There are nine of "35", one "34" and
one"36" in the allele row. Which one is the ancestral one?

In the next column there are eight of "39", one "38" and two of "37", but
the latter belong to a different branch. Which one is the ancestral one in
the main branch?

Generally speaking, a mistake can be made in anything. A mistake can be made

in assigning of a wrong haplotype as an ancestral one, particularly in small

datasets like yours . That is why I avoid the word "ancestral" and call it
"base". However, in your case the situation with the base haplotype is very

You repeatedly use a phrase that by picking a wrong base haplotype I can
miss the boat by 2 times. Big deal. Tell Lev Zhivotovsky about it, he
constantly miss the boat by 3 times. Everyone in "academic" world following
the Zhivotovsky "procedure" missed it by 3 times. So with 2 times it does
not look too bad on that background. Of course it can happen. However, I
very seldom work with 11-haplotype datasets. I prefer hundreds of them, and
for those it is very unlikely to make a mistake with a base haplotype.

I suggest you to reconsider your family dataset, unless you have a solid
proof with family records. However, who can guarantee that the family
records do reflect the reality?




He appears to have
simplistically assumed the modal haplotype, distorted by having over
representation of one branch of a 2 branch family tested, was the ancestral
haplotype. If he had recognized parallel mutations as impacting on his
system, he could have looked deeper, and found in my view what is more
likely to have been the ancestral haplotype. Or at the very least, he could
have recognized the ancestral haplotype was ambiguous, and the alternatives
might give results differing by a factor of 2. He could have qualified his
results, by stating that if the alternative ancestral haplotype applied,
which had perhaps 50% chance of applying, his calculation may be out by a
factor of 2.

To be fair to Anatole, I was aware of some partial genealogical, historical,
and geographical information concerning the example I put to him. I have
been studying this family for decades. This gave me a considerable
advantage over him. However, it showed I think that by simplistically
looking at the raw DNA data, without the contextual genealogical,
historical, and geographical data, and considering the potential affect of
back mutations, it was possibly to be out by of the order of a factor of 2.

Anatole's system in my view might work tolerably well if the ancestral
haplotype can be correctly identified. But if parallel mutations are
lightly dismissed as just as well being "vertical" mutations, then Anatole
is missing an opportunity to improve the outcomes of his system.

Sometimes ancestral haplotype can be inferred by considering parallel
mutations. But probably more often, parallel mutations make things very
ambiguous, and then selecting the ancestral haplotype becomes a lottery and
picking the wrong haplotype might change the mutation count by a factor of

Anatole says if the logarithmic method agrees with the mutation count
method, all is well. I can't comment on that, the maths is well outside of
my comfort zone. But "I think" in the genealogical time frame (particularly
with less than a dozen random mutations) , the play of feral mutations might
cause the logarithmic and mutation count method to sometimes randomly match
even if both are wrong. But I expect Anatole to remind me that what "I
think" has no place in mathematics. He is probably right, and I take no
offence if he should remind me of that.


To unsubscribe from the list, please send an email to
with the word 'unsubscribe' without the
quotes in the subject and the body of the message

This thread: