Archiver > GENEALOGY-DNA > 2010-02 > 1266293575

From: David Ewing <>
Subject: Re: [DNA] Variance Assessment wrt back and parallel mutations
Date: Mon, 15 Feb 2010 21:12:55 -0700

Hooboy. I have about run out of steam, but here is what I hope will be a
final rejoinder.

>>From: David Ewing <>
>>I hesitate to post the images Anatole sent me
without his permission...

>No problem, David, it is yours.
[[[DNE: Anatole's trees of the Ewing data are now available at and
I used the linear version, but these are alternative conformations of the
same tree.]]]

>>It (the tree) shows nodes, branches and the locations of the
terminal haplotypes, but does not label the branches with mutations.

>There was no any need for me, ever... All haplotypes are numbered and are
easily located on the tree. Branches cannot be labeled since they are often
a matter of interpretation. It the machine would label branches for me, I
would have turned this option off.
[[[DNE: Of course branches are "a matter of interpretation." To decide what
is an earlier and what is a later branch, one must make an assumption about
which was the earlier mutation.That labels are not printed on the tree does
not mean that the tree drawing program did not know which marker
corresponded to which branch. Branches are DETERMINED by mutations and they
certainly can be labeled. If they are labeled in some other way, this would
change the structure of the tree.]]]

>>Anatole has repeatedly reassured us that there is no need to worry about
>>parallel mutations, but 7 of the 21 mutations in this fraction of the tree
>>he sent to me are parallel mutations, which is a full third of them.

>Why wouldn't you call them "perpendicular" mutations? What sense do you
attribute to those mutations, different from any other mutations? They might
belong to a certain lineage within a family, in that case they would form a
separate branch. Among those 509 of L21 haplotypes there are probably plenty
of those "parallel" mutations, however, they all are described by the
first-order dynamics, and present an uncomplicated case. What is so special
in those "parallel" mutations?
[[[DNE: This is the key point. We say a "parallel mutation" has occurred
when we have found two haplotypes that have the same value at a given marker
by coincidence rather than by descent. We do not call these "perpendicular
mutations" because no one would know what we are talking about. We could
call them "recurrent mutations," which is a synonym for "parallel
mutations," though the latter is in more common use among people who
participate on this list. I did not invent the term "parallel mutation." It
has a well-defined meaning. As you have often pointed out, in principle we
could call such mutations anything we like as long as we understand what we
are talking about. Indeed, we could all agree to begin using the term "cow"
to refer to what hitherto had been known as a "cat," but why would we do

[[[What is "so special about parallel mutations" is that they introduce
confusion as to descent. If there were NO parallel mutations, then when we
found a given marker value in several haplotypes, we could be sure that all
individuals having it had a common ancestor who had it. As it is, when we
find a given marker value in several haplotypes, we have to wonder whether
they all inherited this from a common ancestor or if the mutation giving
rise to it arose independently, by coincidence, in some unrelated lineages.
The question I am asking, have been asking, is how often does this happen? I
have understood you to say, "not too often," though it appears now that I
may be mistaken, and you claim below that you implied rather that parallel
mutations are a "non-issue." My work has led me to believe that parallel
mutation happens plenty often. The trees you sent me showed about 1/3 of all
mutations occurred, by coincidence, in parallel with identical mutations in
other lines. This is definitely not a "non-issue" because it strikes at the
heart of being able to make claims about shared common ancestors based on
congruent marker values.]]]

>>... so we will be able to demonstrate that although Anatole does not
>>believe parallel mutations happen very often...

>It is NOT what I said. I said that back mutations do not contribute into a
TMRCA calculations during the first 26 generations, and contribute very
little, within the margin of error up to 2,000 ybp. Regarding "parallel"
mutations I only said (or implied) that I do not want to pay attention at
them because they are non-issue. If they cause separate branch formation,
fine, I just analyze that branch. I do not call them "parallel" since they
are just mutations, as anything else.
[[[DNE: You have advised us repeatedly not to create trees "by eye." But
that is exactly what your phylogeny program seems to have done. The four
"main" branches it identified in the Ewing data were defined by DYS 439 = 12
(for the "oldest branch"), (DYS 576 = 19 in the second branch you listed),
DYS 391 = 10 (in the third branch you listed, which corresponds closely to
what I have called Ewing Group 2), and a residual group (your fourth branch
consisting of haplotypes that did not have one of these--or at least mostly
did not).

[[[How were these markers chosen to define the main branches? It appears
that the three markers with the greatest number of matching off-modal values
were chosen, except that CDY markers were excluded. In the Ewing data we are
working with, there were
27 haplotypes with D391=10
11 haplotypes with D439=12
9 with D576=19
11 with CDYa=36
10 with CDYb=37
8 with CDYa=35
6 with CDYb=39
5 with CDYa=38
On what basis were the CDY markers excluded as markers for "major" branches?
The mutation rates are an order of magnitude faster, but is that a criterion
for making a selection in shallow lineages such as most of ours? And what
makes "minor" branch markers of
7 with D439=14 and 7 with YIIb=22
6 with D439=30, 6 with 6/D576=17 and 6 with 6/D442=12?

[[[Further, if the first three matching off-modal marker values mentioned
above took precedence, why weren't all haplotypes with these values included
in the respective branches? I think the answer to this is that some of the
haplotypes had two or more of these values, so there was no clear basis for
assigning them to one or another branch--and the marker for whichever branch
was not chosen showed up as a parallel mutation in the branch that was

[[[This is the problem: as soon as you define branches based on whichever
markers, you are going to find haplotypes that have two or more of the
defining markers, such that assigning any such haplotype to one branch
forces you to identify the other marker as having resulted from a parallel
mutation. And this happens A LOT--1/3 of the time in the tree Anatole's
program made from the Ewing data.

[[[Does this affect the TMRCA calculation? I do not know. Anatole has told
us that it is important to find the major branches and do separate TMRCA
calculations on the separate major branches. He has not told us what
constitutes a "major" branch. We can parse the tree in a lot of different
ways. Should we use the most slowly mutating markers? (Probably not--but
then why leave out CDY?) Should we use the markers where there are the
greatest number of off modal matches, as Anatole's program seems to have
done? If so, how do we decide whether there should be 3, 4, 5, 6 or 7 "main
branches." Should we define branches by choosing only markers where at least
10% of the haplotypes in the data set have an off modal match at that
marker? If we are not going to just eyeball it, we must have an


>Anatole Klyosov

[[[DNE: Thank you for your ongoing patience and kindness. Warm Regards,
David Ewing]]]

This thread: