GENEALOGY-DNA-L Archives

Archiver > GENEALOGY-DNA > 2010-02 > 1265862025


From: David Ewing <>
Subject: [DNA] Variance Assessment wrt back and parallel mutations
Date: Wed, 10 Feb 2010 21:20:25 -0700


I have been following with some interest the recent dialog between Alister
Marsh, Anatole Klyosov and others. Anatole has previously been very helpful
and patient with me off-list some time ago with regard to estimating TRMCA
on our good sized set of Ewing data, and I am not going to ask about that
again here. What I need help understanding is why I cannot make our data
square with the expectation that back and parallel mutations should not be a
significant factor in relatively shallow lineages.

I have made our data easily accessible for anyone who wants to look closely
at it. I have gathered all 83 of our R:M222 37-marker haplotypes into one
spread sheet, which is available at
http://dl.dropbox.com/u/431003/M222%2B%20Ewings.xls
There are three tabs, one each for Ewing Group 1, Group 2 and Group 3.
Details of how our Groups are constituted can be found at
http://dl.dropbox.com/u/431003/Results_Intro.pdf
but basically Groups 1 & 2 consist of 79 men within GD 5 of their own modal,
which differs from the R:M222 modal at seven markers, including CDYa/b.
Group 2 differs from Group 1 only in that all men in Group 2 have DYS 391 =
10 rather than DYS 391 = 11. Group 3 consists of four less closely related
men who do not have most of the distinctive "Ewing" markers. Subgroups
designated by lower case letters are assigned on the basis of conventional
genealogic connections--so: Y-DNA results get our men into a numbered Group
and conventional lineages get them into a lettered sub-group.
If anyone should want to see all of our data, including shorter and longer
panels, special markers, SNP testing and Ewing participants who are in other
haplogroups, this is also available at
http://dl.dropbox.com/u/431003/RawData.xls
but this message will consider only the 37-marker data on Ewings in R:M222.
(And if someone should want to see Relationship Diagrams showing
conventional genealogic lineages, please look at our website
http://www.ewingfamilyassociation.org/DNA_Project/index_Y-DNA.html
or contact me off-list and I will send you a link.)

Now, 83 haplotypes is a pretty big bite to chew, but this is not really what
I want you to chew on. I have prepared a different kind of analysis and
chart that I do want you to chew on. Some of you saw this at the last FTDNA
conference in Houston. This is a hypothetical phylogeny diagram in which I
have forced myself to put mutations in chronological order. It can be found
at
http://dl.dropbox.com/u/431003/EwingM222%2BTree.pdf
or, if you want to manipulate it and move stuff around, at
http://dl.dropbox.com/u/431003/EwingM222%2BTree.xls
The Excel version does not exactly match the pdf version, because I have
continued moving things around, making different hypotheses and trying to
minimize parallel and back mutations. On both charts, mutations in black
type are unique, those in red type are back mutations (or sometimes
simultaneously back and parallel), and those in blue type are parallel
mutations.

This will not be a familiar sort of diagram to many of you, so please take
some time to understand what is happening. At the very top of the chart is a
box designated R1b, which represents the AMH. The R:M222 and Ewing Group 4a
modals both differ from the AMH at DYS456 = 16, so you can see this on a
line under the R1b box before a branch. The R:M222 modal differs from the
AMH at another 11 markers, and these mutations are shown on the vertical
line leading down to the box designated M222+, which represents the R:M222
modal. Below that, you can see CDYa=37 and CDYb=38 printed in red on a
vertical line before the next branch point. These are shown in red because
to make sense of the diagram, these must be back mutations. The AMH has
CDYa/b = 37/38, the R:M222 modal has CDYa/b = 38/39, but the Ewing modal and
most of the men in Ewing Group 3 have CDYa/b = 37/38 ("back" to the AMH
values). (I think it is not necessary to stop and ponder whether the
observed difference is due to a single 2 step mutation from CDY 37 to 39,
but we should note that we could avoid showing these as back mutations by
putting the branch ABOVE the M222+ box and the two earlier CDYa/b mutations,
thus implicitly arguing that Ewing branched off before the R:M222 modal was
established. Still, I do not want to get bogged down in discussion of
CDY--the meat of my question is in the other markers.) All of the Ewing men
at and below the row of yellow boxes designated Ewing modal differ from the
AMH at all of the makers listed on the vertical lines above, except where a
back mutation takes one or another of them back.

Below the branch off to Group 3 are the five markers that we find most
useful in distinguishing Ewings from other R:M222 lineages. You should be
able to see easily why we think the men in Group 3 are not closely related
to the others--they do not have these five markers. You may notice that not
all of the men in the data tables have been included in the phylogeny chart.
This is only because it is a monumental pain in the butt to update, and I
have not gotten around to adding the newest data. Still, there are enough
here that you can see what is going on.

Ideally, one would construct a maximum parsimony tree to minimize the number
of mutations and eliminate as many putative parallel mutations as possible.
But I constructed this chart based partly on information in the conventional
lineages, so sometimes the maximum parsimony rule is violated. The important
things to notice are that (1) any haplotype can be placed anywhere on the
chart if you are willing to accept as many back and parallel mutations as
may be required to force the fit, and (2) placement of a haplotype on the
chart constitutes a hypothesis about descent and the order of mutations. We
should be very suspect of any line that contains a large number of back
and/or parallel mutations, because these should be rare events.

I have spent more hours than I care to admit moving haplotypes around on
this chart trying to minimize the number of back and parallel mutations
required. They just won't go away. On one counting, of 76 total mutations
below the Ewing modal haplotype, 15 were back mutations (8 of these at CDY)
and 39 were parallel mutations (10 of them at CDY) and there were only 22
unique mutations. Try moving a box somewhere else on the chart and see what
happens. I have found that the easiest way to check for back and parallel
mutations is to use the find function in Excel to look for where else a
mutation appears on the chart.

As an example of this, you can see two alternative parsings of part of Group
2 (shaded green) below participant JC2--one version attached to the rest of
the chart and a second version floating free below and to the right (only on
the pdf version). In some cases, it is easy to eliminate a parallel mutation
by moving a box. Consider participant GR, who appears in the rightmost
yellow box. You can see that we have called out a parallel mutation to DYS
391 = 10. To get rid of that, all we have to do is move him below the Group
2 modal--over between JC2 and WS to his right, for example (incidentally,
horizontal placement on the chart means nothing; the action is all
vertical). The problem is that this does violence to his conventional
genealogy. But maybe his conventional genealogy is wrong. Indeed, working
with this method raises this very question.

Here is a challenge: put MS (the lowest yellow box on the chart, more or
less nine columns from the left) somewhere on the chart so that no back or
parallel mutations are required.

I have satisfied myself that no amount of fooling around is going to reduce
the number of parallel and back mutations to a negligible number on this
chart, even if I completely forget about conventional genealogy and
construct a maximum parsimony tree. Indeed, I am confident that there is no
unique maximum parsimony tree. I first noticed this problem when working
with network diagrams, but they are simply too tangled to fuss with when you
have this many haplotypes.

Anatole likes point blank questions. Why can I not draw a chart like this
from our data that has a relatively small number of back and parallel
mutations?

Warm Regards,

David Ewing


This thread: