From: "Alister John Marsh"
Subject: Re: [DNA] TMRCA assessments
Date: Mon, 15 Feb 2010 08:41:19 +1300
Anatole,

Yes, I think we are making progress. My responses to your latest comments
are as follows...

BACK MUTATIONS: You said
>>>>>>>
(Anatole) John, it is nice to have guts, and maybe even nicer to have gut
feeling. However, let's strike them out in context of this discussion. Let's
stick to "a little basic maths". Frankly, I doubt that you have used it
here. I am not sure that you have "in your hands" a math equation showing a
contribution of back mutations compared to "forward" mutations on a time
scale. Because, if you would have considered it, you would have known that
10% "addition" due to back mutations occur far beyond the family study time
periods.
<<<<<<<

I will explain to you what I was meaning when I said I had used a "little
bit of basic maths".

I have not checked what the current estimates of mutation rates are on the
faster markers, and they may have changed a bit from when I originally
extracted data. However, some time ago, the following was quoted as the
mutation rates for the following markers...
DYS576= 0.015 (1 mutation every 66 transmissions)
DYS570= 0.014 (1 mutation every 71 transmissions)
CDYa= 0.017 (1 mutation every 59 transmissions)
CDYb= 0.017 (1 mutation every 59 transmissions).

In the example I gave you to examine a few days ago, one subset of the group
were 9 haplotypes, mostly having a comb shaped tree with a common ancestor
about 400 years back according to your estimate. I personally think the
common ancestor may have been earlier, but lets go with 400 years for now.
The average age of participants may have been near 70 years, so that is
about 330 years/ say 11 birth/ mutation opportunities in each line on each
marker. That means that for the group of 9 haplotypes, there are 9x11=99
opportunities for each marker to mutate. Since this is just "basic maths",
and I have made assumptions about the number of mutation opportunities, lets
just call that a round number of 100 mutation opportunities per marker in
the tree. If there were 100 mutation opportunities per marker in a 330 year
period, well within the "genealogical time frame", then the markers would
have likely mutated the following number of times...
DYS576= 1.5 times
DYS570= 1.4 times
CDYa= 1.7 times
CDYb= 1.7 times.

If the time period was 660 years, 22 generations, the markers would likely
have mutated the following number of times...
DYS576= 3.0 times
DYS570= 2.8 times
CDYa= 3.4 times
CDYb= 3.4 times.

By my rough calculations, this means there is about 10% chance that one
person of the 9 will have two mutations on one marker, so about 5% chance
that there will be a back mutation. This could be a 5% chance that the
mutation count was out by 2 mutations, or more depending on where the
mutation occurred in the tree, and it if had more than one sub branch below
it.

In the tree of 660 years. There statistically would also be about 5
parallel mutations in the 660 year tree, just on those 4 markers.

If you considered all 37 markers instead of just 4, the chance of parallel
and back mutations would be increased.

The significance of the parallel and back mutations to the mutation count
would vary hugely depending on where they took place in the tree.

PICKING ANCESTRAL HAPLOTYPE: Picking an ancestral haplotype to use as the
basis for mutation counts seems to me rather arbitrary.

In the example I gave a few days ago, in the case of marker CDYb, it could
have had ancestral as 37, 38, or 39. I believe you may have assumed it was
39, as 39 was "modal" for the group, I assumed it was 38, based partly on
"R1b modal being 38", but it could just as easily have been 37. If it was
39, the mutation count for just CDYb could be viewed as 5 (or even 3 if you
count 2 step differences as a single mutation). If it was 38, the mutation
count could be 13. If it was 37, the mutation count could be 19.

My feeling is that unless you already know something about the tree
structure from other types of evidence, you can't be sure whether to
arbitrarily pick 37, 38, or 39 as the ancestral haplotype at CDYb.

LOCATION OF HAPLOTYPE D IN MY EXAMPLE TREE:

You said...
>>>>>>>
(Anatole) John, you are making the same mistake as before. Do not rely on
your "eye" and the "given that I think" stuff, when you try to sort out
haplotypes. You are going to fail. And you did fail in this particular case.
You have divided your dataset wrongly, I have commented on it earlier. This
person (named D in your dataset) is not "the most distantly related" at all.
It shares his mini-branch with "B" and "C" which are equally "distant" (in
fact, close).
<<<<<<<

I still can't agree with you in this particular instance. Haplotypes B and
C are documented lines descending from Robert Marsh born 1707. I believe
that several branches from the main tree 100 years before Robert don't have
the mutation he had DYS464c=16, which B and C share. I believe that the
mutation to DYS464c=16 in this line probably occurred between about 1600 and
1707. If it had occurred earlier, it should have appeared in other lines
tested from that village.

Haplotype D is for a line or Marshes at a location about 50 miles away from
where Robert Marsh b.1707 lived. There were Marshes in the village of
haplotype D as far back as 1300, and circumstantial historical evidence
would make it possible that those Marshes in 1300 could possible be related
to the Marsh at the village of Robert Marsh by about 1200. I might well be
wrong, but I believe that the historical context makes a relationship
between D and Robert Marsh more likely before the year 1300. I think it is
coincidental that D also has DYS464c=16, and I think it is one of those
parallel mutations which you don't believe are significant.

Putting that aside for the moment, D (Marsh) J (Teague) and K (Tyndall all
have DYS449=29. Marsh D comes from an Essex line. Marshes A B C E F G H I
all come from Cambridgeshire, or nearby Suffolk, and may all stem from a
common ancestor in a particular Cambridgeshire village, and all have
DYS449=28. My supposition was that because all 3 surnames had at least one
person with DYS449= 29, it may have been ancestral in the common ancestor of
all 3 surnames. All of the Marshes from Cambridgeshire and Suffolk which
surround the village of Weston Colville have DYS449=28.

My assumption based on the above is that an early Marsh had a mutation from
29 to 28, and went to the Cambridgeshire area, and was the ancestor of A B C
E F G H I.

I think you are using a circular argument. You are saying that if you count
the 4 step difference between D compared to B and C at DYS570 as a single
mutation event of 4 steps, you can say that D is close to B and C, and if he
is close to B and C, it proves that the 4 step mutation is a single mutation

The difference between your analysis and mine I think is that I have put the
raw haplotypes in a genealogical and historical context, even although the
context is not concrete. But because of my reference to context, I take the
view that DYS449=28 is a local mutation to the Cambridgeshire cluster, and
the Essex haplotype D is DYS449=29, which was likely ancestral, as it is
shared with other surnames related earlier. Because you appear unwilling to
recognize parallel mutations as likely, you don't seem open to the
possibility that D's line had a parallel mutation on DYS464c.

As it happens, I believe D has also had a parallel mutation to me on
DYF399b. I may be able to confirm that with further test results expected
any day. It would be very unlikely that D could be closely related to B and
C with DYS464c=16 from a common ancestor, and have DYF339b=24 in common with
me when I am not closely related to the DYS464c=16 cluster. At least one of
the mutations must be either a parallel mutation in D, but in probability
both are parallel mutations.

PROOF OF PARALLEL MUTAIONS/ BACK MUTATIONS: Proof of parallel or back
mutations depends on context of a cluster of haplotypes. Their existence
can be proved to credible scientific standards. I won't go into the matter
in detail now, but if you are prepared to open your mind to the possibility,
you may realize this yourself.

I am testing many of the Marshes on up to 130 markers. I am hoping to
eventually be able to prove that back mutations and parallel mutations have
taken place. I will give you an example at some time in the future if I can
find one with "concrete" scientific proof of existence. I would be
surprised if others on this list could not give you definite proof of back
mutations and parallel mutations.

John.

-----Original Message-----
From:
[mailto:] On Behalf Of Anatole Klyosov
Sent: Sunday, February 14, 2010 5:40 AM
To:
Cc: Anatole Klyosov
Subject: Re: [DNA] TMRCA assessments

>From: "Alister John Marsh" <>

Dear John,

I am glad that we are making progress in mutual understanding as well as in
your understanding of my approach. Whether or not I share your concerns (aka
understand their ground) remains to be seen from my responses below.

>(John) BACK MUTATIONS: In "most cases" in the genealogical time frame,
back mutations and parallel mutations may (gut feeling plus a little basic
maths)
have less than 10% impact on TMRCA estimates. Less than 10% impact is not
significant, unless there are several other factors which might be adding
10% errors.

(Anatole) John, it is nice to have guts, and maybe even nicer to have gut
feeling. However, let's strike them out in context of this discussion. Let's
stick to "a little basic maths". Frankly, I doubt that you have used it
here. I am not sure that you have "in your hands" a math equation showing a
contribution of back mutations compared to "forward" mutations on a time
scale. Because, if you would have considered it, you would have known that
10% "addition" due to back mutations occur far beyond the family study time
periods. Here is a little table for your information, it is simple and
handy. It shows a contribution of back mutations versus time:

below 575 years bp - less than 1-2%
625-950 ybp - 2% to 3%
1000-1200 ybp - 3% to 5%
to 1500 ybp - 6% to 7%
to 2000 ybp - 7.5% to 8.1%
to 3000 ybp - 9% to 12%
to 4000 ybp - 14% to 17%
to 5000 ybp - 17% to 20%
to 10,000 ybp - 21% to 39%
to 20,000 ybp - 40 to 75%

As you see, even until 3000 ybp it is within a typical margin of error (with
95% confidence).

>(John) Given how you have explained you "calibrated" your average
generation time/mutation rate, I don't believe your choice of 25 years for
generation time
has caused any significant errors. Given this, back mutations alone are not
typically a serious problem in the "genealogical time frame" if they are
mostly have less than 10% impact.

(Anatole) You got it right.

If fact, a choice of 25 years per generation causes NO error whatsoever,
since for 30 years per generation, or ANY other number the mutation rate
should be just adjusted. The final TMRCA will be exactly the same.

>(John) GENERATION TIME/ MUTATION RATE CALIBRATION: When I said your
calibration bears no relationship to father/ son mutation study data,
perhaps I did not word that very well. What I meant was that you "did not
use" father son study data to arrive at your calibration, and that appears
to be correct.

(Anatole) Yes, it is correct. Because in fact I saw right away that I got
the same numbers as those around of father-son data. Except the father-son
data are all over the place. The accuracy is not there, and on an obvious
reason - too few mutations there in a large number of father-son pairs.
Those data cannot be used for TMRCA calculations. However, they are very
valuable, since provide a kind of a "mental comfort". They showed that my
calibration was principally correct.

>(John) However, it appears that your calibration of mutation rates/
generation times is probably coming up with similar average results to
father son
studies. So although your process is different and does not rely on
father/son studies, your mutation rate would be similar (if adjusted for a
30 year
generation time). I have not checked them in detail, but they do look
generally similar.

Generally, yes. However, if you make a table with all father-son data, and
there are not many of them, your will see that they cover quite of a range,
from 0.0013 to 0.0040 (a ballpark values). Some of them are clear outliers,
and the core is around 0.0020 mutations per haplotype per generation. You do
not need to adjust it for a 30 years per generation (is it a kind of a
religious number? Why not 29, or 33, or 35, or 27, or whatever?), since my
"calibrated" (with 25 years per generation, which is a fixed, a
"mathematical" number) are in the same ballpark, but more accurate ones.

>(John) If you used 700 year pedigrees like the MacDonald one to calibrate
your system, on the one hand you might have some advantages in that if there
are
any non random aspects to occurrence and survival of mutations, your system
would automatically make allowance for them. But on the other hand, perhaps
we don't know for sure if all of the haplotypes in the MacDonald project
"descend from" the clan founder.

If I would have blindly restricted myself with MacDonald haplotypes and the
Lord John story (and the respective dates), you would be right. However, I
did not. I did multiple verifications with other systems and datasets. Just
a simple example, one of many. When I have obtained - on the first (about)
60 MacDonald haplotypes - the average mutation rate constant of 0.022
mutations per 12-marker haplotype per generation, I took a look an John
Chandler's table. It came up as 0.02243 mutations per haplotype. The
difference of less than 2%. This is, of course, well within a margin of
error. This showed that my calibration at least did not conflict with John's
data on the first 12 markers.

However, lately MacDonalds added many more people to their table, who
brought many more mutations, and the TMRCA immediately went deeper in time.
>From the initial 650 years bp (to John, presumable) it went to about 800
ybp. So now it cannot be used for calibration anymore per se. However, it is

>(John) It is interesting to have your different approach to compare to
other approaches.

It makes two of us. However, some other folks do not share your (and mine)
attitude. They follow some kind of a negative pattern, which essentially is
(a) You shall not compare, (b) If you compare, if you wrong, and (c) If you
do not compare, you are wrong anyway.

>(John) In the example of mine which you looked at, you said you counted a 4
step difference on one haplotype at DYS607 as a single 4 step mutation.

(Anatole) Not exactly. I do not do things like that. I have calculated both
variants, with 4 and with 1 mutation, and showed that the difference is
within the margin of error. However, it was obvious that it was a single
4-step mutation, since the rest of the long haplotype was the same and of
the other long haplotypes from the dataset.

>(John) However, given that I think this person may be the most distantly
related Marsh to all of the other Marshes, it is not impossible that his 4
step difference on
the very fast mutating marker is the result of either 1, 2, 3, 4, 6, or more
separate mutations.

(Anatole) John, you are making the same mistake as before. Do not rely on
your "eye" and the "given that I think" stuff, when you try to sort out
haplotypes. You are going to fail. And you did fail in this particular case.
You have divided your dataset wrongly, I have commented on it earlier. This
person (named D in your dataset) is not "the most distantly related" at all.
It shares his mini-branch with "B" and "C" which are equally "distant" (in
fact, close).

>(John) Hypothetically, this arbitrary decision to count it as one could
contribute to a margin of error. You arbitrarily decided to read a 4 step
difference as one mutation, when it could possibly be 4 individual steps, or
even 6. You are probably right, it may be one single 4 step mutation, but
if you are wrong, then you may have underestimated the mutation count by up
to 3 or 5 mutation events.

(Anatole) I repeat, I have considered both cases. Did you miss this part in

It would have been VERY unlikely for the haplotype to make all the way with
four consecutive mutations, being the only member of the extended family
with "14" in that locus, what other have only 18 and 19 in the same locus.
Where are 15, 16, 17 and their offspring? Furthermore, with 4 on-step
mutations in one locus there must be plenty of other mutations in the
haplotype. There are none of them, which would make it different with other
haplotypes. Finally, in that situation I cannot exclude an erroneous typing
in the locus. Did you ask the person to repeat his test?

>(John) In the case of back mutations, you say you do not count them because
you can't see if they occurred or not.

It is incorrect, and its is a misunderstanding. I do not count them in the
first 26 generation because their contribution is negligible, not because I
do not see them. If do not see them because they do not exist in terms of
their contribution. What I said is that - why do you insist on their
existence when you do not see them anyway? On what ground you believe that
they "do exist". A shear belief? A kind of religion? Because someone told
you that?

A back mutation is a mutation you do not see anyway. When you see "13" how
do you know that it is back mutation in a first place? Back mutation would
be 12-->13 and 13--> 12 again. You can stare at all 12s in the dataset all
day long, but you cannot possibly know which 12 is the original one, and
which is a returned one. So I wonder when people here say that they "see"
back mutation, how do they manage to "see" them??

I "see" back mutations ONLY mathematically. And the math tells me - "forget
about them in the first 26 generations".

> (John) Regarding back mutations, if you study a cluster of related
haplotypes
closely, you can sometime prove that a back mutation or parallel mutation
has taken place.

(Anatole) Wow! Disclose your secret, please. How do you see them? How can
you "sometimes prove"??

>(John) So it is not quite as you said in a previous post that a back
mutation can't be counted because you can't see evidence it occurred.

>(John) If I could prove to you, as I may eventually be able to do, that 2
mutations on the example I gave you to analyze were back mutations, would
you still
refuse to count them, even if I could prove they had taken place?

(Anatole) Prove, please. I would love to learn something VERY unusual
(whispering to a side: no chance with those "back mutations"...)

Regards,

Anatole Klyosov

