Archiver > GENEALOGY-DNA > 2010-02 > 1266079226

From: "Anatole Klyosov" <>
Subject: Re: [DNA] TMRCA assessments
Date: Sat, 13 Feb 2010 11:40:26 -0500
References: <>

>From: "Alister John Marsh" <>

Dear John,

I am glad that we are making progress in mutual understanding as well as in your understanding of my approach. Whether or not I share your concerns (aka understand their ground) remains to be seen from my responses below.

>(John) BACK MUTATIONS: In "most cases" in the genealogical time frame, back mutations and parallel mutations may (gut feeling plus a little basic maths)
have less than 10% impact on TMRCA estimates. Less than 10% impact is not significant, unless there are several other factors which might be adding
10% errors.

(Anatole) John, it is nice to have guts, and maybe even nicer to have gut feeling. However, let's strike them out in context of this discussion. Let's stick to "a little basic maths". Frankly, I doubt that you have used it here. I am not sure that you have "in your hands" a math equation showing a contribution of back mutations compared to "forward" mutations on a time scale. Because, if you would have considered it, you would have known that 10% "addition" due to back mutations occur far beyond the family study time periods. Here is a little table for your information, it is simple and handy. It shows a contribution of back mutations versus time:

below 575 years bp - less than 1-2%
625-950 ybp - 2% to 3%
1000-1200 ybp - 3% to 5%
to 1500 ybp - 6% to 7%
to 2000 ybp - 7.5% to 8.1%
to 3000 ybp - 9% to 12%
to 4000 ybp - 14% to 17%
to 5000 ybp - 17% to 20%
to 10,000 ybp - 21% to 39%
to 20,000 ybp - 40 to 75%

As you see, even until 3000 ybp it is within a typical margin of error (with 95% confidence).

>(John) Given how you have explained you "calibrated" your average generation time/mutation rate, I don't believe your choice of 25 years for generation time
has caused any significant errors. Given this, back mutations alone are not typically a serious problem in the "genealogical time frame" if they are
mostly have less than 10% impact.

(Anatole) You got it right.

If fact, a choice of 25 years per generation causes NO error whatsoever, since for 30 years per generation, or ANY other number the mutation rate should be just adjusted. The final TMRCA will be exactly the same.

>(John) GENERATION TIME/ MUTATION RATE CALIBRATION: When I said your calibration bears no relationship to father/ son mutation study data, perhaps I did not word that very well. What I meant was that you "did not use" father son study data to arrive at your calibration, and that appears to be correct.

(Anatole) Yes, it is correct. Because in fact I saw right away that I got the same numbers as those around of father-son data. Except the father-son data are all over the place. The accuracy is not there, and on an obvious reason - too few mutations there in a large number of father-son pairs. Those data cannot be used for TMRCA calculations. However, they are very valuable, since provide a kind of a "mental comfort". They showed that my calibration was principally correct.

>(John) However, it appears that your calibration of mutation rates/ generation times is probably coming up with similar average results to father son
studies. So although your process is different and does not rely on father/son studies, your mutation rate would be similar (if adjusted for a 30 year
generation time). I have not checked them in detail, but they do look generally similar.

Generally, yes. However, if you make a table with all father-son data, and there are not many of them, your will see that they cover quite of a range, from 0.0013 to 0.0040 (a ballpark values). Some of them are clear outliers, and the core is around 0.0020 mutations per haplotype per generation. You do not need to adjust it for a 30 years per generation (is it a kind of a religious number? Why not 29, or 33, or 35, or 27, or whatever?), since my "calibrated" (with 25 years per generation, which is a fixed, a "mathematical" number) are in the same ballpark, but more accurate ones.

>(John) If you used 700 year pedigrees like the MacDonald one to calibrate your system, on the one hand you might have some advantages in that if there are
any non random aspects to occurrence and survival of mutations, your system would automatically make allowance for them. But on the other hand, perhaps
we don't know for sure if all of the haplotypes in the MacDonald project "descend from" the clan founder.

If I would have blindly restricted myself with MacDonald haplotypes and the Lord John story (and the respective dates), you would be right. However, I did not. I did multiple verifications with other systems and datasets. Just a simple example, one of many. When I have obtained - on the first (about) 60 MacDonald haplotypes - the average mutation rate constant of 0.022 mutations per 12-marker haplotype per generation, I took a look an John Chandler's table. It came up as 0.02243 mutations per haplotype. The difference of less than 2%. This is, of course, well within a margin of error. This showed that my calibration at least did not conflict with John's data on the first 12 markers.

However, lately MacDonalds added many more people to their table, who brought many more mutations, and the TMRCA immediately went deeper in time. From the initial 650 years bp (to John, presumable) it went to about 800 ybp. So now it cannot be used for calibration anymore per se. However, it is already calibrated and verified.

>(John) It is interesting to have your different approach to compare to other approaches.

It makes two of us. However, some other folks do not share your (and mine) attitude. They follow some kind of a negative pattern, which essentially is (a) You shall not compare, (b) If you compare, if you wrong, and (c) If you do not compare, you are wrong anyway.

>(John) In the example of mine which you looked at, you said you counted a 4 step difference on one haplotype at DYS607 as a single 4 step mutation.

(Anatole) Not exactly. I do not do things like that. I have calculated both variants, with 4 and with 1 mutation, and showed that the difference is within the margin of error. However, it was obvious that it was a single 4-step mutation, since the rest of the long haplotype was the same and of the other long haplotypes from the dataset.

>(John) However, given that I think this person may be the most distantly related Marsh to all of the other Marshes, it is not impossible that his 4 step difference on
the very fast mutating marker is the result of either 1, 2, 3, 4, 6, or more separate mutations.

(Anatole) John, you are making the same mistake as before. Do not rely on your "eye" and the "given that I think" stuff, when you try to sort out haplotypes. You are going to fail. And you did fail in this particular case. You have divided your dataset wrongly, I have commented on it earlier. This person (named D in your dataset) is not "the most distantly related" at all. It shares his mini-branch with "B" and "C" which are equally "distant" (in fact, close).

>(John) Hypothetically, this arbitrary decision to count it as one could contribute to a margin of error. You arbitrarily decided to read a 4 step difference as one mutation, when it could possibly be 4 individual steps, or even 6. You are probably right, it may be one single 4 step mutation, but if you are wrong, then you may have underestimated the mutation count by up to 3 or 5 mutation events.

(Anatole) I repeat, I have considered both cases. Did you miss this part in my comment on your data?

It would have been VERY unlikely for the haplotype to make all the way with four consecutive mutations, being the only member of the extended family with "14" in that locus, what other have only 18 and 19 in the same locus. Where are 15, 16, 17 and their offspring? Furthermore, with 4 on-step mutations in one locus there must be plenty of other mutations in the haplotype. There are none of them, which would make it different with other haplotypes. Finally, in that situation I cannot exclude an erroneous typing in the locus. Did you ask the person to repeat his test?

>(John) In the case of back mutations, you say you do not count them because you can't see if they occurred or not.

It is incorrect, and its is a misunderstanding. I do not count them in the first 26 generation because their contribution is negligible, not because I do not see them. If do not see them because they do not exist in terms of their contribution. What I said is that - why do you insist on their existence when you do not see them anyway? On what ground you believe that they "do exist". A shear belief? A kind of religion? Because someone told you that?

A back mutation is a mutation you do not see anyway. When you see "13" how do you know that it is back mutation in a first place? Back mutation would be 12-->13 and 13--> 12 again. You can stare at all 12s in the dataset all day long, but you cannot possibly know which 12 is the original one, and which is a returned one. So I wonder when people here say that they "see" back mutation, how do they manage to "see" them??

I "see" back mutations ONLY mathematically. And the math tells me - "forget about them in the first 26 generations".

> (John) Regarding back mutations, if you study a cluster of related haplotypes
closely, you can sometime prove that a back mutation or parallel mutation
has taken place.

(Anatole) Wow! Disclose your secret, please. How do you see them? How can you "sometimes prove"??

>(John) So it is not quite as you said in a previous post that a back mutation can't be counted because you can't see evidence it occurred.

(Anatole) Please read again above.

>(John) If I could prove to you, as I may eventually be able to do, that 2 mutations on the example I gave you to analyze were back mutations, would you still
refuse to count them, even if I could prove they had taken place?

(Anatole) Prove, please. I would love to learn something VERY unusual (whispering to a side: no chance with those "back mutations"...)


Anatole Klyosov

This thread: