Archiver > GENEALOGY-DNA > 2010-02 > 1266407779

From: "Anatole Klyosov" <>
Subject: Re: [DNA] TMRCA assessments
Date: Wed, 17 Feb 2010 06:56:20 -0500

More on the subject.

> From: "Alister John Marsh" <>
> Anatole appears to have "calibrated" his system by counting total
> mutations
> in a large family grouping, but doing this he has not been aware of the
> fact
> that some markers have mutation rates close to 100 times faster than
> others
> in the marker set.


Funny again. What makes you think that "he has not been aware"? It seems
that all my explanations did not reach the target.

A calibration requires two things: (1) knowing (or assuming) a time span to
the event, being considered for the calibration, and (2) a total number of
mutations that occurred in the haplotype dataset compared to the base
haplotype. Oh, yes, and a proof that both linear and the logarithmic methods
give the same value (for that you do not need a mutation rate constant),
that is M/G = ln(G/n). Here M = a number of mutations in the dataset, G is a
number of haplotypes in the dataset, and n is a number of base haplotypes in
the dataset.

Where do you here that I was "aware" or "not aware"?

O.K., continue. As a first try I took the Donal Clan haplotypes (I have
verified data obtained by many other "calibrations" later). There were 60 of
25-marker haplotypes in the Donal Clan series when I did that calibration.
All contained 69 mutations and as many as 18 based haplotypes. As you see,
those base haplotypes were easily identifiable.

Therefore, we have 69/60 = 1.15 mutations per haplotype on average. We have
ln(60/18) = 1.20. The difference between them is only 4%, that is well
within the margin of error. I have reported here not once that the mutation
rate constant for the 25-marker haplotypes is 0.00183 mutations per marker,
that is 0.00183x25 = 0.0046 mutations per haplotype. Hence, we obtain:
1.20/0.046 = 26 generations to a common ancestor for the Clan Donald from
the linear method, and 1.15/0.046 = 25 generations for the logarithmic
method. There is no need for corrections for back mutations in this case,
since it would be within the margin of error. Calculations of the margin of
error give 26+/-4 generations and 25+/-4 generations in the both cases. With
25 years per generation it gives 650 and 625 years respectively, that is
1360 and 1385 year AD for the common ancestor of the Donald Clan. Ask Doug
if those figures make sense. For the record, John Lord of the Isles died in
1386, according to the Donalds.

I hope, now you understand better what "calibration" means, and why your
(true) statement "the fact that some markers have mutation rates close to
100 times faster than others in the marker set" is irrelevant here. All
those markers are included here, fast and slow.

To make a long story short, let's move to 67-marker haplotypes which include
all those fast (and slow) markers you are talking about. There were only 26
haplotypes of that length at the time of my calibration. They all contained
98 mutations, and, of course, there were no base haplotypes in them.
However, we already know that there was only one common ancestor for them.
We have 98/26/0.145 = 26 generations to the common ancestor. The same as
above. 0.145 mutations per 67-maker haplotype is the average mutation rate
constant. By dividing it by 67, we get 0.145/67 = 0.00216 mutation per
marker in the 67-marker haplotypes. For the 37-marker series the mutation
rate constants are 0.09 per haplotype, and 0.09/37 = 0.00243 per marker.

That is what the calibration is.

To show how it works, even in small datasets, here is my recent calculation
(in fact, a couple of days ago) for a participant of this Forum. He sent me
a dataset, and asked for TMRCA estimates. It came out as:

for the 25-marker panel (mutation rate constant = 0.046 per haplotype, that
is 0.00183 per marker per generation) -- 54 generations (25 year each)
uncorrected for back mutations, or 57 generations (corrected), that is 1425
years to a common ancestor.

for the 37 marker panel (mutation rate constant = 0.009 per haplotype, that
is 0.00243 per marker per generation) -- 56 generations uncorrected --> 60
generations corrected = 1500 years to a common ancestor

for the 67 marker panel (mutation rate constant = 0.145 per haplotype, that
is 0.00216 per marker per generation) -- 55 generations uncorr. --> 58
generations corrected = 1450 years to a common ancestor.

Best regards,

Anatole Klyosov


> As I see, the misunderstanding continues. I was talking not on "back
> mutation was improbable in the genealogical time frame", but on a
> contribution of those "back mutation" events into a total pool of
> mutations in the genealogical time frame. Therefore, I was talking on a
> negligible effect of back mutations in TMRCA calculations in the first 650
> years, and on a very small contribution of back mutations in the first
> 2000 years. By "a very small contribution" I meant (and defined) that this
> contribution is within the margin or error in TMRCA calculations.
> You continue talking on like "it might happen". Of course it might happen.
> If it happens once on a background of 100 other mutations, with its 1%
> contribution, this contribution would not effect the TMRCA.
> The same confusion was with "parallel mutations". You (and others)
> apparently meant that those mutations were useful for identification of
> close relatives. Who argues with that? However, I have asked how many of
> those "parallel mutations" might have happened in those 509 of 67-marker
> L21 haplotypes we discussed earlier, and how they might "distort" the
> TMRCA calculations, and I have not seen an answer. The likely answer is
> "they would not distort, since they would have been counted as any other
> mutations, unless they form a distinct branch. In that case the branch
> would be analyzed separately".
> This probably is a core of many of our mutual misunderstandings. I say
> "they do not affect the TMRCA". You (and others) say "but they are useful
> for family studies". That is fine and correct. However, there is no
> conflict whatsoever between the two statements.


>>If there were chances of 3.5 mutations on CDYb between 9 haplotypes,
> 100 mutation opportunities on each marker in the group) that is about 30%
> chance of a back mutation occurring in a single individual on that marker.
> Further, there is also a 30% chance of a back mutation one of the 9
> individuals on CDYa. That is only 2 markers out of 37, and we have in
> total
> about 60% chance of a back mutation in that set of 9 haplotypes in
> approximately a 330 year period. If all 37 markers were considered, the
> chances of a back mutation would increase. If this is the case, back
> mutations might be more of an issue in the genealogical time frame than
> Anatole suggested. And this is not even considering the considerably
> increased chances of parallel mutations.
> I think the issue which Anatole has not taken into consideration, is that
> most of the mutation activity takes place on a small group of the very
> fast
> mutating markers in the 37 marker set, and because of this the chances of
> back mutations are greater than if all markers had the same mutation
> rates.
> Anatole appears to have "calibrated" his system by counting total
> mutations
> in a large family grouping, but doing this he has not been aware of the
> fact
> that some markers have mutation rates close to 100 times faster than
> others
> in the marker set. It has been this imbalance of mutation rates which I
> was
> trying to bring to Anatole's attention, and why I asked him in my first
> posting on the issue...
> "BACK MUTATIONS: When determining the "mathematical fact" that back
> mutations are practically undetectable in the first 26 generations, did
> you
> base the maths on an assumption that all markers have the average mutation
> rate, or on the fact that in a mixed set of fast/ slow markers, most of
> the
> mutations are happening on a very small subset of very fast mutating
> markers?"
> I have been concerned about this imbalance for a long time. I wondered if
> it would impact at all on the "variance of variance' calculations to
> determine TMRCA, but I was unable to understand the maths enough to check
> myself. I had presumed that Ken and others had allowed for individual
> mutation rates of individual markers in the variance calculation, rather
> than just doing the calculations based on all markers having average
> mutation rates. Is that assumption correct? Or does having a mixed set
> of
> fast and slow markers not affect the variance calculation?
> In Tim's series of TMRCA calculations using variance, he consistently
> seemed
> to get different results for faster markers than slower markers. Could
> this
> be evidence that in mixed sets of fast/ slow markers, there is an effect
> from the mixture which is not being mathematically allowed for?
> John.

This thread: