GENEALOGY-DNA-L Archives

Archiver > GENEALOGY-DNA > 2010-02 > 1265942783

From: "Anatole Klyosov" <>
Subject: Re: [DNA] Variance Assessment wrt back and parallel mutations
Date: Thu, 11 Feb 2010 21:46:23 -0500
References: <mailman.4304.1265934440.2099.genealogy-dna@rootsweb.com>

From: "Alister John Marsh" <>

>When determining the "mathematical fact" that back mutations
are practically undetectable in the first 26 generations, did you base the
maths on an assumption that all markers have the average mutation rate, or
on the fact that in a mixed set of fast/ slow markers, most of the mutations
are happening on a very small subset of very fast mutating markers?

John,

It does not matter. I am talking about a contribution of back mutations
during a time period (within the first 26 generations), when even "forward"
mutations are rather rare. I am talking on a fraction of back mutations,
which is the same value with slow or "fast" markers, no difference. A
fraction is a fraction. Fast markers produce a lot of "forward" mutations,
and still a small fraction of them will be back mutations.

A contribution of back mutations into a total pool of mutations for each
locus is an exponential function, which is determined by only an average
number of mutations per marker. When this number is small, the contribution
is small. When this number is large (say, after 5,000 years), the
contribution of back mutations is progressively large. All these
contributions are easily calculated using the exponential formula.

>In some marker sets, perhaps 80% of mutations are occurring on 20% of
markers which have the very fast mutation rates.

No problem. When they "have the very fast mutation rates", they have a high
amount of the "forward" mutations, and a fraction of back mutations will be
again negligible. This is in terms of relative values, not absolute ones.

>AVERAGE GENERATION TIME: I have not commented before on this, but I believe
based on various studies 30 years per generation might be a better average
than 25 which you allow. This would push your estimates perhaps 20% further
back in time.

Here we go again. Please understand a simple formula: N/n = kt, in which N
is a total number of mutations, n is a total number of markers in the same
dataset. This ratio is given, you cannot change it in your dataset. If you
have, say, 1000 of markers and 100 mutations in them, this ratio is 0.1.
Period.

Now, kt is a product of the mutation rate constant (k) and a number of
generations (n). You do not determine them separately when consider
historical event for your calibration. Their product is 0.1 in this example.

Now, suppose I do a calibration, and this 0.1 corresponds to 1000 years. If
I set a generation equal to 25 years (I SET it, do you understand?), 1000
years is 40 generations, by default,
0.1 = k x 40, and k = 0.0025 mutations per marker per generation. It becomes
a calibrate value, for 25 years per generation. It becomes a fixed
"mathematical value".

You suggest to take 30 years per generation. O.K., no problem. In this case
0.1 = k x 33.33, and k = 0.0030 mutations per marker per generation.
However, the 33.33 generations at 30 years per generation is still 1000
years. Nothing is changed.

Someone suggested to use 50 years per generation, and from now on to use it
as a "fixed mathematical value". Fine. In that case 0.1 = k x 20, and k =
0.005 mutations per marker per generation. In that case 1000 years will be
20 generations. It is still the same 1000 years.

Yes someone decided to use 100 years per generation. Fine. That 1000 years
used for calibration become 10 generations only. 0.1 = k x 10, and k = 0.01
mutations per marker per generation.

In other words, your k (mutation rate constant) depends on which length of a
generation you picked. In my system k = 0.00183 mutations per marker per
generation for 25 years per generation (for 12- and 25-marker panels). You
like 30 years per generation better? No problem. You just have to adjust
that 0.00183 and make it 0.00220.

However, the TMRCA (in years) will be exactly the same.

Therefore, the "generation" in my calculations is just a fixed mathematical
value. It looks like a common generation, but it is not. You can use
whatever number of years per generation you like, let it be 10, 30, or 500
years, but you have to adjust the mutation rate constant accordingly.

Therefore, your comment - "I believe based on various studies 30 years per
generation might be a better average than 25 which you allow. This would
push your estimates perhaps 20% further
back in time" is incorrect. The time will stay the same. However, you have
to make a double change: to change the duration of a generation and to
change the mutation rate accordingly. And you will obtain exactly the same
number of years to a common ancestor. Do you really need all that fuss, to
get eventually the same value?

Regards,

Anatole Klyosov