GENEALOGY-DNA-L Archives

Archiver > GENEALOGY-DNA > 2010-02 > 1265971050


From: "Alister John Marsh" <>
Subject: Re: [DNA] Variance Assessment wrt back and parallel mutations..
Date: Fri, 12 Feb 2010 23:37:30 +1300
References: <mailman.4304.1265934440.2099.genealogy-dna@rootsweb.com><346B4D4C02AB404A8447FADABA616660@anatoldesktop>
In-Reply-To: <346B4D4C02AB404A8447FADABA616660@anatoldesktop>


Anatole,

Unfortunately I don't have time to reply in detail to your posting below.

I am not sure we are understanding each other fully yet. I am not saying
you are wrong, just that I don't fully understand your system, and to some
extent you don't seem to be understanding what I have been trying to say.

You seem to be defining/ approaching some things in a different way than I
have been used to, which may be appropriate to your system, but it may have
been a cause of some of my confusion.

Brief summaries of my positions are...

BACK MUTATIONS: My personal experience of finding evidence of back mutations
and parallel mutations moderately frequently in family groups on very fast
mutating markers gives me a "gut feeling" that back mutations and parallel
may contribute some affect in the genealogical time frame. Clearly "gut
feelings" are not recognized as an approved scientific method, but I am
aware of them never the less.

GENERATION TIME: I am not sure what you mean when you refer to
"calibration". Apparently your calibration bears no relationship to data
from father/ son mutation rate studies.

AVERAGE TIME FROM BIRTH OF TEST SUBJECTS IN DATABASE: You did not comment
on this. I think this is a valid point I made. Soon we may be finding DNA
recorded on database like Y-Search which originates from 4,000 year old
archaeological samples. We might have to start paying attention to the date
of birth of the test subjects before harvesting haplotypes from databases,
and attempting to estimate ages of clades etc. Even with the present
situation, average age at birth of the test subjects in a database may vary
from 0 to 90 in extremes, (I have been aware of some 100 year olds being DNA
tested) and as this is mostly a measurable statistic, we can improve
forecasts of TMRCA if we pay attention to it. I have a feeling that many
forecasts by different people have ignored the age of the test subjects, and
assumed them all to be 0 years old.

John.


-----Original Message-----
From:
[mailto:] On Behalf Of Anatole Klyosov
Sent: Friday, February 12, 2010 3:46 PM
To:
Cc: Anatole Klyosov
Subject: Re: [DNA] Variance Assessment wrt back and parallel mutations

From: "Alister John Marsh" <>

>When determining the "mathematical fact" that back mutations
are practically undetectable in the first 26 generations, did you base the
maths on an assumption that all markers have the average mutation rate, or
on the fact that in a mixed set of fast/ slow markers, most of the mutations
are happening on a very small subset of very fast mutating markers?


John,

It does not matter. I am talking about a contribution of back mutations
during a time period (within the first 26 generations), when even "forward"
mutations are rather rare. I am talking on a fraction of back mutations,
which is the same value with slow or "fast" markers, no difference. A
fraction is a fraction. Fast markers produce a lot of "forward" mutations,
and still a small fraction of them will be back mutations.

A contribution of back mutations into a total pool of mutations for each
locus is an exponential function, which is determined by only an average
number of mutations per marker. When this number is small, the contribution
is small. When this number is large (say, after 5,000 years), the
contribution of back mutations is progressively large. All these
contributions are easily calculated using the exponential formula.

>In some marker sets, perhaps 80% of mutations are occurring on 20% of
markers which have the very fast mutation rates.

No problem. When they "have the very fast mutation rates", they have a high
amount of the "forward" mutations, and a fraction of back mutations will be
again negligible. This is in terms of relative values, not absolute ones.

>AVERAGE GENERATION TIME: I have not commented before on this, but I believe
based on various studies 30 years per generation might be a better average
than 25 which you allow. This would push your estimates perhaps 20% further
back in time.

Here we go again. Please understand a simple formula: N/n = kt, in which N
is a total number of mutations, n is a total number of markers in the same
dataset. This ratio is given, you cannot change it in your dataset. If you
have, say, 1000 of markers and 100 mutations in them, this ratio is 0.1.
Period.

Now, kt is a product of the mutation rate constant (k) and a number of
generations (n). You do not determine them separately when consider
historical event for your calibration. Their product is 0.1 in this example.

Now, suppose I do a calibration, and this 0.1 corresponds to 1000 years. If
I set a generation equal to 25 years (I SET it, do you understand?), 1000
years is 40 generations, by default,
0.1 = k x 40, and k = 0.0025 mutations per marker per generation. It becomes

a calibrate value, for 25 years per generation. It becomes a fixed
"mathematical value".

You suggest to take 30 years per generation. O.K., no problem. In this case
0.1 = k x 33.33, and k = 0.0030 mutations per marker per generation.
However, the 33.33 generations at 30 years per generation is still 1000
years. Nothing is changed.

Someone suggested to use 50 years per generation, and from now on to use it
as a "fixed mathematical value". Fine. In that case 0.1 = k x 20, and k =
0.005 mutations per marker per generation. In that case 1000 years will be
20 generations. It is still the same 1000 years.

Yes someone decided to use 100 years per generation. Fine. That 1000 years
used for calibration become 10 generations only. 0.1 = k x 10, and k = 0.01
mutations per marker per generation.

In other words, your k (mutation rate constant) depends on which length of a

generation you picked. In my system k = 0.00183 mutations per marker per
generation for 25 years per generation (for 12- and 25-marker panels). You
like 30 years per generation better? No problem. You just have to adjust
that 0.00183 and make it 0.00220.

However, the TMRCA (in years) will be exactly the same.

Therefore, the "generation" in my calculations is just a fixed mathematical
value. It looks like a common generation, but it is not. You can use
whatever number of years per generation you like, let it be 10, 30, or 500
years, but you have to adjust the mutation rate constant accordingly.

Therefore, your comment - "I believe based on various studies 30 years per
generation might be a better average than 25 which you allow. This would
push your estimates perhaps 20% further
back in time" is incorrect. The time will stay the same. However, you have
to make a double change: to change the duration of a generation and to
change the mutation rate accordingly. And you will obtain exactly the same
number of years to a common ancestor. Do you really need all that fuss, to
get eventually the same value?

Regards,

Anatole Klyosov



-------------------------------
To unsubscribe from the list, please send an email to
with the word 'unsubscribe' without the
quotes in the subject and the body of the message



This thread: