GENEALOGY-DNA-L Archives

Archiver > GENEALOGY-DNA > 2010-02 > 1265925877

From: "Alister John Marsh" <>
Subject: Re: [DNA] Variance Assessment wrt back and parallel mutations
Date: Fri, 12 Feb 2010 11:04:37 +1300
References: <1197787204.2554981265915596096.JavaMail.root@sz0002a.westchester.pa.mail.comcast.net>

Anatole,

You said
>>>>>>>
I have already told here that in the first 26 generations "back mutations"
are practically undetectable. This is a "mathematical fact". There is no
room in the first 26 generations for a mutation not only to occur, but to
return as well. A probability for such a double event is less
than minimal. Between 26 and 46 generations (650-1150 ybp) back mutations
"add" 1-2 generation to a TMRCA , which is within a margin of error anyway.
Even until 2,000 ybp a correction is within a margin of error. So, forget
about back mutations for family studies.
<<<<<<<

BACK MUTATIONS: When determining the "mathematical fact" that back mutations
are practically undetectable in the first 26 generations, did you base the
maths on an assumption that all markers have the average mutation rate, or
on the fact that in a mixed set of fast/ slow markers, most of the mutations
are happening on a very small subset of very fast mutating markers?

In some marker sets, perhaps 80% of mutations are occurring on 20% of
markers which have the very fast mutation rates. Because most of the
mutations are occurring on a few markers, there are greater chances of back
mutations and parallel mutations than if the marker set included all markers
with exactly the same moderate or slow mutation rate.

Some of the fastest markers mutate on average about once very 50
transmissions. If there were 26 generations back to common ancestor, there
would be 52 mutation opportunities between two individuals. It would
therefore not be improbable that for some of these fastest markers, there
would be no mutations, but others might have two mutations. If there are
two mutations between two individuals on a particular marker, there is a 50%
chance that the second mutation will be either a back mutation or a parallel
mutation.

AVERAGE GENERATION TIME: I have not commented before on this, but I believe
based on various studies 30 years per generation might be a better average
than 25 which you allow. This would push your estimates perhaps 20% further
back in time.

If back mutations should start impacting more than 10% on the estimates,
then that combined with a 20% adjustment for generation time starts to get
to be a noticeable figure.

AGE OF PARTICIPANTS: In my surname project, I have not calculated the
average age of participants who are DNA tested, but for some families, the
age is "getting up there" a bit. My latest participant to get results is
over 90. Many whose results are used in calculations were born over 90
years ago, and have died since the project started. Some of the haplotypes
on databases which we use were tested perhaps 10 years ago, and may have
been old when tested.

In families, I normally recommend that the eldest male be tested, even if
there are younger males living, so there is a bias towards tests results
being for older persons. In other situations, it is a middle aged fanatical
female genealogist (bless her) driving the project, who gets her elderly
father to supply a DNA sample. Another bias to the DNA test subject being
relatively old.

If you look at a data set, such as the one I gave you a few days ago, about
60+% of the persons tested were born more than 70 years ago, several of
those born around 90 years ago. Only a couple were in their 40s. It would
be interesting to work out the average age, but I guess it would be between
60 and 70 years.

When a calculation comes up with say 10 generations back to common ancestor,
that might be best described as 10x30 years from the birth of the common
ancestor, plus the average number of years back to the birth of the test
subject set, which might be as much as 70. So for 10 generations, that
might be 300 years plus 70, or 370 years before the present. This allowance
for the typically advanced age back to birth of the test subjects dataset
therefore may add more than 20% to the time to MRCA.

Hypothetically, if we are adding 20% by using 30 year generations instead of
25, and perhaps 10% for back/ parallel mutations, and up to 20% for the
adjustment for birth year of test subjects, that is getting up towards 50%
adjustment. That is a time which starts to look significant. Especially if
the margin of error is estimated to be less than 50%. The statistical
margin of error is not to allow for an older generation time, or back
mutations, or older age of participants. Perhaps the estimate when
expressed in years is 50% out before the uncertainties for margin of error
need to be considered.

For longer time periods, it is possible that some non random elements on
occurrence and survival of mutations may also start to become significant.
There is evidence that some markers may not mutate randomly, but my mutate
or survive around a sweet spot.

John.