GENEALOGY-DNA-L Archives

Archiver > GENEALOGY-DNA > 2010-02 > 1265934424


From: "Alister John Marsh" <>
Subject: Re: [DNA] Variance Assessment wrt back and parallel mutations
Date: Fri, 12 Feb 2010 13:27:04 +1300
References: <1197787204.2554981265915596096.JavaMail.root@sz0002a.westchester.pa.mail.comcast.net><725621759BD14562850C4D49F01981A4@john><3b2a446a1002111506o658ea989sa6433722915f15e0@mail.gmail.com>
In-Reply-To: <3b2a446a1002111506o658ea989sa6433722915f15e0@mail.gmail.com>


Sasson,

You said...
>>>>>>>
1) Regarding the issue of "back/parallel mutations", I think Anatole's
methods do not simply ignore the issue, but account for it in some
simplified way, so that for 800 years or so there is no problem, and for
longer periods there is a "correction formula". So it seems to me, you
cannot simply "add 10%".
<<<<<<<

I was essentially asking the question of Anatole, did he calculate that back
mutations were insignificant on the assumption that all markers had equal
mutation rates, or did he allow for the fact that most mutations occur on a
small subset of very fast mutating markers. I await Anatole's response to
that.

If he has allowed for the fact that most mutations occur on a subset of
markers that mutate about once every 50 transmissions, then Anatole is
probably right that back mutations are seldom significant in shorter time
frames.

I was considering things in generalities. If Anatole has not allowed for
most mutations being on the subset of markers which mutate every 50
transmissions, then a 10% adjustment might roughly apply to some
genealogical time frames, for some marker sets.


You said
>>>>>>>
2) You also mention the issue of AVERAGE GENERATION TIME.

Mutation rates are calibrated with a certain assumption about generation
length, say 30 years.
So the "mutation rates" not literally for generation, but rather for 30
years. So at least for this part of your objections there is a SIMPLE
straightforward answer. You cannot add 20% because of this reson, because it
was "thought through" in advance.
<<<<<<<

I think Anatole is saying that if the common ancestor was probably 10
generations ago, based on the number of mutation events found, then it would
equate to 250 years if a generation time was on average 25 years.

The TMRCA calculations are essentially based on mutation rates, and mutation
opportunities, which is generally birth events. But we often for simplicity
like to express TMRCA in years rather than generations.

If my assumptions about this are correct, then studies which show generation
times (in Y-DNA lines) to be typically 30 years or sometimes more, would
indicate Anatole has possibly been about 20% short by allowing 25 years as
he does.


You said...
>>>>>>>
3) As for the age of participants, let's say it is indeed 60 instead of
(apparently) presupposed 30. For 26 generations it is less than 4% (you use
up all 26 generations to get the 10% for back mutation).
So even if you were right about the "back mutations", you would still only
get up to 14% for 800 years (and even less for younger estimates) - not even
close to 50%.


*When a cluster is "symmetric and homogeneous", the robust calculation
methods Anatole exports from Chemistry work well*. (It is "dynamics", not
the college-level statistics). *If some "symmetric and homogeneous"
sub-clusters are there, they must be excluded and considered separately*.

In addition, to farther reduce the probability that some relevant bad
sub-cluster was spaced-out, the "logarithmic formula" is employed.
<<<<<<<


If the average test subject age is 60 years, it places the TMRCA 60 years
earlier than if test subjects had an average age of 0 years. Some DNA
studies are done on new born babies, and some on 90 year olds. When
estimating TMRCA this needs to be allowed for when evaluating the data set.
A 60 year difference in average age of participants in a dataset would
change the result by 60 years. 26x30=780 years. If you add 60 Years, is
7.7% of the estimate.

But for many genealogical time frames, in cases of 10 generations to MRCA, a
60 year difference added to 300 years is 20%. In the case of my example
which Anatole analyzed, he gave an estimate of 100 years to MRCA of two
persons. One of those persons was born about 90 years ago, and is no longer
living. The age of the other I am unsure of, but would guess middle aged+.
If the estimated TMRCA in this case was 100 years from the present, it would
almost make these two the same person, or brothers. But if the tests were
new born babies, then 100 years would be 3 or 4 generations back to common
ancestor. For younger TMRCA estimates, the significance of the age of test
subjects becomes more relevant, and a higher percentage.

If it is determined that the discovered mutations would statistically
correspond to 10 mutation opportunity events, you have the birth of the
founder, 30 years, the first mutation opportunity, 30+30 years from founder
to the second mutation opportunity.... etc etc,
30+30+30+30+30+30+30+30+30+30+30+30=300 years to the 10th mutation
opportunity, which is the birth of the test subject. If the test subject is
100 years old, then you would add 100 to get the time from the MRCA to the
present. That is 33% increase in estimate when you add on the test
subject's age.


John.





This thread: