**GENEALOGY-DNA-L Archives**

From:Subject:[DNA] MRCA calculatorDate:Fri, 14 Dec 2001 13:40:59 ESTIn a message dated 12/10/01 9:33:27 AM Pacific Standard Time,

writes:

>

> How, for dummies, does degree of matching results convert to time of MRCA ?

>

> I presume statistical - year range with a certain degree of confidence. How

> is this year range affected by number of test sites. Could results from

>

Yes, it's VERY statistical <g>. I have written a small program which will

calculate the number of generations to the Most Recent Common Ancestor

(MRCA). It gives the median (that is, there is a 50-50 chance that two people

will find their common ancestor within that number of generations) and the

95% confidence interval (that is, 95% will find their common ancestor within

that range). The 95% CI covers a very wide range because rare events are

inherently unpredictable.

http://members.aol.com/dnacousins/MRCA.exe

Warning -- the following paragraphs with technical background may make your

eyes glaze over. If so, just read it once over lightly but do persevere to

the conclusions.

The calculator is based on a method outlined by Bruce Walsh. (In his credits,

he mentions that he was prodded to write the paper by Bennett Greenspan of

Family Tree DNA.) The full text of his paper is available online at

http://www.genetics.org/cgi/reprint/158/2/897.pdf.

Many of the equations and technical details are beyond me, but I think it's

worthwhile for everyone to look at the summary and the graphs and tables in

the paper. Table 1 lists the number of generations to the MRCA for two people

with varying number of mismatches out of 5/10/20/50/100 markers, assuming a

mutation rate of .002 per locus.

Walsh poses some objections to the MLE (Maximum Likelihood Estimate or Most

Likely Estimate) method used by Family Tree DNA and Oxford Ancestors. The MLE

gives you the mode, that is, the single most likely value, but it doesn't

convey the wide range of possible values. Also, when you apply MLE to find

the common ancestor (vs predicting the percentage of descendants with

mutations), the highest percentage of two samples which match will be found

in zero generations. That is to say, you match yourself!

Instead, Walsh uses a branch of statistics called Bayesian analysis, which

takes into account what you already know or can assume; in this case, prior

knowledge about populations. Walsh's chief assumption is that the population

base consisted of at least 250 people!

My husband (also known as "MathMan" in my household) wrote out the solutions

for definite integrals for 0, 1 and 2 mismatches (from equation 12 in the

paper). The solutions are complex polynomial equations which my program

solves by a trial and error method. Walsh mentions that he used a symbolic

algebra program called Mathematica.

Using my calculator, you can enter any mutation rate and number of markers,

so it is a supplement to Table 1.

One thing to note is that changing the mutation rate really affects the

outcome. The value of .002 is based on a paper by Heyer, written in 1997. The

sample used in that paper was 42 men descended from 12 founding fathers, with

a total of 213 generations, so you can see we genealogists could augment that

number considerably!

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&;

list_uids=9158156&dopt=Abstract

Since that time, other articles have found somewhat higher mutation rates,

closer to .003. In the general way of things, that wouldn't seem like a

signficant difference, but it greatly affects the final results of the

calculator.

Also, it should be noted that the mutation rate is averaged over several

markers, and the calculator doesn't take into account the fact that different

markers might have different mutation rates. Walsh's paper gives equations to

use if mutation rates are known for each marker, but we're not at that level

of refinement yet. He also has an extensive discussion on how to handle the

possibility of parallel mutations, back mutations, and multiple mutations at

one marker. For now, I think it's best to count a two-step change (e.g. from

14 to 16 repeats) as two separate mutations.

----

Conclusions:

1) More markers are better. And yes, it's possible to use more than one

company to expand the number of markers.

2) The range of possibilities for finding the MRCA is still very broad, even

with many markers. Don't focus too much on the single value of the MLE or

median.

3) Estimates of mutation rates are based on small sample sizes and subject to

change with more data.

4) Don't be unduly discouraged by #2. All of these calculations assume that

the two people in question are randomly selected. Surname projects (or two

people who have ancestors who lived in the same time frame and locality) are

"biased" samples which stack the odds in favor of finding the MRCA sooner.

But we don't have methods for quantifying that yet.

Ann Turner

GENEALOGY-DNA List Administrator

http://lists.rootsweb.com/index/other/Miscellaneous/GENEALOGY-DNA.html

**This thread:**

**[DNA] MRCA calculator by**- Re: [DNA] MRCA calculator by "Allan S. Gleason" <>

- [DNA] MRCA calculator by Patrick Guinness <>
- Re: [DNA] MRCA calculator by "Allan S. Gleason" <>