SANDERS-DNA-L Archives

Archiver > SANDERS-DNA > 2006-07 > 1153101114


From: "Justin M. Sanders" <>
Subject: Analyzing DNA results
Date: Sun, 16 Jul 2006 20:51:54 -0500


Below I have an little article I've written on how to analyze DNA
matches in light of other genealogical evidence. There are some tables
that may not line up if you use a variable width font. The tables
should line up properly if you view the article using a fixed spacing
font like Courier.

***Begin***

Interpreting DNA Matches
Justin M. Sanders
July 2006

We have a number of matches in the Sanders Y-DNA project so far, and I
have gotten questions from others regarding the interpretation of the
results, and I've been thinking about my own matches as well. So here's
a discussion of how to use your results.

The analysis will fall into two situations: one where your connection is
undocumented and unsuspected, and the second situation where you have
some evidence of a connection and some suspicion of who your common
ancestor might be. If you fall into the second situation, it would be
good to read through the first situation analysis before going to your
situation.

*Analyzing an Unsuspected Connection*

I will write this assuming that you have a FTDNA account. On your FTDNA
results page, when you click on the "Y-DNA Matches" tab, you are shown
lists of those who match you at the various test levels you may have
done: 12, 25, 37, or 67. To the right of the e-mail address of the
matching person is a little icon that looks like a Y turned on its side.
This icon calls up the FTDNATiP(tm) utility. This utility computes the
probability that you and the person next to whom you clicked will have a
common ancestor within some number of generations.

Here is a sample of the output of the FTDNATiP results of a 12 marker
comparison between Gary Sanders and me (a 12/12 match):

Within 4| Within 8| Within 12| Within 16| Within 20| Within 24
33.57% | 55.88% | 70.69% | 80.53% | 87.07% | 91.41%

This is the default table that gives the results in steps of 4
generations. But if I want finer steps, I can choose the "Display" menu
to display other grouping, including "every generation". If I do that
with Gary and me, I get:

Within 1| Within 2| Within 3| Within 4| Within 5| Within 6
9.72% | 18.5% | 26.42% | 33.57% | 40.03% | 45.86%

However, Gary and I know for sure that our common ancestor is not within
1 or 2 generations (we are not brothers nor first cousins). Indeed, Gary
knows that he is descended from a great grandfather Isaac Sanders (3
generations back), and I know that I am descended a ggg grandfather
Benjamin Sanders, Jr. (5 generations back), and there is no connection
between the lines up to Isaac or Benjamin Jr. Thus we should use Gary's
3 generations and say that we know that the common ancestor is at least
4 generations back. With the table above, based on an exact 12/12 match,
all we do is shift the table to begin at 4 rather than 1.

Within 4| Within 5| Within 6| Within 7| Within 8| Within 9
9.72% | 18.5% | 26.42% | 33.57% | 40.03% | 45.86%

You would do the same for any exact match on FTDNATiP-- just shift the
generation numbers to begin at the earliest possible generation for the
common ancestor.

Now, Gary and I also match 25/25, so if I click on the icon beside his
name at the 25 marker match, I get the FTDNATiP calculations based on a
25/25 match, and again I need to shift the generation number to account
for the fact that we know the common ancestor is no earlier than 4
generations back from Gary. Here's the table for 25 markers:

Within 4| Within 5| Within 6| Within 7| Within 8| Within 9
21.06% | 37.69% | 50.81% | 61.17% | 69.35% | 75.81%

So with the additional markers, we have a much better idea of when the
common ancestor can be. There is a 50-50 chance that the common ancestor
is within 6 generations of Gary-- that is that Isaac and Benjamin Jr
could have been brothers, 1st cousins, or 2nd cousins.

Finally, Gary and I match 36/37, so if I click the icon beside his name
there, I get the FTDNATiP calculations based on a 36/37. Because the
match is not perfect, the FTDNATiP page is a bit different. There is now
a box where I can enter the fact that I know that the common ancestor is
not within 3 generations. Here's the table:

Within 4| Within 5| Within 6| Within 7| Within 8| Within 9
22.56% | 41.82% | 57.24% | 69.09% | 77.95% | 84.43%

Notice that there is not a very dramatic increase in the probabilities
when going from 37 from 25-- this is partially due to the diminishing
returns in the tests, but it's also because we picked up a mismatch on
one marker. Nevertheless, the probabilities are a tad higher across the
board after the 36/37 match.

Now, if Gary and I had no additional genealogical information-- if the
test were all we had (plus knowing that we didn't share a common
ancestor within 3 generations of Gary), we'd be done with analyzing the
results. The results of the test show us that it would be good to look
for a father, grandfather, or great-grandfather that Isaac and Benjamin
Jr. might share, since there is a better than even chance that Isaac and
Benjamin Jr are at least 2nd cousins, and a better than 1 in 5 chance
that they are brothers, based on DNA alone.

However, in Gary and my case, we do have some additional genealogical
information.

*Analyzing a Suspected Connection*

Example 1: A Close Match

Before we did any testing, Gary and I suspected that perhaps Benjamin,
Jr. and Isaac were 1st cousins-- we thought Benjamin Jr. was son of
Benjamin Sr., that Isaac was son of Francis, and that Ben Sr. and
Francis were brothers. It was possible, but less likely that Ben Jr. and
Isaac were 2nd cousins, or 1st cousins once removed. Thus, if we had
assigned probabilities to which generation contained the common
ancestor, we might have done it like this:

Common Ancestor| Common Ancestor| Common Ancestor| Common Ancestor
in Generation 4| in Generation 5| in Generation 6| in Generation 7
20% | 50% | 20% | 10%

These probabilities are subjectively based on our analysis of
traditional genealogical material-- census, probate, family tradition,
etc. The DNA data is just another sort of evidence, but it is special
since it can be handled mathematically as I'll show below.

The question we want to ask is "How does our estimation of the
probability that the common ancestor is in a given generation change
based on the fact that we now have DNA probabilities?" This problem is
dealt with using a relationship called Bayes' Theorem. We call the
estimates of probability before the test the "prior probabilities" and
our adjusted probabilities that include the test data are the "posterior
probabilities". Bayes' Theorem gives a rule for computing the posterior
probabilities from the prior probabilities:

Posterior Probability for Common Ancestor in Generation N = (Prior
Probability for Common Ancestor in Generation N) times (Probabity of DNA
results if Common Ancestor is Generation N) divided by (Sum of all the
products above).

To apply this formula, we need the Probability of the DNA Results if the
Common Ancestor is Generation N (call this PDNA). This can be obtained
by taking the difference in the FTNDATiP Tables between the entry for
generation N and one before (N-1).

So for the 37 marker test above, we would have:

PDNA 4| PDNA 5| PDNA 6| PDNA 7| PDNA 8| PDNA 9
22.56%| 19.26%| 15.42%| 11.85%| 8.86% | 6.48%

Each entry in the above table is the probability of the DNA results
assuming that the common ancestor is in the given generation (4 through
9). You'll note that the highest probability is in Generation 4, but
others have fairly similar probabilities too. We can now use these
probabilties with the prior probabilities given above to compute
posterior probabilities:

| Gen 4 | Gen 5 | Gen 6 | Gen 7
Prior | 20 | 50 | 20 | 10
PDNA | 22.56 | 19.26 | 15.42 | 11.85
Product | | | |
(Prior times | | | |
PDNA) | 451.2 | 963 | 308.4 | 118.5 (Sum=1841.1)
Posterior | | | |
(Product | | | |
divided | | | |
by sum of | | | |
products) | 24.51 | 52.31 | 16.75 | 6.44

The result of the test, then, is to make us think that a common ancestor
in Generation 4 is rather more likely than we had earlier estimated,
that a common ancestor in Generation 5 is slightly more likely, and that
Generations 6 and 7 are somewhat less likely. We really should
concentrate, therefore, on considering Ben Jr and Isaac to be brothers
or 1st cousins.

As events have played out, Gary did a re-analysis of the traditional
genealogical evidence, and the best guess now is that Ben Jr. and Isaac
really are brothers, sons of Benjamin Sr.

*Example 2: A More Distant Match*

The example above is perhaps the best one so far-- a close DNA match and
a reasonably good genealogical trail. As another example, here is a case
where there is a decent genealogical trail, but not such a close DNA
match. Here's the genealogical situation: We think that I am a
descendant of Isaac Sanders, and we have another participant, Tom, who
is descendant of William Aaron Sanders. The genealogical evidence points
to Isaac and William Aaron being brothers, sons of John and Catherine
Nimrod Sanders.

Isaac is 7 generations from me, and William Aaron is 6 generations from
Tom, so we know our common ancestor cannot be within 6 generations. We
expect John is the common ancestor in generation 7, but perhaps Isaac
and William Aaron were 1st cousins, or 2nd cousins, or (very unlikely)
3rd cousins. Our prior probabilities would be something like this:

Common Ancestor| Common Ancestor| Common Ancestor| Common Ancestor
in Generation 7| in Generation 8| in Generation 9| in Generation 10
70% | 20% | 8% | 2%

When we did our 37 marker test, we match 33/37. The four differences are
on two "slow" and two "fast" markers. Here is the FTDNATip table (Isaac
is 7 generations from me, and William Aaron is 6 generations from Tom,
so we know our common ancestor cannot be within 6 generations):

Within 7| Within 8| Within 9| Within 10| Within 11| Within 12| Within 13
5% | 11.26% | 18.48% | 26.31% | 34.4% | 42.41% | 50.09%

As can be seen, with 4 differences, the DNA really prefers more distant
generations-- so that we don't get to a 50% chance until Generation 13
(this is the probability that the common ancestor is somewhere between
Generations 7 and 13). Here is the table of PDNA's from this result:

PDNA 7| PDNA 8| PDNA 9| PDNA 10| PDNA 11| PDNA 12
5% | 6.26% | 7.22% | 7.83% | 8.09% | 8.01%

Note that the maximum probability for the DNA is in Generation 11, while
the traditional genealogical data places a much higher probabilty on
Generation 7. Note also that unlike the case of Gary and me, where the
DNA probabilities fell off dramatically from the maximum, here the DNA
probabilities are all rather similar.

Applying Bayes' Theorem as before:

| Gen 7 | Gen 8 | Gen 9 | Gen 10 |
Prior | 70 | 20 | 8 | 2 |
PDNA | 5.00 | 6.26 | 7.22 | 7.83 |
Products | | | | |
(Prior times | | | | |
PDNA) | 350 | 125.2 | 57.76 | 15.66 | (Sum= 548.62)
Posterior | | | | |
(Product | | | | |
divided by | | | | |
sum of | | | | |
products) | 63.80 | 22.82 | 10.53 | 2.85 |

Here the probability for a common ancestor in Generation 7 is still
substantial, but is reduced a bit, the probabilities for the other
Generations are increased slightly. But it should be noted that the
posterior probabilities are really not significantly different from the
priors-- this is because the DNA probabilities were all about the same.

Our conclusion is that the DNA test has not significantly changed our
original estimation of the probabilities-- we still think it most likely
that Isaac and William Aaron were brothers and less likely that they
were 1st or 2nd cousins and much less likely that they were 3rd cousins.
It is also important to recognize that the test, while not a perfect
match, still indicates a fair degree of relatedness.

***End***


This thread: