GENEALOGY-DNA-L Archives
Archiver > GENEALOGY-DNA > 2003-10 > 1065124024
From:
Subject: Re: [DNA] Re: Mutation Calculation
Date: Thu, 2 Oct 2003 15:47:10 -0400 (EDT)
References: <184.210ba823.2cadaa02@aol.com>
In-Reply-To: <184.210ba823.2cadaa02@aol.com> (Dolmenx@aol.com)
Len wrote:
> I have an interesting situation in that my own genealogy is well-documented
> by the traditional means for 55 generations in the Irish family of O'CAHAN, a
> sub-branch of the UiNEILL dynastic line. The earliest ancestor, arguably
> supported by authoritative scholarship, is King Conn "of TheHundred Battles" (2nd
> C. AD).
55 generations! This is getting into the area where statistics can
actually be meaningful. Some cautions are in order, though. Read on...
> At least 20% of these matches seem to correspond with sub-septs of Clan
> DONALD in Ulster, in particularly in Co. Antrim. This is normally considered a
> "Scottish" clan (It is, of course!), but the significant thing is that Clan
> DONALD, like O'CAHAN, also claims male-line descent from King Conn (2nd C.)!
First caution: don't treat "earliest known ancestor" as equivalent to
"most recent common ancestor". It matters a great deal exactly when
the DONALD and O'CAHAN lines split.
> My question involves the accuracy of the 25-Marker Y-test where the time
> frame of "genealogical interest" is 1800 years rather than the usual 300-400
> years. Specifically, how many mutations can I "expect" if indeed I assume there
> is a descent from a MRCA 1800 years ago?
>
> I have used Ann Turner's on-line Mutation Calculator, entering the following
> data. I need to know if I am missing or misunderstanding something -:
> ...
> 5.5 = EXPECTED # of mutations.
Second caution: this expected "number of mutations" is actually the
expected norm-square of the genetic random walk, i.e., the sum of the
squares of marker-by-marker differences between two samples. The
distinction doesn't matter a whole lot until you get beyond a few
centuries, but it matters more and more the further back in time you
go. Every time you increase the number of mutations, you increase the
likelihood that two or more of them will land on the same marker.
Half of the coincident pairs will cancel out. To make this explicit,
consider a 20/25 near-match. Suppose two of the differences are
two-step and three are one-step (for a total of five differences).
FTDNA would report this as a distance of 7 (2+2+1+1+1), but the
effective distance is actually 11 (4+4+1+1+1).
NOTE: I hope you have not fallen into the pitfall of reporting
near-matches by subtracting FTDNA's reported distance from 25.
Never do that.
> Probability of Observing mutations: .15582 for 4
> mutations; .17140 for 5 mutations; and .15712 for 6 mutations.
Case in point. With 6 mutations, the chances are only about 1/2 that
they will all fall on different markers.
> Though I have a small sample thus far, it seems that 5.5 expected mutatuiions
> in a 55 genetration lineage may corespond quite well to what I am seeing.
Third caution: even though the statistics are starting to behave at
this stage, you are still facing broad uncertainties and selection
effects. With 106 12/12 matches, you can get a statistical feel for
how many near matches are DONALDS, but with only one 23/25 and one
20/25 match, you are still in the "sample-of-one" regime.
This leads to the fourth and final caution: there are widely varying
opinions on the rate of non-paternity in documented lines. Suppose
the rate is 0.01 -- the chances of actually being biologically
related to someone who "shares" a MRCA 55 generations back is only
1/3 (in the sense of having the same Y chromosome).
John Chandler
This thread: