GENEALOGY-DNA-L Archives
Archiver > GENEALOGY-DNA > 2005-01 > 1105644685
From: (John Chandler)
Subject: Calculating the observed mutation rate [was: Re: [DNA] Male Line Specific Y-STR Average Mutation Rates...]
Date: Thu, 13 Jan 2005 14:31:29 -0500 (EST)
References: <41E5775C.7060302@kerchner.com> <REME20050112155719@alum.mit.edu> <41E59CA9.4020303@kerchner.com> <REME20050112183553@alum.mit.edu> <41E5BBC9.8020603@kerchner.com> <REME20050112203830@alum.mit.edu> <41E5DB49.7020206@kerchner.com> <REME20050112224525@alum.mit.edu> <41E60555.1080203@kerchner.com>
In-Reply-To: <41E60555.1080203@kerchner.com> (message from Charles on Thu, 13Jan 2005 00:21:25 -0500)
Charles wrote:
> Sorry for any confusion this may have caused and I look forward to your
> edited/corrected new message. When you send the new one, I would
> appreciate it if you deleted the old messages from the archives which
> have the data wrong for my project.
Well, I was going to let it ride, but Charles' plea to get rid of the
wrong data won me over. In this third edition of the memo, I have
removed the term "triangulation" and inserted instead the explicit
reasoning that demonstrates how four discrepant test subjects represent
two mutations (not four and not one). Therefore, everyone may want to
reread the (NEW AND IMPROVED) discussion of panel 2. I also added a
brief definition of Henry numbers after the first mention.
CALCULATING THE OBSERVED MUTATION RATE WITHIN A PROJECT
The first step is to count up the mutation opportunities. For
simplicity, let's assume that all testees have the same set of
markers tested. That way, we need only count the number of
transmission events and then multiply by the number of markers.
The counting can be done either from a graphic picture of the
family tree or from a list of the Henry numbers of all the test
subjects. Graphically, we would simply count all the nodes on
the tree below the level of the MRCA (assuming the tree includes
only the testees and their lineages!). Here's an example of
working with Henry numbers.
[For those unfamiliar with Henry numbers, a hint: each person's Henry
number is composed of his/her lineal parent's number plus one more
"digit" to denote the child's place among his/her siblings.]
The technique is to start by inspecting the whole list and chopping
off everthing at the "head" of each number that is shared among the
whole set. In the example below, each number begins "11" and so the
first two digits of each number must be ignored. The next step is to
look at each Henry number in turn and count the number of generations
on the "tail" of the number that are not shared with any Henry number
already examined. This is easier if the entries are sorted by Henry
number, but it's not essential, as this example shows.
Example from the Kerchner DNA project:
kit # Henry no. events
00577 11221782 6 [8 generations minus the 2 that are ignored]
21349 11221783 1 [same as above, except the last generation]
00784 112216A1 3 [11221 is shared; 6A1 is new]
04085 1116584 5
00816 11528111 6
08335 1152841 2
05726 11185312 5 [careful! the "111" already appeared on kit 04085]
02953 11184161 4
02998 11184115 2
Total events: 34. Total mutation opportunities: 34 x 12 = 408.
(This is just the first panel so far.)
The second step is to count up the mutations. In the first panel, it
is especially easy because there is only one. See below.
00577 13 24 14 11 11 16 12 12 12 13 13 29
21349 13 24 14 11 11 16 12 12 12 13 13 29
00784 13 24 14 11 11 16 12 12 12 13 13 29
04085 13 24 14 11 11 16 12 12 12 13 13 29
00816 13 24 14 11 11 16 12 12 12 13 13 29
08335 13 24 14 11 11 16 12 12 12 13 13 29
05726 13 24 14 11 11 16 12 12 12 13 13 29
02953 13 .25. 14 11 11 16 12 12 12 13 13 29
02998 13 .25. 14 11 11 16 12 12 12 13 13 29
(The atypical results are marked with periods.)
Although there are two discrepant numbers, they are the same number and
are shared by two closely related men. It is highly likely that one
mutation in a common ancestor produced both of these 25's.
Third step: compute the rate as mutations/opportunities = 1/408 = 0.0024
Fourth step: just as importantly, compute the statistical uncertainty
in the above rate. It's easy -- just take the square root of the number
of mutations and divide that by the number of opportunities. I.e.,
1/408 = 0.0024
(Note: this formula is only an approximation, but it is adequate for
analyzing low-frequency events.)
Time for a quick rule of thumb. In order to state that one number is
SIGNIFICANTLY different from another, you must verify that it is
different by at least TWICE the above-calculated uncertainty. (That's
the 95% confidence interval we keep talking about.)
To put it another way: this measured rate is not significantly
different from 0 and not significantly different from 0.007.
PANEL 2
But now, to show that complication can set in at any time, let's
look at the next panel. The nine haplotypes (in the same order)
are:
00577 17 8 10 11 11 26 15 19 30 15 15 16 16
21349 17 8 10 11 11 26 15 19 30 15 15 16 16
00784 17 8 10 11 11 26 15 19 30 15 15 16 16
04085 17 8 10 11 11 26 15 19 30 15 15 16 16
00816 17 8 10 11 11 26 15 19 .31. 15 15 16 16
08335 17 8 10 11 11 26 15 19 .31. 15 15 16 16
05726 17 8 10 11 11 26 15 19 30 15 15 16 16
02953 17 8 10 11 11 26 15 19 .31. 15 15 16 16
02998 17 8 10 11 11 26 15 19 .31. 15 15 16 16
(The atypical results are again marked with periods.)
Here there are four discrepant values, but, again, that doesn't
necessarily mean four mutations -- shared ancestry often means shared
mutations. The four differences are the 31's all in the same column.
Could they all result from a single mutation? Well, not in this case.
Look at the lineage for kit 00816 (11528111). That does share a lot
with kit 08335 (1152841), and so these two almost certainly do share a
single mutation which occurred in one of three common ancestors: 115,
1152, or 11528. Any later, and the mutation would not be shared by
both testees; any earlier, and it would be shared by all nine.
Similarly, look at 02953 (11184161) and 02998 (11184115). They also
share many ancestors, and the mutation in their case could have
occurred in either of two: 11184 or 111841. Any later, and the
mutation would not be shared; any earlier, and it would also be shared
by 05726 (11185312). In short, we're looking at two mutations here in
this column, one shared by kits 00816 and 08335 and another shared by
02953 and 02998.
Back to the calculations...
Adding the two panels together, we have 34 x 25 = 850 mutation
opportunities and 1+2=3 mutations in all. That's a rate of 3/850 =
0.0035. The square root of 3 is 1.7, and so the uncertainty is
1.7/850 = 0.0020, about the same as for the first panel alone.
Now, let's consider only the second panel by itself: 34 x 13 = 442.
Rate is 2/442 = 0.0045, and uncertainty is sqrt(2)/442 = 0.0032.
So, are the results from the two panels SIGNIFICANTLY different? No.
They are different by almost a factor of 2, but they are also very
uncertain -- too uncertain to stand apart. More to the point, even
though the measured rate for the second panel is "large", it is still
not significantly different even from the old standby 0.002.
PANEL 3
Finally, let's look at the third panel. The haplotypes are as follows:
00577 11 11 19 22 16 15 .18. 17 36 37 12 12
21349 11 11 19 22 16 15 .18. 17 36 37 12 12
00784 11 11 19 22 16 15 .18. 17 36 37 12 12
04085 11 11 19 22 16 15 17 17 36 37 12 12
00816 11 11 19 22 16 15 17 17 36 37 12 12
08335 11 11 19 22 16 15 17 17 36 37 12 12
05726 11 .10. 19 22 16 15 17 17 36 37 12 12
02953 11 11 19 22 16 15 17 17 36 .38. 12 12
02998 11 11 19 22 16 15 17 17 .35. 37 12 12
This time, the shared discrepancies are again a shared mutation because
all three 18's appear in closely related men. Here, then, we see four
mutations.
The rate for the combination of all three, then, is (1+2+4)/(37x34) =
0.0057. The uncertainty is sqrt(7)/(37x34) = 0.0021. Interestingly
enough, this is again close to the uncertainty for the first panel
alone. Again, we see that this calculated rate is not significantly
different from the nominal 0.002, or from the calculated rates of the
other panels in any combination. It requires a much larger set of
data or a far larger departure from the norm to see a significant
difference in rate.
John Chandler
This thread:
| Calculating the observed mutation rate [was: Re: [DNA] Male Line Specific Y-STR Average Mutation Rates...] by (John Chandler) |