GENEALOGY-DNA-L Archives

Archiver > GENEALOGY-DNA > 2005-01 > 1105666967

From: (John Chandler)
Subject: Re: [DNA] Male Line Specific Y-STR Average Mutation Rates
Date: Thu, 13 Jan 2005 20:42:55 -0500 (EST)
References: <41E5775C.7060302@kerchner.com> <REME20050112155719@alum.mit.edu> <41E59CA9.4020303@kerchner.com> <REME20050112183553@alum.mit.edu> <41E5BBC9.8020603@kerchner.com> <REME20050112203830@alum.mit.edu> <41E5DB49.7020206@kerchner.com> <REME20050112224525@alum.mit.edu> <41E60555.1080203@kerchner.com> <REME20050113143016@alum.mit.edu> <41E6E91D.5040804@kerchner.com>
In-Reply-To: <41E6E91D.5040804@kerchner.com> (message from Charles on Thu, 13Jan 2005 16:33:17 -0500)

Charles wrote:
> But clearly some
> surname projects are observing average mutation rates 3 or almost 4
> times more than others.

This gets to the nub of things. There is a big difference in
principle between the observed rate and the actual rate. You are
implicitly assuming that the process of "observation" is perfect. It
is not. Measuring the mutation rate is rather like measuring the
weight of a man who is jumping up and down on the scales. Most of the
time, he is up in the air, and the scales read "0", but from time to
time (and the interval varies because he doesn't always jump the same
height) his feet hit the scales and register far more than his actual
weight. In the same way, the mutation rate is zero most of the time
(from one generation to the next), but it becomes huge if you catch
the one generation where a mutation actually happens. The analogy
isn't perfect, but it conveys the difficulty of getting a smooth
average out of a series of sudden events. The observed rate is not
the true rate, but only an estimate, and usually a rough estimate at
that.

> commonly used .002 historical average Y-STR mutation rate. You keep
> wanting to hold onto that historical average as the gospel

No, actually the rate I use is 0.0023 +/- 0.0003, which is the result
I obtained by pooling many surname projects, a combination of 12 and
25 marker testing. You may remember seeing the reports I made to
this list at the time (early 2003). Although my study was too small
to measure useful rates for individual markers, I did subdivide the
data in two different ways: first panel vs second panel and slow vs
fast (according to FTDNA). The first test was inconclusive, but the
second showed that the "fast" markers really did have a higher average
rate than the "slow" ones. You can read all about it in the archives.

Unfortunately, the computer where I had the data for that study
crashed later in 2003, and the backups turned out not to be usable, so
the reports I made to the list are all that's left. However, Doug has
since then done a similar study with similar methods and similar
results (slightly higher, but not significantly different rate).

Meanwhile, the biggest mutation rate study that has actually been
published is the Norwegian father-son study by Dupuy et al.,
which covered 10 STR markers (all part of FTDNA's first panel).
Guess what their result is: 0.0020 +/- 0.0003

> It makes me think of an analogy using shoe sizes.

Sorry, but that's a very bad analogy. It is possible, and even
necessary, to measure the shoe size very precisely. The basic unit
is one person's foot (or you can measure both feet to make sure they
are indeed the same). You exhibit that one foot and measure it. End
of story.

The average mutation rate is very different. There is no one thing to
measure. Here's a slightly better analogy: gas mileage. That's a
rate, too, and you can measure it by writing down the odometer reading
at two successive fill-ups and noting the amount of gasoline pumped.
However, different attendants will top off the tank differently, and
so the calculated miles-per-gallon will vary, even if you always do
the same mix of stop-and-go and highway driving. (Non-US readers may
substitute km-per-liter.)

> A Y-STR haplotype average mutation rate which is .0015 in one male line
> is very different than a Y-STR haplotype average mutation rate of .0056
> in another male line.

I know you think that's a "clinching" argument because nobody would
say that 0.0015 = 0.0056. However, it's not that simple. You have
violated rule #1 by failing to specify the uncertainties on those
two numbers. Without the uncertainties, it is impossible to say
whether they are significantly different or not. From the typical
uncertainties associated with DNA projects these days, they are not.

> But I am going to pursue this none the less.

I never said you shouldn't. In fact, I encourage you to do so.
The only problems I foresee are the possible selection bias if
too few projects participate and the possible inaccuracies if
people include lineages without good paper trails. At some
point, it may be necessary to go after individual project
admins and urge them personally to get on board.

> I invite you John to share your data and make an entry into the Log.

Yes, that's the ticket. But don't just invite me. Make general
invitations for a few months, and then start pestering the admins
who haven't contributed.

John Chandler