Archiver > GENEALOGY-DNA > 2010-02 > 1267288334

From: "Ken Nordtvedt" <>
Subject: Re: [DNA] DNA] The British Isles DNA Project
Date: Sat, 27 Feb 2010 09:32:14 -0700
References: <><0715C0D0E6814FE79DF824415E8D271B@anatoldesktop>

----- Original Message -----
From: "Anatole Klyosov" <>

>> From: "Ken Nordtvedt" <>
>> You continue to put unrealistically small statistical confidence
>> intervals
>> on your age estimates.
>> Could you kindly state the formula which produces them (the confidence
>> intervals)?

From: "Anatole Klyosov" <>
> Regretfully, you continue to talk generalities.

From: "Ken Nordtvedt" <
[[SigmaG = SquareRoot [ M times Sum over c { f(c)^2 } ]
M is total mutation rate of haplotype markers use, c sums over each
father/son transitional site on the tree leading to sample haplotypes, f(c)
is fraction of sample haplotype population descendant from each site c.
This analytic formula for any tree has been mentioned on more than one
occasion and has been available on the web for a couple years.
So what's your formula? ]]

From: "Anatole Klyosov" <>
It would have been SO easy
> for you to take those R1a1 haplotypes, since the list was provided,
> calculate YOUR own numbers and present here.

From: "Ken Nordtvedt" <
[[[ Sigma G can NOT be evaluated without knowing the structure of the tree.
This has been said before more than once. I don't know the structure of the
tree, so there is no sense throwing out meaningless alternative SigmaG
numbers. But it is easy to show the number quoted below, for example, is
too low. ]]

From: "Anatole Klyosov" <>
> And, please, let ME decide what is "realistically" and what is
> "unrealistically" small or large.

From: "Ken Nordtvedt" <
[[ Purpose of my original message was to warn others, not you, that your
statistical confidence intervals are consistently too small to an
unrealistic degree; Your case below will now be considered as an example.]]

>> . A common ancestor of that branch lived 2125+/-370 years
>>> before present (if to calculate from the first 25 markers)

[[ You claimed 14 haplotypes for this case. I will use M = 1/14 for the
first 25 markers, and 71 generations for 2125 years. A branch line of
descent of that length will have ON AVERAGE 5 mutations. The sigma
(standard deviation of those 5 will be 2.24 mutations or 45 percent. Some
ERRONEOUSLY assume then that because they have a tree with 14 branch lines
of descent the 45 percent can be divided by square root of 14 = 3.7,
bringing the average haplotype branch line age estimate down to 12.2
percent. As wrong as this procedure would be it still yields a 2SigmaG of
24.4 percent which would be 2125 x .242 = 514 years. So the formula used
here is

2SigmaG / G = 2 / SqrRoot(<n> N ), <n> being the average 5 mutations, N
being the number of haplotypes.
I hasten to say again; this formula is wrong, albeit seen used here and

But the actual 2SigmaG is surely much bigger than even that 514 years for
the basic reason the branch lines of descent to the 14 haplotypes are NOT
independent of each other. They share some portion of their branch lines
with each other. That's why one can't make the realistic SigmaG estimate
without knowing the tree structure. The division by square root of 14 is
only valid if the tree founder had had 14 sons, and each of those 14 sons
had had a descending line leading to one of the 14 sample haplotypes.

The true formula given above reduces to a form

2SigmaG / G = 2 / SqrRoot (<n>) times a factor which diminishes with N much
slower than the 1/SqrRoot(N) --- more like 1/SqrRoot(logN).

The very few papers which actually try to produce a properly founded SigmaG
use some program like BATWING which adds a demographic model to the mutation
model, and then does a huge number of simulations of different tree
structures as well as random mutational placements in each tree, and then
looks at their distribution of age estimates. There are problems with that
but at least they recognized they needed to know the tree structure. ]]

This thread: