GENEALOGY-DNA-L Archives

Archiver > GENEALOGY-DNA > 2009-02 > 1235412666

From: "Ken Nordtvedt" <>
Subject: Re: [DNA] TMRCAs for groups of haplotypes ?
Date: Mon, 23 Feb 2009 11:11:06 -0700
References: <49A2D0B4.5070808@ucl.ac.uk>

Your model is much closer to the truth. We really don't know how many
generations take place between those early occurrences of nodes in the tree
for the sample population. There is intrinsic noise uncertainty in any
ability to infer them no matter how large the sample population (which is
constrained by total population). Those intervals which contribute to the
total generations are shared in common by a very large fraction of the
sample population haplotype pairs , so there is no way to drive down the
statistical uncertainties in their contribution to total G. It requires
INDEPENDENT measurements of something to invoke the 1 / square root of N
diminishment of noise errors.

Population sample size is really more important for assuring you have a
representative sample of the total population (neutralizing various biases).
Total mutation rate M of all your STRs is the important factor. This clock
ticks no faster on average than 1/M You really can't read millisecond
accuracy on a clock which ticks every minute.

For a "normal" tree which grows at constant exponential rate from founder to
present the fractional standard deviation in estimating G is

dG/G = Square Root (log2 / [ MG logN ]) with N being sample size

You see how slowly this diminishes with N

To recover the 1 / square root N effect you would have to assume the
unrealistic picture that all N haplotypes of your sample population
descended from different sons of the founders, so that there N lines to the
present were INDEPENDENT.

----- Original Message -----
From: "James Heald" <>
To: <>
Sent: Monday, February 23, 2009 9:37 AM
Subject: [DNA] TMRCAs for groups of haplotypes ?

>I wonder if people can help me out?
>
> I seem to have got myself into a difference of views in another place
> with Anatole Klysov (not entirely unknown to this list),
> and I was wondering if I could sanity-check a point that has come up
> with people here, particularly the opinion of the people like Ken and
> John Chandler.
>
> Anatole has made some comments there that seem quite odd to me, such as
>
> * A change from 13-18 to 14-17 in DYS 385 a/b should be counted as only
> a one step mutation (giving rise to an 11/12 match), not a two step
> mutation (giving a 10/12 match); and
>
> * "J1 CMH are not Cohanim ... J2 are the reals Cohanim today according
> to fundamental and standard basis."
>
>
> But the real proposition I want to bring to the list is this. It seems
> to me that, if you have only tested a given number of markers, that puts
> an unavoidable ceiling on the accuracy with which you can estimate a
> TMRCA for a group, no matter how great the number of haplotypes you
> sample from that group.
>
> And the reason is this: even if you knew (from an infallible oracle) how
> far back you had to go to get back to only two lines left standing,
> there would still be a remaining uncertainty in the coalescence time for
> the last two lines -- viz the confidence interval in the TMRCA for a
> 12/12 match or a 25/25 match or a 37/37 match (as appropriate) that one
> can look up on Bruce Walsh's page at
> http://nitro.biosci.arizona.edu/ftdna/TMRCA.html
> So, just on the basis of this, there must a 95% interval of at least 650
> years in any 37 marker calculation, 975 years in any 25 marker
> calculation, and 2225 years in a calculation based only on 12 markers.
>
> (In fact, I suspect reasonable uncertainties must be substantially
> larger, since this is the unavoidable contribution to the uncertainty
> from only the very last step of the coalescence. But even having a
> minimum ballpark figure is at least a start).
>
>
> Similarly, if one has a big group of haplotypes centred on a signature
> 12-23-15-10-14-17-11-15-12-13-11-29
> (and maybe some other markers known as well), and a separate group of
> three 12-marker haplotypes all with signature
> 12-23-15-10-13-18-11-15-12-13-11-29
> it seems to me that there must be an uncertainty in the TMRCA of those
> groups given by a 95% confidence interval of at least 4925 years wide,
> that being the number that comes out of Bruce Walsh's figures for the
> uncertainty in the TMRCA of a 10/12 match, corresponding as a minimum to
> the uncertainty in the TMRCA between the common ancestor of the three
> 13-18 haplotypes, and the central 14-17 haplotype.
>
>
> Anatole on the other hand insists that this is "ignorance"; shows I have
> "no understanding of kinetics in general, and kinetics of mutations in
> haplotypes in particular"; I simply "do not understand the concepts of
> DNA", and that of course the number of haplotypes matters.
>
> He gives an uncertainty in the overall TMRCA as only "+/- 200 years"
>
>
> What do people think?
>
> My view is that even if he has 100 haplotypes in his 14-17 group (which
> he claims), he simply doesn't have the data to claim a TMRCA to the
> 13-18 group with any 95% CI better than the 4925 years of Bruce Walsh's
> calculation; and if his calculations aren't reflecting that, then his
> calculations, at least as regards the uncertainty, are simply broken.
>
> (Perhaps in much the the same way that Thomas et al in the original CMH
> paper used an ASD method to calculate a 95% CI of 1150 years despite
> only using 5 markers,
> http://www.ucl.ac.uk/tcga/tcgapdf/Thomas-98-Nat-Cohen.pdf
> ignoring in their calculations that the lineages would certainly
> coalesce, so could in no way be considered independent).
>
>
> The *right* way to do a calculation like this is presumably to do a
> full-on Bayesian sampling simulation with something like BATWING.
>
>
> But as people often do make claims about group TMRCAs, or that
> particular individuals "must" have a recent coalescent time with
> particular clades, even on similarly very short-length haplotypes, I
> thought it would be useful to raise here.
>
> Cheers,
>
> James.
>
>
> -------------------------------
> To unsubscribe from the list, please send an email to
> with the word 'unsubscribe' without the
> quotes in the subject and the body of the message
>