Archiver > GENEALOGY-DNA > 2010-02 > 1267407211

From: "Tim Janzen" <>
Subject: Re: [DNA] : low variance MRCA dates for P310cladesinItalyandSEEurope
Date: Sun, 28 Feb 2010 17:33:31 -0800
In-Reply-To: <>

Dear Vince,
What you are saying below is exactly what I am driving at. You
don't necessarily get more accurate intraclade coalescence age estimates if
you simply randomly add samples to your dataset. Having fewer, but more
carefully selected haplotypes that represent the oldest branches in a
subclade can improve the accuracy of the estimates. This is the reason that
I routinely select only one randomly chosen haplotype per surname if there
are clearly closely related haplotypes with the same surname in the dataset
for a subclade for which I am doing an intraclade coalescence age estimate.
In any case, we often have problems figuring out how many branches there are
in a large subclade because we don't know the overall tree structure. This
problem won't be completely revolved until we have widespread complete Y
chromosome sequencing available so that we can determine what the tree
structure is using all of the SNPs on the Y chromosome for the various
Another issue is how many haplotypes is optimal for intraclade
coalescence age estimates. From having done quite a few intraclade
coalescence age estimates previously I have noted a tendency for intraclade
coalescence age estimates to start declining once you reach 50 or so
haplotypes. This is something that could be reviewed experimentally using a
haplotype generator and running quite a few intraclade coalescence age
estimates using datasets of varying sizes.

-----Original Message-----
[mailto:] On Behalf Of Vincent Vizachero
Sent: Sunday, February 28, 2010 1:52 PM
Subject: Re: [DNA] : low variance MRCA dates for

You are actually implying that you do know something about this in
your example, which makes it easy to point out that - in your example
- you'd get a more accurate TMRCA estimate by REDUCING your sample
down to just two intentionally selected haplotypes (e.g. one "early
branch" and one "recent branch" than you would by increasing your
sample to 1020 randomly chosen haplotypes.


This thread: