GENEALOGY-DNA-L Archives

Archiver > GENEALOGY-DNA > 2010-02 > 1267393594


From: Vincent Vizachero <>
Subject: Re: [DNA] : low variance MRCA dates for P310cladesinItalyandSEEurope
Date: Sun, 28 Feb 2010 16:51:30 -0500
References: <201002282128.o1SLSgu0007885@mail.rootsweb.com>
In-Reply-To: <201002282128.o1SLSgu0007885@mail.rootsweb.com>


Yes but if you you increase your sample size form 102 to 1020 you have
the exact problem, statistically speaking: 20 "early" haplotypes and
1000 "late" haplotypes.

You'd get same estimate from the 102 as you get from the 1020, despite
a 10x sample size in the later case. Just collecting more samples
won't fix anything. You'd have to change your methodology, but that
would require knowing something about the tree structure.

You are actually implying that you do know something about this in
your example, which makes it easy to point out that - in your example
- you'd get a more accurate TMRCA estimate by REDUCING your sample
down to just two intentionally selected haplotypes (e.g. one "early
branch" and one "recent branch" than you would by increasing your
sample to 1020 randomly chosen haplotypes.

VV


On Feb 28, 2010, at 4:28 PM, Tim Janzen wrote:

> Let's say we have only two haplotypes from an
> early branch of this subclade that is say 3000 years old, but we
> have 100
> haplotypes from a relatively recent branch with a true TMRCA of say
> 700
> years. In such a case the 100 haplotypes from the relatively recent
> branch
> will "swamp" the data from the earlier branch and will skew the
> intraclade
> coalescence age to be significantly younger than it would be if only
> two
> haplotypes from the relatively recent branch were included in the
> calculations.


This thread: