GENEALOGY-DNA-L Archives

Archiver > GENEALOGY-DNA > 2010-02 > 1265565688


From: "Ken Nordtvedt" <>
Subject: Re: [DNA] Intraclade Age Sigma "Unknowable"
Date: Sun, 7 Feb 2010 11:01:28 -0700
References: <009601caa80d$da9ca2a0$5e82af48@Ken1><4B6EEA3B.3090005@san.rr.com><000601caa813$70017870$5e82af48@Ken1><4B6EF353.9040503@san.rr.com>


----- Original Message -----
From: "Al Aburto" <>
> Aha! :-)
>
> From your postings and emails I have quit on the intraclade estimation
> a while back. I been using the TMRCA pairs estimation instead as I
> could not quite come to grips with what the interclade estimation was
> doing for me. It gives an age between two groups, but not of one ...

As long as we are talking what could be done hypothetically under amazing
circumstances such as haplotypes of huge STR number, let's consider what we
would see in the distribution of pairwise age estimates from an intraclade
population of N haplotypes. So we assume we drove the statistical
uncertainties in each pairwise TMRCA to zero.

If f is the fraction of the N haplotypes descended from one son of the MRCA,
and (1-f) the fraction then necessarily descended from the other son
(assuming just two sons with non-extinct lines), we should find that the
fraction 2f(1-f) of the pairwise TMRCAs should be the MRCA age. There
would be no high TMRCA tail to the distribution. But rather there would be
a sharp cuttoff in the distribution representing those pairs whose MRCA is
the clade MRCA (assuming your sample population was a good enough sample so
that f was not zero). Then you would get spikes in the distribution for
younger pairwise TMRCAs indicating nodes in the tree leading to these N
haplotypes.

So in this idealized limit you could learn both TMRCA for the whole clade
and the fraction f* value for that first fundamental split, but appropriate
to your sample population of haplotypes. That f* would not necessarily be
the f for the entire world population of this clade. Some kind of quality
control in the sampling of that entire population would be necessary to
guarantee that your measured f* was close to the true f, or worse yet that
you missed the clade TMRCA entirely by having an f *= 0 for your sample
population.

Ken




This thread: