GENEALOGY-DNA-L Archives
Archiver > GENEALOGY-DNA > 2011-02 > 1297289903
From: "Ken Nordtvedt" <>
Subject: Re: [DNA] variance quiz game
Date: Wed, 9 Feb 2011 15:18:23 -0700
References: <002701cbc7a9$8c29ef30$c2482dae@Ken1> <000001cbc842$3e876b10$bb964130$@com><000c01cbc86c$2641c9e0$c2482dae@Ken1> <000301cbc875$0ed2dca0$2c7895e0$@com><00c901cbc891$ae5ec290$c2482dae@Ken1><000001cbc895$b479baf0$1d6d30d0$@com>
----- Original Message -----
From: "Sandy Paterson" <>
> Chuckle.
>
> Replace 'sufficiently' with 'excessively' and we're in agreement.
[[[ The windy version was meant for something other than being windy, but I
judged it ended up being "sufficiently" windy --- it met the threshold ---
to justify a brief version
Anyway, this may be of interest to what I believe you are working on doing.
Weighting different haplotype pairs in a sum of their distance measures to
form a time estimate for a clade/haplogroup population of haplotypes seems
to be possible.
One of the variance varieties of the three I mentioned in previous message
manifestly does involve haplotype pairs of different time depth. When we
consider the N(N-1)/2 pairs of haplotypes from a population of N haplotypes,
clearly some in reality have a young TMRCA, some have middle-of-the-road
TMRCAs and some have large TMRCAs equal to the actual TMRCA of the whole
tree for those N haplotypes. This justifies weighting. Here's my formal
take on how to do that. We're talking about the coalescence age.
Gcoal = Sum p = 1 to N(N-1)/2 of [ Var(p) w(p) ] / { 2M Sum p = 1 to
N(N-1)/2 of [ w(p) ] }
with label "p" meaning specic pair of haplotypes
Expectation value <Var(p)> = 2M TMRCA(p)
Consider the expectation value of the correlation matrix C(p,p') of
statistical flucuations of the Var(p) about their expected values
<[Var(p)-<Var(p)>][Var(p')-<Var(p')>]> = C(p, p')
C(p, p') is a matrix of dimension N(N-1)/2 by N(N-1)/2
The best weights for minimizing the sigma for Gcoal estimation then obey the
matrix eigenvalue equation for its smallest eigenvalue k.
If the Correlation Matrix was known, the best weights could be determined.
Sum p' of [ C(p, p') w(p') = k w(p) ]
Unfortunately I see no workable way right now to get an estimate for this
Correlation Matrix. Only the diagonal entries could be estimated.
A similar analysis of the other two types of variance could be done, but the
same problem emerges --- how are the off-diagonal elements of the
Correlation Matrix estimated? One certainly can not set them to zero. ]]].
This thread:
| Re: [DNA] variance quiz game by "Ken Nordtvedt" <> |