GENEALOGY-DNA-L Archives
Archiver > GENEALOGY-DNA > 2010-07 > 1280249558
From:
Subject: Re: [DNA] DNA] STR Sigmas
Date: Tue, 27 Jul 2010 16:52:38 +0000 (UTC)
In-Reply-To: <359553404.608907.1280249353228.JavaMail.root@sz0002a.westchester.pa.mail.comcast.net>
>From: "Ken Nordtvedt"
>If you have N = 32, then Sigma(G)/G = 1/4
>The 95 percent 2-sigma plus or minus is then plus/minus 50 percent of the estimate --- and this is the asymptotic "best case". Things are worse for more modest G.
Dear Ken,
I honestly does not know why you persistently want to convince people here that margins of error in DNA genealogy are huge and preclude meaningful data analysis, and to "prove" it you give some artificially unfavorable examples and not real data.
Why wouldn't you show actual data, such as in the following example which was published in J. Genet. Geneal. 5, pp. 217-256, namely, a dataset of 1527 of 25-marker haplotypes, all of which collectively contain 8785 mutations from the base ("ancestral") haplotype?
I understand that not everyone works with such extended series of haplotypes, however, in this case you will show a low range of margins of error in a real example. In order to show how margin of error grows in less extended series of haplotypes, take 153 haplotypes from the same series which contain 879 mutaions. Finally, take 15 haplotypes from the same series which contain 87 mutations in all.
This would be much more educational and productive, rather than to scare people away from data analysis.
If you have some difficulties in the above data analysis, let me know. I will gladly help.
Regards,
Anatole Klyosov
< >
Subject: [DNA] STR Sigmas
Date: Tue, 27 Jul 2010 07:27:19 -0600
Statistical sigmas for STR variance growth with time eventually grow quadratically with G. For the interclade node age estimate, an individual STR has:
Variance of Variance = 2mG (1 + 4mG)
m equal STR mutation rate, G is generations back to interclade node.
The square of the interclade variance expectation value, itself, goes as:
<Variance>^2 = (2mG)^2
Variance of Variance / <Variance>^2 = 2 + 1/2mG
For G = 2500 generations and m = 10^-4, this ratio is 4.
So for large G we have the "best" that can be done in fractional terms:
Sigma(G) / G = square root of 2
For very large G all STR G estimates are combined with equal weight to produce the collective G estimate, and best you can do is:
Sigma(G) / G = square root of 2/N
with N being number of STRs in haplotypes
If you have N = 32, then Sigma(G)/G = 1/4
The 95 percent 2-sigma plus or minus is then plus/minus 50 percent of the estimate --- and this is the asymptotic "best case". Things are worse for more modest G.
Unfortunately, there are some statistically unrealistic small sigmas quoted here and there.
A very steep price is paid for statistically by not using the maximum possible number of STRs in your haplotypes.
This thread: