Re: [DNA] underestimating variance
Date: Thu, 10 Apr 2008 11:29:27 -0600
You simply add the variances from the markers, and then divide that sum by
sum of marker mutation rates. While I think some modestly decent individual
marker mutation rates are known for dozens of markers, really you only need
a good number for the sum of mutation rates. That, arguably, is much better
because data of mutations can be clustered together from the different
marker mutations, and the statistics for estimating the sum of rates is
improved.

Ken

> Hi Ken
>
>>
> I buried it in my comments below, but up front I should say I don't see a
> problem.
> <
>
> As I see it, the biggest problem at the moment is that we don't know
> enough
>
> In an attempt to overcome this problem, I've restricted myself to those
> markers that we do at least have reasonably large sample sizes for.
> I've worked solely on R1b1c7.
>
> What I do is to minimise the sum of the squares of the differences between
> expected and actual "non-modals". This has the advantage that you end up
> with a measure akin to R-Squared.
>
> Amusingly, the highest R-Squared was produced using Gusmao mutation rates
> as
> reported in John Chandler's paper, coupled with an age estimate for R1b1c7
> of 1620 years. This was closely followed by using John Chandler's fitted
> mutation rates - there was little in it. I used 5 different sets in all.
>
> However, my analysis is to a large extent nullified by your variance of
> variance analysis. What applies to simulations must apply equally to real
> life. It is clear to me that I need to use more markers.
>
> However, using my method but assuming the same "average" mutation rate
> for all markers will clearly produce nonsense answers.
>
> Hence my comment.
>
> I guess I'll have to wait a few years until we have more information.
>
>
> Sandy Paterson
>
>
>
