From: "Sandy Paterson" <>
Subject: Re: [DNA] variance quiz game
Date: Thu, 24 Feb 2011 09:41:05 -0000

Something quite interesting.

When I realised that some people do indeed use pair-wise variance to
estimate pair-wise TMRCA's, I decided to add pair-wise variance to the
existing 3 input variables

GD
TM
ONN

Where

GD = genetic distance
TM = total number of matches
OMM= number of off-modal matches

The idea is to try and explain how much of the variance in pair-wise TMRCA's
can be explained by the input variables (in other words, get the individual
R-squareds).

I ran 30 batches through, with results summarised as follows :

1234

GD131250
TM81570
OMM92910
Var01920

So GD provided the best explanation in 13 batches, the 2nd best explanation
in 12 batches, 3rd best in 5 batches and the worst in 0 batches. In
contrast, Var provided the best explanation in 0 batches, 2nd best in 1
batch, 3rd best in 9 batches and the worst in 20 batches. (By best
explanation I mean highest R-squared).

The decision to analyse the 30 batches separately rather than combining the
data was intuitive - I can't really explain why I did that.

But when I combined them into one large batch, the results were quite
different :

R-squared

GD.565
TM.549
OMM.289
Var.419

Now this is interesting. What seems to happen is that, depending on how each
haplogroup develops over time, OMM can be very important in some haplogroups
but far less important in other haplogroups. This is a superficial
observation - I need to look at it in more detail.

But I wonder whether one shouldn't examine each marker for symmetry?
A-symmetry may suggest parallel mutations rather than ancestral mutations?
So if a haplogroup experienced say two parallel mutations relatively early
in its development, OMM loses much of its explanatory value?

Thoughts?

Sandy

-----Original Message-----
From:
[mailto:] On Behalf Of Ken Nordtvedt
Sent: 20 February 2011 14:43
To:
Subject: Re: [DNA] variance quiz game

I meant that figuratively, but I bet there is a book somewhere with the
variance = MG formula mentioned in some context or usage

Jobling's book has the formula, but I don't remember the full verbiage and
context he puts with it.

----- Original Message -----
From: "Sandy Paterson" <>
To: <>
Sent: Sunday, February 20, 2011 2:43 AM
Subject: Re: [DNA] variance quiz game

>
> When you say 'text book formula' do you mean that literally? In other
> words,
> there is a text book in existence that describes a method of estimating
> pair-wise TMRCA's using variance?
>
> Sandy
>
>
>
>
>
>
-----Original Message-----
> From:
> [mailto:] On Behalf Of Ken Nordtvedt
> Sent: 09 February 2011 15:15
> To:
> Subject: Re: [DNA] variance quiz game
>
>
>
----- Original Message -----
> From: "Sandy Paterson" <>
>
>> I've forgotten the general form. Is it (for 67 markers)
>>
>> G = [ Var(1) + Var(2) + ....Var(67) ] / [ 67 x 2 x m ]
>>
>> where m is the mean of the mutation rates?
>>
>> Sandy
>
>
> That's the text book formula, although "67 times mean m" should probably
> be
> stated as "M = Sum of marker rates". Although the two quantities are
> obviously identical I have always thought it a "crime" what the former did
> to most all newcomers in sending them down the wrong conceptual path.
>
> When the TMRCAs get to be sufficiently old so that some of the m(i)G
> values
> are not much less than 1, the estimator for N markers should read,
> however:
>
> G = [ Var(1) w(1) + ....... Var(N) w(N) ]
> divided by [ m(1) w(1) + ....... m(N) w(N) ]
>
> with weight factors w(i) = 1 / [ 1 + 4 m(i) G ]
>
> This gives tighter sigmas to the G distribution.
>
>
>
