**GENEALOGY-DNA-L Archives**

From:James Heald <>Subject:Re: [DNA] Central Limit Theorem in ActionDate:Tue, 04 Mar 2008 19:26:53 +0000References:<013801c87d66$d40ce450$6400a8c0@Ken1><REME20080303195544@alum.mit.edu>In-Reply-To:<REME20080303195544@alum.mit.edu>John Chandler wrote:

> Ken wrote:

>

>>For two such markers each with that distribution, the probability

>>distribution for the sum s = v(1)+v(2) is the convolution integral

>

>

> True, but this isn't the problem that James pointed out the other day.

> Although the ASD for a combination of two markers is simply the

> average of the two ASDs computed for the markers individually, the

> combined estimate of TMRCA is not the average of the two

> individual-marker estimates. The issue can be seen most clearly by

> looking at the probability distribution for the TMRCA of two testees

> who match exactly on all markers tested. In this case, the most

> likely TMRCA is actually zero, regardless of how many markers are

> included in the test, and the shape of the distribution is

> approximately an exponential whose mean (expectation) value is the

> reciprocal of the sum of the individual mutation rates. Clearly, the

> mean value does move closer and closer to zero as more markers are

> added, but the distribution never acquires a flat top, or in any other

> way becomes more like a gaussian.

>

> John Chandler

Let me see if I can clarify:

I suspect Ken is quite right, that with enough markers,

P(T | t) rapidly becomes approximately Gaussian, because of the

Central Limit Theorem; with the mean of T = mean no of steps = mu t

But that is not the end of the story. We also need to consider the

variance of T. I suspect that is dominated by the Poisson noise in the

number of steps, which (because of the properties of a Poisson

distribution) for one marker will have a variance also equal to the mean

no of steps = mu t.

We can now apply Bayes' theorem, to find

P(t | T) ~ P(t | I) P(T | t).

If (for simplicity) we take a flat (uniform) prior for P(t | I),

then

P(t | T) ~ P(T | t)

The two sides of the equation have the same algebraic form (up to a

normalising constant).

But while P(T | t) is a Gaussian distribution for T (it only depends on

T through the square of the numerator of the exponential), the

probability for t given T, P(t | T) is *not* a Gaussian distribution --

because it will have a form something like

P(t | T) ~ 1/sqrt(t) exp - {(T - mu t)^2 / mu t) -- the t dependence is

*not* just in the numerator of the exponential.

As a result, P(t | T) is much more skew than P(T | t) -- even when

*lots* of markers are being tested.

Eventually, as John has shown, P(t | T) becomes less skew. But this

only happens when in effect P(T | t) becomes *so* sharply peaked around

T=t that essentially no significantly different values of t can contribute.

That takes many *many* more markers than are required just to get P(T |

t) to become Gaussian.

----

Incidentally, there is one other very important consequence if the

variance associated with the squared deviation statistic for each marker

is proportional to mu t.

It means that, in accordance with the usual rules for making averages of

normally-distributed sample measurements with different variances, we

should prefer the statistic

T = 1/n (Sum (X_i)^2 / mu_i )

rather than

T = Sum (X_i)^2) / Sum (mu_i)

The first statistic should be much less noisy than the second.

The second will tend to be dominated by the noise in the samples that

have the largest variance; but the first appropriately equalises the

contribution from each datum.

-- James.

**This thread:**

- [DNA] Central Limit Theorem in Action by "Ken Nordtvedt" <>
- Re: [DNA] Central Limit Theorem in Action by (John Chandler)
**Re: [DNA] Central Limit Theorem in Action by James Heald <>**- Re: [DNA] Central Limit Theorem in Action by "Ken Nordtvedt" <>
- Re: [DNA] Central Limit Theorem in Action by "Sasson Margaliot" <>
- Re: [DNA] Central Limit Theorem in Action by "Ken Nordtvedt" <>
- Re: [DNA] Central Limit Theorem in Action by "Sasson Margaliot" <>

- Re: [DNA] Central Limit Theorem in Action by "Ken Nordtvedt" <>

- Re: [DNA] Central Limit Theorem in Action by James Heald <>
- Re: [DNA] Central Limit Theorem in Action by "Ken Nordtvedt" <>

- Re: [DNA] Central Limit Theorem in Action by "Ken Nordtvedt" <>
- Re: [DNA] Central Limit Theorem in Action by James Heald <>

- Re: [DNA] Central Limit Theorem in Action by "Sasson Margaliot" <>

- Re: [DNA] Central Limit Theorem in Action by "Ken Nordtvedt" <>

- Re: [DNA] Central Limit Theorem in Action by (John Chandler)