GENEALOGY-DNA-L Archives

Archiver > GENEALOGY-DNA > 2008-01 > 1200172376


From: Thomas Krahn <>
Subject: Re: [DNA] Markers - How many and which ones?
Date: Sat, 12 Jan 2008 15:13:10 -0600
References: <2dff56a0801121109k2ae8145ai6ca2b79ec374e52f@mail.gmail.com>
In-Reply-To: <2dff56a0801121109k2ae8145ai6ca2b79ec374e52f@mail.gmail.com>


Colin Ferguson wrote:
> Hi Thomas,
> Thank you. I have no doubt you are correct. I'll elaborate on what I
> meant by "some". Consider the limit of N being incredibly huge, like a
> google (the digit 1 followed by one hundred zeros). Call the average
> mutation rate deduced by such a set MuBIG. The academic question then
> is how many markers expressed as N does one have to include in a panel
> such that the average muation rate, MuN, can be expected to be within
> x% of MuBig? I'd be happy with x = 20!
>
Colin, I think we are pretty much already at that point with 30 markers
if we exclude all those markers that may be influenced by recombination
and similar effects.

Note that palindromic markers and markers nearby the pseudoautosomal
region (like DXYS156) influence the average mutation rate by other
mechanisms than the regular frame shift mutation rate. So they should be
considered in a way that you split their individual mutation frequency
in two separate components. The frame shift mutation component and the
recombination component. The last one needs to be considered with
different border conditions like several markers sharing the same
palindromic arm etc.

The Y chromosome is also not unlimited in length. I don't know the exact
number of STRs on the Y chromosome, but with a length of ~60 million
bases the number of STRs will be clearly limited below 10,000
analyzable STR locations. The quicker mutating (analyzable) STR markers
are probably all already found, so I'd expect the mutation frequency to
decrease by collecting more and more Y-STR loci.

Most repeats on the Y chromosome are short and invariant. So the binned
(mutation frequency) versus (number of markers) distribution will look
similar to this:
0% <= f_mut < 0.0001% -> 8000 markers
0.0001% <= f_mut < 0.001% -> 800 markers
0.001% <= f_mut < 0.01% -> 80 markers
0.01% <= f_mut < 0.1% -> 8 markers
0.1% <= f_mut -> 3 markers

Note those are not real numbers, but just an example how I'd expect the
distribution.

So it doesn't really make a sense to calculate an average for the total
of Y-STR markers.

The mathematicians on the list may give you a more sophisticated
explanation.

Thomas





This thread: