GENEALOGY-DNA-L Archives

Archiver > GENEALOGY-DNA > 2006-08 > 1156066513


From: "Alister John Marsh" <>
Subject: RE: [DNA] When is 25/26 not enough?..
Date: Sun, 20 Aug 2006 21:35:13 +1200
In-Reply-To: <REME20060819225402@alum.mit.edu>


John,

Thanks for you comments. I reply as below...

-----Original Message-----
From: John Chandler [mailto:]
Sent: Sunday, August 20, 2006 2:56 PM
To:
Subject: Re: [DNA] When is 25/26 not enough?

John wrote:
> I have two rare marker values for R1b on slow markers in the 38-67 set
FTDNA
> & EA 18 marker set. (DYS487=12, DYS641=11) The marker scores occur in
R1bs
> at a rate of about 1/200, and 1/100. My surname occurs in about 1/2000 of
> the population typically represented in DNA testing.

The surname frequence is probably a solid statistic, being based on
census figures, but the allele frequencies are not. With only 600
tests in your sample pool, you have large uncertainties in the allele
frequencies. My preliminary analysis of DYS487 indicates that it may
actually be a "fast" marker (but, again, the statistics are far too
skimpy to place any reliance on this preliminary indication).

=======
MY REPLY: I agree that the estimates of allele frequencies are highly
speculative. In the R1b project, of 165 tested on DYS487, there are 0no
=12, 154no =13, 8no =14, and 3no =15. At least in some of those cases, the
off modal occurrences are related persons. If "on average", each of the 165
have 100 unique ancestors in the lines back to the common ancestor, the 165
lines combined would have contained around 16,500 separate opportunities for
mutations on DYS487. Probably 15,000 of those would have been an
opportunity for a DYS487=13 to mutate to something else. If we ignore back
mutations, which are unlikely significant, the indications are that of a
ball park of about 15,000 mutation opportunities for a 13 to mutate, not 1
has mutated to 12. You are right, there are large uncertainties in a small
sample pool, but I think the early indications are that DYS487 might have a
slow mutation rate. If the mutation rate was average, it might have
produced 20 or so mutations from 13 to 12, in 15,000 mutation opportunities.
Even 1 mutation to 12 high up the tree, may have resulted in 50% of R1bs in
this small pool having DYS487=12, but none do.

There are two issues here, one is mutation rate, and one is allele
frequency. I think in this case allele frequency is relevant, and 0
occurrences in a small pool of 165 indicates a degree of rarity. In another
database of 450 haplotypes, 12 occurred only twice, in two related persons,
representing only 1 mutation event to 12 in the combined genetic history of
the pool. This also indicates a degree of rarity. Neither me nor the two
of my surname appear in ether of these databases as it happens.
=======

Meanwhile, I can't refrain from pointing out that even the 1/2000
figure you're using for the surname is way off in the small pool.
With 3 of your surname out of 600, it's clearly a frequency of 1/200
instead. This simple example illustrates the thin ice you're skating
on when you try to draw conclusions from inadequate sample sizes.

=======
MY REPLY: I understand your point, and I thought of this before making my
original posting. What I was saying was that essentially in the European
population at large, ("the population typically represented in DNA
testing"), there might be 1/40,000,000 who match me on the 2 markers, and
also match surname, if the variables were random. I think in that context it
is valid. If I had pre selected a pool of only 20,000 to test of only my
surname (assuming my allele frequency figures are roughly correct) I should
have found only one who matched. I didn't test 20,000 of my surname to find
one match, I tested only 3 of my surname, and 2 matched on both rare
markers, (and all 3 matched on one). Don't you think my hit rate was a
little bit above the odds?

If we use your figure, and say that a pool contains 1/200 of my surname, and
if we halve my two allele frequency estimates to 1/100, and 1/50, the
chances of me finding two persons matching the two rare scores, and surname,
are about 1/1,000,000 in this pool of 600.
=======

> the chances of me finding another person who matches me on both my marker
> scores for these markers, and who also matches my surname, if the
variables
> were random, are about 1/40,000,000.

Bottom line: there are many orders of magnitude of "slop" in that
calculated probability.

=======
MY REPLY: In using the 1/40,000,000 figure, I was relating that to "the
population typically represented in DNA testing", ie say Europe as a whole,
and I was not relating it to a small artificially selected sub pool. If you
artificially select the R1b project pool of 165 tested on these 2 markers,
or the other database with about 450 R1bs tested on these markers, the odds
of finding someone of my surname matching the markers are zero, because
there are none of my surname in either of those pools.

The bottom line, is that I think it was "very unlikely" for me to have found
2 matches in my surname on both markers, by testing only 3 of my surname, if
the match on markers and surname was not because of relationship since
surnames. I did not find any matches on both markers together in any other
surname. The question I have been trying to address, is the question of
whether this match should be considered to possibly indicate relationship
since surnames. Also, how should this match be weighted, when compared to
21 mutation steps on 37 markers? I am trying to make a reasoned assessment.
In the absence of verified mutation rates, and allele occurrence rates, I
have tried to speculate on what might reasonably apply. If you were
speculating, what would you suggest as an order of magnitude for the allele
frequency rates for DYS487 and DYS641?

John.
=======

John Chandler


==============================
Search the US Census Collection. Over 140 million records added in the
last 12 months. Largest online collection in the world. Learn more:
http://www.ancestry.com/s13965/rd.ashx


This thread: