GENEALOGY-DNA-L Archives

Archiver > GENEALOGY-DNA > 2010-02 > 1265597799


From: "Anatole Klyosov" <>
Subject: Re: [DNA] Variance Assessment of R:U106 DYS425Null Cluster
Date: Sun, 7 Feb 2010 21:56:39 -0500
References: <mailman.3333.1265568738.2099.genealogy-dna@rootsweb.com>


>From: "Lancaster-Boon" <>

>Let me make my point a different way and see if it still sounds crazy. I
very rarely see datasets which have only one likely family tree. Most of
them have many. Your approach seems to rely on the assumption that even with
big sets of data the family tree structure can be stated with zero doubt?


My response:

Dear Andrew,

It seems that we indeed are talking past each other. This is said not in a
negative sense, but matter-of-factly. For example, I do not know exactly
what do you mean by "one likely family tree". Is it a lineage? Is it a
branch on a haplotype tree? Is it a haplotype tree with a number of
branches, each one of them having a most recent common ancestor?

When I consider a 509 67-marker haplotype set of R-L21, is it a "family
tree"? Or do you mean that it consists of many family trees? What do you
mean by "family tree structure" in this particular case of 509 R-L21
haplotypes? What does it mean in this particular case "the family tree
structure can be stated with zero doubt"?

It seems to be a strange situation. I gave here a number of VERY specific
examples. I have explained here (1) how to identify a base haplotype in a
dataset, (2) how to examine the dataset (using the logarithmic and linear
approaches combined) and verify that all haplotypes in the set descended -
statistically - from one the most recent common ancestor, (3) how to build a
haplotype tree, (4) how to dissect a tree to branches, (5) how to calculate
a time span to the common ancestor, (6) how to correct results taking into
account back mutations, etc.

Where is here that "family tree structure" and where "zero doubt" is here?

My approach does not rely on "assumption" regarding anything about the
family tree. It relies on a direct examination of a pattern of mutations in
haplotypes. And this examination shows which methodology I should employ for
analysis of the dataset.

Is it still unclear?

OF.I., one more example, and you tell me about "family tree structure" and
was it "zero doubt" or not. It do not thing that "zero doubt" is relevant to
statistical matters. We talk on probabilities.

Since I have mentioned R-L21 above, I continue. There are 509 haplotypes in
the dataset, all are 67-marker ones. Typically I use 67-marker haplotypes to
compose a haplotype tree, and to see if it is "smooth", meaning it does not
show any distinct branches, or it is a complex tree. For those L21, it was a
nice, symmetrical, "smooth" haplotype tree, containing, I repeat, 509
haplotypes. Again typically, I count mutations in the first 25 markers only,
since it is the most reliable panel of haplotypes for calculations. Also,
because 37- and 67-marker haplotypes generally produce the same results in
terms of TMRCA. However, it is much more time consuming to count mutations
in, say, 509 of 67-marker haplotypes. Even 25-marker ones provide 12,725
alleles.

All 509 haplotypes contained 2924 mutations in the first 25 markers. This
gives 2924/509/0.046 = 125 generations w/out a correction for back
mutations, or 143 generation with the correction (the Table is published),
that is 3575 years to a common ancestor.

The tree shows that the dataset is "monoancestral" one, however, to be on a
safe side, I typically count base haplotypes in the dataset. There were only
2 of them among those 509 25-marker haplotypes. It is not enough for a
reliable calculations, statistics in the there. However, just for fun:
[ln(509/2)]/0.046 = 120 generations w/our correction, or 136 generations
w/correction, that is 3400 years to a common ancestor. Not bad.

By the way, a margin of error. For 2924 mutations in 509 of 25-marker
haplotypes, the average number of mutations per marker equals to 0.230. One
sigma corresponds to 1.8%, two sigma (95% confidence) to 3.6%, that is
0.230+/-0.008 mutations per marker. The TMRCA equals to 3575+/-380 years,
with 95% confidence.

The same L21 haplotype series contain 770 of 12-marker haplotypes, and they
contain 49 base haplotypes, that is identical to each other. This gives
[ln(770/49)]/0.022 = 125 generations w/out correction, or 143 haplotypes
with the correction, that is 3575 years to a common ancestor. It is exactly
equal to the TMRCA for the 25-marker series with 509 haplotypes.

Now, can you tell me what is "mysterious" in this approach and in counting
mutations in the present-day haplotypes. Please tell me about my
"assumptions" and "the family tree structure". Oh, yes, on "zero doubt" too.
Maybe on the "shooting in the dark".

Do you want similar calculations for R-M222? L20?L2? U152?P312? U106?
L51?L23?M269? Maybe on some hundred of other datasets?

Please notice, that I am not talking on math here. I am talking on a
philosophy, if you wish, of DNA genealogy. On a series of simple and well
defined rules which one should follow. There is nothing really complicated
here.

Best regards,

Anatole Klyosov



This thread: