GENEALOGY-DNA-L Archives

Archiver > GENEALOGY-DNA > 2010-02 > 1266084653


From: "Anatole Klyosov" <>
Subject: Re: [DNA] Y Tree STR Mutations can not be counted
Date: Sat, 13 Feb 2010 13:10:53 -0500
References: <mailman.4653.1266079356.2099.genealogy-dna@rootsweb.com>


>From: "Ken Nordtvedt" <>
>... a reply from a lister who you now have repeatedly labeled dishonest.

Dear Ken,

You said it, not me. I did not use the word "dishonest". That is what is
have said:

>I have tried to help you out and gave a specific example. I have asked you
> for an honest a direct answer, and even thank you in advance for it.
> Unfortunately, I do not see in your reply either one.

In my book, there is a BIG difference to call someone "dishonest", or to
state that I did not obtain a direct and honest response.

However, I am withdrawing my that comment above because from your answer I
see that is just a plain lack of knowledge from your side, not a "bad"
intention. It is apparently hard for you to admit that lack of knowledge, so
you prefer to be very elusive with your "answers".

Yes, we do not know many things in detail. We simplify them. For example, in
physical chemistry we know, that thermodynamic and kinetic calculations and
equation are largely correct only for infinitely diluted solutions. It does
not preclude us to make calculations for real solutions, with a full
understanding that it is an approximation. However, in many cases it works,
and woks reasonably well. When we see an obvious deviation from the theory,
we say that apparently in that particular range the equation deviates from
reality. It is an accepted practice, and it works well.

To my surprise, some folks in DNA genealogy want to be holier that the Pope.
They are kind of ashamed with a lack of their knowledge, do not want to
admit it and give elusive answers.

Here is an example:

> [[[ ... Leaving aside how you or others count changes,
> those who count average GDs from the base haplotype would probably divide
> the 1788 by 500. Afterall, those 14 haplotypes have a branch history from
> the founder equally long as the 486, so their outcomes are just as much
> evidence of the the mutational potency and elapsed time of mutational
> opportunties as the 486 haplotypes which ended up with change. That's
> what was meant by "something akin to GDs". But I use variances, not GDs,
> so I don't know if your variation on GDs is improvement or regression.
> <GD> eventually has a non-linear dependence on elapsed generations, so
> your variation may or may not be incorporating some of that
> non-linearity?? I suspect your method is akin enough not to matter much in
> most cases if the TMRCA is not so large. ]]

Then,

>[[ As I mentioned in my earlier detailed message, age estimates themselves
for GDs or variances from assumed founding haplotype (your base haplotype)
do not depend on the demographic details of the tree which emerges from the
founder --- other than the total elapsed generations. Each generation of
the tree makes an identical contribution to total variance of the final
haplotypes, although the early generations make their contribution in few
big hunks, while the later generations make their contributions in many
small hunks. So the age estimate can be made without knowing the
demographic history.]]

Then,

>[[But the statistical confidence interval of the age estimate does depend
>on
the demographic history of the tree. This is known by both analytics and
simulations. Given the same founding (base) haplotype and the same final
500 haplotypes, the statistical confidence interval could be quite different
under these two scenarios: either the tree population grew at slow rate in
its early generations and then a more rapid rate in the late generations, or
vice versa. The early, less populated, generations of the tree contribute
more heavily to the total statistical confidence interval than do the later,
more populated generations. The more the tree lingers in its low population
state, the larger the final statistical confidence interval. Since you
don't know that demographic history, you can't evaluate the statistical
confidence interval. You ask me to do so: I can't as well because I don't
know the demographic history, either. All I can do or you can do is assume
a demographic history and then evaluate a statistical confidence interval
for the assumed demographic history. ]]

(Anatole): O.K., instead of that many words without a clear answer, let it
be even an approximation, I will give you my short and pointed answer.
First, I repeat the description of the dataset:

"By counting mutations" I mean the following. If we have, say, 500 of
67-marker haplotypes, and 14 haplotypes among them are identical to each
other (base haplotypes), and the other 486 mutated haplotypes have
(collectively) 1788 mutations from the base, then those 1788 mutations are
those that we count.

1788/500/0.145 = 25 generations x 25 years (per generations) is 625 years to
a common ancestor.

[ln(500/14)]/0.145 = 25 generations x 25 years is 625 years to a common
ancestor.

Since the both approaches gave the same figures, the dataset described the
first-order rate system, it has only one the most recent common ancestor,
and the count and calculations are correct. It is what I call an
"uncomplicated dataset".

Regarding calculations of the margin of error, which you could not do, I
will help you out. For the 95% confidence interval it is 625+/-70 years, for
the 69% confidence it is 625+/-35 years to a common ancestor. Such a small
margin of error is explained by a good statistics: 500 of 67-marker
haplotypes contain 33,500 alleles, having 1788 mutations in them.

End of the story.

Anatole Klyosov


This thread: