GENEALOGY-DNA-L Archives

Archiver > GENEALOGY-DNA > 2010-02 > 1266432237


From: "Anatole Klyosov" <>
Subject: Re: [DNA] "counting mutations" versus "GD from the modal"
Date: Wed, 17 Feb 2010 13:44:01 -0500
References: <mailman.5327.1266413440.2099.genealogy-dna@rootsweb.com>


>From: Jonathan Day <>
>...as I read it, I think the problem is over the use of "first order".

No. The problem is that people do not want to consider concrete examples.
They do not want to take a pen and a sheet of paper and just once to divide
and multiply, to get an idea. They have a kind of a mental block. They
prefer to talk generalities endlessly. That is where the problem is.

I have already explained, and not once, that the first order reaction in
science (physical chemistry, chemical kinetics, radioactive decay, etc.,
etc.) is that which follows the simple formula: A -> P. This is all. In this
particular case A is a haplotype, P is a mutated haplotype. Or A is a locus,
P is a mutated locus.

Those rates can be summed up. That is why when you see the Chandler's table
with mutation rates in the first 12 loci as 0.00076, 0.00311, 0.00151, ...
0.00242, you can sum them up and get 0.02243 as a mutation rate for the
12-marker haplotype. I got a calibrated value of 0.022 mutation per
haplotype per generation, which is less then 2% different. If you divide
that 0.022 by 12, you will get 0.00183 as the average mutation rate per
marker per generation. This is a mutation rate constant for the first 12
markers in FTDNA format. It turned out that the same 0.00183 is the average
mutation rate constant for the first 25 markers as well. All "fast" and
"slow" markers are there.

That is what the first order kinetics means. You use it but you do not know
that it is called "first order kinetics". The same situation was described
my Moliere, whose character did not know that he speaks prose. It was a
truly discovery for him.

Another definition of the first order reaction is that they obey the rule
c=Ce^-kt, or, that is the same, [ln(C/c)]/k = t. That is how mutations occur
in haplotypes. If C is the total number of haplotypes in the system, k is
the mutation rate constant, t is a number of generations, and c is a number
of unmutated (base) haplotypes, than you have the necessary information how
to calculate the TMRCA (without even counting mutations). However, it works
good in two conditions: (a) you KNOW that the dataset is a the first order
one (you cannot use the formula when you do not know if it is of a first
order dataset), and (b) you have enough haplotypes to see several base
haplotypes in the dataset. Preferably no less than 10. Or at least 5.

In order to make sure that your dataset obeys the first order kinetics, you
calculate a TMRCA by counting mutations, and using a simple formula M/C/k =
t, where M is a total number of mutations in your dataset, counting from the
same base haplotype you have used in the logarithmic formula. If two "t"s,
that is the number of generations, match each other within a margin or error
range, you are all set. More than that, you got your TMRCA by two different
methods. In one you counted mutations, in another you counted base
haplotypes.

The problem which ruins good (otherwise) works in academia is that people do
not do it. They grab haplotypes and their mutations, count all mutations
indiscriminately, and voila. In reality, however, the system is not the
first order one, and does not follow A->P rule. Because of genetic drift, a
contribution by number of populations and different lineages into the
dataset, they in fact have A->P, B->D, C->F, etc. in the same dataset. In
other words, it is NOT a first order dataset, and you cannot take a
logarithm of that cocktail. As a result, there will be two quite different
TMRCAs when you use mutations (the "linear" method) and logarithms.

Here is the example. This serious mistake was made by Hammer et al when they
have considered the "Cohen Modal Haplotype". The same mistake was done by
Behar et al. The same mistake was done by Hammer, Zhivotovsky et al in the
recent paper in Human Genetics, which I have commented on, but they did not
get it.

Let's consider the old Behar's data (2003). They have published a list of
194 6-marker haplotypes, 91 of them had the CMH (Cohen Modal Haplotype).
Hammer, Behar et al claimed that the CMH has a common ancestor who lived
some 3200 years before present. If they would have had a slightest idea on
first-order rate kinetics and on how to treat haplotypes and mutations, they
would realize that 91 base haplotypes out of 194 total, that is almost half
of them, cannot possibly had the TMRCA of 3000+ years. Let's see:
ln(194/91)/0.0088 = 86 generations (I do not have a calculator in my hands
right now, so forgive me if it would be one or so generation off), or 2150
years before present. How about mutations? There were 263 mutations in those
194 haplotypes. 263/194/0.0088 = 154 generations. 3850 years to a common
ancestor. A complete mismatch, 86 vs. 154 generations. End of the story. One
cannot calculate that dataset. Now it is understood why they get that
phantom ancestor of 3200 years bp. It is somewhere in between of 2150 and
3850 years.

Why that has happened? It happened because they did not consider a haplotype
tree. They did not analyze haplotypes. They did not consider branches. In
fact, that population of the "CMH" contains two very distinct
sub-populations. One has a common ancestor of only 1,000 ybp, another of
4200 ybp.

What is the first-order reactions are about.

Regards,

Anatole Klyosov

***************************************************************
The need for a qualifier implies
that there is something with a different qualifier. Now, I'll agree with you
completely that there isn't anything that is obviously "second order"* in
haplotypes, with the understanding that the term wasn't invented or
exclusively used by genetic scientists and that other orders exist in all
kinds of sciences. You can't just ignore a qualifier simply because some
very specific context would (at that specific level of understanding) allow
you to infer it. It has meaning, so it has to be used.


>I hate to get into the middle of arguments, but as I read it, I think the
problem is over the use of "first order". The need for a qualifier implies
that there is something with a different qualifier. Now, I'll agree with you
completely that there isn't anything that is obviously "second order"* in
haplotypes, with the understanding that the term wasn't invented or
exclusively used by genetic scientists and that other orders exist in all
kinds of sciences. You can't just ignore a qualifier simply because some
very specific context would (at that specific level of understanding) allow
you to infer it. It has meaning, so it has to be used.



This thread: