Dear Sasson,
I think you bring up some good points below. One of the issues with
intraclade TMRCA estimates using the variance method is that a dominant
closely related group of haplotypes can "swamp" the data from one or more
distantly related haplotypes. If you have a group of say 100 haplotypes and
95 of them are from descendents of a common ancestor who lived 1400 years
ago and the other 5 are from a branch that shares a common ancestor who
lived 4000 years ago then the data from the 95 haplotypes will skew the
intraclade TMRCA estimate to be much lower than the true TMRCA if all 100
haplotypes are included in the same dataset. The right thing to do in this
situation would be to choose 5 or fewer random haplotypes from the group
that descends from a common ancestor who lived 1400 years ago and pair those
haplotypes with the other 5 are from a branch that shares a common ancestor
who lived 4000 years ago for an intraclade TMRCA estimate.
One way to spot clusters of closely related haplotypes is to check
their genetic distances. This can help give information about clusters of
haplotypes that might be skewing the TMRCA estimate lower. Analysis of the
underlying tree structure such as Anatole is doing is also another way to
get at this same information.
This whole issue will become less and less of a problem as we get
more complete Y chromosome sequences that help us better determine the Y SNP
structure in haplogroup J1 and other haplogroups. Then we can group these
haplotypes in smaller and smaller clusters.
I would be interested in taking a look at the J1 Cohen data if it is
available. There is quite a bit of J1 Cohen data on the haplogroup J
project at in
the subgroup J1e Cohanim, P58+, L147+, L222-, YCAII=22-22. Is that the data
that you used for your analysis?
Tim Janzen

Let me give a specific example: the collection of J1 Cohens. It was fed to
Generations2, returning 2300 years. Then it was examined according to
Anatole's method, and it was discovered that the set has a prominent
sub-set corresponding to a branch which is 1400 years old. The MRCA of
the rest is more like 4000. All numbers are approximate, but the situation
is precisely clear.

You see, all the extra-precision of Generation2 algorithm didn't help at

This is the famous Cohen problem, which is in the center of attention for
more than ten years, the last 15 years, of hobbyists and academics alike -
but it was ONLY possible to receive the right results by subjecting the data
to Anatole's method.


This thread: