GENEALOGY-DNA-L ArchivesArchiver > GENEALOGY-DNA > 2012-02 > 1329916473
Subject: Re: [DNA] Clades versus clusters
Date: Wed, 22 Feb 2012 13:14:33 +0000 (UTC)
>From: Obed W Odom < >
>In the recent controversy, notably between Diana and Ken, about whether
to use the term "clade" or "cluster", I think both are probably right,
but Diana, using SNP's to define clades, is on a firmer footing than
Ken, using clusters of STRs to define clades. If a clade is defined as a
common ancestor and all of his descendants, then I think a shared SNP
(in context) is a more certain indicator of a common ancestor than is
membership in an STR cluster (...).
The problem is, as I see it, much deeper then how to call the "populations", clade or cluster, or whatever. It is, as you rightly have noticed, depends on how one defines it and how one actually composes a haplotype dataset, in practical terms. A wrong dataset contains haplotypes belonging to different DNA-lineages, and gives a phantom "the most recent common ancestor". In other words, it boils down to a definition of a common ancestor of those "formations", or "populations", be it "haplogroup", "subclade", or "branch".
There is no, for example, the term "clade" in my vocabulary, since it is too loosely defined and typically wrongly composed from haplotypes (see below). Here is my explanations:
"Haplogroup", by definition, has a common ancestor, who was the first one carrying a certain SNP. That SNP is strictly defined, can be determined, and the respective population can be clearly identified. NOTE: I am not talking here on those cases which are controversial due to lab errors, unstable SNPs, etc.
"Subclade" by definition also has a common ancestor within the haplogroup. The rest is the same as defined above.
The next subdivision in my vocabulary is "branch" (as a part of the haplotype tree). It is a population identified (usually) by a computer program and based on a distinct "structural similarity" of their haplotypes. The branch has its core, or base haplotype, which is the likely ancestral haplotype. In other words, branch is a "cloud" of haplotypes around the base haplotype, and presumably all coming from the same common ancestor.
A subclade very often has several branches, each with its "local" common ancestor. An "overall common ancestor" of the subclade is (rather, was) a common ancestor of all "local" common ancestors of the branches.
Sometimes a branch on a haplotype tree nicely corresponds to a single subclade. More often it does not. An example - haplogroup R1a contains more than 30 subclades(each is defined by its own SNP), and most of the subclades contain several branches. Some of the branches are identified by both SNP and its distinct branch on a haplotype tree, some of the branches do not have - yet - their own SNP. For example, SNP M458 includes two main branches, one "West Slavic branch", with SNP L260, which is a single distinct branch on the haplotype tree (that is, the branch and the subclade are identical by haplotypes), and the "Central European branch" (which in turn consists of two distinct sub-branches), which does not have its SNP as yet.
I have to repeat again, that branches are separated by a computer program. It is a typical mistake to separate them manually, since it is a "cloud", since many alleles are mutated. The "clouds" (branches) often come close to each other om the tree, and their manual separation is peppered by errors, misassignments.
People often send me "clades" as they believe those haplotypes are. They are ALWAYS more narrow then they should be. The manual "separation" always sort out more mutated haplotypes compared with those which are close to the base haplotype. Therefore, the respective TMRCAs are always "younger" than they should be.
Unfortunately the practice of those "clades" continues. "Clades" in my vocabulary are those formations, which are "cherry-picked", separated by hand. Therefore "clades" are typically described by some typical (for the "clade") alleles, such as DYS391=10 and DYS439=11. It is typically a wrong approach, because the respective branches have also DYS391=11, and DYS439=10, etc. It is a "cloud". Personally, I reject those "clades", when they are based on allele of certain strict values. They should always be examined and verified by objective methods, not by "naked eye" simplified (and commonly wrong) approach.
The same goes with "clusters". If they are separated by a computer, they have the same meaning as the "branches". If they are separated by hand, they have the same meaning as "clades", and typically are inaccurate.
From Obed W Odom, cont. (from the above):
(...) However, to some degree of accuracy both
define clades, and the clade defined by the cluster of STRs would be a
nested clade (or subclade) within the clade defined by an SNP. For
example, in the case of haplogroup I1, I1-AS2 (if it is not just the
remnant maintaining the clade founder's STR pattern) would be a nested
clade within the clade defined by SNPs Z138 and Z139. Then I, as an
I1-AS-generic who is also Z138+ and Z139+, would be a member of the
clade defined by these SNPs but not of the nested clade defined by the
A diagram drawn by personnel at UC Berkeley, which may add clarity to
the meaning of clade, can be seen at the following link: