GENEALOGY-DNA-L Archives

Archiver > GENEALOGY-DNA > 2008-02 > 1202234949


From: "Ken Nordtvedt" <>
Subject: [DNA] The Story of I1b1 (P37.2+)
Date: Tue, 5 Feb 2008 11:09:09 -0700


Over 20,000 Years of I1b1 (P37.2+) Haplogroup

Although I'm personally double I1a, the history of y-haplogroup I1b1 is more interesting, so I'll first give the P37.2+ story as best can be told with present data. The warpedfounderstree file and spreadsheet of founder haplotypes file at http://knordtvedt.home.bresnan.net will be helpful in following this story. I1b1 (as called by the ISOGG) is a major haplogroup of y-haplogroup "I" and defined by the SNP P37.2+. It's still called I1b by FTDNA, but that will soon change. It has four major clades --- Dinaric, Western, Isles, Sardinian --- representing four greatly separated clusters of extended haplotypes, each found with a clear difference of geographical distribution in Europe. The clades have no present defining SNPs except for Sardinian, and this story really depends in no significant way on presence or absence of useful SNPs. But I have no doubt at all that each clade could have its own defining SNP, or multiples of them, if focused searches were made for them. Although the Isles clade of I1b1 by itself is already divided into four sub-clades, I combine them as a single clade for purposes of this story; and the distances (generational times) between the clades within I1b1-Isles are short compared to the other inter-clade generational distances as you will see if you view the warpedfounderstree at my website.

Using 52 of the slowest mutating markers available in our public databases, I established the modal haplotypes for each of the four clades of I1b1. These are our best determinations for the original haplotypes of the respective founders (MRCAs) for the presently assembled clade populations. The sum of real, measured mutation rates for my 52 markers used in these haplotypes is about 1/16. That means that on average each unit of genetic distance (GD) between haplotypes represents 16 generations of branch length between the founders represented by the pair of compared founding haplotypes. From my matrix of GDs, adjusted for back mutations, between all clade founders in y haplogroup I, we therefore have estimates for all the branch lengths between the four clade founders. These six lengths are::

Western <-> Dinaric = 1152 generations
Western <-> Isles = 880 generations
Western <-> Sardinian = 816 generations
Dinaric <-> Isles = 528 generations
Dinaric <-> Sardinian = 848 generations
Isles <-> Sardinian = 784 generations

These four individual males each lived a specific number of generations ago. The task is to construct a tree of descent which begins with a P37.2+ common ancestor to all I1b1 clades and whose total tree branch distances between each pair of clade MRCAs are as true as possible to the six generational distances given above which come from our observations. Such a tree pins down the relative times of existence for those four founders. This has been done and is shown as the red portion of the entire y haplogroup I tree given in file warpedfounderstree (in two formats) at my website. The arrowheads represent the MRCAs (founders) for the respective clade populations seen today.

Note that the MRCA of all four I1b1 clades existed over 17,000 years earlier than the I1b1-Dinaric clade MRCA. This latter founder existed perhaps only 4000 years ago. So the overall P37.2+ MRCA existed prior to the European climate's fall into its last glacial maximum (LGM). And the branch line which eventually leads to the Sardinian clade MRCA leaves the rest of the P37.2+ tree quite close in time to the initial breakup of I1b1 as well. The branch line which leads to the I1b1-Isles founder leaves the branch line leading to I1b1-Dinaric founder at an appreciably later time, however. This is an inferred consequence of the smaller GD between the Dinaric and Isles founders that we observe.

Brief Resume of Each Clade of I1b1 (P37.2+)

Dinaric. This is the original clade of P37.2+ discussed in the literature. It is mainly found in Eastern Europe with frequency peak in Bosnia and Croatia, near the Dinaric Alps. Its frequency falls rapidly as one moves into northeast Italy or into Germanic lands. The I1b1-Dinaric haplotype population looks remarkably young; the full implications of this youth for the ancient migratory history of Eastern Europe are yet to be fully understood in my view. Dinaric I1b1 is the most populous clade of P37.2+ in Europe.

Western I1b1 is located more to the northwest in Germany, but appreciable amounts of it are found in the British Isles as well.

Isles I1b1 is almost exclusively found in the British Isles and especially Ireland. Its haplotype population shows much diversity which with its absence on the mainland suggests the clade arrived or was founded in the Isles very early in the post-glacial repopulation of that region.

Sardinian I1b1a accounts for about a third of Sardinian ydna, but it is also found at decent frequencies in regions of Italy and Iberia. It is also scattered up the Atlantic seaboard of Europe and into the British Isles. This pattern suggests it moved north in the same demographic movements that brought Atlantic R1b1c to northwest Europe. SNP M26+ defines this subhaplogroup of I1b1, but its extremely unique YCAIIa,b motif makes an SNP unnecessary for its identification.

It is interesting that only the tiniest trace of all four of these clades of I1b1 are found in Scandinavia. This seems a useful clue to me in eventually sorting out the movements of y-haplogroup I during its participation in repopulating Europe post-LGM.

What happened over the 550 generation branch line that goes from the MRCA for all of I1b1 to the I1b1-Western founder (MRCA)? Certainly 550 generations of father to son transitions did not go by with just a single son being born each generation. But over that long span of generations, all second and third son descendant lines went extinct. This can not be known to be exactly true, but is rather what is seen to fractional accuracy of a part in several thousand --- the number of y haplogroup I haplotypes which have been examined by population studies up until now. All P37.2+ haplotypes found so far fall into one of the four clusters here discussed. That is not to say that some outlier haplotypes won't be found in the future in larger databases showing additional small clades of P37.2+, but any such additional clades will be demographically marginal.

But over these 550 generations the line's haplotype handed down father to son each generation slowly changed by the random mutations which occurred on its 52 STR markers. So the end result --- the founding (modal) haplotype of the I1b1-Western MRCA --- is quite different from the original P37.2+ MRCA haplotype of 20,000 years ago. Additionally, that haplotype moving through those 550 generations accumulates unique SNP mutations which today would be found in I1b1-Western haplotypes but not in any of other I1b1 clade haplotypes. If we assume the suggested rate of y chromosome SNP mutations of about one for each father-son transition, there should be about 500 SNPs which each could define the I1b1-Western clade. A dedicated search for an I1b1-Western clade SNP which covered at least 1/5 of one percent of the y chromosome should stand a good chance of finding one of these SNPs. Where along this branch line of 550 generations would that SNP have occurred? We will never be able to say; it has about an equal chance to be anywhere along that branch. We can infer the times when contemporary population MRCAs lived in the past; we can infer the times when branch points in the phylogenetic tree took place, but SNPs can only be placed as having taken place somewhere on a branch line. It is unfortunate that so much academic literature puts estimated dates on occurrences of SNP mutations, because that then requires some translating or restating to arrive at what we know from the data. Fundamentally, we date clade populations MRCAs.

You will notice a dashed line in the warpedfounderstree file called "present" lying somewhat to the right from the clade MRCA arrowheads. Location of the "present" relative to the tree of MRCAs was done by determining the variances of several of the y haplogroup I clade haplotype populations. These variances can be converted into estimates of the times between the clade founders and the present. I generally find such times to be younger than much of the literature does because I use the actual measured marker mutation rates rather than fictitious "effective" rates as seen in many papers.



This thread: