GENEALOGY-DNA-L Archives

Archiver > GENEALOGY-DNA > 2006-01 > 1137201724


From: "William Hurst" <>
Subject: Comparison of Dr. Behar's Ashkenazi mtDNA paper K chart with my K survey from MitoSearch
Date: Fri, 13 Jan 2006 20:22:04 -0500


Hi all,

Dr. Doron Behar's promised paper on Ashkenazi Jewish mtDNA has now been released. Since it is concerned mostly with haplogroup K (along with N1b), I thought I should compare the paper with my recent survey of haplogroup K entries on MitoSearch.

The K chart in Behar is based on 121 full-sequence (coding and control regions) samples; my survey was based on 146 MitoSearch samples tested for control regions HVR1 and HVR2. My population was slightly larger, but his was 16 times as deep (HVR1/2 are 1,050 bases out of 16569 for the full mtDNA.) His is concentrated in the Mediterranean area, while MitoSearch is USA-centric. MitoSearch has many more unknown country of origin examples than Behar's populations. Apparently none of the samples from other studies Behar used were from the British Isles.

Behar says that K is defined by coding region mutation G9055A. I assume that FTDNA uses this marker to assign K haplogroups. I found one haplotype in K which has perfect HVR1/2 matches in haplogroup U5.

Behar has labeled 44 subhaplogroups. (In Figure 2 he says there are 91 branches.) I labeled only 33 subclades from 25 more examples. The reason for the difference is that he has defined subs when there was only one example. I only used singletons if there was a lower subclade.

In creating his chart, Behar says "The control-region positions were twofold down-weighted, with respect to coding-region positions." In my survey, the control region positions received 100% of the weight, since that's all that's on MitoSearch.

Behar excluded certain hypervariable nucleotides from his program to create the chart, including the insertions at 309, 315, and 524 in HVR2. I treated those just like any other mutation. As a result, I used 309 and 524 six times each in defining subclades. 315 was only used as a back mutation in K2d3 as mentioned below. Maybe I should go back and see what would happen if I followed his example. He also excluded 16182 and 16183, but I didn't find any K's with those mutations.

Behar uses 16093 about four times in his definitions, with about seven more appearances of it on his chart. I only used it to define K2a, following the much older practice from when only HVR1 mutations were used. It also appeared in two other subclades as a personal mutation.

Both of us used 146 in various places. His main use for it is to define K2, but it is also used in the definitions of K1c and K1b2. I used it to define K1, following the lead of John S. Walden who was following earlier studies. But I also used it to define K2b2. Maybe if this mutation were excluded as being hypervariable, the whole system would collapse.

In Behar about 19% of the samples were Ashkenazi; MitoSearch's were about 15%. Unlike Dr. Behar, I was not looking for Ashkenazi mtDNA in my survey. I did identify one subclade, K2d, with four lower subclades, as Ashkenazi based on Behar's 2004 paper and from postings on the DNA list by Ellen Coffman.

Behar's Ashkenazi subhaplogroups are K1a1b1a, K1a9 and K2a2a.

His K1a1b1a is defined by two coding region mutations, but includes the 16223, 16234, and 114 mutations. That is comparable to my K2d subclade, which I labeled as Ashkenazi.

His K1a9 is defined by 16524, as is my K2a2a. I found only two of these, one each from Hungary and Ukraine.

His K2a2a is defined by three coding region mutations, but includes 512C. That compares to my K1a2, which is concentrated in Eastern Europe origins. (The fact that both of us have an Ashkenazi subclade K2a2a is a coincidence; the two are not the same.)

Dr. Behar apparently didn't run across examples of what I called the oddest subclade, my K2d3, which is characterized by five additional HVR2 mutations and five back mutations. Several of them look Ashkenazi to me, although they barely look like K's.

Behar's main subhaplogroup K1 is defined by two coding regions mutations, while K2 is defined by 146. This doesn't compare at all with my K1/K2 division, since I didn't have access to coding region results.

Behar divides K1 into K1a, defined by 497 (comparable to my K2), K1b which is immediately split, and K1c defined by 146, 152, and 498- (deletion). His K1b does not compare to any of my subclades. His K1c is comparable to my K1 (146), K1a (152), K1a1 (498-) line. (John S. Walden's K chart was my source for this line.) Behar's use of those three mutations to define one subhaplogroup, while I used them to define three subclades in a row, probably reflects the greater number of examples of that line (mainly British) on MitoSearch.

His K1c2 adds 16320 and compares to my K1a1a, which contains my personal haplotype. This haplotype has been described by me and others as Scottish or British, but the three examples used by Behar (Table 3) are one from a Moroccan Jew and two from an unknown population from the Herrnstadt 2002 paper. My MitoSearch survey has 11 K1a1a's, mostly of British origin. Behar (or Herrnstadt) found one with 16320 with a reversion or back mutation at 498-. I found two of those, but one was probably a typo. In my study I had wondered if you could have a back mutation of a deletion - an undeletion? If it's good enough for Dr. Behar, it's good enough for me!

In summary, Dr. Behar's K chart is based on a greater geographic distribution than I had expected after seeing his presentation in Washington. However, I would still like to see more full-sequence haplotypes from the British Isles and other countries in Western Europe before the subclade designations are set in stone. But as for now, his chart is far better than what has been available before. Whether it will be used as the basis for a K subclade test by FTDNA or some other company is the next question. One slight problem with that is simply that some of us, such as me, can easily find our haplotype on the chart just by looking at the control region mutations. But those who have, say, the 16093 mutation, which appears 11 places on his chart, may have to have coding region mutations tested to find their place. Oh wait, I also have 16093! I have only scratched the surface in trying to find MitoSearch haplotypes on Behar's chart. Also, although the chart is based on 121 sequen!
ces, there are a total of 789 K full-sequences which should be available eventually.

As for my now-outdated three-week-old K subclade chart, I think it was very useful for me and hopefully for others. My method of dividing up the haplotypes just on the HVR1/2 results was able to find the three Ashkenaski K founders. True, I only labeled one "Ashkenazi," but the others were easily identified by the countries of origin and names mentioned on MitoSearch. I may have identified a fourth founder or at least a large group of probably Ashkenazi in that "oddest" subclade K2d3. Also, my survey has many British-origin haplotypes which are barely represented, if at all, on Behar's chart.

At the end of my survey I said I had no illusions that the subclade chart would be adopted as official or semi-official. Also that I would not be ordering a K1a1a pin. I'm not quite ready to order a K1c2 pin either.

Bill Hurst


This thread: