Archiver > GENEALOGY-DNA > 2010-02 > 1266765392

From: "Tom Gull" <>
Subject: Re: [DNA] : variance of S116*
Date: Sun, 21 Feb 2010 10:16:32 -0500
References: <>
In-Reply-To: <>

One caveat I would offer on these categorizations is one I've noted to the
project leaders and have thought was a type of bias since day 1 for this

Remember that one STR-based subset of U106 was originally labelled the
"Frisian cluster" and at the time was believed to represent the bulk of
U106. This label and concept stuck hard in many peoples' minds and remains
there to this day. In its intended sense, it does identify a valid U106 STR
cluster and samples (not many) included some in the coastal Denmark,
northwest Germany, and the northern Netherlands provinces including
Friesland. So the original narrow definition was OK but has created a bias
towards a broader definition. Despite actual low numbers of counts in
northern Netherlands and coastal Denmark and Germany, the quoted hot spot is
there. This is partly achieved, in my viewpoint, by taking points from what
is geographically self-described as SW Germany and central Germany and
moving those into the northern Germany bucket.

If you look at the U106 project map, you'll see a smattering of datapoints
in the areas I named above. Then as you move south into Belgium and along
the Rhine to SW Germany, the count goes up significantly. Belgium has more
points by itself than does the entire "greater Frisia" area I noted above,
as does SW Germany. Belgium and SW Germany have about the same count.

Somehow this translates in the project summary to an emphasis on old Frisia
still, with peaks elsewhere essentially recategorized northwards in the
groupings you mention below. I personally believe many of the datapoints
are incorrectly bucketed in geographic terms, but it's not my project and my
earlier suggestions to that effect didn't trigger any changes.

For your purposes, though, where you're trying to be very precise and
mathematical to produce regional estimates, I'd strongly suggest going back
to the maps themselves to see where the datapoints lie and whether you think
it's appropriate to redraw the boundaries from the project for your
analysis. I think the current bucketing has a bias that moves datapoints
from SW Germany into a northern Germany bucket, for example, significantly
changing the whole picture due to their high frequency. It may be that the
larger N-S countries need to be split into multiple buckets. I suspect most
people expect differences between some DNA patterns in northern France vs
southern France. So why not the same with northern Germany and central
Germany and SW Germany. And if you create those buckets, they should line up
with commonly accepted designators. Frankfurt is not "northern Germany".
Numerous places on the web including Wikipedia say this or something
similar: "The city is located on both sides of the River Main in the
south-west part of Germany". I mention this one in particular because
Frankfurt is a major hot spot for U106 - not surprising since per Wikipedia
there were 2.26 million people in its urban area in 2001.

Anyway, you might want to look into this a bit more.

/ Tom

From: "Tim Janzen" <>
Sent: Sunday, February 21, 2010 2:11 AM
To: <>
Cc: "'Alan R'" <>
Subject: Re: [DNA] : variance of S116*

> Dear Alan,
> I agree with your comments below. I had been thinking about looking
> at R-U106 in regards to the area of probable origin as well, but haven't
> had
> time to analyze the data until today. I downloaded the latest data from
> the
> R-U106 project at
> I
> then used the 305 67-marker haplotypes that had been SNP tested as they
> are
> currently categorized on the project's web site as follows:
> 1. Central Europe
> 2. NE Europe
> 3. NW Europe
> 4. Scandinavia
> 5. SW Europe
> 6. SE Europe
> 7. UK and Ireland

This thread: