Archiver > GENEALOGY-DNA > 2010-02 > 1266761317

From: "Anatole Klyosov" <>
Subject: Re: [DNA] : variance of S116*
Date: Sun, 21 Feb 2010 09:08:39 -0500
References: <>

>From: "Tim Janzen" <>
>I had been thinking about looking at R-U106 in regards to the area of
>probable origin (...)
1. Central Europe
2. NE Europe
3. NW Europe
4. Scandinavia
5. SW Europe
6. SE Europe
7. UK and Ireland
>21 67-marker samples from Central Europe: 50 markers: 2795
>24 67-marker samples from NE Europe: 50 markers: 3113
>91 67-marker samples from NW Europe: 50 markers: 3315
>23 67-marker samples from Scandinavia: 50 markers: 2850
>10 67-marker samples from SE Europe: 50 markers: 3047
>7 67-marker samples from SW Europe: 50 markers: 2195
>174 67-marker samples from UK and Ireland: 50 markers: 3144
>I think that the 50 marker results are the ones that you should pay the
>most attention... have more statistical significance.
>In summary, we are also seeing a north/south division in the R-U106* data
>with relatively little variance in haplotypes from SW Europe and from
>We are seeing the highest variance for R-U106* on continental Europe in NE
>Europe and NW Europe with somewhat less variance in Central Europe.
>This would suggest that if R-U106 originated somewhere in northern Europe
>that it reached Ireland and Scotland relatively quickly after R-U106 first
>Overall, the data for R-U106*, R-P312/S116*, and R-U152* would suggest that
>they came west through Europe via a route north of the Alps rather than via
>a route south of the Alps.

Dear Tim,

Leaving aside my estimates which showed the "age" of U106 as 4175+/-430
years (those are calculations using 7100 alleles), let's focus on your
relative "ages" across Europe. Frankly, I do not see a real difference
between different regions, plus the highest value in NW is hard to explain
when everything around is lower. However, the real problem is in statistics.
You do not give the margin of errors, and with those values all your figures
would be largely overlapping.

One thing, you do not need to calculate "ages in years", it would be enough
to present just a number of mutations in chosen haplotypes (or whatever
primary unit you have used for calculations). By calculating in years, you
introduce an uncertainty with precision of the values of the mutation rate
constants. However, if to reduce the values to just a number of mutations
(or whatever) in haplotypes, they still are subjected to statistical
uncertainties. I can figure out that in your series of 10-20 haplotypes
(one, larger group) and 100-200 haplotypes (another group of just two
regions) a number of mutations per a series was 100-200 and 1000-2000
mutations, respectively. This alone would result in plus-minus 20%-14% and
6%-4.5% margins of error in mutation counts, respectively. In other words,
even if to forget about mutation rates and years, mutation counts themselves
would bring you 3000+/-600 to 3000+/-420 years uncertainties for the first
group of series (with 10-20 haplotypes in them). This will virtually nullify
your conclusions regarding Central Europe, NE Europe, Scandinavia, SE
Europe, and SW Europe.

For more extended haplotype series in NW Europe and UK and Ireland (91 and
174 haplotypes, respectively) this would leave you with the margin of error
in mutation counting (or whatever was your measure) of about 3000+/-200
years. This again would effectively cover your 3315 and 3144 difference
there, as well with all other regions in Europe, except SW Europe (2195),
which with its 7 haplotypes and +/-600 years would be rather uncertain as
well (particularly because your next trial gave there 2588 years).

In a nutshell, the data are too uncertain to make any conclusions on
directions of the movements. Statistics is not there.


Anatole Klyosov

This thread: