GENEALOGY-DNA-L Archives
Archiver > GENEALOGY-DNA > 2004-07 > 1090577185
From: Gordon Hamilton <>
Subject: Re: [DNA] I1a and Distribution of DYS Values
Date: Fri, 23 Jul 2004 06:06:25 -0400
Recent discussions about the frequency of various repeat values at the
different DYS loci for the I1a haplogroup prompts me to attempt to post the
following to the list again. I have attempted to post this to the list
twice before in the past couple weeks and in neither case has it gone
through, nor does it appear in the archives.
In an earlier post
(http://archiver.rootsweb.com/th/read/GENEALOGY-DNA/2004-07/1089227170) I
presented data obtained from the Sorenson (SMGF) database on the
distribution of repeat values for the various DYS loci in a cohort of
haplogroup I1a individuals that were identified by the GU I1a haplotype. It
is obvious from an examination of the data that several of the loci used in
the GU I1a haplotype have, themselves, a fairly high dispersion of repeat
values. Consequently, in using the GU I1a haplotype for this analysis one
misses a considerable fraction of I1a individuals in the SMGF database.
The previous analysis identified 6 loci where one specific repeat value
represented more than 95% of the total for that locus in I1a individuals.
In an attempt to include a higher percentage of I1a individuals in a
similar analysis as before, a new minimal I1a haplotype was defined using
these 6 loci. This minimal haplotype (referred to as the GH I1a haplotype)
is DYS # (repeat frequency): 392 (11); 426 (11); 438 (10); 454 (11); 455
(8); GGAAT1B07 (11).
When the SMGF database (http://smgf.org:8081/pubgen/site28.jsp) is searched
using this GH I1a haplotype a total of 906 exact 6/6 matches were found in
a database containing 8,735 genotypes (as of 12 July 2004). The
distributions of repeat values at each of the other SMGF loci were then
determined by using this GH I1a haplotype, plugging in various values for
the repeats at each locus, and recording the number of exact 7/7 matches.
As indicated by the results given below, the distributions do not change
much (on a percentage basis) from those obtained with the GU I1a haplotype.
This gives one confidence that virtually all of the records being examined
in this analysis are in fact records of I1a individuals. Since, using the
GH I1a haplotype, one gains access to 906 such records versus only 475 with
the GU I1a haplotype, it is clear that a much higher percentage of I1a
individuals in the SMGF database is being sampled with the GH I1a haplotype.
In each of the following the number of repeats for each DYS locus is given
followed by the frequency of an exact match in parentheses. In a few cases
the frequencies do not add up to exactly 906 presumably because of some
outliers.
DYS 385a,b: 11,13 (2); 11,14 (3); 12,12 (1); 12, 13 (3); 12,14 (27); 12,15
(1); 13,13 (58); 13,14 (370); 13,15 (64); 13,16 (21); 13,17 (2); 14,14
(219); 14,15 (102); 14,16 (4); 14,17 (1); 15,15 (21); 15,16 (4); 15,17 (1).
DYS 388: 12 (4); 13 (14); 14 (812); 15 (37); 16 (39).
DYS 389i: 11 (9); 12 (817); 13 (74); 14 (6).
DYS 389i,ii: 11,26 (2); 11,27 (5); 11,28 (0); 11,29 (2); 12,26 (1); 12,27
(17); 12,28 (673); 12,29 (114); 12,30 (11); 12,31 (1); 13,28 (0); 13,29
(56); 13,30 (16); 13,31 (2); 14,30 (5); 14,31 (1).
DYS 389ii - 389i (calculated from above): 14 (1); 15 (19); 16 (739); 17
(131); 18 (15); 19 (1).
DYS 390: 21 (8); 22 (556); 23 (311); 24 (30); 25 (1).
DYS 391: 8 (0); 9 (13); 10 (807); 11 (83); 12 (2).
DYS 393: 11 (1); 12 (19); 13 (797); 15 (76); 15 (13).
DYS 19/394 (defined as FTDNA defines it): 13 (11); 14 (727); 15 (147); 16
(15); 17 (6).
DYS 426: 9 (0); 10 (2); 11 (469); 12 (3); 13 (0).
DYS 437: 15 (45); 16 (838); 17 (23).
DYS 439: 9 (1); 10 (29); 11 (681); 12 (161); 13 (30); 14 (4).
DYS 447: 20 (1); 21 (16); 22 (227); 23 (551); 24 (103); 25 (5); 26 (0).
DYS 448 (defined as FTDNA defines it): 18 (4); 19 (56); 20 (785); 21 (58);
22 (3).
DYS 449: 24 (1); 25 (8); 26 (47); 27 (68); 28 (383); 29 (275); 30 (89); 31
(30); 32 (3); 33 (2).
DYS 458: 13 (5); 14 (108); 15 (575); 16 (178); 17 (34); 18 (6).
DYS 459a,b: 7,8 (1); 7,9 (29); 7,10 (1); 8,8 (22); 8,9 (807); 8,10 (15);
8,11 (0); 9,9 (29); 9,10 (1).
DYS 460: 9 (32); 10 (679); 11 (188); 12 (7).
DYS 461: 9 (3); 10 (126); 11 (702); 12 (73); 13 (2).
DYS 462: 11 (7); 12 (646); 13 (245); 14 (8).
YCAIIa,b: 15,21 (1); 17,21 (2); 18,21 (10); 19,19 (12); 19,20 (15); 19,21
(818); 19,22 (17); 19,23 (2); 20,20 (1); 20,21 (8); 21,21 (19).
Y-GATA-A10: 9 (1); 10 (1); 11 (7); 12 (118); 13 (663); 14 (91); 15 (25).
Y-GATA-C4: 20 (15); 21 (349); 22 (381); 23 (118); 24 (32); 25 (11).
Y-GATA-H4 (defined as SMGF defines it): 9 (1); 10 (96); 11 (750); 12 (55);
13 (4).
One can also get a measure of how much the repeat frequencies for the DYS
loci in the GH I1a haplotype vary by keeping the repeat value for 5 of the
loci constant at their GH I1a value, changing the value of the sixth, and
looking for 6/6 exact matches. This gives the following:
DYS 392: 9 (0); 10 (4); 11 (906); 12 (23); 13 (5); 14 (0).
DYS 426: 9 (0); 10 (17); 11 (906); 12 (4).
DYS 438: 8 (3); 9 (13); 10 (906); 11 (11); 12 (0).
DYS 454: 9 (2); 10 (8); 11 (906); 12 (15); 13 (0).
GGAAT1B07: 9 (2); 10 (24); 11 (906); 12 (25); 13 (0).
DYS 455: 8 (906); the SMGF database would not illustrate any matches or
mismatches when other values were put in for this locus with the rest of
the GH I1a haplotype.
As indicated above, these distributions are similar to those previously
reported using the GU I1a haplotype. However, these new data obtained using
the GH I1a haplotype encompass almost twice as many I1a records and
probably greater than 80% (as estimated from the small amount of dispersion
in the repeat numbers for the loci used in the GH I1a haplotype) of all the
I1a records in SMGF.
As others have pointed out and as was confirmed in the previous analysis
using the GU I1a haplotype, 8 repeats at the DYS 455 locus is
characteristic of the I1a haplogroup. Repeat values of 19,21 at YCAIIa,b
has frequently been mentioned as characteristic of the I1a haplogroup but,
as the foregoing results indicate, only 90 % of I1a individuals have those
repeat values. Repeat values of 14 or 15 at DYS 388 and 10 at DYS 391 have
also been mentioned as characteristic of I1a, but again these represent
only about 90% of the total at those loci. The foregoing analysis suggests
that better loci to focus on as characteristic of the I1a haplotype would
be those used to define the GH I1a haplotype. With the exception of
GGAAT1B07 (where the repeat value of 11 is present only 95 % of the time),
the other five loci of the GH I1a haplotype have only one repeat value 97
or greater % of the time. These are: DYS 392 = 11, DYS 426 = 11, DYS 438 =
10, DYS 454 = 11, and DYS 454 = 8. A haplotype with these five values at
their respective loci would then almost certainly be I1a.
Gordon Hamilton
This thread:
| Re: [DNA] I1a and Distribution of DYS Values by Gordon Hamilton <> |