GENEALOGY-DNA-L Archives
Archiver > GENEALOGY-DNA > 2009-04 > 1238915397
From: "Alister John Marsh" <>
Subject: Re: [DNA] Comparison of Chinese, Yoruba,Watson and Venter genome y-snps
Date: Sun, 5 Apr 2009 19:09:57 +1200
References: <200904050603.n3563KVx008666@mail.rootsweb.com>
In-Reply-To: <200904050603.n3563KVx008666@mail.rootsweb.com>
Tim,
Thanks for taking the time to give this breakdown. Very interesting.
Do you think we have got enough information yet from these SNP distributions
to hazard a refined assessment of the age of any R1b subclade nodes? At
least in a proportional sense since Y-DNA Adam.
Of the SNPs which you seem to think were reliably sequenced, approximately
one SNP per 4.5 generations seems some sort of measuring stick. How long
ago would that place the R haplogroup Ventor and Hugo tree node?
It is puzzling that Watson for different regions consistently has many less
SNPs than Ventor compared to Hugo, when Watson and Ventor are apparently the
same R1b subclade downstream of Hugo. Perhaps for most regions, it is safer
to compare Chinese and Yoruba to Ventor, rather than to Watson?
John.
-----Original Message-----
From:
[mailto:] On Behalf Of Tim Janzen
Sent: Sunday, April 05, 2009 6:04 PM
To:
Subject: Re: [DNA] Comparison of Chinese, Yoruba,Watson and Venter genome
y-snps
Dear James and others,
I decided to compare the Watson, Ventor, Chinese, and Yoruba Y
chromosome SNPs to the HUGO Reference Sequence based on the number of
mutations per each section of the Y chromosome. I also categorized the 438
Y SNPs that the Chinese and the Yoruba sequences share by their position on
the Y chromosome. This information is found in the last column under the
heading "Ch. & Yo. Shared". Below is a summary of this information:
Base pair position # of Watson SNPs Ventor Chinese Yoruba Ch. & Yo. shared
2-3 million: 6 101 15 24 7
3-4 million: 171 1787 59 82 13
4-5 million: 34 2152 28 40 4
5-6 million: 99 1578 37 38 3
6-7 million: 37 968 41 39 8
7-8 million: 4 21 65 83 29
8-9 million: 6 19 71 105 35
9-10 million: 0 14 19 16 4
10-11 million: 277 209 151 229 31
11-12 million: 258 159 214 108 17
12-13 million: 38 213 87 128 28
13-14 million: 15 75 65 81 25
14-15 million: 6 21 63 81 20
15-16 million: 4 23 55 99 30
16-17 million: 8 35 79 90 34
17-18 million: 10 26 75 96 35
18-19 million: 0 56 22 2 1
19-20 million: 13 53 61 62 22
20-21 million: 14 604 155 114 32
21-22 million: 8 32 64 90 25
22-23 million: 3 95 45 78 18
23-24 million: 0 127 1 0 0
24-25 million: 0 87 0 0 0
25-26 million: 0 11 0 1 0
26-27 million: 9 37 7 9 1
27-27.3 million: 42 31 25 39 12
28-57 million: 0 for all (apparently not sequenced since it is likely
heterochromatin)
57-58 million: 684 97 29 10 4
There are a number of points that can be drawn from the data above:
1. Most of the errors in the Y SNP data for Watson are likely to be in the
3-4 million, 10-12 million, and 57-58 million positions on the Y chromosome.
2. Most of the errors in the Y SNP data for Ventor are likely to be in the
3-7 million positions on the Y chromosome. This region is predominantly an
X-transposed region. This would suggest that the Y SNPs in Ventor sequence
in the 3-7 million position region may have been incorrectly compared to an
X chromosome sequence. At the very least the Ventor Y chromosome sequence
was seemingly done in a sloppy fashion, which appears to have resulted in
about 10 times as many reported Y SNPs as there likely are in reality.
3. If we assume that the Chinese and Yoruba male sequences are generally
much more accurate that the Watson and Ventor sequences, then it seems
reasonable to assume that the regions where there are a lot of SNPs will
likely have the most SNPs as other people's DNA is sequenced as well. Y
SNPs seem to be distributed relatively uniformly between positions 7 million
and 23 million in general with some relative dead zones between positions
9-10 million, 18-19 million, and 23-27 million. The dead zone between
positions 18-19 million can be readily explained by the fact that this
section of the Y is where the P5 palindrome is located. The dead zone
between positions 23-27 million can be readily explained by the fact that
this section of the Y is where the P1 palindrome is located. Most of the
438 SNPs shared by the Yoruba male and the Chinese male that don't yet have
rs numbers assigned to them are between positions 10.5 million and 12.4
million.
4. For many of the most stable sections of Y chromosome we see that the
number of Y SNPs in the Yoruba male is about 25% more than the number of Y
SNPs in the Chinese male. This is reasonable since the R-E node (probably
M168) is probably about 15,000 years further back in time than the R-0 node
(NOP, rs2033003). This is assuming that the M168 SNP occurred ca 55,000
years ago and that the rs2033003 SNP occurred ca 40,000 years ago. Some
exceptions to the general trend that the number of Y SNPs in the Yoruba male
has about 25% more than the number of Y SNPs in the Chinese male per each
million base pair segment are the 11-12 million and 20-21 million sections.
Possibly there are some sequencing errors in the Chinese male for these
segments. In any case, the fact that there are about 25% more Y SNPs in the
Chinese male for most of the 1 million base pair segments suggests that the
quality of the HUGO Reference Sequence, the Chinese male sequence, and the
Yoruba male sequence in these segments is relatively high.
5. If we exclude the SNPs that are between positions 12933864 and 13714104
that are probably primarily from a male from haplogroup G, this leaves us
with 411 SNPs that should have occurred between the R-L20 (S144) male who
was used for the majority of the HUGO reference sequence back to the first
M-168 male and then down Y SNP tree to the haplogroup E-0 node, which may be
the same spot on the Y SNP tree as the first M-168 male. If we assume that
the M-168 SNP occurred about 55,000 years ago, this would mean that the 411
SNPs would have occurred over a period of about 55,000. This would be about
one SNP every 134 years or about one SNP every 4.5 generations, assuming 30
years per generation. It is also quite possible that a number of these 411
SNPs are sequencing errors in the HUGO Reference Sequence. The Wikipedia
article at http://en.wikipedia.org/wiki/Shotgun_sequencing says that most of
the HUGO sequence was sequenced at 12X or greater coverage, so hopefully
there are relatively few sequencing errors in the HUGO Y chromosome
sequence.
Sincerely,
Tim Janzen
-------------------------------
To unsubscribe from the list, please send an email to
with the word 'unsubscribe' without the
quotes in the subject and the body of the message
No virus found in this incoming message.
Checked by AVG.
Version: 7.5.557 / Virus Database: 270.11.41/2041 - Release Date: 4/4/2009
4:53 PM
No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.557 / Virus Database: 270.11.41/2041 - Release Date: 4/4/2009
4:53 PM
This thread:
| Re: [DNA] Comparison of Chinese, Yoruba,Watson and Venter genome y-snps by "Alister John Marsh" <> |