GENEALOGY-DNA-L ArchivesArchiver > GENEALOGY-DNA > 2009-04 > 1238866813
From: "Tim Janzen" <>
Subject: Re: [DNA] Comparison of Chinese, Yoruba,Watson and Venter genome y-snps
Date: Sat, 4 Apr 2009 10:41:01 -0700
Thanks so much for your work in pulling all of this data together,
in particular for extracting the Y SNPs from the Yoruba male and the Chinese
male. We had some extensive discussions about the Watson and Ventor SNPs
and the HUGO reference sequence on this list in Dec 2007 and Jan 2008. I
don't think that any of Ventor's sequence is found in the HUGO Reference
Sequence. I tried to summarize some of the key points of that discussion in
59. As I mentioned in that message the HUGO Reference Sequence is
predominantly from one R-M269 male with the exception of most of the AZFa
region (positions 12933864-13681909)(which comes at least partly from a male
in haplogroup G). You came to that conclusion as well. Also see these
67, among others from the Dec 2007 and Jan 2008 time period.
Here are some other statistics to ponder:
Number of SNPs in the Watson sequence: 1747 per Ron Scott's extraction at
Number of SNPs in the Ventor sequence: 6679 heterozygous SNPs and 933
homozygous SNPs per Gareth Henson's file at
Number of SNPs in the Yoruba sequence: 1745
Number of SNPs in the Chinese sequence: 1533
I am not surprised that you found that Watson's sequence is missing
a lot of SNPs that it should have had. It also appears that there are a lot
of SNPs in the Ventor sequence that are likely simply sequencing errors.
Both the Watson and Ventor sequences seem to have a lot of errors. See this
paragraph from this message I posted in Jan 2008 at
89: "This whole situation also suggests that a lot of the reputed SNPs in
Ventor's and Watson's Y chromosome sequences are actually sequencing errors
and not true SNPs or we would see far more than 132 SNPs present between the
contemporary HUGO male and the Watson/Ventor/HUGO MRCA or between the
Watson/Ventor MRCA and the Watson/Ventor/HUGO MRCA. Watson's SNP file on
Ron Scott's web site shows 1746 SNPs and Ventor's SNP file shows 8634 SNPs.
Many of Ventor's SNPs in particular should be considered suspect."
At some point in the near future the genetic genealogy community
will need to decide on a single person's Y chromosome sequence to have as
the reference standard. The HUGO sequence as it currently stands is
unacceptable because of the fact that it comes from at least two people and
includes a section that comes from a G2a3b1a (S131) male will make it
inappropriate for use in the long run since the SNPs between positions
12933864 and 13714104 will always be causing confusion for people.
Ideally the DNA sample from the R-L20 (S144) male who was used for
the majority of the HUGO reference sequence would be used again at some
point in the future and that sample's Y chromosome would be re-sequenced,
particularly for the section between positions 12933864 and 13714104. If
that person's sample is unavailable, then essentially any R-L20 (S144)
male's sample could be used for sequencing, but preferably one closely
related to the R-L20 (S144) male who was used for the majority of the HUGO
I would be interested in knowing how you created the list of Y SNPs
for the Chinese male. The Y chromosome sequence for this person is at
http://yh.genomics.org.cn/. Could you outline the steps you took to create
the list of Y SNps for this person as found in your spreadsheet?
[mailto:] On Behalf Of James Heald
Sent: Friday, April 03, 2009 9:45 AM
Subject: [DNA] Comparison of Chinese, Yoruba, Watson and Venter genome
I have uploaded a couple of spreadsheet files,
comparing the y-SNPs reported by the Chinese, Yoruba, Watson and Venter
genome projects to each other, and to the list of SNPs with more-or-less
ascertained positions from ISOGG's haplogroup tree from ISOGG, Adriano
Squecco's 23&Me spreadsheet, and the HapMap data.
* The reference sequence ("Hugo") is a composite from several
individuals, and its genotype varies from base to base. I would very
much like to know if anybody can shed any further light on this,
particularly if any particular stretches are known to all be from the
From the data, it appears that Hugo is mostly R1b.
* At least some of Hugo is from R1b1b2a1a2d3a (S144)
* There is also some from G2a3b1a (S131), especially between circa
12942936 and 13714104
* It is possible that there may be other R1b individuals whose data also
went into the composite - eg perhaps Venter himself? Does anyone know
more about this?
|Re: [DNA] Comparison of Chinese, Yoruba,Watson and Venter genome y-snps by "Tim Janzen" <>|