GENEALOGY-DNA-L Archives

Archiver > GENEALOGY-DNA > 2008-01 > 1199603466


From: "Tim Janzen" <>
Subject: Re: [DNA] Chances for Finding Clade-separating SNP
Date: Sat, 5 Jan 2008 23:11:06 -0800
In-Reply-To: <BAY103-DAV33FF4746492F82A6B7E2EC84E0@phx.gbl>


Dear Dick,
Yes, it is true that with the shotgun sequencing technique many
regions of the genome are oversampled. This is covered in slides 25 to 27
of the lecture at http://www.ibi.vu.nl/teaching/a4g/materials/lect9.pdf that
Vince mentioned on Jan. 1. The Wikipedia article at
http://en.wikipedia.org/wiki/Shotgun_sequencing that Vince referred to says
that in the HUGO Project that most of the genome was sequenced at 12x or
greater coverage (or 12 reads). Watson's DNA was sequenced an average of 6
reads per base per http://jimwatsonsequence.cshl.edu/about.html. The
following is a quote from that web site: "The average coverage is 6 reads
per base, but some regions are covered at greater depth and others at lower.
The scale goes from a coverage of zero to a coverage of over 10." This
would help explain why some of the SNPs were read as many as 30 times. If
the farthest right column in the Ron Scott's Excel file of Watson's Y SNPs
on his home page at
http://freepages.genealogy.rootsweb.com/~ncscotts/Y-DNA/Watson%20DNA/watson-
454-snp-v01-chrY.xls is the number of reads, then Watson's DNA was sequenced
with an average of only about 3.66 reads in the regions where those 1746 Y
SNPs were reported. The sum of that column is 6389 and when this is divided
by 1746 this yields a result of 3.66. Of Watson's Y SNPs, 429 SNPs were
reported based on only one read. Thus a sequencing error for a portion of
those SNPs is possible. Per the article on the HUGO Y chromosome sequence
by Skaletsky, et al, in Nature at
http://www.ncbi.nlm.nih.gov/pubmed/12815422, they reported on p. 826 that
they had a sequencing error rate of 1 per 100,000. If you read each base
more than once you can locate those sequencing errors. However, if you read
each base only once then you won't be able to tell which SNPs are real and
which are due to sequencing errors.
It wouldn't surprise me if there are more sequencing errors with
Ventor's Y SNPs than with Watson's SNPs since Ventor has 8634 SNPs and
Watson has "only" 1746. One would think that if both strands of Ventor's Y
chromosome were sequenced that this would reduce the overall error rate
since any discrepancies between the sequences obtained for both stands could
be reviewed and the error corrected.
Sincerely,
Tim

-----Original Message-----
From:
[mailto:] On Behalf Of RICHARD KENYON
Sent: Saturday, January 05, 2008 10:00 PM
To:
Subject: Re: [DNA] Chances for Finding Clade-separating SNP

Isn't it standard procedure to oversample when using any shotgun
sequencing technique? Also it is possible (as I understand it) that some of
the apparent SNPs are instead sequencing errors. Furthermore, all three
genome sequences have been subject to revisions from time to time.
I don't know the real significance of a statement made by Craig Venter.
Viz., that both strands of his DNA had been sequenced, while James Watson
had only sequenced one strand.



This thread: