GENEALOGY-DNA-L Archives

Archiver > GENEALOGY-DNA > 2010-02 > 1265300539


From: "Ron" <>
Subject: Re: [DNA] FTDNA admits to errors in many mtDNA sequences
Date: Thu, 4 Feb 2010 10:22:19 -0600


List,

On Thu, 04 Feb 2010 03:40:58 -0600, Thomas, the chief lab specialist at the Houston lab, wrote this:

> It is important that you need to distinguish "erroneous" from reporting
> in non- standard format.
> The sequences didn't have errors (since FTDNA also supplied the correct
> FASTA sequence files), but they were just reported in a way that doesn't
> comply the current nomenclature.

As Ian Logan, Mannis van Oven (the mtDNA specialist at PhyloTree), and any other candid mtDNA researcher know, the problem at FTDNA appears to be one of not recognizing the magnitude of the erroneous reporting of the micro-STR (CACACACACA ...) that begins at np 514 (CRS), of which was corrected in late October 2009. As Thomas and Eileen at FTDNA should know, it is not a nomenclature issue that we customers are discussing and with what a few of us are disappointed. We were never told that the FASTA sequences were corrupt and needed to be changed. The problem, in a nutshell was disguised, and continues to be, as we see in Thomas' statement above. It is the duplicitous and contradictory language that we see above, the "skirting of the real issue." There was a software problem that went unrecognized for quite a long time, of which reported 524.1A 524.2C as 524.1C 524.2A, and that was fixed, as pointed out. What is really important is to see the error that this problem creates in a FASTA file that is generated from a mutation list that is derived from that software output.

I'll use a cut-and-paste of my explanation to Bill Hurst yesterday of what I understand to be the issue: "All of these FASTA sequences [the 38 GenBank submissions that Ian has reported above] are corrupted. In other words, they contain a CA or CACA insertion in a position where it doesn't belong, i.e., BETWEEN the two C's that follow the CACACACACA repeat [micro-STR] that begins at np 514 in the CRS. All of these 38 sequences (15 belong to hg K, 11 to hg U) are affected, and many more that are in our projects (those with 524.1C 524.2A ...). If you take the first K (K1a10) FASTA file (EF485042.1), open with Wordpad, do a EDIT/FIND for "CACACACACA," you will notice, not a CACA insertion that would be an extention of the STR that begins at np 514 (as it should be), but, rather, this: CACACACACACCACACGCT. Can you see the CACA insertion sandwiched between two C's? If so, then you will recognize that there is no contiguous repeat of the STR of seven CAs. This is an error! It should read: CACACACACACACACCGCT. With either nomenclature, AC or CA, there are seven repeats, of which either is accepted, as Mannis pointed out [in an email to several of us]. With the incorrect stretch, there are only five repeats for either nomenclature." FTDNA recognizes that the repeat should be contiguous, not spliit up as we can all see in the FASTA file. So, it is, therefore, not simply a matter of "reporting in non-standard format."

My question to Bill was: "What would you think about your DYS393 having an incorrect placement of a nucleotide in the middle of that ySTR, thus giving a reading of 6 repeats instead of 14? Would you ignore it? No, you'd holler at whoever stuck the incorrect nucleotide in there, demanding a change to place it where it belongs." Well, this is essentially the same, at least for us who work with FASTA files on a daily basis, because now we have to change manually every FASTA file that contains the erroneous sequence. As many of you know, I keep an archived list of every FTDNA submission to FTDNA here: http://freepages.genealogy.rootsweb.ancestry.com/~ncscotts/GG/mt_DNA.htm. All 38 corrupted sequences are in there for one to see, with the FASTA sequences. I use GEN-SNiP to generate a mutation list from the FASTA sequence, so these, along with the FASTA file, will need to be edited (not to mention any other FASTA sequence and mutation list with 524.1C 524.2A ....). In addition, any phylogenetic tree generated with those FASTA sequences (I use mtPhyl for this purpose), has to be discarded.

This micro-STR at np 514 is becoming more important, along with other indels in the mtDNA sequence, as indicators of lineage differences, if not, in some cases, as subclade definers. Some researchers ignore them, some find them relevant, and others will eventually see their enormous phylogenetic significance.

Best regards,
Ron Scott



This thread: