GENEALOGY-DNA-L ArchivesArchiver > GENEALOGY-DNA > 2010-06 > 1277318624
From: William Hurst <>
Subject: Re: [DNA] mtDNA descriptions- set in stone?
Date: Wed, 23 Jun 2010 14:43:44 -0400
Hi TK and all,
I hope you get multiple answers to your questions, but I'll give them a shot. Your basic understanding of mtDNA seems fine. You don't have to worry about uracil; that's in RNA not DNA. As for the price of an mtDNA full-sequence test, that has come down considerably in the last few years. FTDNA not only reports your differences from the CRS, but it also lists all 16,569 letters in a FASTA file under your Downloads tab on your personal page. Good luck with that. Its main use is when submitting your results to the GenBank database. In mtDNA, 16519C, for example, tells you just about everything you need to know. In comparison, Y-DNA STRs are reported as simple numbers such as "13"
which may represent 52 DNA letters in some hard-to-determine location.
Besides the three types of mutations you list, there are also categories such as heteroplasmies, homoplasic or recurrent mutations, transitions or transversions, and synonymous or non-synonymous mutations. Most people don't have to worry about these.
Now to your numbered questions:
1. The CRS (Cambridge Reference Sequence) was first published by Anderson in 1981. See http://www.cs.uni-duesseldorf.de/AG/BI/lehre/Dokumente/algorithmen/Anderson-etal-1981-Nature.pdf
Andrews published a revised CRS (rCRS) in 1999, with a few corrections. See http://www.nature.com/ng/journal/v23/n2/pubmed/ng1099_147.html and http://www.ncbi.nlm.nih.gov/nuccore/251831106?report=genbank
The CRS is "right" because it is accepted, just as a mile is an accepted unit of measure. A "better" measure might be to list differences from "mitochondrial Eve," but her DNA sample wasn't in the refrigerator in Cambridge in 1981 - and still isn't. There is a Yoruba Reference Sequence, but use of that would only confuse people. Any change in the reference now would make everybody's life more difficult.
2. I don't think you have to worry about some lab suddenly discovering a new nucleotide between 432 and 433 that would require renumbering the CRS. FTDNA alone has done over 10,000 FGS tests; I think they would have noticed. That differs from Y-DNA where new SNPs are being found all the time, but those SNPs are not numbered sequentially.
3. Multiple insertions are common, especially after positions 309 and 573. After position 523 there are often one to four pairs of CA insertions. If you see "522-, 523-" that's a deletion of a pair at the same place, where the CRS has five CA pairs. These are similar to Y-DNA STRs. Think of the locus as 523 with the CRS as allele 5, one pair of deletions as allele 4, one pair of insertions as allele 6, etc. There are also a "nine-base-pair deletion" in the coding-region and a "six-base-pair deletion" in HVR2 (See my K Project under K1c2). In the same area where the nine-base-pair deletion is found, we one one K person with an extra set of the nine bases called the "nine-base-pair triplication." (By the way, there is quiz on this later.)
I don't know of a case where there are two or more random insertions after the same position, but knowing mtDNA, I wouldn't bet against it. But if the insertions are in non-coding (no genes) sections of the mtDNA, that wouldn't affect the viability.
Hope this helps.
> I've been struggling with "what could be done" to help people use
> their DNA profiles to trace their family trees. I'm trying to devise
> a schema for a database.... but to do so, I need to understand the
> things I want to put into it.
> I'm more of a computer expert than a DNA expert... in spite of a
> human genetics course along the way to my B.Sc. at Cornell. But that
> was "a while" ago... Aside from what I've forgotten, there have been
> new discoveries. (And yes, dear neice, if you are reading this:
> Watson and Crick HAD published their paper BEFORE my university
> I understand sundry restraints and opportunies from the computing
> side. What I seek help with is EXACTLY (computers don't cope well
> with "little details") what we know when we look at mtDNA results we
> get from, say, FamilyTreeDNA. And I need help with the terminology
> being used.
> As I understand mtDNA, it is a single strand of about 16,000
> neucleotides... for my purposes "letters", and those letters can
> (happily!) only be a, c, g or t. (In DNA. Yes, I know about uracil,
> but it doesn't come into what we need, does it?)
> And we can (for a price!) get a lab to determine for us the full
> sequence of our personal mtDNA. And the report comes back to us in a
> shorthand which tells us not, directly, all 16,000 letters, but tells
> us where we DIFFER from the CRS.
> Those differences can be substitutions, deletions and insertions.
> If you see errors in any of the above, I'd be delighted if you were
> to explain them for me and the other readers of this thread. No
> offense taken!
> I hope, however, nothing above is wide of the EXACT truth of how it
> all works, and that none of the following questions will be rendered
> moot by errors in my premises. I've numbered my questions to help
> anyone answering just a selection of them...
> 1) Is the CRS "set in stone"? (As far as any such thing ever will
> be!!) Does anyone know how long the current CRS has been "the" CRS?
> If I understand it properly, it is only a point of reference.... no
> one is saying that it is "the right" sequence... it is, isn't it,
> just a "pattern" on which we can base statements like "your mtDNA
> sequence is as the CRS, except you have "a" at position 256"? If
> that's right, it seems unnecessary and unhelpful to make changes from
> time to time? Or would such a simple scenario be too simple?
> 2) Is the system of reporting WHERE an individual's mtDNA differs
> from the CRS pretty stable? Have instances of "re-numbering" the
> sequence, say to allow for deciding that an extra "t" ought to go in
> between the old position 432 and 433, been few, and the most recent
> one a long time ago, with new ones unlikely?
> 3) Have instances of multiple insertions between two "standard" CRS
> neucleotides been discovered? I.e. If the first four letters in the
> sequence are catg, I have no doubt that there could be someone out
> there with ctatg... i.e someone with a "t" inserted between the
> "standard" c at posn 1 and the "standard" a at posn 2.
> Further in that vein: Have instances been seen, say, of ctcatg... a
> "t" AND a "c" inserted between the "c" at 1 and the "a" at "2"? And
> if not, there's no fundamental reason, is there(?), which would make
> such an odd and unlikely "double insertion" impossible? (I realize
> that a strand with a double insertion at one point has an even lower
> chance of being VIABLE DNA... but it could(?) happen, couldn't it?
> (For the computer database schema design to be satisfactory, it has
> to be able to cope with anything that COULD happen... not just most
> of the things that will probably happen.)
> Thanks for any guidance you can offer.
> A digression: To illustrate the perils of a programmer's life: I once
> was responsible for maintaining a school's database of its pupils. I
> had build into it the necessary provisions for the fact that a boy
> who came to us as Billy Brown might leave as Billy Jones. What I
> hadn't seen the need for was a way to record more than one date of
> birth for a given child.
> One boy came from a nation our government was hostile towards. He
> flew out of his home country on a first passport.. first dob.. to an
> intermediary country, changed planes, passports and dates of birth,
> and then completed his journey to us! Two dates of birth. Simple when
> the records were 3x5 cards in a box. Not so simple when the records
> went into a computer!
> http://sheepdogsoftware.co.uk TK Boyd's site with
> freeware and shareware for kids, parents, schools... and others.
> To unsubscribe from the list, please send an email to with the word 'unsubscribe' without the quotes in the subject and the body of the message
|Re: [DNA] mtDNA descriptions- set in stone? by William Hurst <>|