GENEALOGY-DNA-L Archives

Archiver > GENEALOGY-DNA > 2006-07 > 1151881889


From: David Ewing <>
Subject: Explaining SNP vs STR
Date: Sun, 02 Jul 2006 17:11:29 -0600


I am group administrator of the Ewing Surname Y-DNA project. I am
working on the seventh in a series of articles on the project that are
being published in the Journal of Clan Ewing. The first six articles are
posted on-line at
http://clanewing.org/DNA_Project/Y-DNA.html#Project_Articles.

The article I am working on will be published in the forthcoming August
issue of the Journal. In this article, I am trying to explain to people
who don't know much about genetic genealogy the significance of the
67-marker STR panel and Deep clade SNP testing. Part of the incentive to
do this is that John McEwan is writing another piece for the same issue
of the Journal on M222+, the R1bSTR19Irish STR haplotype, and the Ui
Neill hypothesis. I could use a little help from the list to tell me
(1)whether what I have written is intelligible and likely to be
understandable to a beginner and (2)whether I have made important mistakes

What follows is an excerpt from the current draft of my article. This is
much longer than posts on this list tend to be--feel free not to read
it, but please don't yell at me for being so long-winded. Odd notations
like "[1] <#_ftn1>" that appear in the text because of what happens to
footnotes when they are converted to plain text. The footnotes appear at
the end of the text in the regular way, though.

David Neal Ewing
Albuquerque

*Ewing Surname Y-DNA Project** **Article 7*

This is the seventh in a series of articles about the Ewing surname
Y-DNA project. The previous six articles have appeared in the last six
issues of the /Journal of Clan Ewing/. They are also available on-line
at http://www.clanewing.org/Y-DNA.html.

*New Tests Available*

In the last article I reported that Family Tree DNA had begun offering a
59-marker Y-DNA panel and “deep clade SNP testing.” Now, they have
announced that instead of 59 markers, they are in fact offering a
67-marker panel! And I was about to go bug-eyed trying to keep 37
markers straight. Let me explain what the deal is with these new tests.

*Deep clade SNP testing*

/Warning:/ if you are allergic to technical talk or alphabet soup, you
might want to skip this discussion, but you will need to understand some
of this if you want to understand John McEwan’s interesting article in
this issue of the Journal, and I promise to try to keep it light.

“Clade” means close enough to “branch” that I think I’ll let it go at
that. “Deep” just means “the smallest branches we know how to identify
so far.” SNP means “single nucleotide polymorphism.” In the previous
articles all of our discussion has been about STR (short tandem repeat)
testing, which is a completely different thing than SNP testing. STRs
are much more rapidly mutating than SNPs, and can be useful in
genealogy. Mutations at a SNP locus happen so rarely that once they do,
for all practical purposes they create a permanent record. They are more
useful in anthropological or population studies. There may be several
hundred STRs we could use for our testing, but so far, fewer than a
hundred are used, and the largest panel commercially available from one
lab is FtDNA’s 67-marker panel.[1] <#_ftn1> There are potentially
millions of SNPs that could be used for testing, but only a tiny
fraction of these have been identified (again, maybe a hundred or so),
and of those, the only ones that are used are those that have been found
to identify some population of interest.

There are four “letters” in the genetic code, which are more properly
called “nucleotides.” An SNP mutation occurs when a mistake is made in
copying DNA and one nucleotide is stuck in where a different one should
have gone. Since the purpose of the nucleotides is to spell out
directions for making our bodies, mistakes of this kind can be fatal,
and DNA copying is astonishingly accurate. Now, in the sort of genetic
testing we are doing (whether SNP or STR), we are looking at regions of
the DNA that are often called “junk DNA,” because they do not code for
the proteins that make up our bodies and have no known biological
purpose. Even though these regions don’t make any difference to our
survival, they are still copied just as faithfully as the important
areas. It’s just that mistakes in the “junk” don’t kill anybody, so they
can accumulate and be passed on indefinitely. Fatal mistakes in the
important areas don’t get passed on, so far fewer mutations accumulate
in these areas. A commonly used estimate of the SNP mutation rate is
0.00000002 per generation.[2] <#_ftn2> This means that at any specific
nucleotide, we can expect a copying mistake to be made once every
50,000,000 generations.

What!? There haven’t been anywhere near 50,000,000 generations since
human beings first stood up on their hind legs. How could there be /any/
of these mutations? Well, keep in mind that there are /lots/ of
nucleotides—like maybe 60,000,000 in the non-coding region of the
Y-chromosome. So even though there is only one mistake in 50 million
nucleotides copied, we shouldn’t be surprised to find a mistake if we
copy 60 million, as we do in each generation. If you are following me,
you should be saying, “Now, wait a minute. If there is one SNP mutation
every generation, this should be terrific for doing genealogy.” You
would be right, except for the fact that when we check for SNPs, we need
to know where in the chromosome we are looking—checking all 60 million
places with current technology would cost about a bazillion dollars.[3]
<#_ftn3>

When SNP testing is done, we go looking for some specific mutations that
we already know about, which we think will tell us where the person
being tested falls in the big anthropological family tree. The Y-DNA
haplogroups you may have read about are defined on the basis of SNPs.
All the Ewing men tested so far[4] <#_ftn4> are in haplogroup R1b1, a
branch of the big R1b haplogroup that includes about 80% of all western
Europeans. Now, Family Tree DNA is testing some additional SNPs that
allow us to subdivide R1b1 into “sub-clades.” I got this test done on
myself. Here are the results: M173+ M207+ M222+ M269+ M343+ P25+ M126-
M153- M160- M18- M37- M65- M73- P66- SRY2627-.[5] <#_ftn5> These results
place me in the R1b1c7 subclade. Have a look at John McEwan’s article in
this issue of the Journal to see what the implications of this are.

*The 67-marker panel*

The “markers” we talk about in Y-DNA testing for genealogy are called
either “short tandem repeats (STRs)” or “microsatellites.” These two
terms are synonymous. They refer to sections of the non-coding region of
the Y-chromosome where a series of between two and five nucleotides is
repeated several times, generally on the order of 10 to 30 times. When
the DNA is being copied in preparation for making sperm, sometimes an
error is made and an extra repeat or two is stuck in or left out. Such
errors are also called “mutations.” So the difference between SNPs and
STRs is that when there is a SNP mutation, just one “letter” is changed.
In a STR mutation, a group of three or four “letters” is added or left
out when copying a series in which a short sequence of letters is
repeated a number of times. Any mutation will then be faithfully passed
on to all male offspring of the man who has it until such time as there
is another mutation at the same marker.[6] <#_ftn6> An issue currently
under active discussion is how often STR mutations occur. The most often
quoted estimate is that, on average, any one STR marker can be expected
to undergo a mutation once every 500 generations, for an average
mutation rate of .002 or 0.2%. This is 100,000 times faster than the SNP
mutation rate. Recently, a lot of folks are finding that the average
mutation rate of the markers we have been using is a fair amount faster
than that, maybe even two or three times faster.[7] <#_ftn7>

Let’s talk about what this means. For the sake of discussion, let’s
assume that the average STR mutation rate is 0.4% in our family. At any
one marker locus, we would expect to see a mutation in 250 generations.
But if we test 37 markers, we would expect a mutation in 250/37 = 6.76
generations. Now, 6^th cousins have the same 5^th great grandfather, who
is seven generations removed from each of them following different
lines. This means that 6^th cousins are separated by 14 generations, and
that on average, we should expect to see their 37-marker profiles
differing at two markers. If we increase the number of markers tested to
67, we have to redo the math: 250/67 = 3.73. So if a 67-marker panel is
tested, on average we should expect to see a difference at two markers
in the 67-marker profiles of 3^rd cousins, who are separated by 8
generations.

So far, only two Ewing men have been tested on all 67 markers. They are
Chancellor George Ewing and me. To our surprise, we still match on all
67 markers. So what does this mean? George believes himself to be the
6^th great grandson of John Ewing of Carnashannagh. I believe myself to
be the 6^th great grandson of James Ewing of Inch. We have no
documentation of a relationship between these two men, but they were
close neighbors in Donegal and were almost certainly related. The
difference in their ages suggests that they probably weren’t brothers,
but let’s suppose they were. This would mean that George and I have a
6^th great grandfather in common, so are separated by 16 generations.
Based on “averages,” we would expect to find differences at four markers
in the 67-marker panel. The key word here is /average/. FtDNA calculates
that there is a 50% likelihood that two men who have a perfect 67-marker
match have a common male ancestor within the last three generations (so,
six generations of separation—second cousins or closer), and 90%
likelihood that their common male ancestor was within the last five
generations. If we are right about our conventional genealogies, George
and I have a common male ancestor no more recently than eight
generations ago. It is unlikely (something less than 5% probability)
that we should have an exact 67-marker match, but it is not impossible.
By itself, the perfect match between George and me is not enough to
raise serious questions about our conventional genealogies.

There is another factor in this case that really has me scratching my
head, though. This is that George and both of the other descendants of
John Ewing m. Alice Caswell in our project have the mutation DYS 576 =
19, and so do I. The only other man in what I have been calling “the
large group of related Ewings” who has this mutation is RA, whose
immigrant ancestor is not known. The fact that George and his cousins
share a mutation from the ancestral haplotype of the whole group with
me, when added to the perfect 67-marker match between George and I, has
me beginning to wonder if my conventional genealogy is correct. Maybe I
am a closer relative of George than I thought. The project has some
results pending that will bear on this question. One is that GR, another
descendant of James of Inch has joined the project. If he and I have
significantly different haplotypes, this will really make me think I’ve
hung my hat on the wrong branch of the tree. The other is that we have
results pending on JE, a descendant of one of the brothers of John Ewing
m. Alice Caswell. His result will tell us whether the DYS 576 = 19
mutation occurred in John of Carnashannagh’s son William, or in his
grandson, John m. Alice Caswell. If it happened in William, then I will
have to consider the descendants of several of William’s sons as my
potential ancestor.

A project participant can upgrade from 37 markers to 67 markers for
another $99, but at the present stage of development of the project, we
won’t be able to tell you any more with a 67-marker panel than we can
with a 37-marker panel. We do not recommend the 67-marker panel for most
purposes, though it may be useful for fine tuning some branches of the
family once we have the basic structure worked out.


------------------------------------------------------------------------

[1] <#_ftnref1> “FtDNA” is Family Tree DNA, the lab our project uses.

[2] <#_ftnref2> For a more detailed discussion of these matters, have a
look at Charles Kerchner’s website at
http://www.kerchner.com/dnamutationrates.htm

[3] <#_ftnref3> It cost me $79 to have 15 places checked—do the math.
Actually, I am exaggerating. The mitochondrial DNA test of HVR 1 & 2
involves checking 1143 potential SNP loci, costs only $189 at FtDNA, and
can be found elsewhere for about $100. FtDNA also offers to check the
entire 16,569 nucleotide length of the mitochondrial chromosome for
$895, so maybe we could check all 60,000,000 places for $3500 or so,
though no commercial lab offers this service, as far as I know. Prices
are coming down all the time as testing technology improves. Maybe SNP
testing for genealogy will eventually become feasible.

[4] <#_ftnref4> Five of the men in our project have paid extra to have
their haplogroup confirmed by SNP testing. The others have not had any
SNP testing, but their STR haplotypes are characteristic of R1b1, so
there is essentially little doubt that they are also R1b1. I am the only
Ewing man who has had the “Deep clade SNP test,” which tests more SNP
loci, but all of the Ewing men in the large related sub-group of Ewings
have STR haplotypes so similar to mine that they are certain also to be
in the R1b1c7 subclade of R1b1. It is possible that this is not so for
the Ewing men “unrelated” to me—specifically Js, JM and DS. Js is
probably R1b, and the other two have been tested positively for R1b1,
but they could be in a different subclade than I. On the other hand,
TD’s haplotype is the most similar to the Ui Neill haplotype of all the
Ewings, and even though his is different enough from mine to say that we
are probably not related in genealogic time, he is almost certainly also
in the R1b1c7 subclade.

[5] <#_ftnref5> Hooboy. The “+” after a mutation name means that I have
that mutation; the “-“ after a mutation name means I do not have that
mutation. Because SNP mutations are unique and persistent, each of those
we use has a conclusive meaning. All men in the large R haplogroup have
M207+. In fact, having M207+ is the definition of being in the R
haplogroup. M173+ places one in the R1 subgroup, M343+ in the R1b
sub-subgroup, and P25+ in the R1b1 sub-sub-subgroup. So everyone in R1b1
has all three of these mutations. So far, four different subtypes of
R1b1 have been identified. Three of them are distinguished by three
different additional mutations and the fourth by the fact it doesn’t
have any of the three. M269+ defines the R1b1c sub-sub-sub-subgroup. So
far, eleven different subtypes of R1b1c have been identified (one of
which itself has three subtypes). Ten of these are distinguished by ten
different additional mutations and the eleventh by the fact it has none
of these. M222+ defines the R1b1c7 sub-sub-sub-sub-subgroup (and I think
maybe you are beginning to understand why I defined “deep clade” as I
did above). So my deep clade SNP testing results include M207+, M173+,
M343+, P25+, M269+ and M222+, and this puts me in the R1b1c7 subclade.
The remainder of the SNPs tested did not have a mutation. Two of these
distinguish R1b1a and R1b1b, and the others distinguish R1b1c1 through
R1b1c6 and R1b1c8. The mutations distinguishing R1b1c9, its two
subtypes, and R1b1c10 were not tested, not that this matters too much.
The fact is that if a man tests for M222+, you can be pretty doggone
sure that he is in subclade R1b1c7 without any additional results. The
reason that all of these SNPs are tested is a matter of economy and
efficiency. If we know a man is in the R haplogroup and we want to pin
down his deep clade, we can set our machine up to look at 19 SNPs and be
sure of getting an answer. We could save a little time and money by just
testing for one SNP, but only if we were lucky and chose the right one
to test for on the first try. If we checked only for M222 and didn’t
find it, we still wouldn’t know what his deep clade was, and we would
have to test more SNPs. It would be prohibitively time consuming,
confusing and expensive to check 60 million SNPs all at once, but
checking 19 all at once is easier than checking for fewer several times.

[6] <#_ftnref6> It is vanishingly unlikely for a given nucleotide to
mutate twice, because the rate is 1 mutation in 50 million generations,
which takes something like 2 million years. On the other hand, it is
quite common to find an STR locus where there have been several
mutations, and sometimes even “back mutations,” which is what we call it
when a marker mutates and sometime later mutates back to the same number
of repeats it had originally. The relative rapidity of STR mutations
also explains why sometimes even unrelated men will have the same
mutation by coincidence-a phenomenon called "convergent mutation."

[7] <#_ftnref7> Our data in the known descendants of John Ewing of
Carnashannagh reveals a difference at two markers separating 6^th
cousins (on average), suggesting that the average mutation rate in this
family is about 0.4%.


This thread: