GENEALOGY-DNA-L Archives

Archiver > GENEALOGY-DNA > 2010-11 > 1290352432


From: "Anatole Klyosov" <>
Subject: Re: [DNA] Age of Zhong et al. (2010) R1b-related lineages
Date: Sun, 21 Nov 2010 10:13:52 -0500
References: <mailman.259.1290326443.10826.genealogy-dna@rootsweb.com><3E1211B24AFA4C68968BF184DB608983@anatoldesktop>


>From: "Alister John Marsh" < >
>I can accept that if it is assumed mutation occurrence and survival of
mutations once they occur are uniformly random over time, you can say that
certain mathematically derived confidence intervals apply if a fully random
mutation model applies.
>However, "if" all mutations (strs or snps) say over 1000 years old were not
only selectively eliminated from the gene pool by random mechanisms, but
perhaps 50% were eliminated or modified by evolutionary pressures, and
perhaps 1% were greatly multiplied by selective evolutionary pressures, then
it seems to me that assessing age of haplogroups by methods assuming a 100%
random model applies would be flawed.




My response:



Dear John,



I truly appreciate you thoughtful and concerned letter, so different in its tone from those by "dismissers". It also gives me an opportunity to share with you and others some fundamental things about science.



There are two schools of thinkers. One has a motto "I know only that I know nothing". Another school pushes ahead with a full understanding that their knowledge is limited, however "If not me then who?". They surely realize that they paint only a temporary picture of the world and its features, however, they know that the picture itself, albeit imperfect, provides a ground for future modifications, improvements, better understanding. It is better to move ahead incrementally, stumbling, making mistakes, correcting them, rather that to sit on a fence repeating that the world is too complicated.



Nothing is complete in science. Nothing will be complete. We know only what we know today, but we are moving forward by making assumptions in order to realize tomorrow that those assumptions were imperfect or plain incorrect, however, NOW (tomorrow) we know how to improve them only because we made those imperfect (or wrong) assumptions yesterday and it gave us (and others) a chance to examine and verify them. That is how science is developing.



There are many areas in science where a theoretical ground is limited, but is does not preclude decisive (sic!) scientists go ahead using questionable assumptions. And they often win. Thermodynamics of water solutions is developed for infinitely diluted systems only (this is a simplified description), however, this whole field of experimental science actively studies all kinds of diluted solutions, including rather concentrated once. And often it works. Yes, sometimes there are deviations from the theory, and scientists just record that there are deviations due to such and such high concentrations. Their peers do not blame them, everybody understands. More than that, many interesting phenomena, theoretically and practically useful, were found applying "incorrect" approaches and conditions.



Now, back to mutations. First, instead of keep saying - "stop doing it, because I think that your data are incorrect and I think that the confidence intervals should be wider" - why not to compare the data obtained with real, actual systems? This itself will tell us "wide" or "not wide" were those confidence intervals, and correct or incorrect were the data obtained. If they were correct - fine, the approach is proven to be good. If they are shifted - good, now we can figure out what was wrong. It is win-win situation. Why such an urge to dismiss, bases only on perception, not on actual data?



Now, let's move to data. All my considerations and calculations of dozens and dozens, if not hundreds of haplotype datasets showed me that mutations in this haplotypes we employ are largely (or only) random, if to handle those datasets properly. To handle properly is to separate datasets onto branches of haplotypes (DNA-lineages), consider recLOH mutations, correct data for back mutations, consider (sometimes) symmetry/asymmetry of mutations, etc. Yes, there will be new tricks discovered, that will improve the field even better. This is science. Papers have been published showing that mutations in haplotypes can be considering as a biological clock ticking randomly for the last at least 2 million years. And, nevertheless, many folks are sitting on the fence repeating that we know too little, that mutations are probably not random, that there are tooooo many deviations, and - "stop doing it, you must be wrong". Sounds familiar, isn't it?



Data, one of many, many examples. For the last several years I have been calculating the TMRCA for the R1a haplotypes on the post-Soviet territory. They include the Russian, Ukrainian, Belarus, Lithuanian, Latvian, Estonian, etc. haplotypes, including very few Central Asian (Kyrgyz, Kazakh) haplotypes and very few haplotypes from the Caucasus, all R1a1. I have started at the end of 2007 with as few as 26 of 25 marker haplotypes, and ended (so far) this November with 148 of 67 marker haplotypes. Among them were 255 of 17 marker haplotypes of ethnic Russians, published by Roewer at al (2008). In other words, the series included different populations, different number of haplotypes, and the haplotypes were in the 17, 25, 37 and 67 marker formats. Data were published in 2008-2010, and some of them were published in J. Genet. Geneal. (2009).



Specifically, the data were published in June 2008, November 2008, January 2009, February 2009, March 2009, June 2009, and November 2010. They contained, respectively, 26, 44, 58, 255 (17 marker), 98, 110, 148 haplotypes. Total number of mutations (from the base haplotype, which was the same in all these series, only in 17, 25, 37 and 67 marker formats) were 178, 326, 423, 1320, 711, 804, 1037 (25 markers), 2023 (37 markers) and 2748 (67 markers). The respective TMRCA were: 4400, 4825, 4725, 4475, 4700, 4750, 4500, 4475, 4575 years.



The average TMRCA is 4600+/-150 years. This is plus-minus 3.3%. Nevertheless, in each of the described case I put the margins of error as around 500-550 years. These confidence intervals were calculated following certain rules for random statistics. As you see, the random statistics is actually confirmed in the above examples.



And now I am coming back to the point you were considered. You BELIEVE that the mutations should not be random, or not quite random, or it is complicated, or whatever. You cite examples with two haplotypes or a very limited amounts of haplotypes. Of course, in those cases a margin of error will be huge. When you compare two haplotypes only, it is typical between 100% and 50%. However, I work with dozens and often hundreds of extended haplotypes in a series, and this represents a different situation.



Last by not the least. You write: "Would I be wise if I was a gambler to rely on those odds, and bet my house on the chances?". This rhetorical question is misplaced. It is not applicable in science, on the reasons I have explained above. I would not bet my house on the numbers given above, and I should not. Because I know that tomorrow these numbers will be modified, to the benefit of science in general and DNA genealogy in particular. And I would be happy to see it, because I - and others - will learn more. It is not "betting", it is continuously gaining. This is the principal pleasure to be a scientist, not a gambler.



Regards,



Anatole Klyosov






This thread: