Archiver > Y-DNA-HAPLOGROUP-I > 2008-04 > 1207344239

From: "Ken Nordtvedt" <>
Subject: [yDNAhgI] MRCA Ages by SNP Counts
Date: Fri, 4 Apr 2008 15:23:59 -0600

A real "coming of age" part of the new YCC paper was their introduction of SNP counts in Y Tree branches to estimate relative ages for the various nodes (MRCAs) in the Tree. I have taken the presented data from the paper and set up the calculus for the most likely age estimate for the MRCA of haplogroups I and R to see how it compared to the paper's approach. This is presented in powerpoint slide SNPage at my website http://knordtvedt.home.bresnan.net. For this simple case I get very close agreement with the paper's result.

The foundation for such age estimates is having confidence that all the SNPs have been found in some fixed portion of the y chromosome and which can be assigned to various branch segments of a region of the Y Tree. The paper states that 56 SNPs were found (by someone) in the E branch line starting from MRCA for E, I, and R and to the present; that 20 SNPs were found from that E,I.R MRCA to the I,R MRCA, and that 40 SNPs were found on the I branch line from the I,R MRCA to present, and that 48 SNPs were found on the R branch line from the I,R MRCA to present. Note that the total SNPs to the three present-day haplotypes disagree; that would be due to the statistical flucuations of the probabilistic SNP occurrences if the procedure is otherwise sound.

I next write down the probability of this set of SNP counts happening. M is total mutation rate of SNP sites in the fixed portion of the y chromosome. G is total generations from the E,I,R MRCA to present. FG is the number of generations from E,I,R MRCA to the I,R MRCA. We want to find F. The most likely Tree for producing the observed SNP counts is found by setting the calculus partial derivatives of the probability expression equal to zero for both variables F and GM. The two resulting equations can then be solved for F and GM (F=.304 and GM = 61). If the total time back to the E,I,R MRCA is assumed to be 70,000 years as in the paper, then the time back to the I,R MRCA (presumably in haplogroup F) is found to be 48,700 years ago. Finding that GM = 61 means that in absence of statistical flucuations, the three branch lines should have had 61 SNPs on them, instead of the observed 56, 60, and 68 SNPs.

As greater portions of the y chromosome are mined of all SNPs pertinent to a region of our Y Tree, we should see this method of establishing dates for the Tree nodes (MRCAs) take over the field.

Note: we are not determining ages for SNP occurrences; the ages are for MRCAs (tree nodes).


This thread: