Archiver > GENEALOGY-DNA > 2005-06 > 1119836816

From: "Whit Athey" <>
Subject: New Haplogroup Predictor Version
Date: Sun, 26 Jun 2005 21:46:56 -0400

Dear List,

The Haplogroup Predictor program was introduced last August and the response
since then has been very gratifying. I really appreciate the comments and
feedback that I have received from so many of you. You have helped me to
correct a number of glitches in the program and a number of you have made
suggestions future improvements (a few of which are discussed below)..

I would like to announce some small changes to the original program-Version
1.17 is now operational at the same web site. This new version just updates
the database with more robust allele frequency data, but otherwise it is the

I am also adding a link (on the main instructions page) to the "Beta"
edition of a completely new version (Version 2.0) of the program. There are
obvious problems with the program as it stands at the moment. The main
problems are (1) it is larger and slower to load, and (2) it is painfully
slow. I am working to improve the situation, but while I am doing so, I
would appreciate any feedback on the new features. One major difference
from the previous program is that you must click the "Update" button at
upper left before any calculations will be done on values that you enter
(see additional information about this on the instructions page). Following
are the main differences from the previous version:

1. The original version of the program was limited to the 37 markers tested
by FTDNA. The new version can accept any of 63 markers, covering all
markers currently offered by the commercial labs (all that I am aware of),
plus a number of others developed at NIST. There is an entry box for
DYS425, but at present it doesn't do anything because I don't have enough
data to support it. All of the others are active, though I expect that some
problems may come up in using some of the new ones until all the bugs are
worked out (e.g., at present there is a very limited amount of data
supporting the NIST markers for Haplogroup N)..

2. Perhaps the main new feature, certainly the one that was most difficult
to implement, is a second round of analysis following the initial
determination of the 10 haplogroup scores. In this second round, only the
two highest-scoring haplogroups (from the first round) are used in the
analysis, and only those markers are used that have significantly different
allele frequency distributions between the two haplogroups. This second
round is particularly useful when the traditional analysis has returned
approximately equal scores for two haplogroups, making it difficult to
choose between them.

Nomenclature is always a problem. In general, I use the conventions of
FTDNA for their 37 markers, and I use DNAH conventions on those additional
markers offered by DNAH (and not offered by FTDNA). For about a dozen
markers that have only been reported in NIST studies, I used the values as
reported by NIST, but I don't believe that any standard conventions exist so
far. Near the end of the instructions page, there is a link to a page on
nomenclature issues.

As in the original version, the program only scores the "fit" of a haplotype
to a group of haplotypes that have previously been reported for a particular
haplogroup. It occasionally happens that a haplotype will fit one
haplogroup fairly well, but will be in a different haplogroup. That is the
case with my own haplotype, which the program gives the highest score to
Haplogroup G, whereas I am negative for SNP M201.

If you have a set of results from FTDNA, the original version of the program
will probably be best for you. Version 2 is for difficult cases or cases
where more than the FTDNA 37 markers are available.

Again, I would appreciate any feedback, especially concerning any problems
that you encounter..


