GENEALOGY-DNA-L Archives

Archiver > GENEALOGY-DNA > 2003-09 > 1062549398


From:
Subject: Re: From Tony Frudakis RE: [DNA] Cecelia's Mother's DNAPrint 2.0 Results
Date: Tue, 2 Sep 2003 20:36:47 -0400 (EDT)
References: <6c.310aead3.2c86043a@aol.com>
In-Reply-To: <6c.310aead3.2c86043a@aol.com> (DNACousins@aol.com)


Tony wrote:

> Comparing a 30 marker test result with a 71 marker test for another person
> is just as reliable (or unreliable, depending on how you look at it) as
> comparing 71 marker test results for two separate people - both the 30 and
> 71 marker tests involve a few percentage points of error on average

Again, we have the odd use of the word "error" with different meanings
to different people. Customers expect it to mean the total
uncertainty of the reported values, but Tony apparently uses it to
refer to just the portion of uncertainty due to possible laboratory
measurement error (which happens to be a small part of the total
uncertainty). Suffice it to say that the statistical uncertainty
ranges from about 8% (for African) to 15% (for Amerindian and Asian).
In short, the total uncertainty is never just a "few percentage
points".

> When using 3-way versus 4-way algorithms, most of our customers show the
> same exact result, and virtually all of the differences we have seen have
> been in the percentage of African. Since there is no African for your
> profile, I would venture to guess that your results would be exactly the
> same with the 4-way algorithm.

This remark appears to confirm that the reported results are indeed
always calculated by setting the least likely ethnicity to zero. That
is ok, except for "certain highly admixed individuals".

> The change in algorithm was made because in certain highly admixed
> individuals (such as 33% Euro, 33% EAS, 10% NAM, 23% AFR), the African was
> being under reported as a result of the fact that for the African-Euro,
> African-NAM and AFR-EAS pairs, a large number of completely private alleles
> exist and the genetic distance is the greatest, so the cumulative delta
> value of markers we use is weighted more strongly for these distinctions
> which causes the algorithm we were using problems in highly admixed
> individuals with African ancestry.

Note that the above is all one sentence. I find the implications very
disturbing, though I must admit that the meaning is far from clear.
Note that the DNAprint report gives what it calls the "maximum likelihood"
percentages. If the likelihoods are calculated correctly, there is no
question of weighting some markers more than others. Obviously, some
markers are more informative than others, but that's not a "problem"
if the likelihoods are indeed calculated correctly. I have always
assumed that the DNAprint results were "correct" (subject to the
limitations of small-number statistics), but now I have to wonder.

> As for how we are testing, right now we are relying on mathematical
> simulations rather than large international collection efforts, because the
> former is much tighter and easier to control variables for (which is crucial
> for a sound study).

This is perhaps the most disturbing news of all. There shouldn't be
any variables in the algorithm except the allele frequencies for the
target populations. These frequencies can't be "controlled" -- they
must be measured.

John Chandler


This thread: