GENEALOGY-DNA-L Archives
Archiver > GENEALOGY-DNA > 2004-01 > 1073093817
From:
Subject: Re: [DNA] Clarification about DNAPrint
Date: Fri, 2 Jan 2004 20:37:06 -0500 (EST)
References: <20031231053627.15040.qmail@web41212.mail.yahoo.com> <3FF2E9D4.000001.01856@computer>
In-Reply-To: <3FF2E9D4.000001.01856@computer> (debbyp@metrocast.net)
Debby wrote:
> I understand that there are systematic errors, and that
> the test's results are affected by those errors.
> I understand why y'all think the odds for the true value
> of the MLE is in the scientific level of confidence.
> What you're not understanding is, after reading BOTH
> the emails AND the User Manual, I disagree with y'all
> and think that it's just TOO pessimistic a value!
> I really do believe that the true value lies somewhere
> inbetween the MLE and the 2 fold less likely contour.
> And I think that'll bear out in time.
Let's put this in perspective. I won't go over all the gory details
again, but the fact is that an honest coin comes up heads only 1/2
of the time -- just so, the true ethnicity falls within that inner
contour only 1/2 the time. This is a theorem of elementary statistics
and is exactly true for a normal probability distribution; it is only
approximately true for real life, but it appears to be a good
approximation for the DNAprint case (that is to say, the spacing of
the plotted contours looks very much like the contours for a normal
distribution). You can say that the glass is 1/2 full if you're
being optimistic, or 1/2 empty if you're being pessimistic, but don't
fall into the trap of thinking that the glass is NEARLY full or
NEARLY empty. It's just not so.
As for systematic errors, we don't know how serious they are for
the DNAprint test(s) -- but, then, neither do the DNAprint folks,
and they have made a point of assuming that there are NO systematic
errors at all. They have even boasted how efficient they are
because they are doing test validations by computer simulation
without collecting more data.
> And I'm interested in knowing why you think this 2.5 test isn't going to
> give us the results before it is even out and results shown? Isn't it
> difficult to say that results from this test won't be accurate before they
> even run it to have results?
You and David have already pursued this in terms of the sample size
-- adequate vs inadequate -- but I didn't see anybody mention the fact
that the 2.5 test was explicitly announced as having the same sample
as 2.0. It includes more markers and is therefore more precise than
2.0, but its accuracy will not be any different.
> And if
> the sample is inadequate, then why even bother using it? They too are
> scientists, are they not?
Well, to be more precise, they are entreprenears running a company that
has already spent a lot of money developing this test. There are limits
to how much they can spend. If the sample is inadequate, that's because
an adequate sample would cost more than they can afford. Then again,
there was a press release a week or two ago saying they had just made a
deal with some venture capitalists -- maybe they can now afford some
more data collection.
> Mine is:
> European 87%
> Sub-Saharan African 12%
> Native American 1%
> My next question.... is there ANY doubt in anyone's
> opinion (including you David F.) that if Sub-Saharan African
> shows up in this test that it too could be wrong like the
> NA if less than 30% or is there enough African sampling
> to compare with so you think the SSA results are as presented?
Your African result is big enough that the inner contour might
actually miss the corner -- but with 7 failed markers, your contours
may be rather wider than usual. Just bear in mind that the inner ring
is only the 50% confidence interval. If you want to be 90% sure of
enclosing the true answer, you must go to the outer ring. So, yes,
there is a real possibility that the result as presented is "entirely"
wrong, even though it's the most likely estimate.
> GEESH!! First, you want it to pick up the minority
> groups better, then, when it seems to do just that,
> you think there's something wrong!
What it really comes down to is the feeling that they shouldn't change
the product without changing the label. In fact, I think it's too
soon to conclude that they have changed the product. This could all
be the result of a change in the ethnicity of the "average" customer.
> As I'm new to this, I have no idea as to what algorithm has been or not been
> used before.... but it seems to me that for a test to be reliable and
> considered better, the same algorithm format for that test and any upgrade
> should be the same, shouldn't it?
The algorithm has been vaguely described by the inventors. It's
straightforward in principle, but it's very computer-intensive. They
just calculate the likelihood of obtaining exactly your genotype, given
all the possible combinations of ethnicity. The combination that gives
the highest likelihood is the MLE, and it's easy enough to plot the
confidence contours. Unfortunately, a crucial part of the process is
secret: the allele frequencies in the reference samples. A change in
those would not actually change the "algorithm," but would certainly
change all the results.
> But THERE is the rub!! You take the 2, 5 or 10 times "less likely" (or any
> point thereon or between) and say, it's MORE likely that my "true" MLE is
> anywhere but where it is. What is truer said is the MLE is the BEST estimate
> of what the porportions are with a small chance that they are something else
We've been through lengthy discussions of these very same points, as
you can see in the G-D-L archives. Suffice it to say that you have to
avoid mixing up probability density with absolute probability. Even
the best estimate -- the one with the highest probability density --
has a very low absolute probability unless you include a sufficient
margin of error.
> And here again, I say, if there's smoke, there most likely is fire and if I
> had that 10% NA, I'd be asking many questions. Just like I am my 12%S-SA.
> You saw my triangle...
Actually, I haven't seen yours, but I have seen a goodly number of others.
Yours should be easier to interpret than many because it has non-zero
values for three ethnicities, which means that the MLE dot has not been
arbitrarily forced onto the edge, as it often is.
John Chandler
This thread: