TMG-L Archives

Archiver > TMG > 2005-08 > 1123029747

From: "John Davis" <>
Subject: Re: [TMG] Statistical Report - STD DEV in a nutshell
Date: Tue, 2 Aug 2005 17:42:27 -0700
References: <351F89D30B94074CA4EF76696E56411301A250DC@susday234.corp.ncr.com>

For anyone who may be interested,

The statistic that will be least understood in the statistical report is
standard deviation. It is more or less meaningful and useful depending
on the context in which it is found. Standard deviation in relation to
the number of tags of a certain type is probably not as transparently
meaningful as in the context of the "age" statistics, age at first
marriage, age at death, etc.

Standard deviation can be called the "standard deviation of the mean
(mean = the grand average)." If you just remember (or write down) three
numbers, 68.26%, 95.44% and 99.74% then you can put standard deviation
to work for you to glean some meaningful information from the
statistics. (Just round them to 68, 95 and 99.7 for all practical
purposes.)

68% of the data being examined will be found to be within plus or minus
one standard deviation of the mean, 95 percent will be within plus/minus
two standard deviations of the mean, and a full 99.7% will be found with
plus/minus three standard deviations of the mean. The accuracy of these
statements increases and becomes more meaningful as the size of the
population increases. In other words, the larger the database, the more
the above numbers will be found to be true.

If the data are skewed (heaped up on the low end, a few numbers trailing
to the high end = positively skewed), truncated (cut off in one
direction, such as no one being able to die before they were even
conceived, so no data exists for deaths before age zero) too evenly
spread out or bunched in the "middle" (kurtosis), bunched up around two
(or more) "centers" (bimodal or multimodal) or otherwise just not
"standard" the numbers will not be as meaningful, but will still hold
some meaning.

Example:

Age at death
POP = 126 (very small population for calculating
meaningful STD DEV)
AVG (mean) = 65
STD DEV = 24.2
68.26% of this population died within 24.2 years, either way, of age 65
95.44% of this population died within 48.4 years, either way, of age 65
99.74% of this population died within 72.6 years, either way, of age 65

As you can see, the most meaningful data from this small population is
the 68% or so that died within 24.2 years of age 65 ( between age 40.8
and age 89.2). This is something you can get your teeth into.

95.44% died between age 16.6 and 113.4 years of age. Since the oldest
age at death in my database (MAX) is 99 (rounded out), then we know that
we're talking about "between 16.6 and MAX"

99.74% died between age minus 7.6 and age 137.6. Since MIN is zero and
MAX is 99, then we know that this accounts for ALL of the remaining
deaths, or the remaining 4.3%.

Like I say, the numbers become more meaningful as the size of the
population under consideration increases. When it becomes large enough,
the "nonsensical" results begin to diminish and disappear. Then the
99.74% *should* be contained within the actual data. Everything outside
the 99.74% (3 standard deviations) would be considered "outliers" and
could be considered with statistical certainty as being exceptions to
the rule, or mistakes.

A lot of this mumbo-jumbo will only be of interest to the geekiest among
us, or the most curious, but I thought I'd post it just in case it might
be helpful to at least a few. I assume that the TMG folks include the
STD DEV statistic for the same reason.

(And, for all the purists, I DO tend to mix singular/plural when using
the word, "data," whichever suits my whim :)

John Davis
A retired statistical process control/total quality management guy,
dabbling in genealogy, in Elgin, Oregon

----- Original Message -----
From: "Sholder, Kevin L" <>
To: <>
Sent: Monday, August 01, 2005 10:54 AM
Subject: [TMG] Statistical Report

> All,
>
> I've not used this before and created one today and have some
questions
> about what it means. The following columns are pretty obvious as what
> they mean:
>
> POP - population
> AVG - average
>
> These are not so obvious as to what they mean or how they are used:
>
> STD DEV - Standard Deviation??
> MIN - Minimum (minimum what?)
> MAX - Maximum (maximum what?)
> ID MIN - ????
> ID MAX - ????
>
> Here is an example line from this report:
>
> Age at first marriage
> POP - 14,222
> AVG - 24.3
> STD DEV - 32.4
> MIN - (-1707)
> MAX - 1733
> ID MIN - 22080
> ID MAX - 46047
>
> Can someone help explain what all these mean please?
>
> Thank you for your time,
> Kevin L. Sholder
>
>
>
> ==== TMG Mailing List ====
> To un-subscribe from TMG-L (in MAIL mode), send a message to
<> [to <> in Digest
mode] with just the word "unsubscribe" (no quotes)in the text and turn
>
>