MCINTYRE-L Archives
Archiver > MCINTYRE > 2000-10 > 0972826840
From:
Subject: [MCINTYRE] What is a Soundex?
Date: Sun, 29 Oct 2000 08:40:40 EST
Hi, folks . . . Forwarding this from TheShipsList. Appears to have been
written by someone who works at NARA in answer to a Lister's simple
question: What is the Soundex? It's an excellent explanation and contains
just about everything you wanted to know about Soundex Systems and were
afraid to ask.
Kathy W-F
----- Original Message -----
From: <>
To: <>
Sent: Sunday, October 15, 2000 5:42 AM
Subject: Re: [TSL] Soundex
An excellent question <name snipped>. And one we should all ask ourselves
occasionally, just to stay in touch with reality.
First, a brief but direct answer to your question: The Russell Soundex
System is an algorithm (sp?) that assigns numerical values to certain sounds
in a word, no matter what letter produces that sound. "Soundexing" a name
results in a Soundex code consisting of a Letter and 3 numbers. This is the
actual first letter of the last name followed by the first three "sound
numbers" following that letter (i.e., R###). So Smyth and Smith, Muller and
Miller, Collins and Cullens all come out with the same code. This is a
useful system for indexing/arranging individual records since even two
brothers might spell their name differently, or because the census taker
spelled everything phonetically rather than asking for a correct spelling,
etc. I'm sure you see the value already.
Mr. Russell saw the value and even got a patent on his system, I think just
after WW I or in the early or mid-1920's. He sold it to businesses, he sold
it to governments (he may have sold the rights to Remington Rand
Corporation). Today it is in the public domain and most computer programs
include soundexing formulas (I think they are "fuzzy searches"?).
Anyway, I'm afraid the Soundex system has become another victim of
inadequate historical perspective. Searches on the internet turn up a
variety of explanations of the Soundex system, most of which have invented a
strange history for that system. And in my opinion that strange history is
the product of an optical illusion one sees when looking backward in time.
Go read all the Soundex descriptions for yourself. They all go something
like this: It was the 1930's and the government needed access to or
information from the Census or from other records (in some versions they are
under pressure to produce "reports," in others they need to access age data
in old records in order to process Social Security applications). Whatever
the reason, either the WPA or the National Archives invented the Soundex
system to solve the problem and to save the world for democracy. No one
mentions how private industry was already using the Soundex system before
the National Archives "invented" it, nor how the WPA or National Archives
applied their Soundexing to records not in their possession.
Of course there is a kernel of truth here. Various U.S. Gov't agencies did
purchase the Russell Soundex System in the 1930's to be used in several
large-scale indexing projects performed by the WPA. Those projects came
about because 1) the New Deal sponsored a variety of "make work" programs
during the Great Depression, and the "white collar" projects often involved
records management issues, and 2) the new Social Security program needed age
data from other government records to process applications. So it all came
together--indexing certain government records at one agency to support the
operations of another, new agency.
At the National Archives, we generally run into the Soundex system when
consulting indexes to Census records and Immigration records. And here is
where one encounters great danger if they assume that the National Archives
invented the Soundex and Soundexed all these records. The danger lies in
thinking that one agency, following one set of rules, Soundexed
everything. If that were so, then one would not find any variation among
all the Soundexed records. Since there are cases where people do find the
same name under different Soundex codes--depending on the record set--we
know the assumption is false.
Researchers usually attribute this difference in coding to "error." Surely
some of it is error, but some of it is easily explained by the fact that the
Census records Soundex project followed different coding rules than the
Immigration records Soundex project. And just because the WPA created all
these indices does not mean the WPA was in charge of everything--rather, the
WPA acted as contractor to different agencies, and followed the rules
dictated by their employer. The rules issued to researchers at NARA are the
Census Soundex rules, and these will result in the wrong Immigration records
code in about 3% to 8% of names. This is especially true in the case of
Eastern European names--a problem so troublesome that a separate algorithm
for coding Jewish names was developed by Jewish genealogists.
So, <name snipped>, this is more information than you ever wanted. Suffice
it to say that when you encounter a Soundex index, you will first have to
follow instructions on coding the surname in question before you can
actually use the index. If you go to the Soundex index cards and do not
find even one example of the surname you seek, go looking for some alternate
Soundex Coding Rules. The Soundex before you may not have been created
using the Census rules distributed by NARA.
This thread: