GENEALOGY-DNA-L ArchivesArchiver > GENEALOGY-DNA > 2010-03 > 1268023379
From: steven perkins <>
Subject: [DNA] Article: Ensemble learning algorithms for classification ofmtDNA into haplogroups.
Date: Sun, 7 Mar 2010 23:42:59 -0500
Brief Bioinform. 2010 Mar 4. [Epub ahead of print]
Ensemble learning algorithms for classification of mtDNA into haplogroups.
Wong C, Li Y, Lee C, Huang CH.
Classification of mitochondrial DNA (mtDNA) into their respective
haplogroups allows the addressing of various anthropologic and
forensic issues. Unique to mtDNA is its abundance and non-recombining
uni-parental mode of inheritance; consequently, mutations are the only
changes observed in the genetic material. These individual mutations
are classified into their cladistic haplogroups allowing the tracing
of different genetic branch points in human (and other organisms)
evolution. Due to the large number of samples, it becomes necessary to
automate the classification process. Using 5-fold cross-validation, we
investigated two classification techniques on the consented database
of 21 141 samples published by the Genographic project. The support
vector machines (SVM) algorithm achieved a macro-accuracy of 88.06%
and micro-accuracy of 96.59%, while the random forest (RF) algorithm
achieved a macro-accuracy of 87.35% and micro-accuracy of 96.19%. In
addition to being faster and more memory-economic in making
predictions, SVM and RF are better than or comparable to the
nearest-neighbor method employed by the Genographic project in terms
of prediction accuracy.
PMID: 20203074 [PubMed - as supplied by publisher]
Steven C. Perkins
Online Journal of Genetics and Genealogy
Steven C. Perkins' Genealogy Page
Steven C. Perkins' Genealogy Blog