Repository logo
 
Publication

Data mining methods in the prediction of dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests

dc.contributor.authorMaroco, João Lúcia Gomes da
dc.contributor.authorSilva, Dina Lúcia Gomes da
dc.contributor.authorRodrigues, Ana
dc.contributor.authorGuerreiro, Manuela
dc.contributor.authorSantana, Isabel
dc.contributor.authorMendonça, Alexandre de
dc.date.accessioned2012-07-18T18:10:08Z
dc.date.available2012-07-18T18:10:08Z
dc.date.issued2011
dc.description.abstractBackground: Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press’Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman’s nonparametric test. Results: Press’ Q test showed that all classifiers performed better than chance alone (p < 0.05). Support Vector Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed overall classification accuracy above a median value of 0.63, but for most sensitivity was around or even lower than a median value of 0.5. Conclusions: When taking into account sensitivity, specificity and overall classification accuracy Random Forests and Linear Discriminant analysis rank first among all the classifiers tested in prediction of dementia using several neuropsychological tests. These methods may be used to improve accuracy, sensitivity and specificity of Dementia predictions from neuropsychological testing.por
dc.identifier.citationBMC Research Notes, 4:299por
dc.identifier.issn1756-0500
dc.identifier.urihttp://hdl.handle.net/10400.12/1557
dc.language.isoengpor
dc.peerreviewedyespor
dc.publisherBioMed Centralpor
dc.titleData mining methods in the prediction of dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forestspor
dc.typejournal article
dspace.entity.typePublication
oaire.citation.conferencePlaceLondonpor
oaire.citation.titleBMC Research Notespor
oaire.citation.volume4por
rcaap.rightsopenAccesspor
rcaap.typearticlepor

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
BMCRN 2011 4 299.pdf
Size:
1.02 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: