Repository logo
 
Publication

Comparative study on the performance of different classification algorithms, combined with pre- and post-processing techniques to handle imbalanced data, in the diagnosis of adult patients with familial hypercholesterolemia

dc.contributor.authorAlbuquerque, João
dc.contributor.authorMedeiros, Ana Margarida
dc.contributor.authorAlves, Ana Catarina
dc.contributor.authorBourbon, Mafalda
dc.contributor.authorAntunes, Marília
dc.date.accessioned2022-12-05T15:04:41Z
dc.date.available2022-12-05T15:04:41Z
dc.date.issued2022-06-24
dc.description.abstractFamilial Hypercholesterolemia (FH) is an inherited disorder of cholesterol metabolism. Current criteria for FH diagnosis, like Simon Broome (SB) criteria, lead to high false positive rates. The aim of this work was to explore alternative classification procedures for FH diagnosis, based on different biological and biochemical indicators. For this purpose, logistic regression (LR), naive Bayes classifier (NB), random forest (RF) and extreme gradient boosting (XGB) algorithms were combined with Synthetic Minority Oversampling Technique (SMOTE), or threshold adjustment by maximizing Youden index (YI), and compared. Data was tested through a 10 x 10 repeated k-fold cross validation design. The LR model presented an overall better performance, as assessed by the areas under the receiver operating characteristics (AUROC) and precision-recall (AUPRC) curves, and several operating characteristics (OC), regardless of the strategy to cope with class imbalance. When adopting either data processing technique, significantly higher accuracy (Acc), G-mean and F-1 score values were found for all classification algorithms, compared to SB criteria (p < 0.01), revealing a more balanced predictive ability for both classes, and higher effectiveness in classifying FH patients. Adjustment of the cut-off values through pre or post-processing methods revealed a considerable gain in sensitivity (Sens) values (p < 0.01). Although the performance of pre and post-processing strategies was similar, SMOTE does not cause model's parameters to loose interpretability. These results suggest a LR model combined with SMOTE can be an optimal approach to be used as a widespread screening tool.pt_PT
dc.description.sponsorshipThe current work was supported by the programme Norte2020 (operação NORTE-08-5369-FSE-000018), awarded to JA, and by Fundação para a Ciência e Tecnologia (FCT), under the projects UID/MAT/00006/2019 and PTDC/SAU-SER/29180/2017.
dc.description.versioninfo:eu-repo/semantics/publishedVersionpt_PT
dc.identifier.citationPLoS One. 2022 Jun 24;17(6):e0269713. doi: 10.1371/journal.pone.0269713. eCollection 2022pt_PT
dc.identifier.doi10.1371/journal.pone.0269713pt_PT
dc.identifier.issn1932-6203
dc.identifier.urihttp://hdl.handle.net/10400.18/8384
dc.language.isoengpt_PT
dc.publisherPublic Library of Sciencept_PT
dc.relation.publisherversionhttps://journals.plos.org/plosone/article?id=10.1371/journal.pone.0269713pt_PT
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/pt_PT
dc.subjectFamilial Hypercholesterolemiapt_PT
dc.subjectDiagnosispt_PT
dc.subjectFH Diagnosispt_PT
dc.subjectFH studypt_PT
dc.subjectPortuguese FH studypt_PT
dc.subjectDoenças Cardio e Cérebro-vascularespt_PT
dc.subjectPortugalpt_PT
dc.titleComparative study on the performance of different classification algorithms, combined with pre- and post-processing techniques to handle imbalanced data, in the diagnosis of adult patients with familial hypercholesterolemiapt_PT
dc.typejournal article
dspace.entity.typePublication
oaire.citation.issue6pt_PT
oaire.citation.startPagee0269713pt_PT
oaire.citation.titlePLoS ONEpt_PT
oaire.citation.volume17pt_PT
rcaap.embargofctAcesso de acordo com política editorial da revista.pt_PT
rcaap.rightsopenAccesspt_PT
rcaap.typearticlept_PT

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Comparative study on the performance of different classification algorithms combined with pre- and post-processing techniques to handle imbala.pdf
Size:
1.31 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: