Loading...
Research Project
Dyslipidaemia stratification : new screening tools for a cost effective approach
Funder
Authors
Publications
Single versus Multiple Imputation Methods Applied to Classify Dyslipidemic Patients Concerning Statin Usage: a Comparative Performance Study
Publication . Albuquerque, João; Alves, Ana C.; Medeiros, Ana M.; Bourbon, Mafalda; Antunes, Marília
Introduction: One ofthe greatest challenges when working with clinical datasetsisto decide howto deal withmissing
values. Removing observations with any missing values priorto data analysis, a process defined aslistwise
deletion, is the standard default procedure in most statistical software packages, but may lead to great loss
of valuable information [1]. The use of robust imputation methods may provide accurate estimates for
missing values, allowing to include these observations into the analysis. The imputation strategy to adopt
depends on the amount and type of missing information, and also on the relation between variables, allying
statistical expertise with clinical understanding of the data. The main purpose of this work was to compare
the performance oftwo differentmethods ofimputationto overcomemissingness on dyslipidemic patients
regarding statin usage.
Generation and validation of a classification model to diagnose familial hypercholesterolaemia in adults
Publication . Albuquerque, João; Medeiros, Ana Margarida; Alves, Ana Catarina; Jannes, Cinthia Elim; Mancina, Rosellina M.; Pavanello, Chiara; Chora, Joana Rita; Mombelli, Giuliana; Calabresi, Laura; Pereira, Alexandre da Costa; Krieger, José Eduardo; Romeo, Stefano; Bourbon, Mafalda; Antunes, Marília
Background and aims: The early diagnosis of familial hypercholesterolaemia is associated with a significant
reduction in cardiovascular disease (CVD) risk. While the recent use of statistical and machine learning algorithms
has shown promising results in comparison with traditional clinical criteria, when applied to screening of
potential FH cases in large cohorts, most studies in this field are developed using a single cohort of patients,
which may hamper the application of such algorithms to other populations. In the current study, a logistic
regression (LR) based algorithm was developed combining observations from three different national FH cohorts,
from Portugal, Brazil and Sweden. Independent samples from these cohorts were then used to test the model, as
well as an external dataset from Italy.
Methods: The area under the receiver operating characteristics (AUROC) and precision-recall (AUPRC) curves
was used to assess the discriminatory ability among the different samples. Comparisons between the LR model
and Dutch Lipid Clinic Network (DLCN) clinical criteria were performed by means of McNemar tests, and by the
calculation of several operating characteristics.
Results: AUROC and AUPRC values were generally higher for all testing sets when compared to the training set.
Compared with DLCN criteria, a significantly higher number of correctly classified observations were identified
for the Brazilian (p < 0.01), Swedish (p < 0.01), and Italian testing sets (p < 0.01). Higher accuracy (Acc), G
mean and F1 score values were also observed for all testing sets.
Conclusions: Compared to DLCN criteria, the LR model revealed improved ability to correctly classify observations,
and was able to retain a similar number of FH cases, with less false positive retention. Generalization of the
LR model was very good across all testing samples, suggesting it can be an effective screening tool if applied to
different populations.
Performance comparison of different classification algorithms applied to the diagnosis of familial hypercholesterolemia in paediatric subjects
Publication . Albuquerque, João; Medeiros, Ana Margarida; Alves, Ana Catarina; Bourbon, Mafalda; Antunes, Marília
Familial Hypercholesterolemia (FH) is an inherited disorder of lipid metabolism, characterized by increased low density lipoprotein cholesterol (LDLc) levels. The main purpose of the current work was to explore alternative classification methods to traditional clinical criteria for FH diagnosis, based on several biochemical and biological indicators. Logistic regression (LR), decision tree (DT), random forest (RF) and naive Bayes (NB) algorithms were developed for this purpose, and thresholds were optimized by maximization of Youden index (YI). All models presented similar accuracy (Acc), specificity (Spec) and positive predictive values (PPV). Sensitivity (Sens) and G-mean values were significantly higher in LR and RF models, compared to the DT. When compared to Simon Broome (SB) biochemical criteria for FH diagnosis, all models presented significantly higher Acc, Spec and G-mean values (p < 0.01), and lower negative predictive value (NPV, p < 0.05). Moreover, LR and RF models presented comparable Sens values. Adjustment of the cut-off point by maximizing YI significantly increased Sens values, with no significant loss in Acc. The obtained results suggest such classification algorithms can be a viable alternative to be used as a widespread screening method. An online application has been developed to assess the performance of the LR model in a wider population.
Organizational Units
Description
Keywords
Contributors
Funders
Funding agency
Fundação para a Ciência e a Tecnologia
Funding programme
3599-PPCDT
Funding Award Number
PTDC/SAU-SER/29180/2017
