Loading...
Research Project
Centre of Statistics and its Applications
Funder
Authors
Publications
Single versus Multiple Imputation Methods Applied to Classify Dyslipidemic Patients Concerning Statin Usage: a Comparative Performance Study
Publication . Albuquerque, João; Alves, Ana C.; Medeiros, Ana M.; Bourbon, Mafalda; Antunes, Marília
Introduction: One ofthe greatest challenges when working with clinical datasetsisto decide howto deal withmissing
values. Removing observations with any missing values priorto data analysis, a process defined aslistwise
deletion, is the standard default procedure in most statistical software packages, but may lead to great loss
of valuable information [1]. The use of robust imputation methods may provide accurate estimates for
missing values, allowing to include these observations into the analysis. The imputation strategy to adopt
depends on the amount and type of missing information, and also on the relation between variables, allying
statistical expertise with clinical understanding of the data. The main purpose of this work was to compare
the performance oftwo differentmethods ofimputationto overcomemissingness on dyslipidemic patients
regarding statin usage.
Generation and validation of a classification model to diagnose familial hypercholesterolaemia in adults
Publication . Albuquerque, João; Medeiros, Ana Margarida; Alves, Ana Catarina; Jannes, Cinthia Elim; Mancina, Rosellina M.; Pavanello, Chiara; Chora, Joana Rita; Mombelli, Giuliana; Calabresi, Laura; Pereira, Alexandre da Costa; Krieger, José Eduardo; Romeo, Stefano; Bourbon, Mafalda; Antunes, Marília
Background and aims: The early diagnosis of familial hypercholesterolaemia is associated with a significant
reduction in cardiovascular disease (CVD) risk. While the recent use of statistical and machine learning algorithms
has shown promising results in comparison with traditional clinical criteria, when applied to screening of
potential FH cases in large cohorts, most studies in this field are developed using a single cohort of patients,
which may hamper the application of such algorithms to other populations. In the current study, a logistic
regression (LR) based algorithm was developed combining observations from three different national FH cohorts,
from Portugal, Brazil and Sweden. Independent samples from these cohorts were then used to test the model, as
well as an external dataset from Italy.
Methods: The area under the receiver operating characteristics (AUROC) and precision-recall (AUPRC) curves
was used to assess the discriminatory ability among the different samples. Comparisons between the LR model
and Dutch Lipid Clinic Network (DLCN) clinical criteria were performed by means of McNemar tests, and by the
calculation of several operating characteristics.
Results: AUROC and AUPRC values were generally higher for all testing sets when compared to the training set.
Compared with DLCN criteria, a significantly higher number of correctly classified observations were identified
for the Brazilian (p < 0.01), Swedish (p < 0.01), and Italian testing sets (p < 0.01). Higher accuracy (Acc), G
mean and F1 score values were also observed for all testing sets.
Conclusions: Compared to DLCN criteria, the LR model revealed improved ability to correctly classify observations,
and was able to retain a similar number of FH cases, with less false positive retention. Generalization of the
LR model was very good across all testing samples, suggesting it can be an effective screening tool if applied to
different populations.
Performance comparison of different classification algorithms applied to the diagnosis of familial hypercholesterolemia in paediatric subjects
Publication . Albuquerque, João; Medeiros, Ana Margarida; Alves, Ana Catarina; Bourbon, Mafalda; Antunes, Marília
Familial Hypercholesterolemia (FH) is an inherited disorder of lipid metabolism, characterized by increased low density lipoprotein cholesterol (LDLc) levels. The main purpose of the current work was to explore alternative classification methods to traditional clinical criteria for FH diagnosis, based on several biochemical and biological indicators. Logistic regression (LR), decision tree (DT), random forest (RF) and naive Bayes (NB) algorithms were developed for this purpose, and thresholds were optimized by maximization of Youden index (YI). All models presented similar accuracy (Acc), specificity (Spec) and positive predictive values (PPV). Sensitivity (Sens) and G-mean values were significantly higher in LR and RF models, compared to the DT. When compared to Simon Broome (SB) biochemical criteria for FH diagnosis, all models presented significantly higher Acc, Spec and G-mean values (p < 0.01), and lower negative predictive value (NPV, p < 0.05). Moreover, LR and RF models presented comparable Sens values. Adjustment of the cut-off point by maximizing YI significantly increased Sens values, with no significant loss in Acc. The obtained results suggest such classification algorithms can be a viable alternative to be used as a widespread screening method. An online application has been developed to assess the performance of the LR model in a wider population.
Development and Validation of Screening Methods Applied to Familial Hypercholesterolemia Diagnosis
Publication . Albuquerque, João; Antunes, Marília; Antunes, Marília; Bourbon, Mafalda; Soares, Raquel
Familial hypercholesterolemia (FH) is an inherited disorder of lipid metabolism, characterized
by increased low density lipoprotein cholesterol (LDLc) levels. If untreated, the severe dyslipidemia
from birth leads to the early development of atherosclerosis, representing a major risk factor for
cardiovascular disease (CVD). The early diagnosis of FH is associated with a signi cant reduction
in CVD risk, supporting the introduction of risk mitigation strategies, such as cascade screening of
rst degree relatives, and adequate lipid lowering therapy (LLT) as precociously as possible. The
importance of genetic testing is emphasized by evidence that individuals with a con rmed pathogenic
variant possess a signi cant increase in the risk of CVD when compared to subjects with FH-like
phenotype for whom a causative variant is not detected. Nevertheless, molecular testing is still
not available as a rst line diagnosis tool, and previous selection and strati cation of subjects to
undergo this procedure should be made. Currently used clinical criteria, typically based on LDLc
levels, family history of hypercholesterolemia and/ or premature CVD and presence of physical signs
like tendon xanthomas, present the limitation of retaining a high number of false positive cases. This
may constitute a heavy burden in terms of healthcare costs, and limits the access to the genetic study
of a larger universe of true FH cases.
The main purpose of this work was to develop alternative classi cation methods for FH diagnosis,
based on di erent biochemical and clinical indicators, with improved ability to screen for FH cases
in comparison to traditional clinical criteria. The metrics used for comparison range from the areas
under the receiver operating characteristics (AUROC) and precision-recall (AUPRC) curves, to
several operating characteristics (OC), to agreement tests, among others
Organizational Units
Description
Keywords
Contributors
Funders
Funding agency
Fundação para a Ciência e a Tecnologia
Funding programme
6817 - DCRRNI ID
Funding Award Number
UID/MAT/00006/2019
