Repository logo
 
Loading...
Project Logo
Research Project

Centre of Statistics and its Applications

Authors

Publications

Single versus Multiple Imputation Methods Applied to Classify Dyslipidemic Patients Concerning Statin Usage: a Comparative Performance Study
Publication . Albuquerque, João; Alves, Ana C.; Medeiros, Ana M.; Bourbon, Mafalda; Antunes, Marília
Introduction: One ofthe greatest challenges when working with clinical datasetsisto decide howto deal withmissing values. Removing observations with any missing values priorto data analysis, a process defined aslistwise deletion, is the standard default procedure in most statistical software packages, but may lead to great loss of valuable information [1]. The use of robust imputation methods may provide accurate estimates for missing values, allowing to include these observations into the analysis. The imputation strategy to adopt depends on the amount and type of missing information, and also on the relation between variables, allying statistical expertise with clinical understanding of the data. The main purpose of this work was to compare the performance oftwo differentmethods ofimputationto overcomemissingness on dyslipidemic patients regarding statin usage.
Generation and validation of a classification model to diagnose familial hypercholesterolaemia in adults
Publication . Albuquerque, João; Medeiros, Ana Margarida; Alves, Ana Catarina; Jannes, Cinthia Elim; Mancina, Rosellina M.; Pavanello, Chiara; Chora, Joana Rita; Mombelli, Giuliana; Calabresi, Laura; Pereira, Alexandre da Costa; Krieger, José Eduardo; Romeo, Stefano; Bourbon, Mafalda; Antunes, Marília
Background and aims: The early diagnosis of familial hypercholesterolaemia is associated with a significant reduction in cardiovascular disease (CVD) risk. While the recent use of statistical and machine learning algorithms has shown promising results in comparison with traditional clinical criteria, when applied to screening of potential FH cases in large cohorts, most studies in this field are developed using a single cohort of patients, which may hamper the application of such algorithms to other populations. In the current study, a logistic regression (LR) based algorithm was developed combining observations from three different national FH cohorts, from Portugal, Brazil and Sweden. Independent samples from these cohorts were then used to test the model, as well as an external dataset from Italy. Methods: The area under the receiver operating characteristics (AUROC) and precision-recall (AUPRC) curves was used to assess the discriminatory ability among the different samples. Comparisons between the LR model and Dutch Lipid Clinic Network (DLCN) clinical criteria were performed by means of McNemar tests, and by the calculation of several operating characteristics. Results: AUROC and AUPRC values were generally higher for all testing sets when compared to the training set. Compared with DLCN criteria, a significantly higher number of correctly classified observations were identified for the Brazilian (p < 0.01), Swedish (p < 0.01), and Italian testing sets (p < 0.01). Higher accuracy (Acc), G mean and F1 score values were also observed for all testing sets. Conclusions: Compared to DLCN criteria, the LR model revealed improved ability to correctly classify observations, and was able to retain a similar number of FH cases, with less false positive retention. Generalization of the LR model was very good across all testing samples, suggesting it can be an effective screening tool if applied to different populations.
Performance comparison of different classification algorithms applied to the diagnosis of familial hypercholesterolemia in paediatric subjects
Publication . Albuquerque, João; Medeiros, Ana Margarida; Alves, Ana Catarina; Bourbon, Mafalda; Antunes, Marília
Familial Hypercholesterolemia (FH) is an inherited disorder of lipid metabolism, characterized by increased low density lipoprotein cholesterol (LDLc) levels. The main purpose of the current work was to explore alternative classification methods to traditional clinical criteria for FH diagnosis, based on several biochemical and biological indicators. Logistic regression (LR), decision tree (DT), random forest (RF) and naive Bayes (NB) algorithms were developed for this purpose, and thresholds were optimized by maximization of Youden index (YI). All models presented similar accuracy (Acc), specificity (Spec) and positive predictive values (PPV). Sensitivity (Sens) and G-mean values were significantly higher in LR and RF models, compared to the DT. When compared to Simon Broome (SB) biochemical criteria for FH diagnosis, all models presented significantly higher Acc, Spec and G-mean values (p < 0.01), and lower negative predictive value (NPV, p < 0.05). Moreover, LR and RF models presented comparable Sens values. Adjustment of the cut-off point by maximizing YI significantly increased Sens values, with no significant loss in Acc. The obtained results suggest such classification algorithms can be a viable alternative to be used as a widespread screening method. An online application has been developed to assess the performance of the LR model in a wider population.
Development and Validation of Screening Methods Applied to Familial Hypercholesterolemia Diagnosis
Publication . Albuquerque, João; Antunes, Marília; Antunes, Marília; Bourbon, Mafalda; Soares, Raquel
Familial hypercholesterolemia (FH) is an inherited disorder of lipid metabolism, characterized by increased low density lipoprotein cholesterol (LDLc) levels. If untreated, the severe dyslipidemia from birth leads to the early development of atherosclerosis, representing a major risk factor for cardiovascular disease (CVD). The early diagnosis of FH is associated with a signi cant reduction in CVD risk, supporting the introduction of risk mitigation strategies, such as cascade screening of rst degree relatives, and adequate lipid lowering therapy (LLT) as precociously as possible. The importance of genetic testing is emphasized by evidence that individuals with a con rmed pathogenic variant possess a signi cant increase in the risk of CVD when compared to subjects with FH-like phenotype for whom a causative variant is not detected. Nevertheless, molecular testing is still not available as a rst line diagnosis tool, and previous selection and strati cation of subjects to undergo this procedure should be made. Currently used clinical criteria, typically based on LDLc levels, family history of hypercholesterolemia and/ or premature CVD and presence of physical signs like tendon xanthomas, present the limitation of retaining a high number of false positive cases. This may constitute a heavy burden in terms of healthcare costs, and limits the access to the genetic study of a larger universe of true FH cases. The main purpose of this work was to develop alternative classi cation methods for FH diagnosis, based on di erent biochemical and clinical indicators, with improved ability to screen for FH cases in comparison to traditional clinical criteria. The metrics used for comparison range from the areas under the receiver operating characteristics (AUROC) and precision-recall (AUPRC) curves, to several operating characteristics (OC), to agreement tests, among others

Organizational Units

Description

Keywords

Contributors

Funders

Funding agency

Fundação para a Ciência e a Tecnologia

Funding programme

6817 - DCRRNI ID

Funding Award Number

UID/MAT/00006/2019

ID