Departamento de Promoção da Saúde e Prevenção de Doenças Não Transmissíveis
Permanent URI for this community
Browse
Browsing Departamento de Promoção da Saúde e Prevenção de Doenças Não Transmissíveis by Field of Science and Technology (FOS) "Departamento de Biologia Vegetal"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- Prediction of Genes Associated With Autism Spectrum Disorder Using Sequence and Graph Embedding MethodsPublication . Inácio, João Pedro da Silva; Martiniano, Hugo; Vicente, AstridNeurodevelopmental disorders impose a significant social and economic burden on individuals with these conditions and their families. Given that all neurodevelopmental disorders have a genetic component, identifying the risk genes for these disorders enhances our understanding of their etiology and can aid in the development of future screening methods and targeted therapies. Autism Spectrum Disorder (ASD) is a prototypical complex neurodevelopmental disorder characterized by high heritability and a heterogeneous genetic architecture and phenotypic presentation. This thesis presents a Machine Learning (ML) approach that improves upon state-of-the-art methods for ASD risk gene prediction, this thesis presents a Machine Learning (ML) approach capable of improving state-of-the-art methods for ASD risk gene prediction. To achieve this goal, a novel approach is created using publicly available ASD-associated genes and graph and sequence gene embeddings with supervised ML classifiers. Using a 5-fold nested stratified cross-validation, the pipeline achieved an AUC of 0.90, F1 of 0.82, and MCC of 0.77. Additionally, the top decile of the ranked list of predicted risk genes generated by the model was significantly enriched for ASD phenotypes but not other brain-specific disorders. The proposed pipeline improved state-of-the-art approaches in predicting genes targeted by LOF mutations in the MSSNG and SCC studies. A functional network characterization of the top decile identified four distinct communities significantly enriched for biological pathways associated with ASD. Of the 50 top predicted genes by the pipeline, 37 were already present in ASD risk gene databases, while 13 were not yet linked to ASD. The 13 genes were significantly enriched in the cerebral cortex, and the telencephalon cell migration processes critical for brain development and linked to neurodevelopmental disorders. This thesis provides an accurate comparison of embedding methods for risk gene discovery and improves existing ASD risk gene predictions, taking a step closer to a better understanding of this complex genetic disorder.
