DPSPDNT - Teses de doutoramento
Permanent URI for this collection
Browse
Browsing DPSPDNT - Teses de doutoramento by advisor "Moreira Couto, Francisco José"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- A systems medicine approach to study Autism Spectrum Disorder based on genomic and clinical dataPublication . Asif, Muhammad; Moura Vicente, Astrid; Moreira Couto, Francisco JoséAutism Spectrum Disorder (ASD) is characterized by highly heterogeneous clinical phenotypes and complex genetic architecture, rendering ASD difficult to diagnose particularly in very young children. While many genetic factors are implicated in ASD, the architecture of genotype/phenotype correlations is still very unclear. This work aimed at delineating ASD etiology by analyzing patient’s genetic and clinical data, and functional annotation data using integrative systems biology approaches. Specifically, the objectives of this thesis were to identify ASD underlying biological mechanisms, disrupted by rare variants in patients, and then to find their associations with the ASD phenotype, as defined by analysis of patient’s clinical outcomes. The significance of the parental phenotype for ASD etiology models was also studied in this work. In the second chapter, to correctly infer biological meaning from a large number of putative disease-causing genetic variants, a systematic functional annotation pipeline, called Functional annotation of Variants (FunVar) was proposed. The developed pipeline was applied to Copy Number Variants (CNVs) from ASD patients. Results showed that rare CNVs spanning brain genes disrupted a wide range of biological processes (N = 98), including nervous system development and protein polyubiquitination. To minimize the misinterpretation of results, 33 highly similar biological process terms were grouped. For this purpose, a semantic similarity measure was employed to assess functional similarity between terms. Most of the identified biological processes dysregulated by rare CNVs disrupting brain genes had previously been implicated in ASD, thus indicating the usefulness of the FunVar pipeline in interpreting the biological role of genetic variants in disease development. To predict the clinical outcome from biological processes defined by rare CNVs in ASD subjects, a novel machine learning-based integrative systems biology approach was developed. Agglomerative Hierarchical Clustering was used to identify ASD phenotypic subgroups from the clinical reports of a large population sample of ASD patients. Analysis of multidimensional clinical data identified two distinct phenotypic clusters that differed in overall adaptive behaviour profiles, verbal status, severity and cognitive abilities, defining a milder and a more severe phenotype. Functional enrichment analysis of rare CNVs targeting brain genes in the same patients, using the FunVar pipeline, identified 15 statistically significant biological processes, generally consistent with reported literature for ASD. Random Forest feature importance analysis showed that all these biological processes contributed positively to the classification of ASD phenotype, as defined by the identified clusters. The top two biological processes (regulation of cellular component organization and cell projection organization), which contributed most in discriminating milder and severe ASD phenotype, were previously implicated in ASD. To predict phenotypic subgroups of patients from biological processes disrupted by rare CNVs in brain genes, a Naive Bayes machine learning classifier was trained and tested on the clustered patient and disrupted biological processes datasets. For a subset of individuals that had higher Gene Ontology (GO) term information content, the Naive Bayes classifier was able to make predictions of the severe clinical outcome from biological processes defined by genetic alterations, with a good precision but low sensitivity. This study showed that genotype-phenotype correlations can be established in ASD and ASD phenotype predictions can be made from biological processes putatively disrupted by brain-gene CNVs. However, improved GO annotations and larger datasets will be needed for generalized predictions that can be translated into the clinics. In chapter 4, to predict novel disease genes a supervised machine learning based approach was developed. The proposed approach first computes GO-based functional similarities among genes, using semantic similarity measures, for any given disease-associated and non-associated genes. Multiple machine learning classifiers can be built on calculated gene’s functional similarities to find hidden associations between disease and non-disease causing genes. The traced hidden associations are then used to predict new disease genes. The developed approach was implemented on known ASD genes, obtain SFARI ASD genes database to predict new ASD genes. Machine learning classifiers trained and tested on calculated ASD gene’s functional similarities outperformed the existing state-of-the-art method. Classifier built on functional similarities of high confidence ASD and non-ASD genes showed an improved performance (over the reported classifier). Moreover, we provided an easy to use workflow of the methodology that was made available to the research community to efficiently identify new disease genes. Finally, in chapter 5, to elucidate the significance of the parental Broad Autism Phenotype (BAP) for ASD etiology models, parental phenotypic profiles, assessed using Social Responsiveness Scale (SRS) and Broad Autism Phenotype Questionnaire (BAPQ), and CNVs inheritance information from ASD children, were investigated. Analysis showed that parents of ASD children from this dataset present BAP traits at lower rates than previously reported, and that mothers and fathers have distinct profiles. There was no correlation between SRS scores from ASD children and their parents. Spousal pairs were weakly correlated on SRS scores, indicating the phenomenon of assortative mating. Lastly, no evidence was found for the transmission of parental BAP traits to their children through CNVs disrupting putative ASD genes. However, putative ASD genes in databases are mainly evidenced by studies focusing on rare variants, including CNVs, which have lower heritability rates. As common variants account for largest proportion of ASD liability, future studies are needed to assess their role in the transmission of the parental BAP. This work developed integrative systems medicine methods to improve the identification of biological processes from genetic variants in a genetically complex, clinically heterogeneous disorder, based on machine learning and semantic similarity analysis. Overall, findings applying these methods indicate that complex genotype-phenotype correlations can be established in ASD. Furthermore, clinical subgroups, defined by clustering patients based on multidimensional clinical profiles can be predicted from the biological characterization of genetic variants. The correct identification of disrupted biological processes, associated with phenotypically distinct subgroups of patients, will be important for early detection and prognosis, which have implications for early intervention and for the discovery of potential therapeutic targets for ASD.
