Repository logo
 
Publication

Identifying disease genes using machine learning and gene functional similarities, assessed through Gene Ontology

dc.contributor.authorAsif, M.
dc.contributor.authorMartiniano, H.F.M.C.M.
dc.contributor.authorVicente, A.M.
dc.contributor.authorCouto, F.M.
dc.date.accessioned2019-02-22T17:36:45Z
dc.date.available2019-02-22T17:36:45Z
dc.date.issued2018-12-10
dc.description.abstractIdentifying disease genes from a vast amount of genetic data is one of the most challenging tasks in the post-genomic era. Also, complex diseases present highly heterogeneous genotype, which difficult biological marker identification. Machine learning methods are widely used to identify these markers, but their performance is highly dependent upon the size and quality of available data. In this study, we demonstrated that machine learning classifiers trained on gene functional similarities, using Gene Ontology (GO), can improve the identification of genes involved in complex diseases. For this purpose, we developed a supervised machine learning methodology to predict complex disease genes. The proposed pipeline was assessed using Autism Spectrum Disorder (ASD) candidate genes. A quantitative measure of gene functional similarities was obtained by employing different semantic similarity measures. To infer the hidden functional similarities between ASD genes, various types of machine learning classifiers were built on quantitative semantic similarity matrices of ASD and non-ASD genes. The classifiers trained and tested on ASD and non-ASD gene functional similarities outperformed previously reported ASD classifiers. For example, a Random Forest (RF) classifier achieved an AUC of 0. 80 for predicting new ASD genes, which was higher than the reported classifier (0.73). Additionally, this classifier was able to predict 73 novel ASD candidate genes that were enriched for core ASD phenotypes, such as autism and obsessive-compulsive behavior. In addition, predicted genes were also enriched for ASD co-occurring conditions, including Attention Deficit Hyperactivity Disorder (ADHD). We also developed a KNIME workflow with the proposed methodology which allows users to configure and execute it without requiring machine learning and programming skills. Machine learning is an effective and reliable technique to decipher ASD mechanism by identifying novel disease genes, but this study further demonstrated that their performance can be improved by incorporating a quantitative measure of gene functional similarities. Source code and the workflow of the proposed methodology are available at https://github.com/Muh-Asif/ASD-genes-prediction.pt_PT
dc.description.sponsorshipThis work was supported by the Portuguese Fundação para a Ciência e Tecnologia (SFRH/BD/52485/2014 to MA and DeST: Deep Semantic Tagger PTDC/CCI-BIO/28685/2017).pt_PT
dc.description.versioninfo:eu-repo/semantics/publishedVersionpt_PT
dc.identifier.citationPLoS One. 2018 Dec 10;13(12):e0208626. doi: 10.1371/journal.pone.0208626. eCollection 2018pt_PT
dc.identifier.doi10.1371/journal.pone.0208626. eCollection 2018.pt_PT
dc.identifier.issn1932-6203
dc.identifier.urihttp://hdl.handle.net/10400.18/5930
dc.language.isoengpt_PT
dc.peerreviewedyespt_PT
dc.publisherPublic Library of Sciencept_PT
dc.relation.publisherversionhttps://journals.plos.org/plosone/article?id=10.1371/journal.pone.0208626pt_PT
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/pt_PT
dc.subjectAutism Spectrum Disorderpt_PT
dc.subjectASD genespt_PT
dc.subjectAutismopt_PT
dc.subjectPerturbações do Desenvolvimento Infantil e Saúde Mental
dc.titleIdentifying disease genes using machine learning and gene functional similarities, assessed through Gene Ontologypt_PT
dc.typejournal article
dspace.entity.typePublication
oaire.citation.issue12pt_PT
oaire.citation.startPagee0208626pt_PT
oaire.citation.titlePLoS ONEpt_PT
oaire.citation.volume13pt_PT
rcaap.rightsopenAccesspt_PT
rcaap.typearticlept_PT

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Identifying disease genes using machine learning and gene functional similarities, assessed through Gene Ontology.pdf
Size:
1.08 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: