Logo do repositório
 
Publicação

Bioinformatics toolbox for comparative clustering evaluation of Whole-Genome Sequencing (WGS) pipelines for bacteria routine surveillance

dc.contributor.advisorMixão, Verónica de Pinho
dc.contributor.advisorCouto, Francisco José Moreira
dc.contributor.authorPereira, Joana Vanessa Gomes
dc.date.accessioned2026-03-04T13:21:15Z
dc.date.available2026-03-04T13:21:15Z
dc.date.issued2025-12-19
dc.descriptionDissertação de Mestrado em Bioinformática e Biologia Computacional, apresentado à Faculdade de Ciências, Universidade de Lisboa, 2025. http://hdl.handle.net/10400.5/117273
dc.description.abstractWhole-Genome Sequencing (WGS) provides higher resolution than traditional typing to distinguish closely related isolates. As result, disease surveillance increasingly adopts WGS, with international agencies recommending its use in reference laboratories. However, the heterogeneity of workflows and unequal resources raise concerns about inter-laboratory result comparability and, consequently, data sharing and communication. To address these issues, this thesis project developed EvalTree, a Python-based command-line tool to compare clustering results from two typing solutions, including traditional and genome-scale approaches, assessing their congruence at all possible resolution levels. EvalTree accepts two input folders or clustering files, processes them, and produces multiple outputs, including an user-friendly HTML report. When a folder generated by ReporTree, a tool to identify genetic clusters at all possible distance thresholds, is provided as input, EvalTree enables not only the inter-pipeline clustering comparison, but also detection of stable clustering regions, cluster characterization using metadata, and assessment of outbreak signal overlap. EvalTree was validated and benchmarked using a large (2946 isolates) and diverse dataset of Salmonella enterica, showing it accurately reproduces a recently published large-scale evaluation of inter-pipeline congruence at the European level. Its running time was mainly affected by dataset diversity rather than size. To further demonstrate its applicability, EvalTree supported the implementation of the S. enterica genomic surveillance pipeline at the Portuguese National Institute of Health (INSA), by comparing its performance with that of the European Food Safety Authority (EFSA), revealing high cluster congruence and similar resolution power. In summary, EvalTree is a novel bioinformatics tool (available through conda installation) that offers a practical, flexible solution to evaluate cluster congruence between the pipelines of different laboratories, supporting inter-laboratory communication in a One Health framework. It also promotes the long-term sustainability of any pipeline by enabling informed decision-making throughout its life-cycle (e.g., evaluating software updates).eng
dc.identifier.tid204173167
dc.identifier.urihttp://hdl.handle.net/10400.18/11121
dc.language.isoeng
dc.peerreviewedn/a
dc.rights.uriN/A
dc.subjectEvalTree
dc.subjectClustering Congruence
dc.subjectGenomic Surveillance
dc.subjectWhole-Genome Sequencing
dc.subjectOutbreaks
dc.subjectVigilância Genómica
dc.subjectSequenciação de Genoma Inteiro
dc.subjectCongruência de Pipelines
dc.subjectSurtos
dc.titleBioinformatics toolbox for comparative clustering evaluation of Whole-Genome Sequencing (WGS) pipelines for bacteria routine surveillanceeng
dc.typemaster thesis
dspace.entity.typePublication
oaire.versionhttp://purl.org/coar/version/c_b1a7d7d4d402bcce
thesis.degree.nameMestrado em Bioinformática e Biologia Computacional

Ficheiros

Principais
A mostrar 1 - 1 de 1
A carregar...
Miniatura
Nome:
TM_Joana_Pereira.pdf
Tamanho:
3.5 MB
Formato:
Adobe Portable Document Format
Licença
A mostrar 1 - 1 de 1
Miniatura indisponível
Nome:
license.txt
Tamanho:
4.03 KB
Formato:
Item-specific license agreed upon to submission
Descrição: