DDI - Dissertações de mestrado
URI permanente para esta coleção:
Navegar
Percorrer DDI - Dissertações de mestrado por orientador "Couto, Francisco José Moreira"
A mostrar 1 - 2 de 2
Resultados por página
Opções de ordenação
- Bioinformatics toolbox for comparative clustering evaluation of Whole-Genome Sequencing (WGS) pipelines for bacteria routine surveillancePublication . Pereira, Joana Vanessa Gomes; Mixão, Verónica de Pinho; Couto, Francisco José MoreiraWhole-Genome Sequencing (WGS) provides higher resolution than traditional typing to distinguish closely related isolates. As result, disease surveillance increasingly adopts WGS, with international agencies recommending its use in reference laboratories. However, the heterogeneity of workflows and unequal resources raise concerns about inter-laboratory result comparability and, consequently, data sharing and communication. To address these issues, this thesis project developed EvalTree, a Python-based command-line tool to compare clustering results from two typing solutions, including traditional and genome-scale approaches, assessing their congruence at all possible resolution levels. EvalTree accepts two input folders or clustering files, processes them, and produces multiple outputs, including an user-friendly HTML report. When a folder generated by ReporTree, a tool to identify genetic clusters at all possible distance thresholds, is provided as input, EvalTree enables not only the inter-pipeline clustering comparison, but also detection of stable clustering regions, cluster characterization using metadata, and assessment of outbreak signal overlap. EvalTree was validated and benchmarked using a large (2946 isolates) and diverse dataset of Salmonella enterica, showing it accurately reproduces a recently published large-scale evaluation of inter-pipeline congruence at the European level. Its running time was mainly affected by dataset diversity rather than size. To further demonstrate its applicability, EvalTree supported the implementation of the S. enterica genomic surveillance pipeline at the Portuguese National Institute of Health (INSA), by comparing its performance with that of the European Food Safety Authority (EFSA), revealing high cluster congruence and similar resolution power. In summary, EvalTree is a novel bioinformatics tool (available through conda installation) that offers a practical, flexible solution to evaluate cluster congruence between the pipelines of different laboratories, supporting inter-laboratory communication in a One Health framework. It also promotes the long-term sustainability of any pipeline by enabling informed decision-making throughout its life-cycle (e.g., evaluating software updates).
- Implementation of a data analysis pipeline for the genetic characterization of non-seasonal influenza A WGS samples in the context of laboratory surveillance of viral outbreaksPublication . Pereira, João Luís Gomes; Sobral, Daniel Vieira Noro e Silva; Couto, Francisco José MoreiraBackground: Influenza A viruses (IAV) are rapidly evolving pathogens with high zoonotic and pandemic potential. Their segmented genome allows antigenic drift and reassortment, key drivers of adaptation and cross-species transmission. The ongoing H5Nx panzootic underscores the need for timely genomic surveillance to detect adaptive mutations, reassortments, and antiviral resistance. Existing frameworks such as the INSaFLU-TELEVIR platform, work well for seasonal strains but face challenges with non-seasonal IAV due to reference selection, representation bias, and database redundancies. Objectives: This project aimed to develop an automated pipeline for the genetic characterization of non-seasonal IAV whole-genome sequencing (WGS) samples. Goals were: (1) accurate identification of genomic segments, subtypes/genotypes, host origins and closely-related reference sequences per segment; (2) characterization of mutations of biological relevance, including host adaptation and antiviral resistance; (3) integration of results into user-friendly and machine-readable outputs. Methods: The pipeline, named AFluID (Automatic Influenza Identification pipeline), was implemented in Python and combined clustering (cd-hit), similarity search (BLAST), clade assignment (Nextclade), and mutation screening (FluMut). It was validated on curated datasets from NCBI, GISAID, EQA panels, and Portuguese outbreak samples resulting from an INSA-INIAV-IP cooperation. Results: AFluID rapidly identified IAV segments and subtypes across datasets, while its multi-feature design further streamlined the identification of closely-related references (and detection of reassortment events), along with clade classification, host/geographic inference, and identification of mutations potentially linked to adaptation, virulence, and resistance. Proof-of-concept analysis of outbreak samples confirmed applicability in real surveillance scenarios. Conclusions: AFluID addresses major limitations of current pipelines by offering an automated, scalable, and reproducible framework tailored for non-seasonal IAV. Although reassortment detection requires further refinement, the pipeline strengthens laboratory surveillance capacity and represents a step toward integration with global frameworks such as INSaFLU.
