You are here

Soutenance de Thèse - Taxonomie et inférence fonctionnelle des procaryotes : développement de MACADAM, une base de données de voies métaboliques associées à une taxonomie

Malo Le-Boulch
INRA GenPhySE NED
Date: 
Friday, 20 December, 2019 - 13:00 to 17:00
Room: 
Salle de Conférence IFR, INRA Castanet-Tolosan
Summary: 
Prokaryotes are ubiquitous organisms living in communities, whose extreme metabolic diversity is correlated with their ubiquity. To contribute to a better understanding of the functional role of prokaryotes, we developed MACADAM: a database of metabolic pathways associated with a prokaryote-centric taxonomy. The aim is to provide the scientific community with open access to functional information data which has been selected for its genomic and annotation quality, which is interoperable and simply structured, thereby enabling updates to be made to the data gathered from data sources such as MetaCyc, MicroCyc and RefSeq by MACADAM. MACADAM meets these criteria. MACADAM includes PGDBs (Pathway/Genome DataBases) assembled from RefSeq genomes meeting the complete genome quality criteria, by using the Pathway Tools software made available by MetaCyc, a metabolic pathway database. In order to enrich the database and increase the quality of functional information in MACADAM, a collection of expert-curated PGDBs named MicroCyc was added. Its PGDBs are favoured over those of RefSeq. Functional information sourced from the literature contained in FAPROTAX and IJSEM phenotypic databases was also added. MACADAM contains 13 509 PGDBs (13 195 bacterial PGDBs and 314 archaeal PGDBs) and 1 260 unique metabolic pathways. Built using interoperable technologies (Python 3, SQLite), in a downloadable format and with open-source code, MACADAM can be integrated into tools requiring the pairing of functional and taxonomic information. To improve its visibility among the microbiology community, MACADAM is available online (http://macadam.toulouse.inra.fr). By using the taxonomy of the NCBI Taxonomy database, MACADAM makes it possible to link any taxon—ranging from phylum to species—to its functional information. Each metabolic pathway is associated with two completeness scores (a PS: Pathway Score and a PFS: Pathway Frequency Score). With each update, MACADAM integrates the new versions of RefSeq, NCBI Taxonomy and MicroCyc, allowing any corrections made to the taxonomy to be promptly amended and to add information on recently-submitted genomes. Two examples of ways in which to use MACADAM, and a comparison with an inference approach based on metagenomic readings allowed for a discussion of the strengths and weaknesses (i) MACADAM and (ii) of inference by a prior taxonomic identification approach. The identification of individuals within the prokaryotic community benefits greatly from advances in sequencing technology and the refinement of bioinformatics analysis pipelines. The analysis of readings from metagenomic sequencing leads to the reconstruction of putative genomes and metagenomic species. In this context, we examined the problem of correcting taxonomic assignments of metagenomic species, by using a phylogenetic tree reconstruction approach on the one hand, and by using an overall genome relatedness index (ANI) on the other hand. This work allowed us to clarify the positioning of nine groups of metagenomic species, and highlighted errors in reference genome affiliation in Megasphaera and Blautia Obeum. It also allowed us to confirm the reclassification of Ruminococcus gauvreauii into the genus Blautia. To limit errors and prevent their replication, it is important to ensure the quality of the information contained in the databases. In this context, the scientific community should have better knowledge of the rules of nomenclature and systematic methods. Further efforts should be made to advocate the merits of correcting database data. Finally, although metagenomics provides a better understanding of the microbial communities around us, an effort to cultivate organisms that are said to be uncultivable would increase the knowledge and diversity of prokaryotic organisms in databases. These efforts will have a direct impact on the quality of functional information and the coverage of MACADAM's prokaryotic diversity.