- Results as:
- Print view
- Endnote (RIS)
- BibTeX
- Table: CSV | HTML
Books and chapters
Publications
Publications
Publications
Publications
Publications
Publications
Publications
Publications
Research Mission and Profile
Molecular Signal Processing
Bioorganic Chemistry
Biochemistry of Plant Interactions
Cell and Metabolic Biology
Independent Junior Research Groups
Program Center MetaCom
Publications
Good Scientific Practice
Research Funding
Networks and Collaborative Projects
Symposia and Colloquia
Alumni Research Groups
Books and chapters
Besides a plethora of formal ontologies, the requirement for simple data annotation has led to an increased use of so called controlled vocabularies (CV) in multiple omics communities. We analyze two of those CVs from an ontological viewpoint, highlight typical modelling errors and propose more adequate solutions. Discovered errors are discussed in the light of the OOPS ontology pitfall framework and the OBO Foundry naming conventions. As a result the CVs could be improved and the OOPS catalogue could be amended and expanded with new, previously missing error categories. In an outlook we discuss potential reasons for the error prevalence and analyse what criticism is justified for CV semantics and what `errors' are more valid for formal ontologies rather than CVs. We conclude that although many design principles valid for description logics ontologies are not relevant for semantically flat CVs and in turn there is a need for CV-best-practices that are not appropriate for description logics ontologies, there is room for improvement in the analysed CVs. The scope difference between CVs and formal semantics however should affect policy providers, which should narrow down the scope of their policies, i.e. by stating for each policy the expressivity regime for which it is valid.
Books and chapters
Previous chapters have introduced protocols and examples for high‐throughput metabolomics experiments. Metabolite identification is an important step in these experiments, bridging the metabolomics experiment, metabolite profiling, and the biological interpretation of the results. The elemental composition of the individual metabolites is the most basic information that can be calculated already from the mass spectrometry (MS) profiling data. For a more thorough identification, the “interesting” peaks are subjected to MS2, or even higher‐order MSn measurements. Such spectra carry rich structural hints, revealing building blocks of the unknown compound, or allowing comparison with databases of reference spectra. This chapter describes a general strategy to identify metabolites, and proceeds through the steps of the identification for two example compounds, first calculating elemental compositions, performing in silico identification without reference spectra, and finally spectral library lookup.
Publications
Metabolomics has advanced significantly in the past 10 years with important developments related to hardware, software and methodologies and an increasing complexity of applications. In discovery-based investigations, applying untargeted analytical methods, thousands of metabolites can be detected with no or limited prior knowledge of the metabolite composition of samples. In these cases, metabolite identification is required following data acquisition and processing. Currently, the process of metabolite identification in untargeted metabolomic studies is a significant bottleneck in deriving biological knowledge from metabolomic studies. In this review we highlight the different traditional and emerging tools and strategies applied to identify subsets of metabolites detected in untargeted metabolomic studies applying various mass spectrometry platforms. We indicate the workflows which are routinely applied and highlight the current limitations which need to be overcome to provide efficient, accurate and robust identification of metabolites in untargeted metabolomic studies. These workflows apply to the identification of metabolites, for which the structure can be assigned based on entries in databases, and for those which are not yet stored in databases and which require a de novo structure elucidation.
Publications
Liquid chromatography–mass spectrometry (LC–MS) is a commonly used analytical platform for non-targeted metabolite profiling experiments. Although data acquisition, processing and statistical analyses are almost routine in such experiments, further annotation and subsequent identification of chemical compounds are not. For identification, tandem mass spectra provide valuable information towards the structure of chemical compounds. These are typically acquired online, in data-dependent mode, or offline, using handcrafted acquisition methods and manually extracted from raw data. Here, we present several methods to fast-track and improve both the acquisition and processing of LC–MS/MS data. Our nearly online (nearline) data-dependent tandem MS strategy creates a minimal set of LC–MS/MS acquisition methods for relevant features revealed by a preceding non-targeted profiling experiment. Using different filtering criteria, such as intensity or ion type, the acquisition of irrelevant spectra is minimized. Afterwards, LC–MS/MS raw data are processed with feature detection and grouping algorithms. The extracted tandem mass spectra can be used for both library search and de-novo identification methods. The algorithms are implemented in the R package MetShot and support the export to Bruker, Agilent or Waters QTOF instruments and the vendor-independent TraML standard. We evaluate the performance of our workflow on a Bruker micrOTOF-Q by comparison of automatically acquired and extracted tandem mass spectra obtained from a mixture of natural product standards against manually extracted reference spectra. Using Arabidopsis thaliana wild-type and biosynthetic gene knockout plants, we characterize the metabolic products of a biosynthetic pathway and demonstrate the integration of our approach into a typical non-targeted metabolite profiling workflow.
Publications
MetaboLights (http://www.ebi.ac.uk/metabolights) is the first general-purpose, open-access repository for metabolomics studies, their raw experimental data and associated metadata, maintained by one of the major open-access data providers in molecular biology. Metabolomic profiling is an important tool for research into biological functioning and into the systemic perturbations caused by diseases, diet and the environment. The effectiveness of such methods depends on the availability of public open data across a broad range of experimental methods and conditions. The MetaboLights repository, powered by the open source ISA framework, is cross-species and cross-technique. It will cover metabolite structures and their reference spectra as well as their biological roles, locations, concentrations and raw data from metabolic experiments. Studies automatically receive a stable unique accession number that can be used as a publication reference (e.g. MTBLS1). At present, the repository includes 15 submitted studies, encompassing 93 protocols for 714 assays, and span over 8 different species including human, Caenorhabditis elegans, Mus musculus and Arabidopsis thaliana. Eight hundred twenty-seven of the metabolites identified in these studies have been mapped to ChEBI. These studies cover a variety of techniques, including NMR spectroscopy and mass spectrometry.
Publications
Mass spectrometry (MS) is an important analytical technique for the detection and identification of small compounds. The main bottleneck in the interpretation of metabolite profiling or screening experiments is the identification of unknown compounds from tandem mass spectra.Spectral libraries for tandem MS, such as MassBank or NIST, contain reference spectra for many compounds, but their limited chemical coverage reduces the chance for a correct and reliable identification of unknown spectra outside the database domain.On the other hand, compound databases like PubChem or ChemSpider have a much larger coverage of the chemical space, but they cannot be queried with spectral information directly. Recently, computational mass spectrometry methods and in silico fragmentation prediction allow users to search such databases of chemical structures.We present a new strategy called MetFusion to combine identification results from several resources, in particular, from the in silico fragmenter MetFrag with the spectral library MassBank to improve compound identification. We evaluate the performance on a set of 1062 spectra and achieve an improved ranking of the correct compound from rank 28 using MetFrag alone, to rank 7 with MetFusion, even if the correct compound and similar compounds are absent from the spectral library. On the basis of the evaluation, we extrapolate the performance of MetFusion to the KEGG compound database.
Publications
Mass spectrometry (MS) has become the analytical method of choice in plant metabolomics. Nevertheless, metabolite annotation remains a major challenge and implies the integration of structural searches in compound libraries with biological knowledge inferred from metabolite regulation studies. Here we propose a novel integrative approach to process and exploit the rich structural information contained in in-source fragmentation patterns of high-resolution LC–MS profiles. In this approach, a correlation matrix is first calculated from individual mass features extracted by xcms processing. Mass feature co-regulation patterns corresponding to metabolite in-source fragmentation are then detected and assembled into compound spectra using the R package CAMERA and processed for in silico fragment-based structure elucidation using MetFrag. We validate the performance of this approach for the rapid annotation of the twelve largest compound spectra, including four O-acyl sugars and six 17-hydroxygeranyllinalool diterpene glycosides in metabolic profiles of insect-attacked Nicotiana attenuata leaves. Additionally, we demonstrate the power of refining MetFrag metabolite annotations based on co-regulation patterns between known and unknown compounds in the correlation matrix and proposed structural annotations on two previously un-characterized O-acyl sugars. In summary, this novel approach facilitates compound annotation from in-source fragmentation patterns using correlation between intensities of mass features of one or several metabolites. Additionally, this analysis provides further support that insect herbivory activates major metabolic reconfigurations in N. attenuata leaves.
Publications
In this paper, we describe data processing and metabolite identification approaches which lead to a rapid and semi-automated interpretation of metabolomics experiments. Data from metabolite fingerprinting using LC-ESI-Q-TOF/MS were processed with several open-source software packages, including XCMS and CAMERA to detect features and group features into compound spectra. Next, we describe the automatic scheduling of tandem mass spectrometry (MS) acquisitions to acquire a large number of MS/MS spectra, and the subsequent processing and computer-assisted annotation towards identification using the R packages MetShot, Rdisop, and the MetFusion application. We also implement a simple retention time prediction model using predicted lipophilicity logD, which predicts retention times within 42 s (6 min gradient) for most compounds in our setup. We putatively identified 44 common metabolites including several amino acids and phospholipids at metabolomics standards initiative (MSI) levels two and three and confirmed the majority of them by comparison with authentic standards at MSI level one. To aid both data integration within and data sharing between laboratories, we integrated data from two labs and mapped retention times between the chromatographic systems. Despite the different MS instrumentation and different chromatographic gradient programs, the mapped retention times agree within 26 s (20 min gradient) for 90 % of the mapped features.
Publications
The Critical Assessment of Small Molecule Identification, or CASMI, contest was founded in 2012 to provide scientists with a common open dataset to evaluate their identification methods. In this article, the challenges and solutions for the inaugural CASMI 2012 are presented. The contest was split into four categories corresponding with tasks to determine molecular formula and molecular structure, each from two measurement types, liquid chromatography-high resolution mass spectrometry (LC-HRMS), where preference was given to high mass accuracy data, and gas chromatography-electron impact-mass spectrometry (GC-MS), i.e., unit accuracy data. These challenges were obtained from plant material, environmental samples and reference standards. It was surprisingly difficult to obtain data suitable for a contest, especially for GC-MS data where existing databases are very large. The level of difficulty of the challenges is thus quite varied. In this article, the challenges and the answers are discussed, and recommendations for challenge selection in subsequent CASMI contests are given.
Publications
The Critical Assessment of Small Molecule Identification (CASMI) Contest was founded in 2012 to provide scientists with a common open dataset to evaluate their identification methods. In this review, we summarize the submissions, evaluate procedures and discuss the results. We received five submissions (three external, two internal) for LC–MS Category 1 (best molecular formula) and six submissions (three external, three internal) for LC–MS Category 2 (best molecular structure). No external submissions were received for the GC–MS Categories 3 and 4. The team of Dunn et al. from Birmingham had the most answers in the 1st place for Category 1, while Category 2 was won by H. Oberacher. Despite the low number of participants, the external and internal submissions cover a broad range of identification strategies, including expert knowledge, database searching, automated methods and structure generation. The results of Category 1 show that complementing automated strategies with (manual) expert knowledge was the most successful approach, while no automated method could compete with the power of spectral searching for Category 2—if the challenge was present in a spectral library. Every participant topped at least one challenge, showing that different approaches are still necessary for interpretation diversity.