jump to searchjump to navigationjump to content


Nowadays, gene discovery has been made very efficient with the combination of deep sequencing and the exploitation of natural variation. Just in Arabidopsis, hundreds of genetic loci have been identified as influencing a wide variety of processes, and we aim to go from gene-of-interest to characterized protein product using approaches to “take a picture” of the comprehensive metabolome of the plant.

The IPB is currently operating a wide range of NMR and mass spectrometry instruments for metabolomics across all four departments, which are integrated into our Metabolomics Platform.

The experimental work is complemented by extensive Cheminformatics and Bioinformatics research to process and interpret the huge amounts of data. The IPB is operating the first European MassBank server, and hosts several online tools for metabolite identification.

Contact partner for all interests concerning the metabolomics platform is Dr. Steffen Neumann.

Publications by Tag: Metabolomics

Sort by: Year Type of publication

Displaying results 11 to 20 of 25.


Deutsch, E.W., Chambers, M., Neumann, S., Levander, F., Binz, P.-A., Shofstahl, J., Campbell, D.S., Mendoza, L., Ovelleiro, D., Helsens, K., Martens, L., Aebersold, R., Moritz, R.L. & Brusniak, M.-Y. TraML: a standard format for exchange of selected reaction monitoring transition lists Mol Cell Proteomics 11(4), (2012) DOI: 10.1074/mcp.R111.015040

Targeted proteomics via selected reaction monitoring is a powerful mass spectrometric technique affording higher dynamic range, increased specificity and lower limits of detection than other shotgun mass spectrometry methods when applied to proteome analyses. However, it involves selective measurement of predetermined analytes, which requires more preparation in the form of selecting appropriate signatures for the proteins and peptides that are to be targeted. There is a growing number of software programs and resources for selecting optimal transitions and the instrument settings used for the detection and quantification of the targeted peptides, but the exchange of this information is hindered by a lack of a standard format. We have developed a new standardized format, called TraML, for encoding transition lists and associated metadata. In addition to introducing the TraML format, we demonstrate several implementations across the community, and provide semantic validators, extensive documentation, and multiple example instances to demonstrate correctly written documents. Widespread use of TraML will facilitate the exchange of transitions, reduce time spent handling incompatible list formats, increase the reusability of previously optimized transitions, and thus accelerate the widespread adoption of targeted proteomics via selected reaction monitoring.


Neumann, S., Thum, A. & Böttcher, C. Nearline acquisition and processing of liquid chromatography-tandem mass spectrometry data Metabolomics (2012) DOI: 10.1007/s11306-012-0401-0

Liquid chromatography–mass spectrometry (LC–MS) is a commonly used analytical platform for non-targeted metabolite profiling experiments. Although data acquisition, processing and statistical analyses are almost routine in such experiments, further annotation and subsequent identification of chemical compounds are not. For identification, tandem mass spectra provide valuable information towards the structure of chemical compounds. These are typically acquired online, in data-dependent mode, or offline, using handcrafted acquisition methods and manually extracted from raw data. Here, we present several methods to fast-track and improve both the acquisition and processing of LC–MS/MS data. Our nearly online (nearline) data-dependent tandem MS strategy creates a minimal set of LC–MS/MS acquisition methods for relevant features revealed by a preceding non-targeted profiling experiment. Using different filtering criteria, such as intensity or ion type, the acquisition of irrelevant spectra is minimized. Afterwards, LC–MS/MS raw data are processed with feature detection and grouping algorithms. The extracted tandem mass spectra can be used for both library search and de-novo identification methods. The algorithms are implemented in the R package MetShot and support the export to Bruker, Agilent or Waters QTOF instruments and the vendor-independent TraML standard. We evaluate the performance of our workflow on a Bruker micrOTOF-Q by comparison of automatically acquired and extracted tandem mass spectra obtained from a mixture of natural product standards against manually extracted reference spectra. Using Arabidopsis thaliana wild-type and biosynthetic gene knockout plants, we characterize the metabolic products of a biosynthetic pathway and demonstrate the integration of our approach into a typical non-targeted metabolite profiling workflow.


Schymanski, E.L., Gallampois, C.M.J., Krauss, M., Meringer, M., Neumann, S., Schulze, T., Wolf, S. & Brack, W. Consensus Structure Elucidation Combining GC/EI-MS, Structure Generation, and Calculated Properties Anal. Chem 84 (7), 3287–3295, (2012) DOI: 10.1021/ac203471y


This article explores consensus structure elucidation on the basis of GC/EI-MS, structure generation, and calculated properties for unknown compounds. Candidate structures were generated using the molecular formula and substructure information obtained from GC/EI-MS spectra. Calculated properties were then used to score candidates according to a consensus approach, rather than filtering or exclusion. Two mass spectral match calculations (MOLGEN-MS and MetFrag), retention behavior (Lee retention index/boiling point correlation, NIST Kovat’s retention index), octanol–water partitioning behavior (log

), and finally steric energy calculations were used to select candidates. A simple consensus scoring function was developed and tested on two unknown spectra detected in a mutagenic subfraction of a water sample from the Elbe River using GC/EI-MS. The top candidates proposed using the consensus scoring technique were purchased and confirmed analytically using GC/EI-MS and LC/MS/MS. Although the compounds identified were not responsible for the sample mutagenicity, the structure-generation-based identification for GC/EI-MS using calculated properties and consensus scoring was demonstrated to be applicable to real-world unknowns and suggests that the development of a similar strategy for multidimensional high-resolution MS could improve the outcomes of environmental and metabolomics studies.


Kuhl, C., Tautenhahn, R., Böttcher, C., Larson, R. & Neumann, S. CAMERA: An integrated strategy for compound spectra extraction and annotation of LC/MS data sets Anal Chem. 84 (1), 283-289, (2012) DOI: 10.1021/ac202450g

Liquid chromatography coupled to mass spectrometry is routinely used for metabolomics experiments. In contrast to the fairly routine and automated data acquisition steps, subsequent compound annotation and identification require extensive manual analysis and thus form a major bottleneck in data interpretation. Here we present CAMERA, a Bioconductor package integrating algorithms to extract compound spectra, annotate isotope and adduct peaks, and propose the accurate compound mass even in highly complex data. To evaluate the algorithms, we compared the annotation of CAMERA against a manually defined annotation for a mixture of known compounds spiked into a complex matrix at different concentrations. CAMERA successfully extracted accurate masses for 89.7% and 90.3% of the annotatable compounds in positive and negative ion modes, respectively. Furthermore, we present a novel annotation approach that combines spectral information of data acquired in opposite ion modes to further improve the annotation rate. We demonstrate the utility of CAMERA in two different, easily adoptable plant metabolomics experiments, where the application of CAMERA drastically reduced the amount of manual analysis.


Hildebrandt, C., Wolf, S. & Neumann, S. Database supported candidate search for Metabolite identification Journal of Integrative Bioinformatics 8 (2), 157, (2011) DOI: 10.2390/biecoll-jib-2011-157

Mass spectrometry is an important analytical technology for the identification of metabolites and small compounds by their exact mass. But dozens or hundreds of different compounds may have a similar mass or even the same molecule formula. Further elucidation requires tandem mass spectrometry, which provides the masses of compound fragments, but in silico fragmentation programs require substantial computational resources if applied to large numbers of candidate structures. We present and evaluate an approach to obtain candidates from a relational database which contains 28 million compounds from PubChem. A training phase associates tandem-MS peaks with corresponding fragment structures. For the candidate search, the peaks in a query spectrum are translated to fragment structures, and the candidates are retrieved and sorted by the number of matching fragment structures. In the cross validation the evaluation of the relative ranking positions (RRP) using different sizes of training sets confirms that a larger coverage of training data improves the average RRP from 0.65 to 0.72. Our approach allows downstream algorithms to process candidates in order of importance.


Wolf, S., Schmidt, S., Müller-Hannemann, M. & Neumann, S. In silico fragmentation for computer assisted identification of metabolite mass spectra BMC Bioinformatics 2010 148, 11, (2010) DOI: 10.1186/1471-2105-11-148


Mass spectrometry has become the analytical method of choice in metabolomics research. The identification of unknown compounds is the main bottleneck. In addition to the precursor mass, tandem MS spectra carry informative fragment peaks, but the coverage of spectral libraries of measured reference compounds are far from covering the complete chemical space. Compound libraries such as PubChem or KEGG describe a larger number of compounds, which can be used to compare their in silico fragmentation with spectra of unknown metabolites.


We created the MetFrag suite to obtain a candidate list from compound libraries based on the precursor mass, subsequently ranked by the agreement between measured and in silico fragments. In the evaluation MetFrag was able to rank most of the correct compounds within the top 3 candidates returned by an exact mass query in KEGG. Compared to a previously published study, MetFrag obtained better results than the commercial MassFrontier software. Especially for large compound libraries, the candidates with a good score show a high structural similarity or just different stereochemistry, a subsequent clustering based on chemical distances reduces this redundancy. The in silico fragmentation requires less than a second to process a molecule, and MetFrag performs a search in KEGG or PubChem on average within 30 to 300 seconds, respectively, on an average desktop PC.


We presented a method that is able to identify small molecules from tandem MS measurements, even without spectral reference data or a large set of fragmentation rules. With today's massive general purpose compound libraries we obtain dozens of very similar candidates, which still allows a confident estimate of the correct compound class. Our tool MetFrag improves the identification of unknown substances from tandem MS spectra and delivers better results than comparable commercial software. MetFrag is available through a web application, web services and as java library. The web frontend allows the end-user to analyse single spectra and browse the results, whereas the web service and console application are aimed to perform batch searches and evaluation.


Horai, H., Arita, M., Kanaya, S., Nihei, Y., Ikeda, T., Suwa, K., Ojima, Y., Tanaka, K., Tanaka, S., Aoshima, K., Oda, Y., Kakazu, Y., Kusano, M., Tohege, T., Matsuda, F., Sawada, Y., Hirai, M.Y., Nakanishi, H., Ikeda, K., Akimoto, N., Maoko, T., Takahashi, H., Ara, T., Sakurai, N., Suzuki, H., Shibata, D., Neumann, S., Iida, T., Tanaka, K., Funatsu, K., Matsuura, F., Soga, T., Taguchi, R., Saito, K. & Nishioka, T. MassBank: a public repository for sharing mass spectral data for life sciences Journal of Mass Spectrometrie 45(7), 703-714, (2010) DOI: 10.1002/jms.1777

MassBank is the first public repository of mass spectra of small chemical compounds for life sciences (<3000 Da). The database contains 605 electron-ionization mass spectrometry (EI-MS), 137 fast atom bombardment MS and 9276 electrospray ionization (ESI)-MS(n) data of 2337 authentic compounds of metabolites, 11 545 EI-MS and 834 other-MS data of 10,286 volatile natural and synthetic compounds, and 3045 ESI-MS(2) data of 679 synthetic drugs contributed by 16 research groups (January 2010). ESI-MS(2) data were analyzed under nonstandardized, independent experimental conditions. MassBank is a distributed database. Each research group provides data from its own MassBank data servers distributed on the Internet. MassBank users can access either all of the MassBank data or a subset of the data by specifying one or more experimental conditions. In a spectral search to retrieve mass spectra similar to a query mass spectrum, the similarity score is calculated by a weighted cosine correlation in which weighting exponents on peak intensity and the mass-to-charge ratio are optimized to the ESI-MS(2) data. MassBank also provides a merged spectrum for each compound prepared by merging the analyzed ESI-MS(2) data on an identical compound under different collision-induced dissociation conditions. Data merging has significantly improved the precision of the identification of a chemical compound by 21-23% at a similarity score of 0.6. Thus, MassBank is useful for the identification of chemical compounds and the publication of experimental data.


Neumann, S. & Böcker, S. Computational mass spectrometry for metabolomics: Identification of metabolites and small molecules Analytical and Bioanalytical Chemistry 398, 2779-2788, (2010) DOI: 10.1007/s00216-010-4142-5

The identification of compounds from mass spectrometry (MS) data is still seen as a major bottleneck in the interpretation of MS data. This is particularly the case for the identification of small compounds such as metabolites, where until recently little progress has been made. Here we review the available approaches to annotation and identification of chemical compounds based on electrospray ionization (ESI-MS) data. The methods are not limited to metabolomics applications, but are applicable to any small compounds amenable to MS analysis. Starting with the definition of identification, we focus on the analysis of tandem mass and MS(n) spectra, which can provide a wealth of structural information. Searching in libraries of reference spectra provides the most reliable source of identification, especially if measured on comparable instruments. We review several choices for the distance functions. The identification without reference spectra is even more challenging, because it requires approaches to interpret tandem mass spectra with regard to the molecular structure. Both commercial and free tools are capable of mining general-purpose compound libraries, and identifying candidate compounds. The holy grail of computational mass spectrometry is the de novo deduction of structure hypotheses for compounds, where method development has only started thus far. In a case study, we apply several of the available methods to the three compounds, kaempferol, reserpine, and verapamil, and investigate whether this results in reliable identifications.


Böttcher, C., von Roepenack-Lahaye, E., Schmidt, J., Schmotz, C., Neumann, S., Scheel, D. & Clemens, S. Metabolome analysis of biosynthetic mutants reveals diversity of metabolomic changes and allows identification of a large number of new compounds in Arabidopsis thaliana Plant Physiol 147, 2107-2120, (2008) DOI: 10.1104/pp.108.117754

Metabolomics is facing a major challenge: the lack of knowledge about metabolites present in a given biological system. Thus, large-scale discovery of metabolites is considered an essential step toward a better understanding of plant metabolism. We show here that the application of a metabolomics approach generating structural information for the analysis of Arabidopsis (Arabidopsis thaliana) mutants allows the efficient cataloging of metabolites. Fifty-six percent of the features that showed significant differences in abundance between seeds of wild-type, transparent testa4, and transparent testa5 plants could be annotated. Seventy-five compounds were structurally characterized, 21 of which could be identified. About 40 compounds had not been known from Arabidopsis before. Also, the high-resolution analysis revealed an unanticipated expansion of metabolic conversions upstream of biosynthetic blocks. Deficiency in chalcone synthase results in the increased seed-specific biosynthesis of a range of phenolic choline esters. Similarly, a lack of chalcone isomerase activity leads to the accumulation of various naringenin chalcone derivatives. Furthermore, our data provide insight into the connection between p-coumaroyl-coenzyme A-dependent pathways. Lack of flavonoid biosynthesis results in elevated synthesis not only of p-coumarate-derived choline esters but also of sinapate-derived metabolites. However, sinapoylcholine is not the only accumulating end product. Instead, we observed specific and sophisticated changes in the complex pattern of sinapate derivatives.


Böttcher, C., Centeno, D., Freitag, J., Höfgen, R., Köhl, K., Kopka, J., Kroymann, J., Matros, A., Mock, H.P., Neumann, S., Pfalz, M., von Roepenack-Lahaye, E., Schauer, N., Trenkamp, S., Zubriggen, M. & Fernie, A.R. Teaching (and learning from) metabolomics: the 2006 PlantMetaNet ETNA Metabolomics Research School Physiol Plant 132, 136-49, (2008) DOI: 10.1111/j.1399-3054.2007.00990.x

Under the auspices of the European Training and Networking Activity programme of the European Union, a 'Metabolic Profiling and Data Analysis' Plant Genomics and Bioinformatics Summer School was hosted in Potsdam, Germany between 20 and 29 September 2006. Sixteen early career researchers were invited from the European Union partner nations and the so-called developing nations (Appendix). Lectures from invited leading European researchers provided an overview of the state of the art of these fields and seeded discussion regarding major challenges for their future advancement. Hands-on experience was provided by an example experiment - that of defining the metabolic response of Arabidopsis to treatment of a commercial herbicide of defined mode of action. This experiment was performed throughout the duration of the course in order to teach the concepts underlying extraction and machine handling as well as to provide a rich data set with which the required computation and statistical skills could be illustrated. Here we review the state of the field by describing both key lectures given at and practical aspects taught at the summer school. In addition, we disclose results that were obtained using the four distinct technical platforms at the different participating institutes. While the effects of the chosen herbicide are well documented, this study looks at a broader number of metabolites than in previous investigations. This allowed, on the one hand, not only to characterise further effects of the herbicide than previously observed but also to detect molecules other than the herbicide that were obviously present in the commercial formulation. These data and the workshop in general are all discussed in the context of the teaching of metabolomics.

This page was last modified on 10.03.2014.

IPB Mainnav Search