Publikationen Program Center MetaCom
- Results as:
- Print view
- Endnote (RIS)
- BibTeX
- Table: CSV | HTML
Preprints
Publications
Publications
Publications
Publications
Diese Seite wurde zuletzt am 03 Sep 2024 geändert.
Research Mission and Profile
Molecular Signal Processing
Bioorganic Chemistry
Biochemistry of Plant Interactions
Cell and Metabolic Biology
Independent Junior Research Groups
Program Center MetaCom
Publications
Good Scientific Practice
Research Funding
Networks and Collaborative Projects
Symposia and Colloquia
Alumni Research Groups
Publikationen
Publikationen Program Center MetaCom
Preprints
High-quality data preprocessing is essential for untargeted metabolomics experiments, where increasing dataset scale and complexity demand adaptable, robust, and reproducible software solutions. Modern preprocessing tools must evolve to integrate seamlessly with downstream analysis platforms, ensuring efficient and streamlined workflows. Since its introduction in 2005, the xcms R package has become one of the most widely used tools for LC-MS data preprocessing. Developed through an open-source, community-driven approach, xcms has maintained long-term stability while continuously expanding its capabilities and accessibility. We present recent advancements that position xcms as a central component of a modular and interoperable software ecosystem for metabolomics data analysis. Key improvements include enhanced scalability, enabling the processing of large-scale experiments with thousands of samples on standard computing hardware. These developments empower users to build comprehensive, customizable, and reproducible workflows tailored to diverse experimental designs and analytical needs. An expanding collection of tutorials, documentation, and teaching materials further supports both new and experienced users in leveraging the broader R and Bioconductor ecosystems. These resources facilitate the integration of statistical modeling, visualization tools, and domain-specific packages, extending the reach and impact of xcms workflows. Together, these enhancements solidify xcms as a cornerstone of modern metabolomics research.
Publications
Liquid chromatography-mass spectrometry (LC-MS)-based untargeted metabolomics experiments have become increasingly popular because of the wide range of metabolites that can be analyzed and the possibility to measure novel compounds. LC-MS instrumentation and analysis conditions can differ substantially among laboratories and experiments, thus resulting in non-standardized datasets demanding customized annotation workflows. We present an ecosystem of R packages, centered around the MetaboCoreUtils, MetaboAnnotation and CompoundDb packages that together provide a modular infrastructure for the annotation of untargeted metabolomics data. Initial annotation can be performed based on MS1 properties such as m/z and retention times, followed by an MS2-based annotation in which experimental fragment spectra are compared against a reference library. Such reference databases can be created and managed with the CompoundDb package. The ecosystem supports data from a variety of formats, including, but not limited to, MSP, MGF, mzML, mzXML, netCDF as well as MassBank text files and SQL databases. Through its highly customizable functionality, the presented infrastructure allows to build reproducible annotation workflows tailored for and adapted to most untargeted LC-MS-based datasets. All core functionality, which supports base R data types, is exported, also facilitating its re-use in other R packages. Finally, all packages are thoroughly unit-tested and documented and are available on GitHub and through Bioconductor.
Publications
Metabolomics aims to measure and characterise the complex composition of metabolites in a biological system. Metabolomics studies involve sophisticated analytical techniques such as mass spectrometry and nuclear magnetic resonance spectroscopy, and generate large amounts of high-dimensional and complex experimental data. Open source processing and analysis tools are of major interest in light of innovative, open and reproducible science. The scientific community has developed a wide range of open source software, providing freely available advanced processing and analysis approaches. The programming and statistics environment R has emerged as one of the most popular environments to process and analyse Metabolomics datasets. A major benefit of such an environment is the possibility of connecting different tools into more complex workflows. Combining reusable data processing R scripts with the experimental data thus allows for open, reproducible research. This review provides an extensive overview of existing packages in R for different steps in a typical computational metabolomics workflow, including data processing, biostatistics, metabolite annotation and identification, and biochemical network and pathway analysis. Multifunctional workflows, possible user interfaces and integration into workflow management systems are also reviewed. In total, this review summarises more than two hundred metabolomics specific packages primarily available on CRAN, Bioconductor and GitHub.
Publications
Demands in research investigating small molecules by applying untargeted approaches have been a key motivator for the development of repositories for mass spectrometry spectra and automated tools to aid compound identification. Comparatively little attention has been afforded to using retention times (RTs) to distinguish compounds and for liquid chromatography there are currently no coordinated efforts to share and exploit RT information. We therefore present PredRet; the first tool that makes community sharing of RT information possible across laboratories and chromatographic systems (CSs). At http://predret.org, a database of RTs from different CSs is available and users can upload their own experimental RTs and download predicted RTs for compounds which they have not experimentally determined in their own experiments. For each possible pair of CSs in the database, the RTs are used to construct a projection model between the RTs in the two CSs. The number of compounds for which RTs can be predicted and the accuracy of the predictions are dependent upon the compound coverage overlap between the CSs used for construction of projection models. At the moment, it is possible to predict up to 400 RTs with a median error between 0.01 and 0.28 min depending on the CS and the median width of the prediction interval ranging from 0.08 to 1.86 min. By comparing experimental and predicted RTs, the user can thus prioritize which isomers to target for further characterization and potentially exclude some structures completely. As the database grows, the number and accuracy of predictions will increase.
Publications
In this paper, we describe data processing and metabolite identification approaches which lead to a rapid and semi-automated interpretation of metabolomics experiments. Data from metabolite fingerprinting using LC-ESI-Q-TOF/MS were processed with several open-source software packages, including XCMS and CAMERA to detect features and group features into compound spectra. Next, we describe the automatic scheduling of tandem mass spectrometry (MS) acquisitions to acquire a large number of MS/MS spectra, and the subsequent processing and computer-assisted annotation towards identification using the R packages MetShot, Rdisop, and the MetFusion application. We also implement a simple retention time prediction model using predicted lipophilicity logD, which predicts retention times within 42 s (6 min gradient) for most compounds in our setup. We putatively identified 44 common metabolites including several amino acids and phospholipids at metabolomics standards initiative (MSI) levels two and three and confirmed the majority of them by comparison with authentic standards at MSI level one. To aid both data integration within and data sharing between laboratories, we integrated data from two labs and mapped retention times between the chromatographic systems. Despite the different MS instrumentation and different chromatographic gradient programs, the mapped retention times agree within 26 s (20 min gradient) for 90 % of the mapped features.
Diese Seite wurde zuletzt am 03 Sep 2024 geändert.

