Geschmack ist vorhersagbar: Mit FlavorMiner. FlavorMiner heißt das Tool, das IPB-Chemiker und Partner aus Kolumbien jüngst entwickelt haben. Das Programm kann, basierend auf maschinellem Lernen (KI), anhand der…
Seit Februar 2021 bietet Wolfgang Brandt, ehemaliger Leiter der Arbeitsgruppe Computerchemie am IPB, sein Citizen Science-Projekt zur Pilzbestimmung an. Dafür hat er in regelmäßigen Abständen öffentliche Vorträge zur Vielfalt…
Research data is an essential part of research and almost every publication in chemistry. The data itself can be valuable for reuse if sustainably deposited, annotated and archived. Thus, it is important to publish data following the FAIR principles, to make it findable, accessible, interoperable and reusable not only for humans but also in machine-readable form. This also improves transparency and reproducibility of research findings and fosters analytical work with scientific data to generate new insights, being only accessible with manifold and diverse datasets. Research data requires complete and informative metadata and use of open data formats to obtain interoperable data. Generic data formats like AnIML and JCAMP-DX have been used for many applications. Special formats for some analytical methods are already accepted, like mzML for mass spectrometry or nmrML and NMReDATA for NMR spectroscopy data. Other methods still lack common standards for data. Only a joint effort of chemists, instrument and software vendors, publishers and infrastructure maintainers can make sure that the analytical data will be of value in the future. In this review, we describe existing data formats in analytical chemistry and introduce guidelines for the development and use of standardized and open data formats.
Publikation
Rainer, J.; Vicini, A.; Salzer, L.; Stanstrup, J.; Badia, J. M.; Neumann, S.; Stravs, M. A.; Verri Hernandes, V.; Gatto, L.; Gibb, S.; Witting, M.;A modular and expandable ecosystem for metabolomics data annotation in RMetabolites12173(2022)DOI: 10.3390/metabo12020173
Liquid chromatography-mass spectrometry (LC-MS)-based untargeted metabolomics experiments have become increasingly popular because of the wide range of metabolites that can be analyzed and the possibility to measure novel compounds. LC-MS instrumentation and analysis conditions can differ substantially among laboratories and experiments, thus resulting in non-standardized datasets demanding customized annotation workflows. We present an ecosystem of R packages, centered around the MetaboCoreUtils, MetaboAnnotation and CompoundDb packages that together provide a modular infrastructure for the annotation of untargeted metabolomics data. Initial annotation can be performed based on MS1 properties such as m/z and retention times, followed by an MS2-based annotation in which experimental fragment spectra are compared against a reference library. Such reference databases can be created and managed with the CompoundDb package. The ecosystem supports data from a variety of formats, including, but not limited to, MSP, MGF, mzML, mzXML, netCDF as well as MassBank text files and SQL databases. Through its highly customizable functionality, the presented infrastructure allows to build reproducible annotation workflows tailored for and adapted to most untargeted LC-MS-based datasets. All core functionality, which supports base R data types, is exported, also facilitating its re-use in other R packages. Finally, all packages are thoroughly unit-tested and documented and are available on GitHub and through Bioconductor.
Publikation
Herres-Pawlis, S.; Bach, F.; Bruno, I. J.; Chalk, S. J.; Jung, N.; Liermann, J. C.; McEwen, L. R.; Neumann, S.; Steinbeck, C.; Razum, M.; Koepler, O.;Minimum information standards in chemistry: A call for better research data management practicesAngew. Chem. Int. Ed.61e202203038(2022)DOI: 10.1002/anie.202203038
Research data management (RDM) is needed to assist experimental advances and data collection in the chemical sciences. Many funders require RDM because experiments are often paid for by taxpayers and the resulting data should be deposited sustainably for posterity. However, paper notebooks are still common in laboratories and research data is often stored in proprietary and/or dead-end file formats without experimental context. Data must mature beyond a mere supplement to a research paper. Electronic lab note-books (ELN) and laboratory information managementsystems (LIMS) allow researchers to manage data better and they simplify research and publication. Thus, an agreement is needed on minimum information standards for data handling to support structured approaches to data reporting. As digitalization becomes part of curricular teaching, future generations of digital native chemists will embrace RDM and ELN as an organic part of their research.
Publikation
Jurburg, S. D.; Buscot, F.; Chatzinotas, A.; Chaudhari, N. M.; Clark, A. T.; Garbowski, M.; Grenié, M.; Hom, E. F. Y.; Karakoç, C.; Marr, S.; Neumann, S.; Tarkka, M.; van Dam, N. M.; Weinhold, A.; Heintz-Buschart, A.;The community ecology perspective of omics dataMicrobiome10225(2022)DOI: 10.1186/s40168-022-01423-8
The measurement of uncharacterized pools of biological molecules through techniques such as metabarcoding, metagenomics, metatranscriptomics, metabolomics, and metaproteomics produces large, multivariate datasets. Analyses of these datasets have successfully been borrowed from community ecology to characterize the molecular diversity of samples (ɑ-diversity) and to assess how these profiles change in response to experimental treatments or across gradients (β-diversity). However, sample preparation and data collection methods generate biases and noise which confound molecular diversity estimates and require special attention. Here, we examine how technical biases and noise that are introduced into multivariate molecular data affect the estimation of the components of diversity (i.e., total number of different molecular species, or entities; total number of molecules; and the abundance distribution of molecular entities). We then explore under which conditions these biases affect the measurement of ɑ- and β-diversity and highlight how novel methods commonly used in community ecology can be adopted to improve the interpretation and integration of multivariate molecular data.