Publications: Leibniz Institute of Plant Biochemistry English

Advanced Search

Displaying results 1 to 10 of 13.

Results as:
Print view
Endnote (RIS)
BibTeX
Table: CSV | HTML

Publications

Herrera-Rocha, F.; Fernández-Niño, M.; Duitama, J.; Cala, M. P.; Chica, M. J.; Wessjohann, L. A.; Davari, M. D.; Barrios, A. F. G.; FlavorMiner: a machine learning platform for extracting molecular flavor profiles from structural data J. Cheminform. 16 140 (2024) DOI: 10.1186/s13321-024-00935-9

Flavor is the main factor driving consumers acceptance of food products. However, tracking the biochemistry of flavor is a formidable challenge due to the complexity of food composition. Current methodologies for linking individual molecules to flavor in foods and beverages are expensive and time-consuming. Predictive models based on machine learning (ML) are emerging as an alternative to speed up this process. Nonetheless, the optimal approach to predict flavor features of molecules remains elusive. In this work we present FlavorMiner, an ML-based multilabel flavor predictor. FlavorMiner seamlessly integrates different combinations of algorithms and mathematical representations, augmented with class balance strategies to address the inherent class of the input dataset. Notably, Random Forest and K-Nearest Neighbors combined with Extended Connectivity Fingerprint and RDKit molecular descriptors consistently outperform other combinations in most cases. Resampling strategies surpass weight balance methods in mitigating bias associated with class imbalance. FlavorMiner exhibits remarkable accuracy, with an average ROC AUC score of 0.88. This algorithm was used to analyze cocoa metabolomics data, unveiling its profound potential to help extract valuable insights from intricate food metabolomics data. FlavorMiner can be used for flavor mining in any food product, drawing from a diverse training dataset that spans over 934 distinct food products.Scientific Contribution FlavorMiner is an advanced machine learning (ML)-based tool designed to predict molecular flavor features with high accuracy and efficiency, addressing the complexity of food metabolomics. By leveraging robust algorithmic combinations paired with mathematical representations FlavorMiner achieves high predictive performance. Applied to cocoa metabolomics, FlavorMiner demonstrated its capacity to extract meaningful insights, showcasing its versatility for flavor analysis across diverse food products. This study underscores the transformative potential of ML in accelerating flavor biochemistry research, offering a scalable solution for the food and beverage industry.

Publications

Zulfiqar, M.; Gadelha, L.; Steinbeck, C.; Sorokina, M.; Peters, K.; MAW: the reproducible Metabolome Annotation Workflow for untargeted tandem mass spectrometry J. Cheminform. 15 32 (2023) DOI: 10.1186/s13321-023-00695-y

Mapping the chemical space of compounds to chemical structures remains a challenge in metabolomics. Despite the advancements in untargeted liquid chromatography-mass spectrometry (LC–MS) to achieve a high-throughput profile of metabolites from complex biological resources, only a small fraction of these metabolites can be annotated with confidence. Many novel computational methods and tools have been developed to enable chemical structure annotation to known and unknown compounds such as in silico generated spectra and molecular networking. Here, we present an automated and reproducible Metabolome Annotation Workflow (MAW) for untargeted metabolomics data to further facilitate and automate the complex annotation by combining tandem mass spectrometry (MS2) input data pre-processing, spectral and compound database matching with computational classification, and in silico annotation. MAW takes the LC-MS2 spectra as input and generates a list of putative candidates from spectral and compound databases. The databases are integrated via the R package Spectra and the metabolite annotation tool SIRIUS as part of the R segment of the workflow (MAW-R). The final candidate selection is performed using the cheminformatics tool RDKit in the Python segment (MAW-Py). Furthermore, each feature is assigned a chemical structure and can be imported to a chemical structure similarity network. MAW is following the FAIR (Findable, Accessible, Interoperable, Reusable) principles and has been made available as the docker images, maw-r and maw-py. The source code and documentation are available on GitHub (https://github.com/zmahnoor14/MAW). The performance of MAW is evaluated on two case studies. MAW can improve candidate ranking by integrating spectral databases with annotation tools like SIRIUS which contributes to an efficient candidate selection procedure. The results from MAW are also reproducible and traceable, compliant with the FAIR guidelines. Taken together, MAW could greatly facilitate automated metabolite characterization in diverse fields such as clinical metabolomics and natural product discovery.

Publications

Schymanski, E. L.; Kondić, T.; Neumann, S.; Thiessen, P. A.; Zhang, J.; Bolton, E. E.; Empowering large chemical knowledge bases for exposomics: PubChemLite meets MetFrag J. Cheminform. 13 19 (2021) DOI: 10.1186/s13321-021-00489-0

Compound (or chemical) databases are an invaluable resource for many scientific disciplines. Exposomics researchers need to find and identify relevant chemicals that cover the entirety of potential (chemical and other) exposures over entire lifetimes. This daunting task, with over 100 million chemicals in the largest chemical databases, coupled with broadly acknowledged knowledge gaps in these resources, leaves researchers faced with too much—yet not enough—information at the same time to perform comprehensive exposomics research. Furthermore, the improvements in analytical technologies and computational mass spectrometry workflows coupled with the rapid growth in databases and increasing demand for high throughput “big data” services from the research community present significant challenges for both data hosts and workflow developers. This article explores how to reduce candidate search spaces in non-target small molecule identification workflows, while increasing content usability in the context of environmental and exposomics analyses, so as to profit from the increasing size and information content of large compound databases, while increasing efficiency at the same time. In this article, these methods are explored using PubChem, the NORMAN Network Suspect List Exchange and the in silico fragmentation approach MetFrag. A subset of the PubChem database relevant for exposomics, PubChemLite, is presented as a database resource that can be (and has been) integrated into current workflows for high resolution mass spectrometry. Benchmarking datasets from earlier publications are used to show how experimental knowledge and existing datasets can be used to detect and fill gaps in compound databases to progressively improve large resources such as PubChem, and topic-specific subsets such as PubChemLite. PubChemLite is a living collection, updating as annotation content in PubChem is updated, and exported to allow direct integration into existing workflows such as MetFrag. The source code and files necessary to recreate or adjust this are jointly hosted between the research parties (see data availability statement). This effort shows that enhancing the FAIRness (Findability, Accessibility, Interoperability and Reusability) of open resources can mutually enhance several resources for whole community benefit. The authors explicitly welcome additional community input on ideas for future developments.

Publications

Ntie-Kang, F.; Telukunta, K. K.; Fobofou, S. A. T.; Chukwudi Osamor, V.; Egieyeh, S. A.; Valli, M.; Djoumbou-Feunang, Y.; Sorokina, M.; Stork, C.; Mathai, N.; Zierep, P.; Chávez-Hernández, A. L.; Duran-Frigola, M.; Babiaka, S. B.; Tematio Fouedjou, R.; Eni, D. B.; Akame, S.; Arreyetta-Bawak, A. B.; Ebob, O. T.; Metuge, J. A.; Bekono, B. D.; Isa, M. A.; Onuku, R.; Shadrack, D. M.; Musyoka, T. M.; Patil, V. M.; van der Hooft, J. J. J.; da Silva Bolzani, V.; Medina-Franco, J. L.; Kirchmair, J.; Weber, T.; Tastan Bishop, ?.; Medema, M. H.; Wessjohann, L. A.; Ludwig-Müller, J.; Computational Applications in Secondary Metabolite Discovery (CAiSMD): an online workshop J. Cheminform. 13 64 (2021) DOI: 10.1186/s13321-021-00546-8

Publications

McEachran, A. D.; Mansouri, K.; Grulke, C.; Schymanski, E. L.; Ruttkies, C.; Williams, A. J.; “MS-Ready” structures for non-targeted high-resolution mass spectrometry screening studies J. Cheminform. 10 45 (2018) DOI: 10.1186/s13321-018-0299-2

Publications

Schymanski, E. L.; Ruttkies, C.; Krauss, M.; Brouard, C.; Kind, T.; Dührkop, K.; Allen, F.; Vaniya, A.; Verdegem, D.; Böcker, S.; Rousu, J.; Shen, H.; Tsugawa, H.; Sajed, T.; Fiehn, O.; Ghesquière, B.; Neumann, S.; Critical Assessment of Small Molecule Identification 2016: automated methods J. Cheminform. 9 22 (2017) DOI: 10.1186/s13321-017-0207-1

Publications

Ruttkies, C.; Schymanski, E. L.; Wolf, S.; Hollender, J.; Neumann, S.; MetFrag relaunched: incorporating strategies beyond in silico fragmentation J. Cheminform. 8 3 (2016) DOI: 10.1186/s13321-016-0115-9

BackgroundThe in silico fragmenter MetFrag, launched in 2010, was one of the first approaches combining compound database searching and fragmentation prediction for small molecule identification from tandem mass spectrometry data. Since then many new approaches have evolved, as has MetFrag itself. This article details the latest developments to MetFrag and its use in small molecule identification since the original publication.ResultsMetFrag has gone through algorithmic and scoring refinements. New features include the retrieval of reference, data source and patent information via ChemSpider and PubChem web services, as well as InChIKey filtering to reduce candidate redundancy due to stereoisomerism. Candidates can be filtered or scored differently based on criteria like occurence of certain elements and/or substructures prior to fragmentation, or presence in so-called “suspect lists”. Retention time information can now be calculated either within MetFrag with a sufficient amount of user-provided retention times, or incorporated separately as “user-defined scores” to be included in candidate ranking. The changes to MetFrag were evaluated on the original dataset as well as a dataset of 473 merged high resolution tandem mass spectra (HR-MS/MS) and compared with another open source in silico fragmenter, CFM-ID. Using HR-MS/MS information only, MetFrag2.2 and CFM-ID had 30 and 43 Top 1 ranks, respectively, using PubChem as a database. Including reference and retention information in MetFrag2.2 improved this to 420 and 336 Top 1 ranks with ChemSpider and PubChem (89 and 71 %), respectively, and even up to 343 Top 1 ranks (PubChem) when combining with CFM-ID. The optimal parameters and weights were verified using three additional datasets of 824 merged HR-MS/MS spectra in total. Further examples are given to demonstrate flexibility of the enhanced features.ConclusionsIn many cases additional information is available from the experimental context to add to small molecule identification, which is especially useful where the mass spectrum alone is not sufficient for candidate selection from a large number of candidates. The results achieved with MetFrag2.2 clearly show the benefit of considering this additional information. The new functions greatly enhance the chance of identification success and have been incorporated into a command line interface in a flexible way designed to be integrated into high throughput workflows. Feedback on the command line version of MetFrag2.2 available at http://c-ruttkies.github.io/MetFrag/ is welcome.

Publications

Rausch, F.; Brandt, W.; Schicht, M.; Bräuer, L.; Paulsen, F.; Protein modeling and molecular dynamic studies of two new surfactant proteins J. Cheminform. 5 O2 (2013) DOI: 10.1186/1758-2946-5-S1-O2

BibText
RIS

Publications

Heym, P.-P.; Brandt, W.; Wessjohann, L. A.; Niclas, H.-J.; Virtual screening for plant PARP inhibitors – what can be learned from human PARP inhibitors? J. Cheminform. 4 O24 (2012) DOI: 10.1186/1758-2946-4-S1-O24

BibText
RIS

Publications

Brandt, W.; Kufka, J.; Schulze, D.; Schulze, E.; Rausch, F.; Wessjohann, L.; The membrane bound aromatic p-hydroxybenzoic acid oligoprenyltransferase (UbiA) - how iterative improvements lead to a realistic structure that offers new insights into functional aspects of prenyl transferases and terpene synthases J. Cheminform. 2 O20 (2010) DOI: 10.1186/1758-2946-2-S1-O20

BibText
RIS