- Results as:
- Print view
- Endnote (RIS)
- BibTeX
- Table: CSV | HTML
Publications
Publications
Publications
Publications
Publications
Publications
Publications
Publications
Publications
Publications
Research Mission and Profile
Molecular Signal Processing
Bioorganic Chemistry
Biochemistry of Plant Interactions
Cell and Metabolic Biology
Independent Junior Research Groups
Program Center MetaCom
Publications
Good Scientific Practice
Research Funding
Networks and Collaborative Projects
Symposia and Colloquia
Alumni Research Groups
Publications
Flavor is the main factor driving consumers acceptance of food products. However, tracking the biochemistry of flavor is a formidable challenge due to the complexity of food composition. Current methodologies for linking individual molecules to flavor in foods and beverages are expensive and time-consuming. Predictive models based on machine learning (ML) are emerging as an alternative to speed up this process. Nonetheless, the optimal approach to predict flavor features of molecules remains elusive. In this work we present FlavorMiner, an ML-based multilabel flavor predictor. FlavorMiner seamlessly integrates different combinations of algorithms and mathematical representations, augmented with class balance strategies to address the inherent class of the input dataset. Notably, Random Forest and K-Nearest Neighbors combined with Extended Connectivity Fingerprint and RDKit molecular descriptors consistently outperform other combinations in most cases. Resampling strategies surpass weight balance methods in mitigating bias associated with class imbalance. FlavorMiner exhibits remarkable accuracy, with an average ROC AUC score of 0.88. This algorithm was used to analyze cocoa metabolomics data, unveiling its profound potential to help extract valuable insights from intricate food metabolomics data. FlavorMiner can be used for flavor mining in any food product, drawing from a diverse training dataset that spans over 934 distinct food products.Scientific Contribution FlavorMiner is an advanced machine learning (ML)-based tool designed to predict molecular flavor features with high accuracy and efficiency, addressing the complexity of food metabolomics. By leveraging robust algorithmic combinations paired with mathematical representations FlavorMiner achieves high predictive performance. Applied to cocoa metabolomics, FlavorMiner demonstrated its capacity to extract meaningful insights, showcasing its versatility for flavor analysis across diverse food products. This study underscores the transformative potential of ML in accelerating flavor biochemistry research, offering a scalable solution for the food and beverage industry.
Publications
Three previously undescribed azepino-indole alkaloids, named purpurascenines A−C (1−3), together with the new-to-nature 7-hydroxytryptophan (4) as well as two known compounds, adenosine (5) and riboflavin (6), were isolated from fruiting bodies of Cortinarius purpurascens Fr. (Cortinariaceae). The structures of 1−3 were elucidated based on spectroscopic analyses and ECD calculations. Furthermore, the biosynthesis of purpurascenine A (1) was investigated by in vivo experiments using 13C-labeled sodium pyruvate, alanine, and sodium acetate incubated with fruiting bodies of C. purpurascens. The incorporation of 13C into 1 was analyzed using 1D NMR and HRESIMS methods. With [3-13C]-pyruvate, a dramatic enrichment of 13C was observed, and hence a biosynthetic route via a direct Pictet−Spengler reaction between α-keto acids and 7-hydroxytryptophan (4) is suggested for the biosynthesis of purpurascenines A−C (1−3). Compound 1 exhibits no antiproliferative or cytotoxic effects against human prostate (PC-3), colorectal (HCT-116), and breast (MCF-7) cancer cells. An in silico docking study confirmed the hypothesis that purpurascenine A (1) could bind to the 5-HT2A serotonin receptor’s active site. A new functional 5-HT2A receptor activation assay showed no functional agonistic but some antagonistic effects of 1 against the 5-HT-dependent 5-HT2A activation and likely antagonistic effects on putative constitutive activity of the 5-HT2A receptor.
Publications
Mapping the chemical space of compounds to chemical structures remains a challenge in metabolomics. Despite the advancements in untargeted liquid chromatography-mass spectrometry (LC–MS) to achieve a high-throughput profile of metabolites from complex biological resources, only a small fraction of these metabolites can be annotated with confidence. Many novel computational methods and tools have been developed to enable chemical structure annotation to known and unknown compounds such as in silico generated spectra and molecular networking. Here, we present an automated and reproducible Metabolome Annotation Workflow (MAW) for untargeted metabolomics data to further facilitate and automate the complex annotation by combining tandem mass spectrometry (MS2) input data pre-processing, spectral and compound database matching with computational classification, and in silico annotation. MAW takes the LC-MS2 spectra as input and generates a list of putative candidates from spectral and compound databases. The databases are integrated via the R package Spectra and the metabolite annotation tool SIRIUS as part of the R segment of the workflow (MAW-R). The final candidate selection is performed using the cheminformatics tool RDKit in the Python segment (MAW-Py). Furthermore, each feature is assigned a chemical structure and can be imported to a chemical structure similarity network. MAW is following the FAIR (Findable, Accessible, Interoperable, Reusable) principles and has been made available as the docker images, maw-r and maw-py. The source code and documentation are available on GitHub (https://github.com/zmahnoor14/MAW). The performance of MAW is evaluated on two case studies. MAW can improve candidate ranking by integrating spectral databases with annotation tools like SIRIUS which contributes to an efficient candidate selection procedure. The results from MAW are also reproducible and traceable, compliant with the FAIR guidelines. Taken together, MAW could greatly facilitate automated metabolite characterization in diverse fields such as clinical metabolomics and natural product discovery.
Publications
Compound (or chemical) databases are an invaluable resource for many scientific disciplines. Exposomics researchers need to find and identify relevant chemicals that cover the entirety of potential (chemical and other) exposures over entire lifetimes. This daunting task, with over 100 million chemicals in the largest chemical databases, coupled with broadly acknowledged knowledge gaps in these resources, leaves researchers faced with too much—yet not enough—information at the same time to perform comprehensive exposomics research. Furthermore, the improvements in analytical technologies and computational mass spectrometry workflows coupled with the rapid growth in databases and increasing demand for high throughput “big data” services from the research community present significant challenges for both data hosts and workflow developers. This article explores how to reduce candidate search spaces in non-target small molecule identification workflows, while increasing content usability in the context of environmental and exposomics analyses, so as to profit from the increasing size and information content of large compound databases, while increasing efficiency at the same time. In this article, these methods are explored using PubChem, the NORMAN Network Suspect List Exchange and the in silico fragmentation approach MetFrag. A subset of the PubChem database relevant for exposomics, PubChemLite, is presented as a database resource that can be (and has been) integrated into current workflows for high resolution mass spectrometry. Benchmarking datasets from earlier publications are used to show how experimental knowledge and existing datasets can be used to detect and fill gaps in compound databases to progressively improve large resources such as PubChem, and topic-specific subsets such as PubChemLite. PubChemLite is a living collection, updating as annotation content in PubChem is updated, and exported to allow direct integration into existing workflows such as MetFrag. The source code and files necessary to recreate or adjust this are jointly hosted between the research parties (see data availability statement). This effort shows that enhancing the FAIRness (Findability, Accessibility, Interoperability and Reusability) of open resources can mutually enhance several resources for whole community benefit. The authors explicitly welcome additional community input on ideas for future developments.
Publications
AbstractWe report the major conclusions of the online open-access workshop “Computational Applications in Secondary Metabolite Discovery (CAiSMD)” that took place from 08 to 10 March 2021. Invited speakers from academia and industry and about 200 registered participants from five continents (Africa, Asia, Europe, South America, and North America) took part in the workshop. The workshop highlighted the potential applications of computational methodologies in the search for secondary metabolites (SMs) or natural products (NPs) as potential drugs and drug leads. During 3 days, the participants of this online workshop received an overview of modern computer-based approaches for exploring NP discovery in the “omics” age. The invited experts gave keynote lectures, trained participants in hands-on sessions, and held round table discussions. This was followed by oral presentations with much interaction between the speakers and the audience. Selected applicants (early-career scientists) were offered the opportunity to give oral presentations (15 min) and present posters in the form of flash presentations (5 min) upon submission of an abstract. The final program available on the workshop website (https://caismd.indiayouth.info/) comprised of 4 keynote lectures (KLs), 12 oral presentations (OPs), 2 round table discussions (RTDs), and 5 hands-on sessions (HSs). This meeting report also references internet resources for computational biology in the area of secondary metabolites that are of use outside of the workshop areas and will constitute a long-term valuable source for the community. The workshop concluded with an online survey form to be completed by speakers and participants for the goal of improving any subsequent editions.
Publications
Chemical database searching has become a fixture in many non-targeted identification workflows based on high-resolution mass spectrometry (HRMS). However, the form of a chemical structure observed in HRMS does not always match the form stored in a database (e.g., the neutral form versus a salt; one component of a mixture rather than the mixture form used in a consumer product). Linking the form of a structure observed via HRMS to its related form(s) within a database will enable the return of all relevant variants of a structure, as well as the related metadata, in a single query. A Konstanz Information Miner (KNIME) workflow has been developed to produce structural representations observed using HRMS (“MS-Ready structures”) and links them to those stored in a database. These MS-Ready structures, and associated mappings to the full chemical representations, are surfaced via the US EPA’s Chemistry Dashboard (https://comptox.epa.gov/dashboard/). This article describes the workflow for the generation and linking of ~ 700,000 MS-Ready structures (derived from ~ 760,000 original structures) as well as download, search and export capabilities to serve structure identification using HRMS. The importance of this form of structural representation for HRMS is demonstrated with several examples, including integration with the in silico fragmentation software application MetFrag. The structures, search, download and export functionality are all available through the CompTox Chemistry Dashboard, while the MetFrag implementation can be viewed at https://msbi.ipb-halle.de/MetFragBeta/.
Publications
BackgroundThe fourth round of the Critical Assessment of Small Molecule Identification (CASMI) Contest (www.casmi-contest.org) was held in 2016, with two new categories for automated methods. This article covers the 208 challenges in Categories 2 and 3, without and with metadata, from organization, participation, results and post-contest evaluation of CASMI 2016 through to perspectives for future contests and small molecule annotation/identification.ResultsThe Input Output Kernel Regression (CSI:IOKR) machine learning approach performed best in “Category 2: Best Automatic Structural Identification—In Silico Fragmentation Only”, won by Team Brouard with 41% challenge wins. The winner of “Category 3: Best Automatic Structural Identification—Full Information” was Team Kind (MS-FINDER), with 76% challenge wins. The best methods were able to achieve over 30% Top 1 ranks in Category 2, with all methods ranking the correct candidate in the Top 10 in around 50% of challenges. This success rate rose to 70% Top 1 ranks in Category 3, with candidates in the Top 10 in over 80% of the challenges. The machine learning and chemistry-based approaches are shown to perform in complementary ways.ConclusionsThe improvement in (semi-)automated fragmentation methods for small molecule identification has been substantial. The achieved high rates of correct candidates in the Top 1 and Top 10, despite large candidate numbers, open up great possibilities for high-throughput annotation of untargeted analysis for “known unknowns”. As more high quality training data becomes available, the improvements in machine learning methods will likely continue, but the alternative approaches still provide valuable complementary information. Improved integration of experimental context will also improve identification success further for “real life” annotations. The true “unknown unknowns” remain to be evaluated in future CASMI contests.
Publications
The chemical investigation of the chloroform extract of Hypericum lanceolatum guided by 1H NMR, ESIMS, and TLC profiles led to the isolation of 11 new tricyclic acylphloroglucinol derivatives, named selancins A–I (1–9) and hyperselancins A and B (10 and 11), along with the known compound 3-O-geranylemodin (12), which is described for a Hypericum species for the first time. Compounds 8 and 9 are the first examples of natural products with a 6-acyl-2,2-dimethylchroman-4-one core fused with a dimethylpyran unit. The new compounds 1–9 are rare acylphloroglucinol derivatives with two fused dimethylpyran units. Compounds 10 and 11 are derivatives of polycyclic polyprenylated acylphloroglucinols related to hyperforin, the active component of St. John’s wort. Their structures were elucidated by UV, IR, extensive 1D and 2D NMR experiments, HRESIMS, and comparison with the literature data. The absolute configurations of 5, 8, 10, and 11 were determined by comparing experimental and calculated electronic circular dichroism spectra. Compounds 1 and 2 were synthesized regioselectively in two steps. The cytotoxicity of the crude extract (88% growth inhibition at 50 μg/mL) and of compounds 1–6, 8, 9, and 12 (no significant growth inhibition up to a concentration of 10 mM) against colon (HT-29) and prostate (PC-3) cancer cell lines was determined. No anthelmintic activity was observed for the crude extract.
Publications
Pseudohygrophorones A(12) (1) and B(12) (2), the first naturally occurring alkyl cyclohexenones from a fungal source, and the recently reported hygrophorone B(12) (3) have been isolated from fruiting bodies of the basidiomycete Hygrophorus abieticola Krieglst. ex Gröger & Bresinsky. Their structures were assigned on the basis of extensive one- and two-dimensional NMR spectroscopic analysis as well as ESI-HRMS measurements. The absolute configuration of the three stereogenic centers in the diastereomeric compounds 1 and 2 was established with the aid of (3)JH,H and (4)JH,H coupling constants, NOE interactions, and conformational analysis in conjunction with quantum chemical CD calculations. It was concluded that pseudohygrophorone A(12) (1) is 4S,5S,6S configured, while pseudohygrophorone B(12) (2) was identified as the C-6 epimer of 1, corresponding to the absolute configuration 4S,5S,6R. In addition, the mass spectrometric fragmentation behavior of 1-3 obtained by the higher energy collisional dissociation method allows a clear distinction between the pseudohygrophorones (1 and 2) and hygrophorone B(12) (3). The isolated compounds 1-3 exhibited pronounced activity against phytopathogenic organisms.
Publications
The Chilean Sepedonium aff. chalcipori strain KSH 883, isolated from the endemic Boletus loyo Philippi, was studied in a polythetic approach based on chemical, molecular, and biological data. A taxonomic study of the strain using molecular data of the ITS, EF1-α, and RPB2 barcoding genes confirmed the position of the isolated strain within the S. chalcipori clade, but also suggested the separation of this clade into three different species. Two new linear 15-residue peptaibols, named chilenopeptins A (1) and B (2), together with the known peptaibols tylopeptins A (3) and B (4) were isolated from the semisolid culture of strain KSH 883. The structures of 1 and 2 were elucidated on the basis of HRESIMS(n) experiments in conjunction with comprehensive 1D and 2D NMR analysis. Thus, the sequence of chilenopeptin A (1) was identified as Ac-Aib(1)-Ser(2)-Trp(3)-Aib(4)-Pro(5)-Leu(6)-Aib(7)-Aib(8)-Gln(9)-Aib(10)-Aib(11)-Gln(12)-Aib(13)-Leu(14)-Pheol(15), while chilenopeptin B (2) differs from 1 by the replacement of Trp(3) by Phe(3). Additionally, the total synthesis of 1 and 2 was accomplished by a solid-phase approach, confirming the absolute configuration of all chiral amino acids as l. Both the chilenopeptins (1 and 2) and tylopeptins (3 and 4) were evaluated for their potential to inhibit the growth of phytopathogenic organisms.