- Results as:
- Print view
- Endnote (RIS)
- BibTeX
- Table: CSV | HTML
Publications
Publications
Publications
Publications
Publications
Publications
Publications
Publications
Publications
Publications
Research Mission and Profile
Molecular Signal Processing
Bioorganic Chemistry
Biochemistry of Plant Interactions
Cell and Metabolic Biology
Independent Junior Research Groups
Program Center MetaCom
Publications
Good Scientific Practice
Research Funding
Networks and Collaborative Projects
Symposia and Colloquia
Alumni Research Groups
Publications
Flavor is the main factor driving consumers acceptance of food products. However, tracking the biochemistry of flavor is a formidable challenge due to the complexity of food composition. Current methodologies for linking individual molecules to flavor in foods and beverages are expensive and time-consuming. Predictive models based on machine learning (ML) are emerging as an alternative to speed up this process. Nonetheless, the optimal approach to predict flavor features of molecules remains elusive. In this work we present FlavorMiner, an ML-based multilabel flavor predictor. FlavorMiner seamlessly integrates different combinations of algorithms and mathematical representations, augmented with class balance strategies to address the inherent class of the input dataset. Notably, Random Forest and K-Nearest Neighbors combined with Extended Connectivity Fingerprint and RDKit molecular descriptors consistently outperform other combinations in most cases. Resampling strategies surpass weight balance methods in mitigating bias associated with class imbalance. FlavorMiner exhibits remarkable accuracy, with an average ROC AUC score of 0.88. This algorithm was used to analyze cocoa metabolomics data, unveiling its profound potential to help extract valuable insights from intricate food metabolomics data. FlavorMiner can be used for flavor mining in any food product, drawing from a diverse training dataset that spans over 934 distinct food products.Scientific Contribution FlavorMiner is an advanced machine learning (ML)-based tool designed to predict molecular flavor features with high accuracy and efficiency, addressing the complexity of food metabolomics. By leveraging robust algorithmic combinations paired with mathematical representations FlavorMiner achieves high predictive performance. Applied to cocoa metabolomics, FlavorMiner demonstrated its capacity to extract meaningful insights, showcasing its versatility for flavor analysis across diverse food products. This study underscores the transformative potential of ML in accelerating flavor biochemistry research, offering a scalable solution for the food and beverage industry.
Publications
Unspecific peroxygenases (UPOs) are fungal enzymes that attract significant attention for their ability to perform versatile oxyfunctionalization reactions using H2O2. Unlike other oxygenases, UPOs do not require additional reductive equivalents or electron transfer chains that complicate basic and applied research. Nevertheless, UPOs generally exhibit low to no heterologous production levels and only four UPO structures have been determined to date by crystallography limiting their usefulness and obstructing research. To overcome this bottleneck, we implemented a workflow that applies PROSS stability design to AlphaFold2 model structures of 10 unique and diverse UPOs followed by a signal peptide shuffling to enable heterologous production. Nine UPOs were functionally produced in Pichia pastoris, including the recalcitrant CciUPO and three UPOs derived from oomycetes the first nonfungal UPOs to be experimentally characterized. We conclude that the high accuracy and reliability of new modeling and design workflows dramatically expand the pool of enzymes for basic and applied research.
Publications
Mapping the chemical space of compounds to chemical structures remains a challenge in metabolomics. Despite the advancements in untargeted liquid chromatography-mass spectrometry (LC–MS) to achieve a high-throughput profile of metabolites from complex biological resources, only a small fraction of these metabolites can be annotated with confidence. Many novel computational methods and tools have been developed to enable chemical structure annotation to known and unknown compounds such as in silico generated spectra and molecular networking. Here, we present an automated and reproducible Metabolome Annotation Workflow (MAW) for untargeted metabolomics data to further facilitate and automate the complex annotation by combining tandem mass spectrometry (MS2) input data pre-processing, spectral and compound database matching with computational classification, and in silico annotation. MAW takes the LC-MS2 spectra as input and generates a list of putative candidates from spectral and compound databases. The databases are integrated via the R package Spectra and the metabolite annotation tool SIRIUS as part of the R segment of the workflow (MAW-R). The final candidate selection is performed using the cheminformatics tool RDKit in the Python segment (MAW-Py). Furthermore, each feature is assigned a chemical structure and can be imported to a chemical structure similarity network. MAW is following the FAIR (Findable, Accessible, Interoperable, Reusable) principles and has been made available as the docker images, maw-r and maw-py. The source code and documentation are available on GitHub (https://github.com/zmahnoor14/MAW). The performance of MAW is evaluated on two case studies. MAW can improve candidate ranking by integrating spectral databases with annotation tools like SIRIUS which contributes to an efficient candidate selection procedure. The results from MAW are also reproducible and traceable, compliant with the FAIR guidelines. Taken together, MAW could greatly facilitate automated metabolite characterization in diverse fields such as clinical metabolomics and natural product discovery.
Publications
Unspecific peroxygenases (UPOs) perform oxy-functionalizations for a wide range of substrates utilizing H2O2 without the need for further reductive equivalents or electron transfer chains. Tailoring these promising enzymes toward industrial application was intensely pursued in the last decade with engineering campaigns addressing the heterologous expression, activity, stability, and improvements in chemo- and regioselectivity. One hitherto missing integral part was the targeted engineering of enantioselectivity for specific substrates with poor starting enantioselectivity. In this work, we present the engineering of the short-type MthUPO toward the enantiodivergent hydroxylation of the terpene model substrate, β-ionone. Guided by computational modeling, we designed a small smart library and screened it with a GC−MS setup. After two rounds of iterative protein evolution, the activity increased up to 17-fold and reached a regioselectivity of up to 99.6% for the 4-hydroxy-β-ionone. Enantiodivergent variants were identified with enantiomeric ratios of 96.6:3.4 (R) and 0.3:99.7 (S), respectively.
Publications
In recent years, the engineering of flexible loops to improve enzyme properties has gained attention in biocatalysis. Herein, we report a loop engineering strategy to improve the stability of the substrate access tunnels, which reveals the molecular mechanism between loops and tunnels. Based on the dynamic tunnel analysis of CYP116B3, five positions (A86, T91, M108, A109, T111) in loops B-B′ and B′-C potentially affecting tunnel frequent occurrence were selected and subjected to simultaneous saturation mutagenesis. The best variant 8G8 (A86T/T91L/M108N/A109M/T111A) for the dealkylation of 7-ethoxycoumarin and the hydroxylation of naphthalene was identified with considerably increased activity (134-fold and 9-fold) through screening. Molecular dynamics simulations showed that the reduced flexibility of loops B-B′ and B′-C was responsible for increasing the stability of the studied tunnel. The redesign of loops B-B′ and B′-C surrounding the tunnel entrance provides loop engineering with a powerful and likely general method to kick on/off the substrate/product transportation.
Publications
Engineering proteins and enzymes with the desired functionality has broad applications in molecular biology, biotechnology, biomedical sciences, health, and medicine. The vastness of protein sequence space and all the possible proteins it represents can pose a considerable barrier for enzyme engineering campaigns through directed evolution and rational design. The nonlinear effects of coevolution between amino acids in protein sequences complicate this further. Data-driven models increasingly provide scientists with the computational tools to navigate through the largely undiscovered forest of protein variants and catch a glimpse of the rules and effects underlying the topology of sequence space. In this review, we outline a complete theoretical journey through the processes of protein engineering methods such as directed evolution and rational design and reflect on these strategies and data-driven hybrid strategies in the context of sequence space. We discuss crucial phenomena of residue coevolution, such as epistasis, and review the history of models created over the past decade, aiming to infer rules of protein evolution from data and use this knowledge to improve the prediction of the structure− function relationship of proteins. Data-driven models based on deep learning algorithms are among the most promising methods that can account for the nonlinear phenomena of sequence space to some degree. We also critically discuss the available models to predict evolutionary coupling and epistatic effects (classical and deep learning) in terms of their capabilities and limitations. Finally, we present our perspective on possible future directions for developing data-driven approaches and provide key orientation points and necessities for the future of the fast-evolving field of enzyme engineering.
Publications
Compound (or chemical) databases are an invaluable resource for many scientific disciplines. Exposomics researchers need to find and identify relevant chemicals that cover the entirety of potential (chemical and other) exposures over entire lifetimes. This daunting task, with over 100 million chemicals in the largest chemical databases, coupled with broadly acknowledged knowledge gaps in these resources, leaves researchers faced with too much—yet not enough—information at the same time to perform comprehensive exposomics research. Furthermore, the improvements in analytical technologies and computational mass spectrometry workflows coupled with the rapid growth in databases and increasing demand for high throughput “big data” services from the research community present significant challenges for both data hosts and workflow developers. This article explores how to reduce candidate search spaces in non-target small molecule identification workflows, while increasing content usability in the context of environmental and exposomics analyses, so as to profit from the increasing size and information content of large compound databases, while increasing efficiency at the same time. In this article, these methods are explored using PubChem, the NORMAN Network Suspect List Exchange and the in silico fragmentation approach MetFrag. A subset of the PubChem database relevant for exposomics, PubChemLite, is presented as a database resource that can be (and has been) integrated into current workflows for high resolution mass spectrometry. Benchmarking datasets from earlier publications are used to show how experimental knowledge and existing datasets can be used to detect and fill gaps in compound databases to progressively improve large resources such as PubChem, and topic-specific subsets such as PubChemLite. PubChemLite is a living collection, updating as annotation content in PubChem is updated, and exported to allow direct integration into existing workflows such as MetFrag. The source code and files necessary to recreate or adjust this are jointly hosted between the research parties (see data availability statement). This effort shows that enhancing the FAIRness (Findability, Accessibility, Interoperability and Reusability) of open resources can mutually enhance several resources for whole community benefit. The authors explicitly welcome additional community input on ideas for future developments.
Publications
AbstractWe report the major conclusions of the online open-access workshop “Computational Applications in Secondary Metabolite Discovery (CAiSMD)” that took place from 08 to 10 March 2021. Invited speakers from academia and industry and about 200 registered participants from five continents (Africa, Asia, Europe, South America, and North America) took part in the workshop. The workshop highlighted the potential applications of computational methodologies in the search for secondary metabolites (SMs) or natural products (NPs) as potential drugs and drug leads. During 3 days, the participants of this online workshop received an overview of modern computer-based approaches for exploring NP discovery in the “omics” age. The invited experts gave keynote lectures, trained participants in hands-on sessions, and held round table discussions. This was followed by oral presentations with much interaction between the speakers and the audience. Selected applicants (early-career scientists) were offered the opportunity to give oral presentations (15 min) and present posters in the form of flash presentations (5 min) upon submission of an abstract. The final program available on the workshop website (https://caismd.indiayouth.info/) comprised of 4 keynote lectures (KLs), 12 oral presentations (OPs), 2 round table discussions (RTDs), and 5 hands-on sessions (HSs). This meeting report also references internet resources for computational biology in the area of secondary metabolites that are of use outside of the workshop areas and will constitute a long-term valuable source for the community. The workshop concluded with an online survey form to be completed by speakers and participants for the goal of improving any subsequent editions.
Publications
Enzymatic hydroxylation of activated and nonactivated sp3-carbons attracts keen interest from the chemistry community as it is one of the most challenging tasks in organic synthesis. Nature provides a vast number of enzymes with an enormous catalytic versatility to fulfill this task. Given that those very different enzymes have a distinct specificity in substrate scope, selectivity, activity, stability, and catalytic cycle, it is interesting to outline similarities and differences. In this Review, we intend to delineate which enzymes possess considerable advantages within specific issues. Heterologous production, crystal structure availability, enzyme engineering potential, and substrate promiscuity are essential factors for the applicability of these biocatalysts.
Publications
Unspecific peroxygenases (UPOs) enable oxyfunctionalizations of a broad substrate range with unparalleled activities. Tailoring these enzymes for chemo- and regioselective transformations represents a grand challenge due to the difficulties in their heterologous productions. Herein, we performed protein engineering in Saccharomyces cerevisiae using the MthUPO from Myceliophthora thermophila. More than 5300 transformants were screened. This protein engineering led to a significant reshaping of the active site as elucidated by computational modelling. The reshaping was responsible for the increased oxyfunctionalization activity, with improved kcat/Km values of up to 16.5-fold for the model substrate 5-nitro-1,3-benzodioxole. Moreover, variants were identified with high chemo- and regioselectivities in the oxyfunctionalization of aromatic and benzylic carbons, respectively. The benzylic hydroxylation was demonstrated to perform with enantioselectivities of up to 95% ee. The proposed evolutionary protocol and rationalization of the enhanced activities and selectivities acquired by MthUPO variants represent a step forward toward the use and implementation of UPOs in biocatalytic synthetic pathways of industrial interest.