- Ergebnisse als:
- Druckansicht
- Endnote (RIS)
- BibTeX
- Tabelle: CSV | HTML
Preprints
Preprints
Leitbild und Forschungsprofil
Molekulare Signalverarbeitung
Natur- und Wirkstoffchemie
Biochemie pflanzlicher Interaktionen
Stoffwechsel- und Zellbiologie
Unabhängige Nachwuchsgruppen
Program Center MetaCom
Publikationen
Gute Wissenschaftliche Praxis
Forschungsförderung
Netzwerke und Verbundprojekte
Symposien und Kolloquien
Alumni-Forschungsgruppen
Publikationen
Preprints
Protein engineering using directed evolution and (semi)rational design has emerged as a powerful strategy for optimizing and enhancing enzymes or proteins with desired properties. Integrating artificial intelligence methods has further enhanced and accelerated protein engineering through predictive models developed in data-driven strategies. However, the lack of explainability and interpretability in these models poses challenges. Explainable Artificial Intelligence addresses the interpretability and explainability of machine learning models, providing transparency and insights into predictive processes. Nonetheless, there is a growing need to incorporate explainable techniques in predicting protein properties in machine learning-assisted protein engineering. This work explores incorporating explainable artificial intelligence in predicting protein properties, emphasizing its role in trustworthiness and interpretability. It assesses different machine learning approaches, introduces diverse explainable methodologies, and proposes strategies for seamless integration, improving trust-worthiness. Practical cases demonstrate the explainable model’s effectiveness in identifying DNA binding proteins and optimizing Green Fluorescent Protein brightness. The study highlights the utility of explainable artificial intelligence in advancing computationally assisted protein design, fostering confidence in model reliability.
Preprints
Protein engineering through directed evolution and (semi-)rational approaches has been applied successfully to optimize protein properties for broad applications in molecular biology, biotechnology, and biomedicine. The potential of protein engineering is not yet fully realized due to the limited screening throughput hampering the efficient exploration of the vast protein sequence space. Data-driven strategies have emerged as a powerful tool to leverage protein engineering by providing a model of the sequence-fitness landscape that can exhaustively be explored in silico and capitalize on the high diversity potential offered by nature However, as both the quality and quantity of the inputted data determine the success of such approaches, the applicability of data-driven strategies is often limited due to sparse data. Here, we present a hybrid model that combines direct coupling analysis and machine learning techniques to enable data-driven protein engineering when only few labeled sequences are available. Our method achieves high performance in predicting a protein’s fitness based on its sequence regardless of the number of sequences-fitness pairs in the training dataset. Besides reducing the computational effort compared to state-of-the-art methods, it outperforms them for sparse data situations, i.e., 50 − 250 labeled sequences available for training. In essence, the developed method is auspicious for data-driven protein engineering, especially for protein engineers who have only access to a limited amount of data for sequence-fitness landscape modeling.