Dem IPB wird erneut ein beispielhaftes Handeln im Sinne einer chancengleichheitsorientierten Personal- und Organisationspolitik bescheinigt. Das Institut erhält zum 6. Mal in Folge das TOTAL E-QUALITY…
Die Plant Science Student Conference (PSSC) wird seit 20 Jahren im jährlichen Wechsel von Studierenden der beiden Leibniz-Institute IPK und IPB organisiert. Im Interview erläutern Christina Wäsch…
Illig, A.-M.; Siedhoff, N. E.; Davari, M. D.; Schwaneberg, U.;Evolutionary probability and stacked regressions enable data-driven protein engineering with minimized experimental effortJ. Chem. Inf. Model.646350-6360(2024)DOI: 10.1021/acs.jcim.4c00704
Protein engineering through directed evolution and (semi)rational approaches is routinely applied to optimize protein properties for a broad range of applications in industry and academia. The multitude of possible variants, combined with limited screening throughput, hampers efficient protein engineering. Data-driven strategies have emerged as a powerful tool to model the protein fitness landscape that can be explored in silico, significantly accelerating protein engineering campaigns. However, such methods require a certain amount of data, which often cannot be provided, to generate a reliable model of the fitness landscape. Here, we introduce MERGE, a method that combines direct coupling analysis (DCA) and machine learning (ML). MERGE enables data-driven protein engineering when only limited data are available for training, typically ranging from 50 to 500 labeled sequences. Our method demonstrates remarkable performance in predicting a protein’s fitness value and rank based on its sequence across diverse proteins and properties. Notably, MERGE outperforms state-of-the-art methods when only small data sets are available for modeling, requiring fewer computational resources, and proving particularly promising for protein engineers who have access to limited amounts of data.