Die Plant Science Student Conference (PSSC) wird seit 20 Jahren im jährlichen Wechsel von Studierenden der beiden Leibniz-Institute IPK und IPB organisiert. Im Interview erläutern Christina Wäsch (IPK) und Carolin Apel (IPB),…
Über 600 Gäste kamen am 4. Juli ans IPB zur Langen Nacht, die Wissen schafft, um bei unserem Wissenschafts-Quiz-Parcours viel Neues zu erfahren und ihre Kenntnisse unter Beweis zu stellen. Unser Programm in diesem Jahr…
Illig, A.-M.; Siedhoff, N. E.; Davari, M. D.; Schwaneberg, U.;Evolutionary probability and stacked regressions enable data-driven protein engineering with minimized experimental effortJ. Chem. Inf. Model.646350-6360(2024)DOI: 10.1021/acs.jcim.4c00704
Protein engineering through directed evolution and (semi)rational approaches is routinely applied to optimize protein properties for a broad range of applications in industry and academia. The multitude of possible variants, combined with limited screening throughput, hampers efficient protein engineering. Data-driven strategies have emerged as a powerful tool to model the protein fitness landscape that can be explored in silico, significantly accelerating protein engineering campaigns. However, such methods require a certain amount of data, which often cannot be provided, to generate a reliable model of the fitness landscape. Here, we introduce MERGE, a method that combines direct coupling analysis (DCA) and machine learning (ML). MERGE enables data-driven protein engineering when only limited data are available for training, typically ranging from 50 to 500 labeled sequences. Our method demonstrates remarkable performance in predicting a protein’s fitness value and rank based on its sequence across diverse proteins and properties. Notably, MERGE outperforms state-of-the-art methods when only small data sets are available for modeling, requiring fewer computational resources, and proving particularly promising for protein engineers who have access to limited amounts of data.