Abstract [eng] |
A long-standing goal of protein engineering is the design of proteins with novel or improved properties. Despite years of research aimed towards improving traditional approaches of rational design and directed evolution, their application remains hindered due to high time, labor and resource requirements. The ever-growing availability of biological data empowers the use of machine learning methods at solving protein engineering tasks that were either hard or impossible to solve using the conventional tools. This study aimed to analyze malate dehydrogenase sequences generated by generative adversarial networks. The generated sequences recapitulate first and second order sequence statistics, while substantially expanding the natural sequence space. 13 out of 55 sequence variants sampled from generative adversarial network’s latent space retained biological activity. Guided changes of the latent space variables correlate with various proteins sequence features – a capability directly applicable in protein engineering. |