proteinMPNN 主要思路解析

来源：刀刀网

一、方法背景

background: The protein sequence design problem is finding, given a protein backbone structure of interest, an amino acid sequence that will fold to this structure.
蛋白质序列设计的问题是，给定一个感兴趣的蛋白质主干结构，找到一个可以折叠到这个结构的氨基酸序列。

Physically based approaches such as Rosetta treat sequenced ensign as an energy optimization problem, searching for the combination of amino acididentities and conformations that has the lowest energy for a given input structure.（Rosetta method）
基于物理的方法，如Rosetta，将序列ensign视为一个能量优化问题，寻找给定输入结构具有最低能量的氨基酸酸化实体和构象的组合。
The amino acid sequence at different positions can be coupled between single or multiple chains, enabling application to a wide range of current protein design challenges.
不同位置的氨基酸序列可以在单链或多链之间耦合，使其能够广泛应用于当前的蛋白质设计挑战。
Recently, deep-learning approaches have shown promise in rapidly generating candidate amino acid sequences given monomeric protein backbones without the need for compute-intensive explicit consideration of side chain rotameric states.However, the methods described thus far do not apply to the full range of current protein design challenges and have not been extensively validated experimentally.
最近，深度学习方法在快速生成候选氨基酸序列方面显示出了希望，给出了单体蛋白质骨架，而不需要对侧链rotamerican状态进行计算密集型的显式考虑。然而，到目前为止所描述的方法并不适用于当前蛋白质设计的所有挑战，也没有经过广泛的实验验证

二、对比实验

Tool: MPNN（message-passing neural network） with three encoder and three decoderlayers and 128 hidden dimensions that predicts protein sequences in an autoregressive manner from N to C.
具有三个编码器和三个解码器层和128个隐藏维度的消息传递神经网络(MPNN)，自回归。
Input: distances between Cα-Cα atoms, relative Cα-Cα-Cα frame orientations and rotations, and backbone dihedral angles.
使用蛋白质骨架特征作为输入，包括Cα-Cα原子之间的距离、Cα-Cα-Cα框架的方向和旋转以及骨架二面角。

三、主要实验

输入：protein backbone features and distances(PDB files) between N, Ca,C,O,and a virtual Cb placed based on the other backbone atoms as additional input feature.
蛋白质主干特征和N, Ca,C,O和基于其他主干原子放置的虚拟Cb之间的距离(PDB文件)作为额外的输入特征。

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文

全部频道

proteinMPNN 主要思路解析

一、方法背景

二、对比实验

三、主要实验