How to remember the hydrophobic amino acids

ConclusionsĮnd-to-end trained deep learning networks consistently outperform methods using hand-engineered features, suggesting that the 3DCNN framework is well suited for analysis of protein microenvironments and may be useful for other protein structural analyses. Finally, we present a visualization method to inspect the individual contributions of each atom to the classification decisions. Our substitution matrices contain rich information relevant to mutation analysis compared to well-established substitution matrices. Models built from our predictions and substitution matrices achieve an 85% accuracy predicting outcomes of the T4 lysozyme mutation variants. Our deep 3DCNN achieves a two-fold increase in prediction accuracy compared to models that employ conventional hand-engineered features and successfully recapitulates known information about similar and different microenvironments. To further validate the power of our method, we construct two amino acid substitution matrices from the prediction statistics and use them to predict effects of mutations in T4 lysozyme structures. As a pilot study, we use our network to analyze local protein microenvironments surrounding the 20 amino acids, and predict the amino acids most compatible with environments within a protein structure. The framework automatically extracts task-specific features from the raw atom distribution, driven by supervised labels. In this paper, we present a general framework that applies 3D convolutional neural network (3DCNN) technology to structure-based protein analysis. These are often general-purpose but not optimized for the specific application of interest. Most current methods rely on features that are manually selected based on knowledge about protein structures. However, performance of these methods depends critically on the choice of protein structural representation. The surfeit of protein structural data enables development of computational methods to systematically derive rules governing structural-functional relationships. Central to protein biology is the understanding of how structural elements give rise to observed function.