Identifying Conserved Regions from Multiple Protein Structures
and Its Application in Improving Ab Initio Prediction


The Critical Assessment of Techniques for Protein Structure Prediction (CASP) experience suggests that the conserved regions of multiple predicted structures (called decoys or models) for a given protein can be utilized for protein structure prediction. Most of previous studies focused on the identification of conserved regions with helps of alignment information. In the cases where alignment information is unavailable, the identification of conserved regions remains as a difficulty.


Based on our previous work on approximating the bottleneck distance, we proposed a formal definition of conserved regions, and designed an O(m^2*n^2*log n) time algorithm to extract the maximum set of conserved regions from m decoys for a protein with n residues. Using the algorithm to identify conserved regions, we first investigated whether conserved regions of ab initio decoys are similar to their counterparts in native structure. We observed that for 16 out of 25 TBM (template-based modeling) CASP7 targets, our method identifies over 70% native-like regions and filters out over 90% of non-native-like regions, simultaneously. In addition, we obtained more than half of native-like regions and filtered out over 80% non-native-like regions for $10$ out of 12 FM (free modeling) CASP7 targets. We further investigated whether these conserved regions improve protein structure prediction. We observed that for 10 out of 12 FM CASP7 targets, our method improves accuracies of ROSETTA. In particular, by identifying conserved regions, the quality of four targets were improved from meaningless (TMscore < 0.4) to meaningful (TMscore > 0.4).


Experimental results illustrate that our definition of conserved region is effective, and that most identified conserved regions are similar to the corresponding regions in native structures. In addition, coupling with iteration strategy, the identified conserved regions can improve the quality of the final generated structures.


Download software package ApproxSub0425.tar.gz