|
|
||||||||
Protein Science, Vol 1, Issue 3 409-417, Copyright © 1992 by Cold Spring Harbor Laboratory Press
ARTICLE |
U. HOBOHM, M. SCHARF, R. SCHNEIDER and C. SANDER
European Molecular Biology Laboratory, Meyerhofstrasse 1, D-6900 Heidelberg, Germany
The Protein Data Bank currently contains about 600 data sets of three-dimensional protein coordinates determined by X-ray crystallography or NMR. There is considerable redundancy in the data base, as many protein pairs are identical or very similar in sequence. However, statistical analyses of protein sequence-structure relations require nonredundant data. We have developed two algorithms to extract from the data base representative sets of protein chains with maximum coverage and minimum redundancy. The first algorithm focuses on optimizing a particular property of the selected proteins and works by successive selection of proteins from an ordered list and exclusion of all neighbors of each selected protein. The other algorithm aims at maximizing the size of the selected set and works by successive thinning out of clusters of similar proteins. Both algorithms are generally applicable to other data bases in which criteria of similarity can be defined and relate to problems in graph theory. The largest nonredundant set extracted from the current release of the Protein Data Bank has 155 protein chains. In this set, no two proteins have sequence similarity higher than a certain cutoff (30% identical residues for aligned subsequences longer than 80 residues), yet all structurally unique protein families are represented. Periodically updated lists of representative data sets are available by electronic mail from the file server ``netserv@embl-heidelberg.de.'' The selection may be useful in statistical approaches to protein folding as well as in the analysis and documentation of the known spectrum of three-dimensional protein structures.
This article has been cited by other articles:
![]() |
Y. Xin, C. Laing, N. B. Leontis, and T. Schlick Annotation of tertiary interactions in RNA structures reveals variations and correlations RNA, December 1, 2008; 14(12): 2465 - 2477. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Yang and Y. Zhou Ab initio folding of terminal segments with secondary structures reveals the fine difference between two closely related all-atom statistical energy functions Protein Sci., July 1, 2008; 17(7): 1212 - 1219. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Kliger, E. Gofer, A. Wool, A. Toporik, A. Apatoff, and M. Olshansky Predicting proteolytic sites in extracellular proteins: only halfway there Bioinformatics, April 15, 2008; 24(8): 1049 - 1055. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. S. Andersen, A. Lind-Thomsen, B. Knudsen, S. E. Kristensen, J. H. Havgaard, E. Torarinsson, N. Larsen, C. Zwieb, P. Sestoft, J. Kjems, et al. Semiautomated improvement of RNA alignments RNA, November 1, 2007; 13(11): 1850 - 1859. [Abstract] [Full Text] [PDF] |
||||
![]() |
H.-P. Peng and A.-S. Yang Modeling protein loops with knowledge-based prediction of sequence-structure alignment Bioinformatics, November 1, 2007; 23(21): 2836 - 2842. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. D. Cabrita, D. Gilis, A. L. Robertson, Y. Dehouck, M. Rooman, and S. P. Bottomley Enhancing the stability and solubility of TEV protease using in silico design Protein Sci., November 1, 2007; 16(11): 2360 - 2367. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. C. Choi, A. Hobolth, D. M. Robinson, H. Kishino, and J. L. Thorne Quantifying the Impact of Protein Tertiary Structure on Molecular Evolution Mol. Biol. Evol., August 1, 2007; 24(8): 1769 - 1782. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Cheng, J. Pei, and L. Lai A Free-Rotating and Self-Avoiding Chain Model for Deriving Statistical Potentials Based on Protein Structures Biophys. J., June 1, 2007; 92(11): 3868 - 3877. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. E. Suzek, H. Huang, P. McGarvey, R. Mazumder, and C. H. Wu UniRef: comprehensive and non-redundant UniProt reference clusters Bioinformatics, May 15, 2007; 23(10): 1282 - 1288. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Lagesen, P. Hallin, E. A. Rodland, H.-H. Staerfeldt, T. Rognes, and D. W. Ussery RNAmmer: consistent and rapid annotation of ribosomal RNA genes Nucleic Acids Res., May 14, 2007; 35(9): 3100 - 3108. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. T. Chang, D. Ghosh, D. E. Kirschner, and J. J. Linderman Peptide length-based prediction of peptide-MHC class II binding Bioinformatics, November 15, 2006; 22(22): 2761 - 2767. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Budagyan and R. Abagyan Weighted quality estimates in machine learning Bioinformatics, November 1, 2006; 22(21): 2597 - 2603. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. T. Tran, H. Treutlein, and A. W. Burgess Designing amino acid residues with single-conformations Protein Eng. Des. Sel., September 1, 2006; 19(9): 401 - 408. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Sawada and S. Honda Structural Diversity of Protein Segments Follows a Power-Law Distribution Biophys. J., August 15, 2006; 91(4): 1213 - 1223. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. G. Lees, A. J. Miles, F. Wien, and B. A. Wallace A reference database for circular dichroism spectroscopy covering fold and secondary structure space Bioinformatics, August 15, 2006; 22(16): 1955 - 1962. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. L. Marsden, D. Lee, M. Maibaum, C. Yeats, and C. A. Orengo Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space Nucleic Acids Res., February 15, 2006; 34(3): 1066 - 1080. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Nielsen and A. Krogh Large-scale prokaryotic gene prediction and comparison to genome annotation Bioinformatics, December 15, 2005; 21(24): 4322 - 4329. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Gaspari, K. Vlahovicek, and S. Pongor Efficient recognition of folds in protein 3D structures by the improved PRIDE algorithm Bioinformatics, August 1, 2005; 21(15): 3322 - 3323. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Wang and R. L. Dunbrack Jr PISCES: recent improvements to a PDB sequence culling server Nucleic Acids Res., July 1, 2005; 33(suppl_2): W94 - W98. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Zheng and D. Yang STARS: statistics on inter-atomic distances and torsion angles in protein secondary structures Bioinformatics, June 15, 2005; 21(12): 2925 - 2926. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. O. Jonsdottir, F. S. Jorgensen, and S. Brunak Prediction methods and databases within chemoinformatics: emphasis on drugs and drug candidates Bioinformatics, May 15, 2005; 21(10): 2145 - 2160. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. R.S. Hartig, T. T. Tran, and M. L. Smythe Intramolecular disulphide bond arrangements in nonhomologous proteins Protein Sci., February 1, 2005; 14(2): 474 - 482. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. D. McAllister, D. P. Hurst, J. Barnett-Norris, D. Lynch, P. H. Reggio, and M. E. Abood Structural Mimicry in Class A G Protein-coupled Receptor Rotamer Toggle Switches: THE IMPORTANCE OF THE F3.36(201)/W6.48(357) INTERACTION IN CANNABINOID CB1 RECEPTOR ACTIVATION J. Biol. Chem., November 12, 2004; 279(46): 48024 - 48037. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. A. Ilyin, A. Abyzov, and C. M. Leslin Structural alignment of proteins by a novel TOPOFIT method, as a superimposition of common volumes at a topomax point Protein Sci., July 1, 2004; 13(7): 1865 - 1874. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Viklund and A. Elofsson Best {alpha}-helical transmembrane protein topology predictions are achieved using hidden Markov models and evolutionary information Protein Sci., July 1, 2004; 13(7): 1908 - 1917. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. G. Bagos, T. D. Liakopoulos, I. C. Spyropoulos, and S. J. Hamodrakas PRED-TMBB: a web server for predicting the topology of {beta}-barrel outer membrane proteins Nucleic Acids Res., July 1, 2004; 32(suppl_2): W400 - W404. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. A. Binkowski, P. Freeman, and J. Liang pvSOAR: detecting similar surface patterns of pocket and void surfaces of amino acid residues on proteins Nucleic Acids Res., July 1, 2004; 32(suppl_2): W555 - W558. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Zhang, S. Liu, H. Zhou, and Y. Zhou The Dependence of All-Atom Statistical Potentials on Structural Training Database Biophys. J., June 1, 2004; 86(6): 3349 - 3358. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. la Cour, L. Kiemer, A. Molgaard, R. Gupta, K. Skriver, and S. Brunak Analysis and prediction of leucine-rich nuclear export signals Protein Eng. Des. Sel., June 1, 2004; 17(6): 527 - 536. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Eden and S. Brunak Analysis and recognition of 5' UTR intron splice sites in human pre-mRNA Nucleic Acids Res., February 11, 2004; 32(3): 1131 - 1142. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Zhang, S. Liu, and Y. Zhou Accurate and efficient loop selections by the DFIRE-based all-atom statistical potential Protein Sci., February 1, 2004; 13(2): 391 - 399. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Zhang, S. Liu, H. Zhou, and Y. Zhou An accurate, residue-level, pair potential of mean force for folding and binding based on the distance-scaled, ideal-gas reference state Protein Sci., February 1, 2004; 13(2): 400 - 411. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Espadaler, N. Fernandez-Fuentes, A. Hermoso, E. Querol, F. X. Aviles, M. J. E. Sternberg, and B. Oliva ArchDB: automated protein loop classification as a tool for structural genomics Nucleic Acids Res., January 1, 2004; 32(90001): D185 - 188. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Vinayagam, G. Pugalenthi, R. Rajesh, and R. Sowdhamini DSDBASE: a consortium of native and modelled disulphide bonds in proteins Nucleic Acids Res., January 1, 2004; 32(90001): D200 - 202. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Meiler and D. Baker Coupled prediction of protein secondary and tertiary structure PNAS, October 14, 2003; 100(21): 12105 - 12110. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Westerlund, G. von Heijne, and O. Emanuelsson LumenP--A neural network predictor for protein localization in the thylakoid lumen Protein Sci., October 1, 2003; 12(10): 2360 - 2366. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. A. Oberg, J.-M. Ruysschaert, and E. Goormaghtigh Rationally selected basis proteins: A new approach to selecting proteins for spectroscopic secondary structure analysis Protein Sci., September 1, 2003; 12(9): 2015 - 2031. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Jiang, A. Tovchigrechko, and I. A. Vakser The role of geometric complementarity in secondary structure packing: A systematic docking study Protein Sci., August 1, 2003; 12(8): 1646 - 1651. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. S. Juncker, H. Willenbrock, G. von Heijne, S. Brunak, H. Nielsen, and A. Krogh Prediction of lipoprotein signal peptides in Gram-negative bacteria Protein Sci., August 1, 2003; 12(8): 1652 - 1662. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Saxova, S. Buus, S. Brunak, and C. Kesmir Predicting proteasomal cleavage sites: a comparison of available methods Int. Immunol., July 1, 2003; 15(7): 781 - 787. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Zhou and Y. Zhou Predicting the topology of transmembrane helical proteins using mean burial propensity and a hidden-Markov-model-based method Protein Sci., July 1, 2003; 12(7): 1547 - 1555. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. K. Smith, P. Radivojac, Z. Obradovic, A. K. Dunker, and G. Zhu Improved amino acid flexibility parameters Protein Sci., May 1, 2003; 12(5): 1060 - 1072. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Hardin, M. P. Eastwood, M. C. Prentiss, Z. Luthey-Schulten, and P. G. Wolynes Associative memory Hamiltonians for structure prediction without homology: alpha /beta proteins PNAS, February 18, 2003; 100(4): 1679 - 1684. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-t. Guo, D. Xu, D. Kim, and Y. Xu Improving the performance of DomainParser for structural domain partition using neural network Nucleic Acids Res., February 1, 2003; 31(3): 944 - 952. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. P. Chen, A. Kernytsky, and B. Rost Transmembrane helix predictions revisited Protein Sci., December 1, 2002; 11(12): 2774 - 2791. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Nair and B. Rost Sequence conserved for subcellular localization Protein Sci., December 1, 2002; 11(12): 2836 - 2847. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. G. de Brevern, H. Valadie, S. Hazout, and C. Etchebest Extension of a local backbone description using a structural alphabet: A new approach to the sequence-structure relationship Protein Sci., December 1, 2002; 11(12): 2871 - 2886. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Nilsson, B. Persson, and G. von Heijne Prediction of partial membrane protein topologies using a consensus approach Protein Sci., December 1, 2002; 11(12): 2974 - 2980. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. R. Ortiz, C. E.M. Strauss, and O. Olmea MAMMOTH (Matching molecular models obtained from theory): An automated method for model comparison Protein Sci., November 1, 2002; 11(11): 2606 - 2621. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Zhou and Y. Zhou Distance-scaled, finite ideal-gas reference state improves structure-derived potentials of mean force for structure selection and stability prediction Protein Sci., November 1, 2002; 11(11): 2714 - 2726. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Bonneau, I. Ruczinski, J. Tsai, and D. Baker Contact order and ab initio protein structure prediction Protein Sci., August 1, 2002; 11(8): 1937 - 1944. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. L. Wilson, S. J. Hubbard, and A. J. Doig A critical assessment of the secondary structure {alpha}-helices and their termini in proteins Protein Eng. Des. Sel., July 1, 2002; 15(7): 545 - 554. [Abstract] [Full Text] [PDF] |
||||
![]() |
M.N. Fodje and S. Al-Karadaghi Occurrence, conformational features and amino acid propensities for the {pi}-helix Protein Eng. Des. Sel., May 1, 2002; 15(5): 353 - 358. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. C.E. Tosatto, E. Bindewald, J. Hesser, and R. Manner A divide and conquer approach to fast loop modeling Protein Eng. Des. Sel., April 1, 2002; 15(4): 279 - 286. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. J. Rigden Use of covariance analysis for the prediction of structural domain boundaries from multiple protein sequence alignments Protein Eng. Des. Sel., February 1, 2002; 15(2): 65 - 77. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. F. Eckenhoff, K. Chan, and R. G. Eckenhoff Multiple Specific Binding Targets for Inhaled Anesthetics in the Mammalian Brain J. Pharmacol. Exp. Ther., January 1, 2002; 300(1): 172 - 179. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Reva, A. Kister, S. Topiol, and I. Gelfand Determining the roles of different chain fragments in recognition of immunoglobulin fold Protein Eng. Des. Sel., January 1, 2002; 15(1): 13 - 19. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Bradley, L. Cowen, M. Menke, J. King, and B. Berger BETAWRAP: Successful prediction of parallel beta -helices from primary sequence reve |