Protein Science
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Full Text (PDF)
Right arrow Data Supplement
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by HOBOHM, U.
Right arrow Articles by SANDER, C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by HOBOHM, U.
Right arrow Articles by SANDER, C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Protein Science, Vol 1, Issue 3 409-417, Copyright © 1992 by Cold Spring Harbor Laboratory Press


ARTICLE

Selection of representative protein data sets

U. HOBOHM, M. SCHARF, R. SCHNEIDER and C. SANDER
European Molecular Biology Laboratory, Meyerhofstrasse 1, D-6900 Heidelberg, Germany

The Protein Data Bank currently contains about 600 data sets of three-dimensional protein coordinates determined by X-ray crystallography or NMR. There is considerable redundancy in the data base, as many protein pairs are identical or very similar in sequence. However, statistical analyses of protein sequence-structure relations require nonredundant data. We have developed two algorithms to extract from the data base representative sets of protein chains with maximum coverage and minimum redundancy. The first algorithm focuses on optimizing a particular property of the selected proteins and works by successive selection of proteins from an ordered list and exclusion of all neighbors of each selected protein. The other algorithm aims at maximizing the size of the selected set and works by successive thinning out of clusters of similar proteins. Both algorithms are generally applicable to other data bases in which criteria of similarity can be defined and relate to problems in graph theory. The largest nonredundant set extracted from the current release of the Protein Data Bank has 155 protein chains. In this set, no two proteins have sequence similarity higher than a certain cutoff (30% identical residues for aligned subsequences longer than 80 residues), yet all structurally unique protein families are represented. Periodically updated lists of representative data sets are available by electronic mail from the file server ``netserv@embl-heidelberg.de.'' The selection may be useful in statistical approaches to protein folding as well as in the analysis and documentation of the known spectrum of three-dimensional protein structures.
Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
RNAHome page
Y. Xin, C. Laing, N. B. Leontis, and T. Schlick
Annotation of tertiary interactions in RNA structures reveals variations and correlations
RNA, December 1, 2008; 14(12): 2465 - 2477.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
Y. Yang and Y. Zhou
Ab initio folding of terminal segments with secondary structures reveals the fine difference between two closely related all-atom statistical energy functions
Protein Sci., July 1, 2008; 17(7): 1212 - 1219.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Y. Kliger, E. Gofer, A. Wool, A. Toporik, A. Apatoff, and M. Olshansky
Predicting proteolytic sites in extracellular proteins: only halfway there
Bioinformatics, April 15, 2008; 24(8): 1049 - 1055.
[Abstract] [Full Text] [PDF]


Home page
RNAHome page
E. S. Andersen, A. Lind-Thomsen, B. Knudsen, S. E. Kristensen, J. H. Havgaard, E. Torarinsson, N. Larsen, C. Zwieb, P. Sestoft, J. Kjems, et al.
Semiautomated improvement of RNA alignments
RNA, November 1, 2007; 13(11): 1850 - 1859.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
H.-P. Peng and A.-S. Yang
Modeling protein loops with knowledge-based prediction of sequence-structure alignment
Bioinformatics, November 1, 2007; 23(21): 2836 - 2842.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
L. D. Cabrita, D. Gilis, A. L. Robertson, Y. Dehouck, M. Rooman, and S. P. Bottomley
Enhancing the stability and solubility of TEV protease using in silico design
Protein Sci., November 1, 2007; 16(11): 2360 - 2367.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
S. C. Choi, A. Hobolth, D. M. Robinson, H. Kishino, and J. L. Thorne
Quantifying the Impact of Protein Tertiary Structure on Molecular Evolution
Mol. Biol. Evol., August 1, 2007; 24(8): 1769 - 1782.
[Abstract] [Full Text] [PDF]


Home page
Biophys. JHome page
J. Cheng, J. Pei, and L. Lai
A Free-Rotating and Self-Avoiding Chain Model for Deriving Statistical Potentials Based on Protein Structures
Biophys. J., June 1, 2007; 92(11): 3868 - 3877.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
B. E. Suzek, H. Huang, P. McGarvey, R. Mazumder, and C. H. Wu
UniRef: comprehensive and non-redundant UniProt reference clusters
Bioinformatics, May 15, 2007; 23(10): 1282 - 1288.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
K. Lagesen, P. Hallin, E. A. Rodland, H.-H. Staerfeldt, T. Rognes, and D. W. Ussery
RNAmmer: consistent and rapid annotation of ribosomal RNA genes
Nucleic Acids Res., May 14, 2007; 35(9): 3100 - 3108.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. T. Chang, D. Ghosh, D. E. Kirschner, and J. J. Linderman
Peptide length-based prediction of peptide-MHC class II binding
Bioinformatics, November 15, 2006; 22(22): 2761 - 2767.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
L. Budagyan and R. Abagyan
Weighted quality estimates in machine learning
Bioinformatics, November 1, 2006; 22(21): 2597 - 2603.
[Abstract] [Full Text] [PDF]


Home page
Protein Eng Des SelHome page
T. T. Tran, H. Treutlein, and A. W. Burgess
Designing amino acid residues with single-conformations
Protein Eng. Des. Sel., September 1, 2006; 19(9): 401 - 408.
[Abstract] [Full Text] [PDF]


Home page
Biophys. JHome page
Y. Sawada and S. Honda
Structural Diversity of Protein Segments Follows a Power-Law Distribution
Biophys. J., August 15, 2006; 91(4): 1213 - 1223.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. G. Lees, A. J. Miles, F. Wien, and B. A. Wallace
A reference database for circular dichroism spectroscopy covering fold and secondary structure space
Bioinformatics, August 15, 2006; 22(16): 1955 - 1962.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. L. Marsden, D. Lee, M. Maibaum, C. Yeats, and C. A. Orengo
Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space
Nucleic Acids Res., February 15, 2006; 34(3): 1066 - 1080.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
P. Nielsen and A. Krogh
Large-scale prokaryotic gene prediction and comparison to genome annotation
Bioinformatics, December 15, 2005; 21(24): 4322 - 4329.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Z. Gaspari, K. Vlahovicek, and S. Pongor
Efficient recognition of folds in protein 3D structures by the improved PRIDE algorithm
Bioinformatics, August 1, 2005; 21(15): 3322 - 3323.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
G. Wang and R. L. Dunbrack Jr
PISCES: recent improvements to a PDB sequence culling server
Nucleic Acids Res., July 1, 2005; 33(suppl_2): W94 - W98.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Y. Zheng and D. Yang
STARS: statistics on inter-atomic distances and torsion angles in protein secondary structures
Bioinformatics, June 15, 2005; 21(12): 2925 - 2926.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. O. Jonsdottir, F. S. Jorgensen, and S. Brunak
Prediction methods and databases within chemoinformatics: emphasis on drugs and drug candidates
Bioinformatics, May 15, 2005; 21(10): 2145 - 2160.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
G. R.S. Hartig, T. T. Tran, and M. L. Smythe
Intramolecular disulphide bond arrangements in nonhomologous proteins
Protein Sci., February 1, 2005; 14(2): 474 - 482.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
S. D. McAllister, D. P. Hurst, J. Barnett-Norris, D. Lynch, P. H. Reggio, and M. E. Abood
Structural Mimicry in Class A G Protein-coupled Receptor Rotamer Toggle Switches: THE IMPORTANCE OF THE F3.36(201)/W6.48(357) INTERACTION IN CANNABINOID CB1 RECEPTOR ACTIVATION
J. Biol. Chem., November 12, 2004; 279(46): 48024 - 48037.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
V. A. Ilyin, A. Abyzov, and C. M. Leslin
Structural alignment of proteins by a novel TOPOFIT method, as a superimposition of common volumes at a topomax point
Protein Sci., July 1, 2004; 13(7): 1865 - 1874.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
H. Viklund and A. Elofsson
Best {alpha}-helical transmembrane protein topology predictions are achieved using hidden Markov models and evolutionary information
Protein Sci., July 1, 2004; 13(7): 1908 - 1917.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. G. Bagos, T. D. Liakopoulos, I. C. Spyropoulos, and S. J. Hamodrakas
PRED-TMBB: a web server for predicting the topology of {beta}-barrel outer membrane proteins
Nucleic Acids Res., July 1, 2004; 32(suppl_2): W400 - W404.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
T. A. Binkowski, P. Freeman, and J. Liang
pvSOAR: detecting similar surface patterns of pocket and void surfaces of amino acid residues on proteins
Nucleic Acids Res., July 1, 2004; 32(suppl_2): W555 - W558.
[Abstract] [Full Text] [PDF]


Home page
Biophys. JHome page
C. Zhang, S. Liu, H. Zhou, and Y. Zhou
The Dependence of All-Atom Statistical Potentials on Structural Training Database
Biophys. J., June 1, 2004; 86(6): 3349 - 3358.
[Abstract] [Full Text] [PDF]


Home page
Protein Eng Des SelHome page
T. la Cour, L. Kiemer, A. Molgaard, R. Gupta, K. Skriver, and S. Brunak
Analysis and prediction of leucine-rich nuclear export signals
Protein Eng. Des. Sel., June 1, 2004; 17(6): 527 - 536.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
E. Eden and S. Brunak
Analysis and recognition of 5' UTR intron splice sites in human pre-mRNA
Nucleic Acids Res., February 11, 2004; 32(3): 1131 - 1142.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
C. Zhang, S. Liu, and Y. Zhou
Accurate and efficient loop selections by the DFIRE-based all-atom statistical potential
Protein Sci., February 1, 2004; 13(2): 391 - 399.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
C. Zhang, S. Liu, H. Zhou, and Y. Zhou
An accurate, residue-level, pair potential of mean force for folding and binding based on the distance-scaled, ideal-gas reference state
Protein Sci., February 1, 2004; 13(2): 400 - 411.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Espadaler, N. Fernandez-Fuentes, A. Hermoso, E. Querol, F. X. Aviles, M. J. E. Sternberg, and B. Oliva
ArchDB: automated protein loop classification as a tool for structural genomics
Nucleic Acids Res., January 1, 2004; 32(90001): D185 - 188.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. Vinayagam, G. Pugalenthi, R. Rajesh, and R. Sowdhamini
DSDBASE: a consortium of native and modelled disulphide bonds in proteins
Nucleic Acids Res., January 1, 2004; 32(90001): D200 - 202.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
J. Meiler and D. Baker
Coupled prediction of protein secondary and tertiary structure
PNAS, October 14, 2003; 100(21): 12105 - 12110.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
I. Westerlund, G. von Heijne, and O. Emanuelsson
LumenP--A neural network predictor for protein localization in the thylakoid lumen
Protein Sci., October 1, 2003; 12(10): 2360 - 2366.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
K. A. Oberg, J.-M. Ruysschaert, and E. Goormaghtigh
Rationally selected basis proteins: A new approach to selecting proteins for spectroscopic secondary structure analysis
Protein Sci., September 1, 2003; 12(9): 2015 - 2031.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
S. Jiang, A. Tovchigrechko, and I. A. Vakser
The role of geometric complementarity in secondary structure packing: A systematic docking study
Protein Sci., August 1, 2003; 12(8): 1646 - 1651.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
A. S. Juncker, H. Willenbrock, G. von Heijne, S. Brunak, H. Nielsen, and A. Krogh
Prediction of lipoprotein signal peptides in Gram-negative bacteria
Protein Sci., August 1, 2003; 12(8): 1652 - 1662.
[Abstract] [Full Text] [PDF]


Home page
Int ImmunolHome page
P. Saxova, S. Buus, S. Brunak, and C. Kesmir
Predicting proteasomal cleavage sites: a comparison of available methods
Int. Immunol., July 1, 2003; 15(7): 781 - 787.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
H. Zhou and Y. Zhou
Predicting the topology of transmembrane helical proteins using mean burial propensity and a hidden-Markov-model-based method
Protein Sci., July 1, 2003; 12(7): 1547 - 1555.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
D. K. Smith, P. Radivojac, Z. Obradovic, A. K. Dunker, and G. Zhu
Improved amino acid flexibility parameters
Protein Sci., May 1, 2003; 12(5): 1060 - 1072.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
C. Hardin, M. P. Eastwood, M. C. Prentiss, Z. Luthey-Schulten, and P. G. Wolynes
Associative memory Hamiltonians for structure prediction without homology: alpha /beta proteins
PNAS, February 18, 2003; 100(4): 1679 - 1684.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J.-t. Guo, D. Xu, D. Kim, and Y. Xu
Improving the performance of DomainParser for structural domain partition using neural network
Nucleic Acids Res., February 1, 2003; 31(3): 944 - 952.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
C. P. Chen, A. Kernytsky, and B. Rost
Transmembrane helix predictions revisited
Protein Sci., December 1, 2002; 11(12): 2774 - 2791.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
R. Nair and B. Rost
Sequence conserved for subcellular localization
Protein Sci., December 1, 2002; 11(12): 2836 - 2847.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
A. G. de Brevern, H. Valadie, S. Hazout, and C. Etchebest
Extension of a local backbone description using a structural alphabet: A new approach to the sequence-structure relationship
Protein Sci., December 1, 2002; 11(12): 2871 - 2886.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
J. Nilsson, B. Persson, and G. von Heijne
Prediction of partial membrane protein topologies using a consensus approach
Protein Sci., December 1, 2002; 11(12): 2974 - 2980.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
A. R. Ortiz, C. E.M. Strauss, and O. Olmea
MAMMOTH (Matching molecular models obtained from theory): An automated method for model comparison
Protein Sci., November 1, 2002; 11(11): 2606 - 2621.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
H. Zhou and Y. Zhou
Distance-scaled, finite ideal-gas reference state improves structure-derived potentials of mean force for structure selection and stability prediction
Protein Sci., November 1, 2002; 11(11): 2714 - 2726.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
R. Bonneau, I. Ruczinski, J. Tsai, and D. Baker
Contact order and ab initio protein structure prediction
Protein Sci., August 1, 2002; 11(8): 1937 - 1944.
[Abstract] [Full Text] [PDF]


Home page
Protein Eng Des SelHome page
C. L. Wilson, S. J. Hubbard, and A. J. Doig
A critical assessment of the secondary structure {alpha}-helices and their termini in proteins
Protein Eng. Des. Sel., July 1, 2002; 15(7): 545 - 554.
[Abstract] [Full Text] [PDF]


Home page
Protein Eng Des SelHome page
M.N. Fodje and S. Al-Karadaghi
Occurrence, conformational features and amino acid propensities for the {pi}-helix
Protein Eng. Des. Sel., May 1, 2002; 15(5): 353 - 358.
[Abstract] [Full Text] [PDF]


Home page
Protein Eng Des SelHome page
S. C.E. Tosatto, E. Bindewald, J. Hesser, and R. Manner
A divide and conquer approach to fast loop modeling
Protein Eng. Des. Sel., April 1, 2002; 15(4): 279 - 286.
[Abstract] [Full Text] [PDF]


Home page
Protein Eng Des SelHome page
D. J. Rigden
Use of covariance analysis for the prediction of structural domain boundaries from multiple protein sequence alignments
Protein Eng. Des. Sel., February 1, 2002; 15(2): 65 - 77.
[Abstract] [Full Text] [PDF]


Home page
J. Pharmacol. Exp. Ther.Home page
M. F. Eckenhoff, K. Chan, and R. G. Eckenhoff
Multiple Specific Binding Targets for Inhaled Anesthetics in the Mammalian Brain
J. Pharmacol. Exp. Ther., January 1, 2002; 300(1): 172 - 179.
[Abstract] [Full Text] [PDF]


Home page
Protein Eng Des SelHome page
B. Reva, A. Kister, S. Topiol, and I. Gelfand
Determining the roles of different chain fragments in recognition of immunoglobulin fold
Protein Eng. Des. Sel., January 1, 2002; 15(1): 13 - 19.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
P. Bradley, L. Cowen, M. Menke, J. King, and B. Berger
BETAWRAP: Successful prediction of parallel beta -helices from primary sequence reve