PhD Thesis
Illuminating the Dark Proteome
2017
—Key information
Authors:
Supervisors:
Published in
01/06/2017
Abstract
Molecular models of a protein’s structure can give detailed insight into mechanisms underlying its function, especially when viewed in combination with sequence features. In theory, 3D structural models are now available for many proteins, however in practice it is often complex to find all appropriate models and view them with sequence features. Thus, we developed Aquaria, a new web resource that provides 46 million pre- calculated structural models using homology from sequence to structure – 10 times more than currently available from other resources, resulting in at least one matching structure for 87% of Swiss-Prot proteins and a median of 35 structures per protein Using Aquaria, we surveyed the known or visible proteome. Its complement, the ‘unknown’ or ‘dark’ proteome, i.e., regions of proteins that remain stubbornly inaccessible to both experimental structure determination and modeling, was scanned, stored and indexed into the Dark Proteome Database. Using the above systems, it was performed the most recent structural modeling study covering 546,000 proteins across many organisms, where it was found 44–54% of the proteome in eukaryotes and viruses is dark, compared with only 14% for archaea and bacteria. Surprisingly, most of the dark proteome could not be accounted for by conventional explanations, such as intrinsic disorder, transmembrane regions or compositional bias. Nearly half of the dark proteome comprised dark proteins, in which the entire sequence lacked similarity to any known structure. Dark proteins fulfill a wide variety of functions, but a subset showed distinct and largely unexpected features, such as association with secretion, specific tissues, the endoplasmic reticulum, disulfide bonding, and proteolytic cleavage. Dark proteins also had short sequence length, low evolutionary reuse, and few known interactions with other proteins. This thesis also suggests the existence of transmembrane regions undetected by current prediction methods. Therefore, our work suggests several new directions for research in structural and computational biology. This work surely will help focus the efforts of future research to shed light on the remaining dark proteome thus potentially revealing molecular processes of life that are currently unknown.
Publication details
Authors in the community:
Nelson Perdigão
ist31928
Supervisors of this institution:
Agostinho Cláudio da Rosa
ist11812
Fields of Science and Technology (FOS)
electrical-engineering-electronic-engineering-information-engineering - Electrical engineering, electronic engineering, information engineering
Keywords
- Big Data
- Databases
- Homology
- Proteins
- Structure.
Publication language (ISO code)
eng - English
Rights type:
Embargo lifted
Date available:
12/01/2017
Institution name
Instituto Superior Técnico