Tese de Doutoramento

Illuminating the Dark Proteome

Nelson Ricardo Perdigão Pereira2017

Informações chave

Autores:

Nelson Ricardo Perdigão Pereira (Nelson Perdigão)

Orientadores:

Agostinho Cláudio da Rosa (Agostinho Cláudio da Rosa)

Publicado em

06/01/2017

Resumo

Molecular models of a protein’s structure can give detailed insight into mechanisms underlying its function, especially when viewed in combination with sequence features. In theory, 3D structural models are now available for many proteins, however in practice it is often complex to find all appropriate models and view them with sequence features. Thus, we developed Aquaria, a new web resource that provides 46 million pre- calculated structural models using homology from sequence to structure – 10 times more than currently available from other resources, resulting in at least one matching structure for 87% of Swiss-Prot proteins and a median of 35 structures per protein Using Aquaria, we surveyed the known or visible proteome. Its complement, the ‘unknown’ or ‘dark’ proteome, i.e., regions of proteins that remain stubbornly inaccessible to both experimental structure determination and modeling, was scanned, stored and indexed into the Dark Proteome Database. Using the above systems, it was performed the most recent structural modeling study covering 546,000 proteins across many organisms, where it was found 44–54% of the proteome in eukaryotes and viruses is dark, compared with only 14% for archaea and bacteria. Surprisingly, most of the dark proteome could not be accounted for by conventional explanations, such as intrinsic disorder, transmembrane regions or compositional bias. Nearly half of the dark proteome comprised dark proteins, in which the entire sequence lacked similarity to any known structure. Dark proteins fulfill a wide variety of functions, but a subset showed distinct and largely unexpected features, such as association with secretion, specific tissues, the endoplasmic reticulum, disulfide bonding, and proteolytic cleavage. Dark proteins also had short sequence length, low evolutionary reuse, and few known interactions with other proteins. This thesis also suggests the existence of transmembrane regions undetected by current prediction methods. Therefore, our work suggests several new directions for research in structural and computational biology. This work surely will help focus the efforts of future research to shed light on the remaining dark proteome thus potentially revealing molecular processes of life that are currently unknown.

Detalhes da publicação

Autores da comunidade :

Orientadores desta instituição:

Domínio Científico (FOS)

electrical-engineering-electronic-engineering-information-engineering - Engenharia Eletrotécnica, Eletrónica e Informática

Palavras-chave

  • Big Data
  • Databases
  • Homology
  • Proteins
  • Structure.

Idioma da publicação (código ISO)

eng - Inglês

Acesso à publicação:

Embargo levantado

Data do fim do embargo:

01/12/2017

Nome da instituição

Instituto Superior Técnico