Dissertação de Mestrado

Sparse Transformers for High Order Epistasis Detection

Miguel Ângelo da Silva Graça2022

Informações chave

Autores:

Miguel Ângelo da Silva Graça (Miguel Ângelo da Silva Graça)

Orientadores:

Leonel Augusto Pires Seabra de Sousa (Leonel Augusto Pires Seabra de Sousa); Aleksandar Ilic (Aleksandar Ilic)

Publicado em

23/11/2022

Resumo

Genome-Wide Association Studies (GWAS) aim to identify relations between Single Nucleotide Polymorphisms (SNPs) and the manifestation of certain diseases, which is an important challenge in biomedicine. However, most genetic diseases are not only explained by the effects of individual SNPs, but by the interactions between several SNPs, known as epistasis. Detecting high order epistasis is a very computationally demanding task, due to the exponential increase in evaluated combinations of SNPs. Recently, deep learning has emerged as a possible solution for genomic prediction, but the black-box nature of neural networks and lack of explainability is a drawback yet to be solved. In this dissertation, a new framework for interpreting neural networks for epistasis detection is presented. Using sparse transformers, a technique not yet employed for epistasis detection, SNPs can be assigned attention scores to quantify their relevance for predicting a phenotype. This new methodology is proposed to be tested on IPUs, a recent massively parallel processor aimed at machine learning workloads and efficient processing of sparse data. The results on simulated datasets show that the proposed framework outperforms state-of-the-art methods for explainability, identifying SNP interactions in various epistasis scenarios. Furthermore, training on IPUs provides higher performance than GPUs and TPUs, achieving reasonable speedups up to 2.79x. To conclude, the proposed framework is validated on a real breast cancer dataset, identifying second to fifth order interactions in the top 40% most relevant SNPs.

Detalhes da publicação

Autores da comunidade :

Orientadores desta instituição:

Domínio Científico (FOS)

electrical-engineering-electronic-engineering-information-engineering - Engenharia Eletrotécnica, Eletrónica e Informática

Idioma da publicação (código ISO)

eng - Inglês

Acesso à publicação:

Embargo levantado

Data do fim do embargo:

30/08/2023

Nome da instituição

Instituto Superior Técnico