Master's Thesis

Sparse Transformers for High Order Epistasis Detection

Miguel Ângelo da Silva Graça2022

Key information

Authors:

Miguel Ângelo da Silva Graça (Miguel Ângelo da Silva Graça)

Supervisors:

Leonel Augusto Pires Seabra de Sousa (Leonel Augusto Pires Seabra de Sousa); Aleksandar Ilic (Aleksandar Ilic)

Published in

11/23/2022

Abstract

Genome-Wide Association Studies (GWAS) aim to identify relations between Single Nucleotide Polymorphisms (SNPs) and the manifestation of certain diseases, which is an important challenge in biomedicine. However, most genetic diseases are not only explained by the effects of individual SNPs, but by the interactions between several SNPs, known as epistasis. Detecting high order epistasis is a very computationally demanding task, due to the exponential increase in evaluated combinations of SNPs. Recently, deep learning has emerged as a possible solution for genomic prediction, but the black-box nature of neural networks and lack of explainability is a drawback yet to be solved. In this dissertation, a new framework for interpreting neural networks for epistasis detection is presented. Using sparse transformers, a technique not yet employed for epistasis detection, SNPs can be assigned attention scores to quantify their relevance for predicting a phenotype. This new methodology is proposed to be tested on IPUs, a recent massively parallel processor aimed at machine learning workloads and efficient processing of sparse data. The results on simulated datasets show that the proposed framework outperforms state-of-the-art methods for explainability, identifying SNP interactions in various epistasis scenarios. Furthermore, training on IPUs provides higher performance than GPUs and TPUs, achieving reasonable speedups up to 2.79x. To conclude, the proposed framework is validated on a real breast cancer dataset, identifying second to fifth order interactions in the top 40% most relevant SNPs.

Publication details

Authors in the community:

Supervisors of this institution:

Fields of Science and Technology (FOS)

electrical-engineering-electronic-engineering-information-engineering - Electrical engineering, electronic engineering, information engineering

Publication language (ISO code)

eng - English

Rights type:

Embargo lifted

Date available:

08/30/2023

Institution name

Instituto Superior Técnico