Dissertação de Mestrado

Classification of microcytic anaemias using machine learning methods

Beatriz Neves Leitão2021

Informações chave

Autores:

Beatriz Neves Leitão (Beatriz Neves Leitão)

Orientadores:

Maria Paula Duarte Faustino Gonçalves; Susana de Almeida Mendes Vinga Martins (Susana de Almeida Mendes Vinga Martins)

Publicado em

10/11/2021

Resumo

The prevalence of anaemia in the world population is 24.8%. Proper discrimination between microcytic anaemias is essential to provide the right treatment and genetic counselling. As the most reliable methods to diagnose thalassemias and IDA (iron deficiency anaemia), some of the most common microcytic anaemias are expensive and time-consuming, many indexes have been developed through the years. These indexes, however, have not been revealed to be 100% accurate. In this thesis, haematological data from a sample of the Portuguese population constituted by 390 individuals and their diagnosis was used to train and test different machine learning algorithms. The objective was to develop a binary classifier, specifically adapted to the Portuguese population, to discriminate β-thalassemia carriers from IDA patients. Beyond that, a multi-class classifier capable of distinguishing between β-thalassemia carriers, α-thalassemia carriers, IDA patients, and healthy subjects was also developed. In order not to compromise the main objective, to obtain a quick and accessible diagnosis, the classifiers developed were only based on information obtained through a complete blood count test, one of the most common laboratory tests in medicine. Although it was not possible to surpass the performance with the binary classifiers created of the most reliable index for the Portuguese population, RDWI (red cell distribution width index), which presented a median accuracy of 95.4%, it was possible to match it with the random forest algorithm. This algorithm showed an excellent performance in the binary and in the multi-class classification, where it achieved promising results, revelling a median accuracy of 93.0%.

Detalhes da publicação

Autores da comunidade :

Orientadores desta instituição:

Domínio Científico (FOS)

industrial-biotechnology - Biotecnologia Industrial

Idioma da publicação (código ISO)

eng - Inglês

Acesso à publicação:

Embargo levantado

Data do fim do embargo:

19/09/2022

Nome da instituição

Instituto Superior Técnico