Master's Thesis

Classification of microcytic anaemias using machine learning methods

Beatriz Neves Leitão2021

Key information

Authors:

Beatriz Neves Leitão (Beatriz Neves Leitão)

Supervisors:

Maria Paula Duarte Faustino Gonçalves; Susana de Almeida Mendes Vinga Martins (Susana de Almeida Mendes Vinga Martins)

Published in

11/10/2021

Abstract

The prevalence of anaemia in the world population is 24.8%. Proper discrimination between microcytic anaemias is essential to provide the right treatment and genetic counselling. As the most reliable methods to diagnose thalassemias and IDA (iron deficiency anaemia), some of the most common microcytic anaemias are expensive and time-consuming, many indexes have been developed through the years. These indexes, however, have not been revealed to be 100% accurate. In this thesis, haematological data from a sample of the Portuguese population constituted by 390 individuals and their diagnosis was used to train and test different machine learning algorithms. The objective was to develop a binary classifier, specifically adapted to the Portuguese population, to discriminate β-thalassemia carriers from IDA patients. Beyond that, a multi-class classifier capable of distinguishing between β-thalassemia carriers, α-thalassemia carriers, IDA patients, and healthy subjects was also developed. In order not to compromise the main objective, to obtain a quick and accessible diagnosis, the classifiers developed were only based on information obtained through a complete blood count test, one of the most common laboratory tests in medicine. Although it was not possible to surpass the performance with the binary classifiers created of the most reliable index for the Portuguese population, RDWI (red cell distribution width index), which presented a median accuracy of 95.4%, it was possible to match it with the random forest algorithm. This algorithm showed an excellent performance in the binary and in the multi-class classification, where it achieved promising results, revelling a median accuracy of 93.0%.

Publication details

Authors in the community:

Supervisors of this institution:

Fields of Science and Technology (FOS)

industrial-biotechnology - Industrial Biotechnology

Publication language (ISO code)

eng - English

Rights type:

Embargo lifted

Date available:

09/19/2022

Institution name

Instituto Superior Técnico