Dissertação de Mestrado
Diving into Gender Translation Bias for the Portuguese Language - A Comparative Analysis of Commercial Machine Translation Systems, General-Purpose LLMs, and Non-Commercial Translation-Specific Models
2025
—Informações chave
Autores:
Orientadores:
Publicado em
29/06/2025
Resumo
Bias in Machine Translation models is a growing concern, particularly when translating into more gender-marked languages, where gender assumptions may be necessary. In such cases, Machine Translation models can perpetuate and even amplify stereotypes, reinforcing harmful patterns of societal discrimination. This work aims to address a gap in current research by investigating gender bias in English-to-Portuguese translation. In the first stage of our work, we conduct several experiments to comparatively analyze commercial Machine Translation systems, general-purpose LLMs, and non-commercial translation-specific models across different contexts and dimensions of gender bias. We evaluate how gender bias manifests in both single- and multi-sentence contexts and assess whether sentence sentiment impacts gender assignment. Additionally, we compare Portuguese results to those from other Romance languages (French, Spanish, Italian). Our findings show that commercial MT systems still lead in producing unbiased translations, although significant biases persist, particularly in inter-sentence contexts. Moreover, while these systems have improved, other Romance languages still exhibit greater gender bias than Portuguese. In the second stage of our work, we explore bias mitigating strategies, particularly fine-tuning. We adapt an existing model using a small, gender-balanced dataset and demonstrate that fine-tuning is a promising and efficient approach for mitigating bias. This work advances the state-of-the-art by offering a detailed evaluation of the current gender bias landscape in the English-Portuguese language pair and underscores the need for more equitable language technologies.
Detalhes da publicação
Autores da comunidade :
Sofia Seabra Bonifácio
ist192559
Orientadores desta instituição:
Domínio Científico (FOS)
electrical-engineering-electronic-engineering-information-engineering - Engenharia Eletrotécnica, Eletrónica e Informática
Idioma da publicação (código ISO)
eng - Inglês
Acesso à publicação:
Acesso Embargado
Data do fim do embargo:
29/03/2026
Nome da instituição
Instituto Superior Técnico