Master's Thesis

Automatic Hate Speech Detection in Portuguese Social Media Text

Bernardo Cunha Matos2022

Key information

Authors:

Bernardo Cunha Matos (Bernardo Cunha Matos)

Supervisors:

Paula Cristina Quaresma da Fonseca Carvalho (Paula Cristina Quaresma da Fonseca Carvalho); Ricardo Daniel Santos Faro Marques Ribeiro

Published in

11/18/2022

Abstract

Online Hate Speech (HS) has been growing dramatically on social media and its uncontrolled spread has motivated researchers to develop a diversity of methods for its automated detection. However, the detection of online HS in Portuguese still merits further research. To fill this gap, we explored different models that proved to be successful in the literature to address this task. In particular, we have explored models that use the BERT architecture. Beyond testing single-task models we also explored multitask models that use the information on other related categories to learn HS. To better capture the semantics of this type of texts, we developed HateBERTimbau, a retrained version of BERTimbau more directed to social media language including potential HS targeting African descent, Roma, and LGBTQI+ communities. The performed experiments were based on CO-HATE and FIGHT, corpora of social media messages posted by the Portuguese online community that were labelled regarding the presence of HS among other categories. The results achieved show the importance of considering the annotator's agreement on the data used to develop HS detection models. Comparing different subsets of data used for the training of the models it was shown that, in general, a higher agreement on the data leads to better results. HATEBERTimbau consistently outperformed BERTimbau on both datasets confirming that further pre-training of BERTimbau was a successful strategy to obtain a language model more suitable for online HS detection in Portuguese. The implementation of target-specific models, and multitask learning have shown potential in obtaining better results.

Publication details

Authors in the community:

Supervisors of this institution:

Fields of Science and Technology (FOS)

electrical-engineering-electronic-engineering-information-engineering - Electrical engineering, electronic engineering, information engineering

Publication language (ISO code)

eng - English

Rights type:

Embargo lifted

Date available:

09/01/2023

Institution name

Instituto Superior Técnico