Dissertação de Mestrado
Automatic Hate Speech Detection in Portuguese Social Media Text
2022
—Informações chave
Autores:
Orientadores:
Publicado em
18/11/2022
Resumo
Online Hate Speech (HS) has been growing dramatically on social media and its uncontrolled spread has motivated researchers to develop a diversity of methods for its automated detection. However, the detection of online HS in Portuguese still merits further research. To fill this gap, we explored different models that proved to be successful in the literature to address this task. In particular, we have explored models that use the BERT architecture. Beyond testing single-task models we also explored multitask models that use the information on other related categories to learn HS. To better capture the semantics of this type of texts, we developed HateBERTimbau, a retrained version of BERTimbau more directed to social media language including potential HS targeting African descent, Roma, and LGBTQI+ communities. The performed experiments were based on CO-HATE and FIGHT, corpora of social media messages posted by the Portuguese online community that were labelled regarding the presence of HS among other categories. The results achieved show the importance of considering the annotator's agreement on the data used to develop HS detection models. Comparing different subsets of data used for the training of the models it was shown that, in general, a higher agreement on the data leads to better results. HATEBERTimbau consistently outperformed BERTimbau on both datasets confirming that further pre-training of BERTimbau was a successful strategy to obtain a language model more suitable for online HS detection in Portuguese. The implementation of target-specific models, and multitask learning have shown potential in obtaining better results.
Detalhes da publicação
Autores da comunidade :
Bernardo Cunha Matos
ist189419
Orientadores desta instituição:
Domínio Científico (FOS)
- Engenharia Eletrotécnica, Eletrónica e Informática
Idioma da publicação (código ISO)
- Inglês
Acesso à publicação:
Embargo levantado
Data do fim do embargo:
01/09/2023
Nome da instituição
Instituto Superior Técnico