Dissertação de Mestrado
Open-domain Conversational Agent based on Pre-Trained Transformers for Human-Robot Interaction
2021
—Informações chave
Autores:
Orientadores:
Publicado em
22/11/2021
Resumo
Over the past years, many breakthroughs occurred in the field of Machine Learning (ML) and Natural Language Processing (NLP), such as generative pre-trained transformers (GPTs), and attention mechanisms that learn contextual relationships between words in a text. These breakthroughs came with several new possibilities regarding Human-Robot Interactions (e.g. the creation of an open-domain chatbot). However, a substantial amount of research and available data are in English, causing low-resourced languages to be overlooked. This thesis explored this problem with two options: (i) Translation of the sentences before and after using the model fine-tuned on an English-based dataset, (ii) Translation of the English-based dataset to Portuguese and then fine-tune this model on it. When in presence of adequate training data and a good choice of generation method, it was demonstrated that DialoGPT (dialogue generative pre-trained transformer), a tunable neural conversational answer generation model, could learn the basic skills to conduct a dialogue. For the language models as well as the baseline methods, two sources of evaluation were used: (i) Metrics for text generation based on uncertainty (i.e. perplexity), and similarity between sentences (i.e. BLEU, METEOR and ROUGE) and (ii) Human-based evaluation of the sentences. Finally, it was shown that it is possible to resort to MT to have a fluent speaking chatbot, in Portuguese. The translation of sentences before and after of the modified DialoGPT model, using the Daily Dialogue dataset led to the best results.
Detalhes da publicação
Autores da comunidade :
Mariana Fidalgo Fernandes
ist187074
Orientadores desta instituição:
José Santos Victor
ist12760
Plinio Moreno Lopez
ist31838
Domínio Científico (FOS)
electrical-engineering-electronic-engineering-information-engineering - Engenharia Eletrotécnica, Eletrónica e Informática
Idioma da publicação (código ISO)
eng - Inglês
Acesso à publicação:
Embargo levantado
Data do fim do embargo:
19/09/2022
Nome da instituição
Instituto Superior Técnico