Dissertação de Mestrado

Open-domain Conversational Agent based on Pre-Trained Transformers for Human-Robot Interaction

Mariana Fidalgo Fernandes2021

Informações chave

Autores:

Mariana Fidalgo Fernandes (Mariana Fidalgo Fernandes)

Orientadores:

José Alberto Rosado dos Santos Victor (José Santos Victor); Plinio Moreno Lopez (Plinio Moreno Lopez)

Publicado em

22/11/2021

Resumo

Over the past years, many breakthroughs occurred in the field of Machine Learning (ML) and Natural Language Processing (NLP), such as generative pre-trained transformers (GPTs), and attention mechanisms that learn contextual relationships between words in a text. These breakthroughs came with several new possibilities regarding Human-Robot Interactions (e.g. the creation of an open-domain chatbot). However, a substantial amount of research and available data are in English, causing low-resourced languages to be overlooked. This thesis explored this problem with two options: (i) Translation of the sentences before and after using the model fine-tuned on an English-based dataset, (ii) Translation of the English-based dataset to Portuguese and then fine-tune this model on it. When in presence of adequate training data and a good choice of generation method, it was demonstrated that DialoGPT (dialogue generative pre-trained transformer), a tunable neural conversational answer generation model, could learn the basic skills to conduct a dialogue. For the language models as well as the baseline methods, two sources of evaluation were used: (i) Metrics for text generation based on uncertainty (i.e. perplexity), and similarity between sentences (i.e. BLEU, METEOR and ROUGE) and (ii) Human-based evaluation of the sentences. Finally, it was shown that it is possible to resort to MT to have a fluent speaking chatbot, in Portuguese. The translation of sentences before and after of the modified DialoGPT model, using the Daily Dialogue dataset led to the best results.

Detalhes da publicação

Autores da comunidade :

Orientadores desta instituição:

Domínio Científico (FOS)

electrical-engineering-electronic-engineering-information-engineering - Engenharia Eletrotécnica, Eletrónica e Informática

Idioma da publicação (código ISO)

eng - Inglês

Acesso à publicação:

Embargo levantado

Data do fim do embargo:

19/09/2022

Nome da instituição

Instituto Superior Técnico