Using subtitles to deal with Out-of-Domain interactions

Daniel Filipe Nunes Magarreiro

Informações chave

Autores:

Daniel Filipe Nunes Magarreiro (Daniel Filipe Nunes Magarreiro)

Orientadores:

Francisco António Chaves Saraiva de Melo (Francisco António Chaves Saraiva de Melo); Maria Luísa Torres Ribeiro Marques da Silva Coheur (Maria Luísa Torres Ribeiro Marques da Silva Coheur)

Publicado em

04/11/2014

Resumo

Pessoas que interagem com sistemas de diálogo, muitas vezes colocam questões às quais os sistemas não conseguem responder, denominadas de Out of Domain (OOD). Recentemente, um corpus construído com interacções de legendas de filmes, chamados Subtle, foi construído e utilizado em Say Something Smart (SSS), um chatbot que lida com interacções OOD. Quando confrontado com uma pergunta do utilizador, SSS consulta a base de conhecimento usando Lucene, um sistema de recuperação de informação, que devolve um conjunto de respostas possíveis. Anteriormente, SSS utilizava duas métricas simples para escolher uma resposta a partir deste conjunto: verificar a semelhança da entrada do usuário com perguntas na base de conhecimento e calculando a frequência da resposta. Embora simples, esta abordagem nem sempre dá bons resultados, especialmente quando se lida com questões menos usuais. Neste trabalho desenvolvemos duas maneiras alternativas de seleccionar uma resposta de um conjunto de respostas possíveis, com o objectivo de melhorar os resultados globais do SSS: uma e combinar várias medidas de uma forma linear e outra é usar o paradigma Learning to Rank. Nas nossas avaliações, obtivemos 61,67% de respostas plausíveis usando uma combinação de quatro medidas e também obtivemos resultados promissores com a abordagem Learning to Rank, conseguindo 35 pontos percentuais a mais de respostas adequadas do que com a combinação linear de medidas. Realizamos uma avaliação comparando a versão actual do SSS com a anterior e descobrimos que a actual e capaz de responder a mais perguntas do usuário de maneira adequada, especialmente para Português. People that interact with dialogue systems often pose questions that cannot be handled by the system, called Out of Domain (OOD). Recently a corpus built using interactions from movie subtitles, called Subtle, was built and used in Say Something Smart (SSS), a chatbot that deals with OOD interactions. When faced with a user question, SSS consults the knowledge base using Lucene, an information retrieval system, that returns a set of possible answers. Previously, SSS used two simple metrics to choose one answer from this set: checking the similarity of the user input with questions in the knowledge base and the frequency of the answer. Although simple, this approach does not always give good results, especially when dealing with more unusual questions. In this work we develop two alternative ways of selecting an answer from a list of possible ones, with the goal of improving the overall results of SSS: one is to combine several measures in a linear way and the other to use the learning to rank paradigm. In our evaluations we were able to obtain 61.67% of plausible answers using a combination of four measures and we also got promising results with the Learning to Rank approach, achieving 35 percentage points more of suitable answers than with the linear combination of measures. We also conducted an evaluation comparing the current version of SSS with the previous and found that the current one is able to answer much more user requests suitably than the previous, especially for Portuguese.