Dissertação de Mestrado
PROLEGIS – Intelligent Search in Legislation Databases
2016
—Informações chave
Autores:
Orientadores:
Publicado em
31/05/2016
Resumo
Portuguese legislation, similarly to other countries, is not published in an organized way, being it by topics or concepts. Instead, it is organized by a numbering system which follows the publication order. For a common citizen or even researchers, searching for information about a subject or a specific problem is an hard and complex task. The categorization of legal texts, besides requiring specialized labour, is a task which would need a great amount of time due to the quantity of published documents. The purpose of this work focuses in evaluating the possibility of automatically assign to this legislative documents a category using Machine Learning algorithms. The focus of this work will be on the supervised domain, nevertheless, an unsupervised clustering analysis is also explored. Multiple supervised classification algorithms are experimented, using a set of pre-classified documents, in order to comparatively evaluate their classification performances. Support Vector Machines, K-Nearest Neighbours, Multinomial Naive Bayes and Decision-Trees were used individually and, in order to seek to enhance the results, in conjunction with various techniques for pre-processing features. Latent Semantic Indexing, feature selection with different metrics and stemming were analysed.
Detalhes da publicação
Autores da comunidade :
Hugo Miguel de Jesus Lopes
ist167603
Orientadores desta instituição:
Carlos Alberto Pinto Ferreira
ist11342
Luís Manuel Marques Custódio
ist13279
Domínio Científico (FOS)
electrical-engineering-electronic-engineering-information-engineering - Engenharia Eletrotécnica, Eletrónica e Informática
Idioma da publicação (código ISO)
eng - Inglês
Acesso à publicação:
Embargo levantado
Data do fim do embargo:
08/04/2017
Nome da instituição
Instituto Superior Técnico