Master's Thesis
PROLEGIS – Intelligent Search in Legislation Databases
2016
—Key information
Authors:
Supervisors:
Published in
05/31/2016
Abstract
Portuguese legislation, similarly to other countries, is not published in an organized way, being it by topics or concepts. Instead, it is organized by a numbering system which follows the publication order. For a common citizen or even researchers, searching for information about a subject or a specific problem is an hard and complex task. The categorization of legal texts, besides requiring specialized labour, is a task which would need a great amount of time due to the quantity of published documents. The purpose of this work focuses in evaluating the possibility of automatically assign to this legislative documents a category using Machine Learning algorithms. The focus of this work will be on the supervised domain, nevertheless, an unsupervised clustering analysis is also explored. Multiple supervised classification algorithms are experimented, using a set of pre-classified documents, in order to comparatively evaluate their classification performances. Support Vector Machines, K-Nearest Neighbours, Multinomial Naive Bayes and Decision-Trees were used individually and, in order to seek to enhance the results, in conjunction with various techniques for pre-processing features. Latent Semantic Indexing, feature selection with different metrics and stemming were analysed.
Publication details
Authors in the community:
Hugo Miguel de Jesus Lopes
ist167603
Supervisors of this institution:
Carlos Alberto Pinto Ferreira
ist11342
Luís Manuel Marques Custódio
ist13279
Fields of Science and Technology (FOS)
electrical-engineering-electronic-engineering-information-engineering - Electrical engineering, electronic engineering, information engineering
Publication language (ISO code)
eng - English
Rights type:
Embargo lifted
Date available:
04/08/2017
Institution name
Instituto Superior Técnico