Dissertação de Mestrado

Standardization of Portuguese Addresses: Transformer-Based Architecture

Fátima Agostinho Napoleão2025

Informações chave

Autores:

Fátima Agostinho Napoleão (Fátima Agostinho Napoleão)

Orientadores:

João Luís Gustavo de Matos (João Luís Gustavo de Matos); Alberto Manuel Rodrigues da Silva (Alberto Manuel Rodrigues da Silva)

Publicado em

26/05/2025

Resumo

Standardizing addresses is crucial for various applications, including geocoding, logistics, navigation, and data management. However, inconsistent and incomplete address data pose significant challenges to the accuracy and efficiency of address processing systems. Traditional rule-based and heuristic-based models often struggle to address these challenges effectively due to their limited capacity to capture complex patterns and variations in real-world address data. First, emphasis was given to the detailed understanding of the problem at hand and to the review of the existing literature, addressing important concepts related to the topic. To solve the problem, this dissertation proposes an innovative approach to standardization using deep learning techniques, a Seq2Seq model with an attention mechanism adapted for Portuguese addresses. The methodology involves implementing a deep learning architecture capable of learning and capturing complex relationships between address components. Unlike traditional models that rely on predefined rules or patterns, the proposed deep learning model learns directly from the data, allowing it to adapt to the diverse and evolving nature of address formats. Given the lack of Portuguese standardized addresses, it was necessary to use non-standardized data and standardize it as much as possible. However, the available data was insufficient for all cases, resulting in an accuracy rate of 71.4%. The findings suggest that with a more extensive dataset, the accuracy could exceed 90%.

Detalhes da publicação

Autores da comunidade :

Orientadores desta instituição:

Domínio Científico (FOS)

electrical-engineering-electronic-engineering-information-engineering - Engenharia Eletrotécnica, Eletrónica e Informática

Idioma da publicação (código ISO)

eng - Inglês

Acesso à publicação:

Acesso Embargado

Data do fim do embargo:

29/03/2026

Nome da instituição

Instituto Superior Técnico