Master's Thesis
Standardization of Portuguese Addresses: Transformer-Based Architecture
2025
—Key information
Authors:
Supervisors:
Published in
05/26/2025
Abstract
Standardizing addresses is crucial for various applications, including geocoding, logistics, navigation, and data management. However, inconsistent and incomplete address data pose significant challenges to the accuracy and efficiency of address processing systems. Traditional rule-based and heuristic-based models often struggle to address these challenges effectively due to their limited capacity to capture complex patterns and variations in real-world address data. First, emphasis was given to the detailed understanding of the problem at hand and to the review of the existing literature, addressing important concepts related to the topic. To solve the problem, this dissertation proposes an innovative approach to standardization using deep learning techniques, a Seq2Seq model with an attention mechanism adapted for Portuguese addresses. The methodology involves implementing a deep learning architecture capable of learning and capturing complex relationships between address components. Unlike traditional models that rely on predefined rules or patterns, the proposed deep learning model learns directly from the data, allowing it to adapt to the diverse and evolving nature of address formats. Given the lack of Portuguese standardized addresses, it was necessary to use non-standardized data and standardize it as much as possible. However, the available data was insufficient for all cases, resulting in an accuracy rate of 71.4%. The findings suggest that with a more extensive dataset, the accuracy could exceed 90%.
Publication details
Authors in the community:
Fátima Agostinho Napoleão
ist191605
Supervisors of this institution:
João Luís Gustavo de Matos
ist13346
Fields of Science and Technology (FOS)
electrical-engineering-electronic-engineering-information-engineering - Electrical engineering, electronic engineering, information engineering
Publication language (ISO code)
eng - English
Rights type:
Embargoed access
Date available:
03/29/2026
Institution name
Instituto Superior Técnico