Master's Thesis

Efficient Speech Recognition on the Edge with Binary Neural Networks

Frederico Maria De Almeida Santos Roque2024

Key information

Authors:

Frederico Maria De Almeida Santos Roque (Frederico Maria De Almeida Santos Roque)

Supervisors:

Pedro Filipe Zeferino Aidos Tomás (Pedro Filipe Zeferino Tomás); Nuno Filipe Simões Santos Moraes da Silva Neves (Nuno Filipe Simões Santos Moraes Neves)

Published in

December 6, 2024

Abstract

Automatic speech recognition has seen large improvements in the last decade thanks to the use of deep neural networks and the introduction of Transformers and it's derivatives. However, the ever-increasing complexity of new models makes it harder to run them on low-end hardware like edge devices. It tends to be both faster and more energy efficient to off-load neural network computing to the cloud even with the costs associated with transmitting data over the internet. This is problematic for applications that require low latency and low energy consumption. Binary neural networks are an extreme method of quantization that aims to massively reduce memory and computational requirements while still being competitive with full-precision models regarding accuracy. Binary neural networks have been shown to provide a $38\times$ size reduction and a $58\times$ speedup on ImageNet, making it viable to run state-of-the-art models on the CPU instead of the GPU or a dedicated accelerator. However, the state-of-the-art for these networks is mainly focused on deep convolution networks which have fallen into disuse since the introduction of Transformers. Hence, this thesis aims to minimize the computational requirements of running a binary neural network on low-end devices for automatic speech recognition by binarizing a state-of-the-art model and optimizing its inference latency and memory consumption. This work proposes Binary Conformer which required binarizing a model with significantly higher complexity than what was attempted before for a task outside the scope of the majority of the current research.

Publication details

Authors in the community:

Supervisors of this institution:

Fields of Science and Technology (FOS)

electrical-engineering-electronic-engineering-information-engineering - Electrical engineering, electronic engineering, information engineering

Publication language (ISO code)

por - Portuguese

Rights type:

Embargo lifted

Date available:

September 29, 2025

Institution name

Instituto Superior Técnico