Ultra-low power approximate processing-in-memory acceleration for deep neural networks
Produktform: Buch
Today, a large number of applications depend on deep neural networks (DNN) to process data and
perform complicated tasks at restricted power and latency specifications. Therefore, processing-in-
memory (PIM) platforms are actively explored as a promising approach to improve DNN computing
systems’ throughput and energy efficiency. Several PIM architectures adopt resistive non-volatile
memories as their main unit to build crossbar-based accelerators for DNN inference. However, these
structures suffer from several drawbacks such as reliability, low accuracy, large ADCs/DACs power
consumption and area, high write energy, etc.
On the other hand, an increasing number of embedded devices for neural network acceleration
have adopted hardware approximations as they enable an efficient trade-off between computational
resources, energy efficiency, and network accuracy. Both approximate computing and mixed-signal in-
memory accelerators are promising paradigms to significantly fulfill the computational requirements
of DNN inference without accuracy loss.
In this thesis, I tackle these problems by pursuing both directions of hardware optimizations using
in-memory computations and more HW/SW co-design optimization using approximate computing.
The main goal of this thesis is to offer an efficient emerging memory technology-based PIM archi-
tecture that integrates several presented approximate compute paradigms and allows for in-memory
specific approximate computing optimizations.
I start by presenting an efficient crossbar design and implementation intended for PIM accelera-
tion of artificial neural networks based on ferroelectric field-effect transistor (FeFET) technology.
The novel mixed-signal blocks shown in this work reduce the device-to-device variation and are
optimized for low area, low power, and high throughput. In addition, I illustrate the operation and
programmability of the crossbar that adopts bit-decomposition techniques for MAC operations.
Afterwards, I construct a new mixed-signal in-memory architecture based on the bit-decomposition
of the MAC operations. This architecture uses the previously mentioned macro. Compared to the
state-of-the-art, this system architecture provides a high level of parallelism while using only 3-bit
ADCs and eliminating the need for any DAC. In addition, it provides flexibility and a very high
utilization efficiency for varying tasks and loads. Simulations demonstrate that this architecture
outperforms state-of-the-art efficiencies with 36.5 TOPS/W and can pack 2.05 TOPS with 8-bit acti-
vation and 4-bit weight precision in an area of 4.9 mm? using 22 nm FDSOI technology. Employing
binary operation, it achieves a performance of 1169 TOPS/W and over 261 TOPS/W/mm? on the
system level.
Then, I complement this architecture by presenting several approximation techniques that can
be integrated but are not limited to the presented architecture. I first introduce a framework for
kernel-wise optimization in quantization and approximation. Then I pursue a digital approximation
technique that can be performed by truncation. In the case of the application of digital approximation
to the presented architecture, I propose seven different approximations with a high degree of freedom.
I explore the design space of these approximations and evaluate their effect on area and power as
well as network accuracy loss.weiterlesen