Noch Fragen? 0800 / 33 82 637

Ultra-low power approximate processing-in-memory acceleration for deep neural networks

Produktform: Buch

Today, a large number of applications depend on deep neural networks (DNN) to process data and perform complicated tasks at restricted power and latency specifications. Therefore, processing-in- memory (PIM) platforms are actively explored as a promising approach to improve DNN computing systems’ throughput and energy efficiency. Several PIM architectures adopt resistive non-volatile memories as their main unit to build crossbar-based accelerators for DNN inference. However, these structures suffer from several drawbacks such as reliability, low accuracy, large ADCs/DACs power consumption and area, high write energy, etc. On the other hand, an increasing number of embedded devices for neural network acceleration have adopted hardware approximations as they enable an efficient trade-off between computational resources, energy efficiency, and network accuracy. Both approximate computing and mixed-signal in- memory accelerators are promising paradigms to significantly fulfill the computational requirements of DNN inference without accuracy loss. In this thesis, I tackle these problems by pursuing both directions of hardware optimizations using in-memory computations and more HW/SW co-design optimization using approximate computing. The main goal of this thesis is to offer an efficient emerging memory technology-based PIM archi- tecture that integrates several presented approximate compute paradigms and allows for in-memory specific approximate computing optimizations. I start by presenting an efficient crossbar design and implementation intended for PIM accelera- tion of artificial neural networks based on ferroelectric field-effect transistor (FeFET) technology. The novel mixed-signal blocks shown in this work reduce the device-to-device variation and are optimized for low area, low power, and high throughput. In addition, I illustrate the operation and programmability of the crossbar that adopts bit-decomposition techniques for MAC operations. Afterwards, I construct a new mixed-signal in-memory architecture based on the bit-decomposition of the MAC operations. This architecture uses the previously mentioned macro. Compared to the state-of-the-art, this system architecture provides a high level of parallelism while using only 3-bit ADCs and eliminating the need for any DAC. In addition, it provides flexibility and a very high utilization efficiency for varying tasks and loads. Simulations demonstrate that this architecture outperforms state-of-the-art efficiencies with 36.5 TOPS/W and can pack 2.05 TOPS with 8-bit acti- vation and 4-bit weight precision in an area of 4.9 mm? using 22 nm FDSOI technology. Employing binary operation, it achieves a performance of 1169 TOPS/W and over 261 TOPS/W/mm? on the system level. Then, I complement this architecture by presenting several approximation techniques that can be integrated but are not limited to the presented architecture. I first introduce a framework for kernel-wise optimization in quantization and approximation. Then I pursue a digital approximation technique that can be performed by truncation. In the case of the application of digital approximation to the presented architecture, I propose seven different approximations with a high degree of freedom. I explore the design space of these approximations and evaluate their effect on area and power as well as network accuracy loss.weiterlesen

Dieser Artikel gehört zu den folgenden Serien

Sprache(n): Englisch

ISBN: 978-3-9597419-7-2 / 978-3959741972 / 9783959741972

Verlag: RPTU Rheinland-Pfälzische Technische Universität Kaiserslautern Landau

Erscheinungsdatum: 30.06.2024

Seiten: 168

Autor(en): Taha Soliman

50,00 € inkl. MwSt.
kostenloser Versand

lieferbar - Lieferzeit 10-15 Werktage

zurück