Artikel empfehlen

Efficient hardware acceleration of recurrent neural networks

Produktform: Buch / Einband - flex.(Paperback)

Abstract Every next innovation revolution comes at a faster pace than a previous one. The Agricultural Revolution happened approximately 10,000 years ago. The Scientific Revolution that came with significant advances in natural sciences happened 500 years ago. Less than 200 years after the Industrial Revolution that brought machines to replace manual labor and revolutionized production, we are evidencing a new revolution. According to Dr. Michio Kaku, professor of theoretical physics in City College of New York, who is also a futurist and a popularizer of science, "We are witnessing one of the greatest revolutions in all of human history – a revolution driven by artificial intelligence and the Internet of Things." Artificial Intelligence (AI) is a machinery approach of mimicking human reasoning used for adaptively solving problems, hence minimizing human involvement. The term AI was coined in 1956, but AI has become more popular only recently due to the deep learning breakthrough. Deep learning, also known as deep neural learning or Deep Neural Network (DNN), is a hierarchical composition of artificial neural neurons connected in layers with a problem-solving capability that increases with more layers creating a deeper structure. Deep learning-based AI surpasses human capabilities in many applications, creating a paradigm shift in virtually every tech industry sector, allowing for decision support systems and intelligent search systems that complement and augment human abilities. However, the AI revolution and progress in the deployment of DNNs would not be possible without the evolution of computers, namely improvements in computing power and storage. At the dawn of the computer industry, nobody knew where this new technology would take us. "I think there is a world market for maybe five computers." - Thomas Watson, president of IBM, 1943. Ken Olsen, a prominent computer industry pioneer, was quoted in 1977 as saying that "There is no reason for any individual to have a computer in their home." However, in less than 50 years, the rapid progress of technology following Moore’s law enabled computers to evolve according to the bravest envisions of futurists. Recently, the progress has brought computers to the next leap in innovation - the Internet of Things (IoT), which can be seen as another evolutionary step made by the computers on a long way from vacuum-tube machines occupying a complete building, to mainframes of a size of a room, to personal computers available at each desk, finally, to interrelated computing devices communicating over a network, which are available in our pocket as a cellphone or a tablet, as wearable devices, and ubiquitous sensors. The deployment of revolutionary AI on IoT devices requires unprecedented processing speed, power consumption, and energy efficiency. One of the most promising and rapidly developing platforms that can meet the requirements is Field Programmable Gate Array (FPGA). FPGAs are semiconductor devices that ii are based on a matrix of configurable logic blocks connected via programmable interconnects. Referring to Manoj Roge, vice president of product planning and business development at Achronix, currently, there is a change in paradigm with respect to FPGAs that are in the third era of programmable logic shifting from being used only as a glue logic or for prototyping to independent compute engines for data acceleration. Today’s FPGAs push the 500MHz performance barrier. FPGAs became a compelling proposition for almost any design due to an unprecedented increase in logic density and a host of other features. A Perfect Storm is a term used by analogy to an unusually severe storm that results from a rare combination of meteorological phenomena. Referring to this analogy, currently, we are in the middle of a perfect technological and innovation storm bringing together (1) revolutionary artificial intelligence with beyond-human recognition capabilities, (2) the ubiquitous IoT with unprecedented speed, power, and energy requirements, (3) and FPGAs emerging as independent computing platform with a unique combination of flexibility and efficiency. Deployment of DNNs has high computational and storage requirements. One of the most challenging for efficient implementation neural networks is a Long Short-Term Memory (LSTM) network that achieves advanced accuracy in many applications targeting sequence recognition, namely optical character and speech recognition, forecasting, and many more. The main goal of this research has been to design efficient hardware architectures of LSTM networks for applications requiring high throughput, low power, and energy efficiency. Despite the advantages of FPGAs, very often CPUs and GPUs are preferred over FPGAs because of the faster and easier development process. The thesis presents a holistic design space exploration methodology and an automatic framework to facilitate fast and efficient implementation of DNNs on FPGAs. This work also presents a low-power, energy-efficient solution with real-time capabilities for digitizing historical documents as an additional contribution. In the context of communication standards for IoT, the research targets a design of a critical component of a hardware architecture for error correction codes enabling high reliability and suitable for high-speed and low-latency wireless communication. The novel contributions presented in this thesis are bundled into five topics: • The first hardware architecture and Pareto-frontier analysis of bidirectional LSTM. • The first hardware architecture and Pareto-frontier analysis of multidimensional LSTM. • A cross-layer design space exploration methodology and a framework for automatic co-design and implementation of DNNs and hardware architectures on FPGAs. • The first heterogeneous architecture for low-power, real-time and energy-efficient device for highly accurate end-to-end transcription of historical documents. • The first hardware architecture for high-speed and low-latency Non-Binary Low- Density Parity-Check (NB-LDPC) check node for Galois Field GF(256).weiterlesen

Dieser Artikel gehört zu den folgenden Serien

Forschungsberichte Mikroelektronik

Sprache(n): Englisch

ISBN: 978-3-9597418-7-3 / 978-3959741873 / 9783959741873

Verlag: RPTU Rheinland-Pfälzische Technische Universität Kaiserslautern Landau

Erscheinungsdatum: 20.02.2023

Seiten: 170

Auflage: 1

Autor(en): Vladimir Rybalkin

50,00 € inkl. MwSt.
kostenloser Versand

lieferbar - Lieferzeit 10-15 Werktage

zurück