Noch Fragen? 0800 / 33 82 637

Streaming Architectures for Extreme Energy Efficiency in High-Performance Computing

Produktform: Buch

The end of Moore’s law and the breakdown of Dennard scaling has prompted a paradigm shift in the way we approach computer architecture design. Performance at low power has become the key ingredient in achieving high utilization of available hardware in order to mitigate the effect of limited frequency and overcome dark silicon. The von Neumann bottleneck is one of the key challenges in this field: instruction fetches compete with data accesses for memory bandwidth. This bottleneck also applies to the instruction pipeline of a processor, where load-store and control instructions compete with compute instructions for issue slots. A popular way to overcome this bottleneck is to implement dedicated accelerators for a specific problem. This approach has grown ever more popular with the recent rise of machine learning. It is based on the observation that, all other things being equal, specialization in hardware always wins. However the complementary conclusion also holds: the lack of general programmability limits the accelerator’s use to a specific problem. In a time of fast-moving algorithms, today’s hardware accelerator cannot compute tomorrow’s algorithm. General purpose processors have evolved to mitigate the von Neumann bottleneck as well. One example of this is the CISC-to-RISC translation in modern processors, which can act as an instruction compression scheme. Similarly, SIMD and SIMT paradigms offer a fixed increase in computations per instruction, while Cray-style vectorization offers a more dynamic and potentially higher increase. Among the algorithms that lend themselves particularly well to such acceleration is the class of data-oblivious algorithms. These algorithms have control flow which does not depend on the data being processed, and comprise many relevant algorithms from linear algebra, machine learning, and scientific computing. This thesis develops the concept of hardware address generation and direct memory streaming as a method to mitigate the von Neumann bottleneck, applies the concept to in-order single-issue processors, allowing them to achieve full utilization of compute resources, introduces pseudo-dual-issue execution with dedicated compute hardware loops, and distills these extensions into an architectural template for high-performance computers capable of concentrating a significant part of its energy footprint in the arithmetic units.weiterlesen

Dieser Artikel gehört zu den folgenden Serien

Sprache(n): Englisch

ISBN: 978-3-86628-725-9 / 978-3866287259 / 9783866287259

Verlag: Hartung-Gorre

Erscheinungsdatum: 20.10.2021

Seiten: 312

Auflage: 1

Herausgegeben von Andreas Schenk, Mathieu Luisier, Bernd Witzigmann, Huang Qiuting
Autor(en): Fabian Thomas Schuiki

64,00 € inkl. MwSt.
kostenloser Versand

sofort lieferbar - Lieferzeit 1-3 Werktage

zurück