Noch Fragen? 0800 / 33 82 637

A Dependable Middleware for Enhancing the Fault Tolerance of Distributed Computations in Grid Environments

Produktform: Buch

Grid computing envisions the sharing of compute, storage, network and software resources across multi-institutional virtual organizations (VOs) to provide more effective solutions for important scientific, engineering and business problems. With the advancing penetration of Grid infrastructures in science and in industry, issues of fault tolerance and self-healing are becoming tremendously important. The more resources and components involved, the more complicated and error-prone becomes the system. In particular for long-running applications high failure rates are a major concern. Many scientific applications require to run for days or weeks. For example, a simulation of gamma-ray bursts, an astrophysical phenomena, requires over 100 days ofruntime on an one PFlop/s machine. Running such an application requires a large supercomputer or a Grid. Unfortunately, these systems are very error-prone. A single node failure usually leads to the abort of the entire application. Thus, efficient support for fault tolerance is essential. The Migol middleware, which is the main contribution of this thesis, addresses the fault tolerance of long-running applications as found in many sciences, e. g. in astrophysics or life sciences. Migol supports applications in performing complex tasks such as resource allocation, monitoring, checkpointing and, if necessary, recoveries.weiterlesen

Dieser Artikel gehört zu den folgenden Serien

Sprache(n): Englisch

ISBN: 978-3-8322-9124-2 / 978-3832291242 / 9783832291242

Verlag: Shaker

Erscheinungsdatum: 18.05.2010

Seiten: 238

Auflage: 1

Autor(en): André Luckow

49,80 € inkl. MwSt.
kostenloser Versand

lieferbar - Lieferzeit 10-15 Werktage

zurück