NovaCat - NSU Libraries Catalog user info Skip the menu to the main content
     

Cover for {{ rc.info.title }}

{{rc.info.title}}

{{ rc.info.subtitle }}

{{ rc.info.author }}

{{ rc.info.edition }}

{{ rc.info.publisher }} {{ rc.info.year }}

Summary

{{rc.info.summary}} {{rc.info.summaryMore}}

Location Call # Volume Status
 E-BOOK      
Author Sorin, Daniel J.
Title Fault tolerant computer architecture / Daniel J. Sorin.
OCLC 200904CAC005
ISBN 9781598299540 (electronic bk.)
9781598299533 (pbk.)
ISBN/ISSN 10.2200/S00192ED1V01Y200904CAC005 doi
Publisher San Rafael, Calif. (1537 Fourth Street, San Rafael, CA 94901 USA) : Morgan & Claypool Publishers, [2009]
©2009
Description 1 electronic text (xii, 103 pages : illustrations\.) : digital file.
LC Subject heading/s Fault-tolerant computing.
Self-stabilization (Computer science)
Computer architecture.
SUBJECT Fault tolerance (or fault tolerant)
Reliability.
Dependability.
Computer architecture.
Error detection.
Error recovery.
Fault diagnosis.
Self-repair.
Autonomous.
Dynamic verification.
System details note Mode of access: World Wide Web.
System requirements: Adobe Acrobat reader.
Bibliography Includes bibliographical references.
Contents Introduction -- Goals of this book -- Faults, errors, and failures -- Masking -- Duration of faults and errors -- Underlying physical phenomena -- Trends leading to increased fault rates -- Smaller devices and hotter chips -- More devices per processor -- More complicated designs -- Error models -- Error type -- Error duration -- Number of simultaneous errors -- Fault tolerance metrics -- Availability -- Reliability -- Mean time to failure -- Mean time between failures -- Failures in time -- Architectural vulnerability factor -- The rest of this book -- References -- Error detection -- General concepts -- Physical redundancy -- Temporal redundancy -- Information redundancy -- The end-to-end argument -- Microprocessor cores -- Functional units -- Register files -- Tightly lockstepped redundant cores -- Redundant multithreading without lockstepping -- Dynamic verification of invariants -- High-level anomaly detection -- Using software to detect hardware errors -- Error detection tailored to specific fault models -- Caches and memory -- Error code implementation -- Beyond EDCs -- Detecting errors in content addressable memories -- Detecting errors in addressing -- Multiprocessor memory systems -- Dynamic verification of cache coherence -- Dynamic verification of memory consistency -- Interconnection networks -- Conclusions -- References -- Error recovery -- General concepts -- Forward error recovery -- Backward error recovery -- Comparing the performance of FER and BER -- Microprocessor cores -- FER for cores -- BER for cores -- Single-core memory systems -- FER for caches and memory -- BER for caches and memory -- Issues unique to multiprocessors -- What state to save for the recovery point -- Which algorithm to use for saving the recovery point -- Where to save the recovery point -- How to restore the recovery point state -- Software-implemented BER -- Conclusions -- References -- Diagnosis -- General concepts -- The benefits of diagnosis -- System model implications -- Built-in self-test -- Microprocessor core -- Using periodic BIST -- Diagnosing during normal execution -- Caches and memory -- Multiprocessors -- Conclusions -- References -- Self-repair -- General concepts -- Microprocessor cores -- Superscalar cores -- Simple cores -- Caches and memory -- Multiprocessors -- Core replacement -- Using the scheduler to hide faulty functional units -- Sharing resources across cores -- Self-repair of noncore components -- Conclusions -- References -- The future -- Adoption by industry -- Future relationships between fault tolerance and other fields -- Power and temperature -- Security -- Static design verification -- Fault vulnerability reduction -- Tolerating software bugs -- References.
Restrictions Abstract freely available; full-text restricted to subscribers or individual document purchasers.
Access may be restricted to authorized users only.
Unlimited user license access
NOTE Compendex.
Google scholar.
Google book search.
Abstract For many years, most computer architects have pursued one primary goal: performance. Architects have translated the ever-increasing abundance of ever-faster transistors provided by Moore's law into remarkable increases in performance. Recently, however, the bounty provided by Moore's law has been accompanied by several challenges that have arisen as devices have become smaller, including a decrease in dependability due to physical faults. In this book, we focus on the dependability challenge and the fault tolerance solutions that architects are developing to overcome it. The two main purposes of this book are to explore the key ideas in fault-tolerant computer architecture and to present the current state-of-the-art--over approximately the past 10 years--in academia and industry.
NOTE INSPEC.
Additional physical form available note Also available in print.
General note Part of: Synthesis digital library of engineering and computer science.
Title from PDF t.p. (viewed on June 4, 2009).
Series from website.
Permanent link back to this item
https://novacat.nova.edu:446/record=b2328814~S13

Use classic NovaCat |