George Reis will present his preFPO on Friday, January 18th at 10:00am in Room 402. The members of his committee are David August: advisor: Niraj Jha (ELE) and Shubu Mukherjee (Intel), readers; David Walker and Li-Shiuan Peh (ELE), nonreaders. Everyone is invited to attend his talk. His abstract follows below. ----------------------------------------------------------- Software Modulated Fault Tolerance In recent decades, microprocessor performance has been increasing exponentially, due in large part to smaller and faster transistors enabled by improved fabrication technology. While such transistors yield performance enhancements, their smaller size and sheer number make chips more susceptible to transient faults. Designers frequently introduce redundant hardware or software to detect or recover from these faults because process and material advances are often insufficient to mitigate their effect. Regardless of the methods used for adding reliability, these techniques incur unnecessary and significant performance penalties by uniformly protecting the entire application. They do not take into account the varying resilience to transient faults of different program regions. This overprotection leads to wasted resources that reduce performance. To maximize fault coverage while minimizing the performance impact, this thesis takes advantage of the key insight that not all faults in an unprotected application will cause an incorrect answer and not all parts of the program respond the same way to reliability techniques. First, this thesis demonstrates the varying vulnerability and performance responses of an application and identifies those regions which are most susceptible to faults as well as those which are inexpensive to protect. Second, this thesis advocates the use of software and hybrid approaches to fault tolerance to enable the synergy of high-level information with specific redundancy techniques. Third, this thesis demonstrates how to exploit this non-uniformity via Software-Modulated Fault Tolerance (SMFT). SMFT leverages reliability and performance information at a high level and directs the reliability choices at fine granularities to provide the most efficient use of processor resources for an application. This thesis shows the effectiveness of SMFT via two specific implementations: one that utilizes both performance and reliability profiles to achieve effective trade-offs, and a second that only uses code information to direct the reliability choices.
participants (1)
-
Melissa M Lawson