[talks] G Reis preFPO

Melissa M Lawson mml at CS.Princeton.EDU
Tue Jan 15 09:18:32 EST 2008

George Reis will present his preFPO on Friday, January 18th at 10:00am in Room 402.  
The members of his committee are David August: advisor: Niraj Jha (ELE) and Shubu 
Mukherjee (Intel),  readers; David Walker and Li-Shiuan Peh (ELE), nonreaders.  Everyone 
is invited to attend his talk.  His abstract follows below.

Software Modulated Fault Tolerance

In recent decades, microprocessor performance has been increasing exponentially, due in
large part to smaller and faster transistors enabled by improved fabrication technology.
While such transistors yield performance enhancements, their smaller size and sheer number
make chips more susceptible to transient faults.  Designers frequently introduce redundant
hardware or software to detect or recover from these faults because process and material
advances are often insufficient to mitigate their effect.

Regardless of the methods used for adding reliability, these techniques incur unnecessary
and significant performance penalties by uniformly protecting the entire application.
They do not take into account the varying resilience to transient faults of different
program regions.  This overprotection leads to wasted resources that reduce performance.

To maximize fault coverage while minimizing the performance impact, this thesis takes
advantage of the key insight that not all faults in an unprotected application will cause
an incorrect answer and not all parts of the program respond the same way to reliability
First, this thesis demonstrates the varying vulnerability and performance responses of an
application and identifies those regions which are most susceptible to faults as well as
those which are inexpensive to protect.  Second, this thesis advocates the use of software
and hybrid approaches to fault tolerance to enable the synergy of high-level information
with specific redundancy techniques.
 Third, this thesis demonstrates how to exploit this non-uniformity via Software-Modulated
Fault Tolerance (SMFT).  SMFT leverages reliability and performance information at a high
level and directs the reliability choices at fine granularities to provide the most
efficient use of processor resources for an application.  This thesis shows the
effectiveness of SMFT via two specific implementations: one that utilizes both performance
and reliability profiles to achieve effective trade-offs, and a second that only uses code
information to direct the reliability choices.

More information about the talks mailing list