[talks] E Raman preFPO

22 May 2008

      Easwaran Raman will present his preFPO on Thursday May 29, 2PM, in 
room 402.  The members of his committee are:  David August, advisor; 
Teresa Johnson (HP) and Doug Clark, readers; David Walker and Vivek 
Pai, nonreaders.  Everyone is welcome to attend his talk. His abstract 
follows below.
--------------------------------

Title: Extracting Iteration-Level Parallelism from Loops with Inter-Iteration Dependences

Abstract
---------------
The continuing exponential growth in transistor density and the
limitations in using the additional transistors to boost uniprocessor
performance has caused processor manufactures to pack multiple
processor cores onto a single chip. These processors, known as chip
multiprocessors (CMPs) or multicore processors, do not ensure that
single-threaded applications continue to run faster unless those
applications are parallelized. Automatic parallelization by compilers
has a key role to play in utilizing the CMP architecture to improve
single-threaded application performance.

One of the most common compiler techniques to automatically
parallelize loops is DOALL parallelization.  DOALL extracts
iteration-level parallelism by executing different iterations of the
loop concurrently. The advantage of DOALL lies in its scalability with
the iteration count of the loop, but its applicability is severely
limited by the presence of inter-iteration or loop-carried
dependences.  My thesis proposes two new techniques to extract
iteration-level parallelism from loops with loop-carried dependences.

The first technique, known as Speculative Iteration Chunk Execution
(Spice), uses value speculation to break loop-carried dependences,
enabling speculative execution of chunks of iterations in parallel.
Unlike most value-speculation based parallelization techniques, Spice
speculates only a few values to unlock parallelism using a
software-based value predictor.

The second technique is known as Parallel-Stage Decoupled Software
Pipelining (PS-DSWP). PS-DSWP extracts both pipelined parallelism and
iteration-level parallelism from loops, thereby generalizing two
well-known parallelization techniques, DOALL and Decoupled Software
Pipelining (DSWP).  PS-DSWP partitions a loop into a sequence of
pipeline stages that are executed concurrently by different threads,
resulting in pipelined parallelism.  Iteration-level parallelism is
extracted by allowing those stages that do not have any loop-carried
dependences, known as parallel stages, to be executed by multiple
threads concurrently in a DOALL-like fashion. The applicability and
effectiveness of PS-DSWP is further enhanced by using speculation to
remove certain dependences.  Both these techniques are implemented
in the VELOCITY compiler and evaluated on an Itanium 2  CMP simulator.
The results demonstrate the ability of these techniques to
significantly out-perform prior techniques on several important loops
from real applications.

[talks] E Raman preFPO

Melissa M Lawson