Easwaran Raman will present his preFPO on Thursday May 29, 2PM, in room 402. The members of his committee are: David August, advisor; Teresa Johnson (HP) and Doug Clark, readers; David Walker and Vivek Pai, nonreaders. Everyone is welcome to attend his talk. His abstract follows below. -------------------------------- Title: Extracting Iteration-Level Parallelism from Loops with Inter-Iteration Dependences Abstract --------------- The continuing exponential growth in transistor density and the limitations in using the additional transistors to boost uniprocessor performance has caused processor manufactures to pack multiple processor cores onto a single chip. These processors, known as chip multiprocessors (CMPs) or multicore processors, do not ensure that single-threaded applications continue to run faster unless those applications are parallelized. Automatic parallelization by compilers has a key role to play in utilizing the CMP architecture to improve single-threaded application performance. One of the most common compiler techniques to automatically parallelize loops is DOALL parallelization. DOALL extracts iteration-level parallelism by executing different iterations of the loop concurrently. The advantage of DOALL lies in its scalability with the iteration count of the loop, but its applicability is severely limited by the presence of inter-iteration or loop-carried dependences. My thesis proposes two new techniques to extract iteration-level parallelism from loops with loop-carried dependences. The first technique, known as Speculative Iteration Chunk Execution (Spice), uses value speculation to break loop-carried dependences, enabling speculative execution of chunks of iterations in parallel. Unlike most value-speculation based parallelization techniques, Spice speculates only a few values to unlock parallelism using a software-based value predictor. The second technique is known as Parallel-Stage Decoupled Software Pipelining (PS-DSWP). PS-DSWP extracts both pipelined parallelism and iteration-level parallelism from loops, thereby generalizing two well-known parallelization techniques, DOALL and Decoupled Software Pipelining (DSWP). PS-DSWP partitions a loop into a sequence of pipeline stages that are executed concurrently by different threads, resulting in pipelined parallelism. Iteration-level parallelism is extracted by allowing those stages that do not have any loop-carried dependences, known as parallel stages, to be executed by multiple threads concurrently in a DOALL-like fashion. The applicability and effectiveness of PS-DSWP is further enhanced by using speculation to remove certain dependences. Both these techniques are implemented in the VELOCITY compiler and evaluated on an Itanium 2 CMP simulator. The results demonstrate the ability of these techniques to significantly out-perform prior techniques on several important loops from real applications.