Matt Bridges will present his preFPO on Monday August 13 at 2PM in Room 402. The members of his committee are: David August, advisor; Kai Li and Dan Lavery (Intel), readers; Margaret Martonosi (ELE) and Doug Clark, nonreaders. Everyone is invited to attend his talk. His abstract follows below. ------------------------------ Title: Unlocking the Potential of Automatic Parallelization Abstract: Multiprocessor systems, particularly chip multiprocessors, have emerged as the predominant organization for future microprocessors. Systems with 4 cores are already shipping and a future with 32 or more cores is easily conceivable. Unfortunately, multiple cores do not directly result in improved application performance, particularly for legacy applications. Consequently, parallelizing applications to execute on multiple cores is essential. Parallel programming models and languages could be used to create multithreaded applications. However, moving to a parallel programming model only increases the complexity and cost involved in software development. To address these costs, automatic thread extraction techniques, such as TLS and DSWP, have been explored. Unfortunately, the amount of parallelism that has been automatically extracted is generally insufficient to keep many cores busy. First, many important loops are not parallelized because the compiler lacks the necessary scope to apply the optimization. Second, the sequential programming model forces programmers to define a single legal application outcome, rather than allowing for a range of legal outcomes, leading to conservative dependences that prevent parallelization. This dissertation explores the performance of key existing, state-of-the-art methodologies in automatic thread extraction. The potential of these methodologies is significantly increased by combining them with an expanded optimization scope, which facilitates the optimization of large loops, leading to parallelism at higher levels in the application. Finally, natural, simple extensions to the sequential programming model are presented to break the limitations which can prevent the extraction of scalable parallelism. Through a case study of several applications, including all C benchmarks in the SPEC CINT2000 suite, this dissertation shows how scalable parallelism can be extracted. For the SPEC CINT2000 suite, our experience demonstrates that, by changing only 56 source code lines, all of the applications were parallelizable by automatic thread extraction. This process, constrained by the limits of modern optimizing compilers, yielded a speedup of 540%.