Matt Bridges will present his preFPO on Monday August 13 at 2PM in Room 402. The
members of his committee are: David August, advisor; Kai Li and Dan Lavery (Intel),
readers; Margaret Martonosi (ELE) and Doug Clark, nonreaders. Everyone is
invited to attend his talk. His abstract follows below.
------------------------------
Title: Unlocking the Potential of Automatic Parallelization
Abstract:
Multiprocessor systems, particularly chip multiprocessors, have
emerged as the predominant organization for future microprocessors.
Systems with 4 cores are already shipping and a future with 32 or more
cores is easily conceivable. Unfortunately, multiple cores do not
directly result in improved application performance, particularly for
legacy applications. Consequently, parallelizing applications to
execute on multiple cores is essential.
Parallel programming models and languages could be used to create
multithreaded applications. However, moving to a parallel programming
model only increases the complexity and cost involved in software
development. To address these costs, automatic thread extraction
techniques, such as TLS and DSWP, have been explored.
Unfortunately, the amount of parallelism that has been automatically
extracted is generally insufficient to keep many cores busy. First,
many important loops are not parallelized because the compiler lacks
the necessary scope to apply the optimization. Second, the
sequential programming model forces programmers to define a single
legal application outcome, rather than allowing for a range of legal
outcomes, leading to conservative dependences that prevent
parallelization.
This dissertation explores the performance of key existing,
state-of-the-art methodologies in automatic thread extraction. The
potential of these methodologies is significantly increased by
combining them with an expanded optimization scope, which facilitates
the optimization of large loops, leading to parallelism at higher
levels in the application. Finally, natural, simple extensions to the
sequential programming model are presented to break the limitations
which can prevent the extraction of scalable parallelism.
Through a case study of several applications, including all C
benchmarks in the SPEC CINT2000 suite, this dissertation shows how
scalable parallelism can be extracted. For the SPEC CINT2000 suite,
our experience demonstrates that, by changing only 56 source code
lines, all of the applications were parallelizable by automatic thread
extraction. This process, constrained by the limits of modern
optimizing compilers, yielded a speedup of 540%.