[talks] N Vachharajani preFPO

Tue Sep 4 15:58:48 EDT 2007

Neil Vachharajani will present his preFPO on Friday, September 7 at 1:30pm in Room 402.
The members of his committee are:  David August, advisor; Sharad Malik (ELE) and 
Scott Mahlke (U Michigan), readers; Andrew Appel and Li-Shiuan Peh (ELE), nonreaders. 
Everyone is invited to attend his talk.  His abstract follows below.
-----------------------------------
INTELLIGENT PIPELINED MULTITHREADING SPECULATION

In recent years, microprocessor manufacturers have shifted their focus
from single-core to multicore processors.  To avoid burdening
programmers with the responsibility of parallelizing their
applications, some researchers have advocated automatic thread
extraction.  Within the scientific computing domain automatic
parallelization techniques have been successful, but in the general
purpose computing domain few, if any, techniques have achieved
comparable success.

Despite this, recent progress hints at mechanisms to unlock
parallelism from general purpose applications.  In particular, two
promising proposals exist in the literature.  The first, a group of
techniques loosely classified as thread-level speculation (TLS),
attempts to adapt techniques successful in the scientific domain, such
as DOALL and DOACROSS parallelization, to the general purpose domain
by using speculation to overcome complex control flow and data access
patterns not easily analyzed statically.  The second, a
non-speculative technique called Decoupled Software Pipelining,
partitions loops into long-running, fine-grained threads organized
into a pipeline (pipelined multithreading or PMT).  DSWP effectively
extends the reach of conventional software pipeling to codes with
complex control flow and variable latency operations.

Unfortunately, both techniques suffer key limitations.  TLS techniques
either suffer from over speculation, in an attempt to speculatively
transform a loop into a DOALL loop, or realize little parallelism in
practice because DOACROSS parallelization puts core-to-core
communication latency on the critical path.  DSWP avoids these
pitfalls with its pipeline organization and decoupled execution using
inter-core communication queues.  However, its non-speculative nature
and restrictions needed to ensure a pipeline organization prevent DSWP
from achieving balanced parallelism on many key application loops.

This dissertation advances automatic parallelization of general
purpose applications with two key contributions.  First, we propose
extending pipelined multithreaded execution with intelligent
speculation.  Rather than speculating all loop-carried dependences to
transform loops into DOALL loops, we propose speculating only key
predictable dependences that inhibit balanced, pipelined execution.
We demonstrate this technique is effective with an automatic compiler
implementation of Speculative DSWP.  Second, to support decoupled
speculative execution, this dissertation explores extending a
multi-core architecture's memory subsystem with versioning.  The
proposed memory systems resemble those present in TLS architectures,
but provide efficient execution in the presence of large transactions,
many simultaneous outstanding transactions, and eager data forwarding
between uncommitted transactions.  In addition to supporting usage
patterns exhibited by speculative pipelined multithreading, the
proposed memory system facilitates existing and future speculative
threading techniques.