Jack Tzu-Han Hung will present his research seminar/general exam on
Tuesday May 12 at 10AM
in Room 402. the members of his committee are: David
August, advisor, Margaret Martonosi,
and Doug Clark. Everyone is invited to attend his talk, and
those faculty wishing to remain for the
oral exam following are welcome to do so. His abstract and
reading list follow below.
-----------------------------
Abstract:
Recently, the multicore systems have emerged as the dominate and promising
architecture in the computing area. Unfortunately, a parallelizing compiler is
essential to extract the potential parallelism from the existing sequential
programs and utilize the power of multicore systems. Independent multi-threading
(e.g., DOALL) and pipelined multi-threading (e.g., decouple software
pipelining, DSWP) parallelization techniques have been researched extensively.
However, each has its own advantages and limitations. On one hand, DOALL
exploits data parallelism and scales well to the number of cores; however, it
requires the lack of loop-carried dependencies and thus suffers from its
applicability. On the other hand, DSWP handles the program dependencies via the
use of communication queue and makes it applicable to more programs; however,
the scalability of this approach is limited by the number of stages in the
pipeline, which is usually far less than the number of cores.
In this work, we propose two techniques to enhance the scalability of DSWP.
First, instead of being inner-loop-blind, as DSWP does, we examine the
structure of inner loops in the pipeline and exploit the DOALL opportunities if
applicable. This improves the scalability of DSWP since the additional cores
could be used for executing DOALL inner loops. Second, we optimize the
communication for loop live-in and live-out variables by performing pre- and
post-partitioning analyses. The post-partitioning analysis leverages the
existing liveness analysis and is applied for each partition, instead of the
entire loop. The pre-partitioning analysis takes the communication cost into
consideration and could achieve better partitioning. Our initial results show a
speedup up to 5.4x for 052.alvinn and ECLAT.
Reading List:
[Text Books]
* Modern Compiler Implementation in ML, by A. W. Appel, Cambridge University
Press, 1998.
* Compilers: Principles, Techniques, and Tools (2nd Edition) by Alfred V. Aho,
Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman, Addison Wesley, 2006.
[Papers]
* Thies, W., Chandrasekhar, V., and Amarasinghe, S. 2007. A Practical Approach
to Exploiting Coarse-Grained Pipeline Parallelism in C Programs. In Proceedings
of the 40th Annual IEEE/ACM international Symposium on Microarchitecture
(December 01 - 05, 2007).
* Gordon, M. I., Thies, W., and Amarasinghe, S. 2006. Exploiting coarse-grained
task, data, and pipeline parallelism in stream programs. In Proceedings of the
12th international Conference on Architectural Support For Programming
Languages and Operating Systems (San Jose, California, USA, October 21 - 25,
2006).
* Hongtao Zhong, Mojtaba Mehrara, Steve Lieberman, Scott Mahlke. "
Uncovering Hidden Loop Level Parallelism in Sequential Applications". The
14th International Symposium on High-Performance Computer Architecture (HPCA),
February, 2008.
* Guilherme Ottoni, Ram Rangan, Adam Stoler, and David I. August.
Automatic Thread Extraction with Decoupled Software Pipelining. In Proceedings
of the 38th IEEE/ACM International Symposium on Microarchitecture (MICRO),
November 2005.
* EaswaranRaman, Guilherme Ottoni, Arun Raman, Matthew Bridges, and David I.
August. Parallel-Stage Decoupled Software Pipelining. In Proceedings of the
2008 International Symposium on Code Generation and Optimization (CGO), April
2008.
* Neil Vachharajani, Ram Rangan, Easwaran Raman, Matthew J. Bridges, Guilherme
Ottoni and David I. August. Speculative Decoupled Software Pipelining. In
Proceedings of the 16th International Conference on Parallel Architecture and
Compilation Techniques (PACT), September 2007.
* Guilherme Ottoni and David I. August. Global Multi-Threaded Instruction
Scheduling: Technique and Initial Results. In Proceedings of the Sixth Workshop
on Explicitly Parallel Instruction Computer Architectures and Compiler
Technology (EPIC), March 2007.
* J. R. B. Davies. Parallel loop constructs for multiprocessors. Master’s
thesis, Department
of Computer Science, University of Illinois, Urbana, IL, May 1981.
* J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph
and its use in optimization. ACM Transactions on Programming Languages and
Systems, 9:319–349, July 1987.