[talks] J Hung general exam

Melissa Lawson mml at CS.Princeton.EDU
Tue May 5 09:27:10 EDT 2009

Jack Tzu-Han Hung will present his research seminar/general exam on Tuesday May 12 at 10AM

in Room 402.  the members of his committee are:  David August, advisor, Margaret

and Doug Clark.  Everyone is invited to attend his talk, and those faculty wishing to
remain for the 

oral exam following are welcome to do so.  His abstract and reading list follow below.



Recently, the multicore systems have emerged as the dominate and promising architecture in
the computing area. Unfortunately, a parallelizing compiler is essential to extract the
potential parallelism from the existing sequential programs and utilize the power of
multicore systems. Independent multi-threading (e.g., DOALL) and pipelined multi-threading
(e.g., decouple software pipelining, DSWP) parallelization techniques have been researched
extensively. However, each has its own advantages and limitations. On one hand, DOALL
exploits data parallelism and scales well to the number of cores; however, it requires the
lack of loop-carried dependencies and thus suffers from its applicability. On the other
hand, DSWP handles the program dependencies via the use of communication queue and makes
it applicable to more programs; however, the scalability of this approach is limited by
the number of stages in the pipeline, which is usually far less than the number of cores.

In this work, we propose two techniques to enhance the scalability of DSWP. First, instead
of being inner-loop-blind, as DSWP does, we examine the structure of inner loops in the
pipeline and exploit the DOALL opportunities if applicable. This improves the scalability
of DSWP since the additional cores could be used for executing DOALL inner loops. Second,
we optimize the communication for loop live-in and live-out variables by performing pre-
and post-partitioning analyses. The post-partitioning analysis leverages the existing
liveness analysis and is applied for each partition, instead of the entire loop. The
pre-partitioning analysis takes the communication cost into consideration and could
achieve better partitioning. Our initial results show a speedup up to 5.4x for 052.alvinn
and ECLAT.

Reading List:

[Text Books]

* Modern Compiler Implementation in ML, by A. W. Appel, Cambridge University Press, 1998.
* Compilers: Principles, Techniques, and Tools (2nd Edition) by Alfred V. Aho, Monica S.
Lam, Ravi Sethi, and Jeffrey D. Ullman, Addison Wesley, 2006.


* Thies, W., Chandrasekhar, V., and Amarasinghe, S. 2007. A Practical Approach to
Exploiting Coarse-Grained Pipeline Parallelism in C Programs. In Proceedings of the 40th
Annual IEEE/ACM international Symposium on Microarchitecture (December 01 - 05, 2007).

* Gordon, M. I., Thies, W., and Amarasinghe, S. 2006. Exploiting coarse-grained task,
data, and pipeline parallelism in stream programs. In Proceedings of the 12th
international Conference on Architectural Support For Programming Languages and Operating
Systems (San Jose, California, USA, October 21 - 25, 2006).

* Hongtao Zhong, Mojtaba Mehrara, Steve Lieberman, Scott Mahlke. " Uncovering Hidden Loop
Level Parallelism in Sequential Applications". The 14th International Symposium on
High-Performance Computer Architecture (HPCA), February, 2008.

*  Guilherme Ottoni, Ram Rangan, Adam Stoler, and David I. August. Automatic Thread
Extraction with Decoupled Software Pipelining. In Proceedings of the 38th IEEE/ACM
International Symposium on Microarchitecture (MICRO), November 2005. 

* EaswaranRaman, Guilherme Ottoni, Arun Raman, Matthew Bridges, and David I. August.
Parallel-Stage Decoupled Software Pipelining. In Proceedings of the 2008 International
Symposium on Code Generation and Optimization (CGO), April 2008. 

* Neil Vachharajani, Ram Rangan, Easwaran Raman, Matthew J. Bridges, Guilherme Ottoni and
David I. August. Speculative Decoupled Software Pipelining. In Proceedings of the 16th
International Conference on Parallel Architecture and Compilation Techniques (PACT),
September 2007.

* Guilherme Ottoni and David I. August. Global Multi-Threaded Instruction Scheduling:
Technique and Initial Results. In Proceedings of the Sixth Workshop on Explicitly Parallel
Instruction Computer Architectures and Compiler Technology (EPIC), March 2007. 

* J. R. B. Davies. Parallel loop constructs for multiprocessors. Master's thesis,
of Computer Science, University of Illinois, Urbana, IL, May 1981.

* J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its
use in optimization. ACM Transactions on Programming Languages and Systems, 9:319-349,
July 1987.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.princeton.edu/pipermail/talks/attachments/20090505/f4f519e3/attachment-0001.html>

More information about the talks mailing list