Jack Tzu-Han Hung will present his research seminar/general exam on Tuesday May 12 at 10AM

in Room 402.  the members of his committee are:  David August, advisor, Margaret Martonosi,

and Doug Clark.  Everyone is invited to attend his talk, and those faculty wishing to remain for the

oral exam following are welcome to do so.  His abstract and reading list follow below.

-----------------------------

Abstract:

Recently, the multicore systems have emerged as the dominate and promising architecture in the computing area. Unfortunately, a parallelizing compiler is essential to extract the potential parallelism from the existing sequential programs and utilize the power of multicore systems. Independent multi-threading (e.g., DOALL) and pipelined multi-threading (e.g., decouple software pipelining, DSWP) parallelization techniques have been researched extensively. However, each has its own advantages and limitations. On one hand, DOALL exploits data parallelism and scales well to the number of cores; however, it requires the lack of loop-carried dependencies and thus suffers from its applicability. On the other hand, DSWP handles the program dependencies via the use of communication queue and makes it applicable to more programs; however, the scalability of this approach is limited by the number of stages in the pipeline, which is usually far less than the number of cores.

In this work, we propose two techniques to enhance the scalability of DSWP. First, instead of being inner-loop-blind, as DSWP does, we examine the structure of inner loops in the pipeline and exploit the DOALL opportunities if applicable. This improves the scalability of DSWP since the additional cores could be used for executing DOALL inner loops. Second, we optimize the communication for loop live-in and live-out variables by performing pre- and post-partitioning analyses. The post-partitioning analysis leverages the existing liveness analysis and is applied for each partition, instead of the entire loop. The pre-partitioning analysis takes the communication cost into consideration and could achieve better partitioning. Our initial results show a speedup up to 5.4x for 052.alvinn and ECLAT.

Reading List:

[Text Books]

* Modern Compiler Implementation in ML, by A. W. Appel, Cambridge University Press, 1998.
* Compilers: Principles, Techniques, and Tools (2nd Edition) by Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman, Addison Wesley, 2006.

[Papers]

* Thies, W., Chandrasekhar, V., and Amarasinghe, S. 2007. A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs. In Proceedings of the 40th Annual IEEE/ACM international Symposium on Microarchitecture (December 01 - 05, 2007).

* Gordon, M. I., Thies, W., and Amarasinghe, S. 2006. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In Proceedings of the 12th international Conference on Architectural Support For Programming Languages and Operating Systems (San Jose, California, USA, October 21 - 25, 2006).

* Hongtao Zhong, Mojtaba Mehrara, Steve Lieberman, Scott Mahlke. " Uncovering Hidden Loop Level Parallelism in Sequential Applications". The 14th International Symposium on High-Performance Computer Architecture (HPCA), February, 2008.

*  Guilherme Ottoni, Ram Rangan, Adam Stoler, and David I. August. Automatic Thread Extraction with Decoupled Software Pipelining. In Proceedings of the 38th IEEE/ACM International Symposium on Microarchitecture (MICRO), November 2005.

* EaswaranRaman, Guilherme Ottoni, Arun Raman, Matthew Bridges, and David I. August. Parallel-Stage Decoupled Software Pipelining. In Proceedings of the 2008 International Symposium on Code Generation and Optimization (CGO), April 2008.

* Neil Vachharajani, Ram Rangan, Easwaran Raman, Matthew J. Bridges, Guilherme Ottoni and David I. August. Speculative Decoupled Software Pipelining. In Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques (PACT), September 2007.

* Guilherme Ottoni and David I. August. Global Multi-Threaded Instruction Scheduling: Technique and Initial Results. In Proceedings of the Sixth Workshop on Explicitly Parallel Instruction Computer Architectures and Compiler Technology (EPIC), March 2007.

* J. R. B. Davies. Parallel loop constructs for multiprocessors. Master’s thesis, Department
of Computer Science, University of Illinois, Urbana, IL, May 1981.

* J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems, 9:319–349, July 1987.