[talks] J Hung generals
mml at CS.Princeton.EDU
Fri Jan 15 07:59:21 EST 2010
Jack Tzu-Han Hung will present his research seminar/general exam on Wednesday January 20
9AM in room 402. The members of his committee are David August (advisor), Doug Clark, and
Everyone is invited to attend his talk, and those faculty wishing to remain for the oral
exam are welcome to do
so. His abstract and reading list follow below.
Recently, multicore systems have emerged as the dominant
architecture in computing. Unfortunately, sequential programs
do not benefit much from the new architecture. To
keep up to the decades-old performance growth trend, some
propose new programming languages or extensions for explicitly
parallel programming [9, 2, 5, 3, 10, 8]. This approach
leaves great burden to the programmers, including
code rewriting effort and concerns for correctness and performance.
An alternative is to use compilation techniques to extract
the implicit parallelism from traditional, sequential
programs. Most works focus on parallelizing the hottest
loop in the program [12, 6, 4]; a few previous works attempt
to exploit hybrid parallelism from several independent
loops . Hence, without considering the loop nest
as a whole, these techniques leave a significant amount of
This work aims to extract various types of parallelism
existing in a loop nest, using a dynamic programmingbased
framework. Specifically, with a bottom-up traversal
of the loop nest, every loop is examined to see if data or
pipeline parallelism can be extracted and is assessed to see
if the parallelization strategy is beneficial. The framework
produces the final parallelized code that provides the optimal
combination of the parallelization techniques. With
manual parallelization, we experiment with six benchmarks,
selected from different domains, including streaming applications
(StreamIt benchmark) and data-intensive
applications (SPECfp and MineBench). Our initial results
show a (geo-mean) speedup of 3.35x (on an 8-core machine),
while prior approaches (i.e., parallelizing one single
loop or independent loops) only produce a speedup of 2.85x.
 J. R. B. Davies. Parallel loop constructs for multiprocessors.
Master's thesis, Department of Computer Science, University
of Illinois, Urbana, IL, May 1981.
 M. I. Gordon, W. Thies, and S. Amarasinghe. Exploiting
coarse-grained task, data, and pipeline parallelism in stream
programs. In ASPLOS-XII: Proceedings of the 12th international
conference on Architectural support for programming
languages and operating systems, pages 151-162, New York,
NY, USA, 2006. ACM.
 J. Gummaraju, J. Coburn, Y. Turner, and M. Rosenblum.
Streamware: programming general-purpose multicore processors
using streams. In ASPLOS XIII: Proceedings of the 13th
international conference on Architectural support for programming
languages and operating systems, pages 297-307,
New York, NY, USA, 2008. ACM.
 J. Huang, A. Raman, Y. Zhang, T. B. Jablin, T.-H. Hung, and
D. I. August. Decoupled software pipelining creates parallelization
opportunities. In Proceedings of the 2010 International
Symposium on Code Generation and Optimization,
 M. Kudlur and S. Mahlke. Orchestrating the execution of
stream programs on multicore platforms. In PLDI '08: Proceedings
of the 2008 ACM SIGPLAN conference on Programming
language design and implementation, pages 114-124,
New York, NY, USA, 2008. ACM.
 G. Ottoni, R. Rangan, A. Stoler, and D. I. August. Automatic
thread extraction with decoupled software pipelining. In Proceedings
of the 38th Annual IEEE/ACM International Symposium
on Microarchitecture, pages 105-116, November 2005.
 E. Raman, G. Ottoni, A. Raman, M. Bridges, and D. I. August.
Parallel-stage decoupled software pipelining. In Proceedings
of the 2008 International Symposium on Code Generation and
Optimization, April 2008.
 W. Thies, V. Chandrasekhar, and S. Amarasinghe. A practical
approach to exploiting coarse-grained pipeline parallelism
in C programs. In Proceedings of the 40th Annual
IEEE/ACM International Symposium on Microarchitecture,
 W. Thies, M. Karczmarek, and S. Amarasinghe. StreamIt:
A language for streaming applications. In Proceedings of
the 12th International Conference on Compiler Construction,
 A. Udupa, R. Govindarajan, and M. J. Thazhuthaveetil. Synergistic
execution of stream programs on multicores with accelerators.
In LCTES '09: Proceedings of the 2009 ACM
SIGPLAN/SIGBED conference on Languages, compilers, and
tools for embedded systems, pages 99-108, New York, NY,
USA, 2009. ACM.
 H. Zhong, S. A. Lieberman, and S. A. Mahlke. Extending
multicore architectures to exploit hybrid parallelism in singlethread
 H. Zhong, M. Mehrara, S. Lieberman, and S. Mahlke. Uncovering
hidden loop level parallelism in sequential applications.
In Proc. of the 14th International Symposium on High-
Performance Computer Architecture, 2008.
 L. Lamport. The parallel execution of DO loops. In
Communications of the ACM, 1974.
 J. Allen and K. Kennedy. Automatic loop interchange.
In Proc. of the 1984 SIGPLAN symposium on Compiler construction,
 W. Lee, R. Barua, M. Frank, D. Srikrishna, J. Babb, V. Sarkar,
and S. Amarasinghe. Space-time scheduling of instruction-level
parallelism on a raw machine. In Proc. of the eighth international
conference on Architectural support for programming languages and
operating systems, 1998.
 A. Lim and M. Lam. Maximizing parallelism and minimizing
synchronization with affine transforms. In the Proc. of the 24th
ACM SIGPLAN-SIGACT symposium on Principles of programming languages,
 A. Lim, G. Cheong, and M. Lam. An affine partitioning algorithm
to maximize parallelism and minimize communication. In the Proc. of
the 13th international conference on Supercomputing, 1999.
 Modern Compiler Implementation in ML, by A. W. Appel, Cambridge
University Press, 1998.
 Compilers: Principles, Techniques, and Tools (2nd Edition) by
Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman,
Addison Wesley, 2006.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the talks