Jack Tzu-Han Hung will present his research seminar/general exam on Wednesday January 20 at 9AM in room 402. The members of his committee are David August (advisor), Doug Clark, and Margaret Martonosi. Everyone is invited to attend his talk, and those faculty wishing to remain for the oral exam are welcome to do so. His abstract and reading list follow below. -------------------------------------------------------------- (Abstract) Recently, multicore systems have emerged as the dominant architecture in computing. Unfortunately, sequential programs do not benefit much from the new architecture. To keep up to the decades-old performance growth trend, some propose new programming languages or extensions for explicitly parallel programming [9, 2, 5, 3, 10, 8]. This approach leaves great burden to the programmers, including code rewriting effort and concerns for correctness and performance. An alternative is to use compilation techniques to extract the implicit parallelism from traditional, sequential programs. Most works focus on parallelizing the hottest loop in the program [12, 6, 4]; a few previous works attempt to exploit hybrid parallelism from several independent loops [11]. Hence, without considering the loop nest as a whole, these techniques leave a significant amount of parallelism unexploited. This work aims to extract various types of parallelism existing in a loop nest, using a dynamic programmingbased framework. Specifically, with a bottom-up traversal of the loop nest, every loop is examined to see if data or pipeline parallelism can be extracted and is assessed to see if the parallelization strategy is beneficial. The framework produces the final parallelized code that provides the optimal combination of the parallelization techniques. With manual parallelization, we experiment with six benchmarks, selected from different domains, including streaming applications (StreamIt benchmark) and data-intensive applications (SPECfp and MineBench). Our initial results show a (geo-mean) speedup of 3.35x (on an 8-core machine), while prior approaches (i.e., parallelizing one single loop or independent loops) only produce a speedup of 2.85x. (Reading List) [Papers] [1] J. R. B. Davies. Parallel loop constructs for multiprocessors. Master's thesis, Department of Computer Science, University of Illinois, Urbana, IL, May 1981. [2] M. I. Gordon, W. Thies, and S. Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In ASPLOS-XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, pages 151-162, New York, NY, USA, 2006. ACM. [3] J. Gummaraju, J. Coburn, Y. Turner, and M. Rosenblum. Streamware: programming general-purpose multicore processors using streams. In ASPLOS XIII: Proceedings of the 13th international conference on Architectural support for programming languages and operating systems, pages 297-307, New York, NY, USA, 2008. ACM. [4] J. Huang, A. Raman, Y. Zhang, T. B. Jablin, T.-H. Hung, and D. I. August. Decoupled software pipelining creates parallelization opportunities. In Proceedings of the 2010 International Symposium on Code Generation and Optimization, April 2010. [5] M. Kudlur and S. Mahlke. Orchestrating the execution of stream programs on multicore platforms. In PLDI '08: Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation, pages 114-124, New York, NY, USA, 2008. ACM. [6] G. Ottoni, R. Rangan, A. Stoler, and D. I. August. Automatic thread extraction with decoupled software pipelining. In Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture, pages 105-116, November 2005. [7] E. Raman, G. Ottoni, A. Raman, M. Bridges, and D. I. August. Parallel-stage decoupled software pipelining. In Proceedings of the 2008 International Symposium on Code Generation and Optimization, April 2008. [8] W. Thies, V. Chandrasekhar, and S. Amarasinghe. A practical approach to exploiting coarse-grained pipeline parallelism in C programs. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, December 2007. [9] W. Thies, M. Karczmarek, and S. Amarasinghe. StreamIt: A language for streaming applications. In Proceedings of the 12th International Conference on Compiler Construction, 2002. [10] A. Udupa, R. Govindarajan, and M. J. Thazhuthaveetil. Synergistic execution of stream programs on multicores with accelerators. In LCTES '09: Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems, pages 99-108, New York, NY, USA, 2009. ACM. [11] H. Zhong, S. A. Lieberman, and S. A. Mahlke. Extending multicore architectures to exploit hybrid parallelism in singlethread applications. 2007. [12] H. Zhong, M. Mehrara, S. Lieberman, and S. Mahlke. Uncovering hidden loop level parallelism in sequential applications. In Proc. of the 14th International Symposium on High- Performance Computer Architecture, 2008. [13] L. Lamport. The parallel execution of DO loops. In Communications of the ACM, 1974. [14] J. Allen and K. Kennedy. Automatic loop interchange. In Proc. of the 1984 SIGPLAN symposium on Compiler construction, 1984. [15] W. Lee, R. Barua, M. Frank, D. Srikrishna, J. Babb, V. Sarkar, and S. Amarasinghe. Space-time scheduling of instruction-level parallelism on a raw machine. In Proc. of the eighth international conference on Architectural support for programming languages and operating systems, 1998. [16] A. Lim and M. Lam. Maximizing parallelism and minimizing synchronization with affine transforms. In the Proc. of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, 1997. [17] A. Lim, G. Cheong, and M. Lam. An affine partitioning algorithm to maximize parallelism and minimize communication. In the Proc. of the 13th international conference on Supercomputing, 1999. [Text Books] [18] Modern Compiler Implementation in ML, by A. W. Appel, Cambridge University Press, 1998. [19] Compilers: Principles, Techniques, and Tools (2nd Edition) by Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman, Addison Wesley, 2006.
participants (1)
-
Melissa Lawson