<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">


<head>

<meta http-equiv=Content-Type content="text/html; charset=us-ascii">

<meta name=Generator content="Microsoft Word 12 (filtered medium)">

<style>

<!--

 /* Font Definitions */

 @font-face

        {font-family:"Cambria Math";

        panose-1:2 4 5 3 5 4 6 3 2 4;}

@font-face

        {font-family:Calibri;

        panose-1:2 15 5 2 2 2 4 3 2 4;}

 /* Style Definitions */

 p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0in;

        margin-bottom:.0001pt;

        font-size:12.0pt;

        font-family:"Times New Roman","serif";}

a:link, span.MsoHyperlink

        {mso-style-priority:99;

        color:blue;

        text-decoration:underline;}

a:visited, span.MsoHyperlinkFollowed

        {mso-style-priority:99;

        color:purple;

        text-decoration:underline;}

span.EmailStyle17

        {mso-style-type:personal-reply;

        font-family:"Arial","sans-serif";

        color:blue;

        font-weight:normal;

        font-style:normal;

        text-decoration:none none;}

.MsoChpDefault

        {mso-style-type:export-only;}

@page Section1

        {size:8.5in 11.0in;

        margin:1.0in 1.0in 1.0in 1.0in;}

div.Section1

        {page:Section1;}

-->

</style>

<!--[if gte mso 9]><xml>

 <o:shapedefaults v:ext="edit" spidmax="1026" />

</xml><![endif]--><!--[if gte mso 9]><xml>

 <o:shapelayout v:ext="edit">

  <o:idmap v:ext="edit" data="1" />

 </o:shapelayout></xml><![endif]-->

</head>


<body lang=EN-US link=blue vlink=purple>


<div class=Section1>


<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Arial","sans-serif";

color:blue'>Jack Tzu-Han Hung will present his research seminar/general exam on

Wednesday January 20 at <o:p></o:p></span></p>


<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Arial","sans-serif";

color:blue'>9AM in room 402.  The members of his committee are David

August (advisor), Doug Clark, and Margaret Martonosi.  <o:p></o:p></span></p>


<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Arial","sans-serif";

color:blue'>Everyone is invited to attend his talk, and those faculty wishing

to remain for the oral exam are welcome to do <o:p></o:p></span></p>


<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Arial","sans-serif";

color:blue'>so.  His abstract and reading list follow below.<o:p></o:p></span></p>


<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Arial","sans-serif";

color:blue'>--------------------------------------------------------------<o:p></o:p></span></p>


<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Arial","sans-serif";

color:blue'><o:p> </o:p></span></p>


<div>


<p class=MsoNormal> (Abstract)<o:p></o:p></p>


</div>


<div>


<p class=MsoNormal> <o:p></o:p></p>


</div>


<div>


<p class=MsoNormal>Recently, multicore systems have emerged as the dominant<br>

architecture in computing. Unfortunately, sequential programs<br>

do not benefit much from the new architecture. To<br>

keep up to the decades-old performance growth trend, some<br>

propose new programming languages or extensions for explicitly<br>

parallel programming [9, 2, 5, 3, 10, 8]. This approach<br>

leaves great burden to the programmers, including<br>

code rewriting effort and concerns for correctness and performance.<o:p></o:p></p>


</div>


<div>


<p class=MsoNormal> <o:p></o:p></p>


</div>


<div>


<p class=MsoNormal>An alternative is to use compilation techniques to extract<br>

the implicit parallelism from traditional, sequential<br>

programs. Most works focus on parallelizing the hottest<br>

loop in the program [12, 6, 4]; a few previous works attempt<br>

to exploit hybrid parallelism from several independent<br>

loops [11]. Hence, without considering the loop nest<br>

as a whole, these techniques leave a significant amount of<br>

parallelism unexploited.<o:p></o:p></p>


</div>


<div>


<p class=MsoNormal> <o:p></o:p></p>


</div>


<div>


<p class=MsoNormal>This work aims to extract various types of parallelism<br>

existing in a loop nest, using a dynamic programmingbased<br>

framework. Specifically, with a bottom-up traversal<br>

of the loop nest, every loop is examined to see if data or<br>

pipeline parallelism can be extracted and is assessed to see<br>

if the parallelization strategy is beneficial. The framework<br>

produces the final parallelized code that provides the optimal<br>

combination of the parallelization techniques. With<br>

manual parallelization, we experiment with six benchmarks,<br>

selected from different domains, including streaming applications<br>

(StreamIt benchmark) and data-intensive<br>

applications (SPECfp and MineBench). Our initial results<br>

show a (geo-mean) speedup of 3.35x (on an 8-core machine),<br>

while prior approaches (i.e., parallelizing one single<br>

loop or independent loops) only produce a speedup of 2.85x.<o:p></o:p></p>


</div>


<div>


<p class=MsoNormal> <o:p></o:p></p>


</div>


<div>


<p class=MsoNormal>(Reading List)<o:p></o:p></p>


</div>


<div>


<p class=MsoNormal> <o:p></o:p></p>


</div>


<div>


<p class=MsoNormal>[Papers]<o:p></o:p></p>


</div>


<div>


<p class=MsoNormal> <o:p></o:p></p>


</div>


<div>


<p class=MsoNormal>[1] J. R. B. Davies. Parallel loop constructs for

multiprocessors.<br>

Master’s thesis, Department of Computer Science, University<br>

of Illinois, Urbana, IL, May 1981.<o:p></o:p></p>


</div>


<div>


<p class=MsoNormal> <o:p></o:p></p>


</div>


<div>


<p class=MsoNormal>[2] M. I. Gordon, W. Thies, and S. Amarasinghe. Exploiting<br>

coarse-grained task, data, and pipeline parallelism in stream<br>

programs. In ASPLOS-XII: Proceedings of the 12th international<br>

conference on Architectural support for programming<br>

languages and operating systems, pages 151–162, New York,<br>

NY, USA, 2006. ACM.<o:p></o:p></p>


</div>


<div>


<p class=MsoNormal> <o:p></o:p></p>


</div>


<div>


<p class=MsoNormal>[3] J. Gummaraju, J. Coburn, Y. Turner, and M. Rosenblum.<br>

Streamware: programming general-purpose multicore processors<br>

using streams. In ASPLOS XIII: Proceedings of the 13th<br>

international conference on Architectural support for programming<br>

languages and operating systems, pages 297–307,<br>

New York, NY, USA, 2008. ACM.<o:p></o:p></p>


</div>


<div>


<p class=MsoNormal> <o:p></o:p></p>


</div>


<div>


<p class=MsoNormal>[4] J. Huang, A. Raman, Y. Zhang, T. B. Jablin, T.-H. Hung,

and<br>

D. I. August. Decoupled software pipelining creates parallelization<br>

opportunities. In Proceedings of the 2010 International<br>

Symposium on Code Generation and Optimization,<br>

April 2010.<o:p></o:p></p>


</div>


<div>


<p class=MsoNormal> <o:p></o:p></p>


</div>


<div>


<p class=MsoNormal>[5] M. Kudlur and S. Mahlke. Orchestrating the execution of<br>

stream programs on multicore platforms. In PLDI ’08: Proceedings<br>

of the 2008 ACM SIGPLAN conference on Programming<br>

language design and implementation, pages 114–124,<br>

New York, NY, USA, 2008. ACM.<o:p></o:p></p>


</div>


<div>


<p class=MsoNormal> <o:p></o:p></p>


</div>


<div>


<p class=MsoNormal>[6] G. Ottoni, R. Rangan, A. Stoler, and D. I. August.

Automatic<br>

thread extraction with decoupled software pipelining. In Proceedings<br>

of the 38th Annual IEEE/ACM International Symposium<br>

on Microarchitecture, pages 105–116, November 2005.<o:p></o:p></p>


</div>


<div>


<p class=MsoNormal> <o:p></o:p></p>


</div>


<div>


<p class=MsoNormal>[7] E. Raman, G. Ottoni, A. Raman, M. Bridges, and D. I.

August.<br>

Parallel-stage decoupled software pipelining. In Proceedings<br>

of the 2008 International Symposium on Code Generation and<br>

Optimization, April 2008.<o:p></o:p></p>


</div>


<div>


<p class=MsoNormal> <o:p></o:p></p>


</div>


<div>


<p class=MsoNormal>[8] W. Thies, V. Chandrasekhar, and S. Amarasinghe. A

practical<br>

approach to exploiting coarse-grained pipeline parallelism<br>

in C programs. In Proceedings of the 40th Annual<br>

IEEE/ACM International Symposium on Microarchitecture,<br>

December 2007.<o:p></o:p></p>


</div>


<div>


<p class=MsoNormal> <o:p></o:p></p>


</div>


<div>


<p class=MsoNormal>[9] W. Thies, M. Karczmarek, and S. Amarasinghe. StreamIt:<br>

A language for streaming applications. In Proceedings of<br>

the 12th International Conference on Compiler Construction,<br>

2002.<o:p></o:p></p>


</div>


<div>


<p class=MsoNormal> <o:p></o:p></p>


</div>


<div>


<p class=MsoNormal>[10] A. Udupa, R. Govindarajan, and M. J. Thazhuthaveetil.

Synergistic<br>

execution of stream programs on multicores with accelerators.<br>

In LCTES ’09: Proceedings of the 2009 ACM<br>

SIGPLAN/SIGBED conference on Languages, compilers, and<br>

tools for embedded systems, pages 99–108, New York, NY,<br>

USA, 2009. ACM.<o:p></o:p></p>


</div>


<div>


<p class=MsoNormal> <o:p></o:p></p>


</div>


<div>


<p class=MsoNormal>[11] H. Zhong, S. A. Lieberman, and S. A. Mahlke. Extending<br>

multicore architectures to exploit hybrid parallelism in singlethread<br>

applications. 2007.<o:p></o:p></p>


</div>


<div>


<p class=MsoNormal> <o:p></o:p></p>


</div>


<div>


<p class=MsoNormal>[12] H. Zhong, M. Mehrara, S. Lieberman, and S. Mahlke.

Uncovering<br>

hidden loop level parallelism in sequential applications.<br>

In Proc. of the 14th International Symposium on High-<br>

Performance Computer Architecture, 2008.<o:p></o:p></p>


</div>


<div>


<p class=MsoNormal> <o:p></o:p></p>


</div>


<div>


<p class=MsoNormal>[13] L. Lamport. The parallel execution of DO loops. In <br>

Communications of the ACM, 1974.<o:p></o:p></p>


</div>


<div>


<p class=MsoNormal> <o:p></o:p></p>


</div>


<div>


<p class=MsoNormal>[14] J. Allen and K. Kennedy. Automatic loop interchange. <br>

In Proc. of the 1984 SIGPLAN symposium on Compiler construction, <br>

1984.<o:p></o:p></p>


</div>


<div>


<p class=MsoNormal> <o:p></o:p></p>


</div>


<div>


<p class=MsoNormal>[15] W. Lee, R. Barua, M. Frank, D. Srikrishna, J. Babb, V.

Sarkar, <br>

and S. Amarasinghe. Space-time scheduling of instruction-level <br>

parallelism on a raw machine. In Proc. of the eighth international <br>

conference on Architectural support for programming languages and <br>

operating systems, 1998.<o:p></o:p></p>


</div>


<div>


<p class=MsoNormal> <o:p></o:p></p>


</div>


<div>


<p class=MsoNormal>[16] A. Lim and M. Lam. Maximizing parallelism and

minimizing <br>

synchronization with affine transforms. In the Proc. of the 24th <br>

ACM SIGPLAN-SIGACT symposium on Principles of programming languages, <br>

1997.<o:p></o:p></p>


</div>


<div>


<p class=MsoNormal> <o:p></o:p></p>


</div>


<div>


<p class=MsoNormal>[17] A. Lim, G. Cheong, and M. Lam. An affine partitioning

algorithm <br>

to maximize parallelism and minimize communication. In the Proc. of <br>

the 13th international conference on Supercomputing, 1999.<o:p></o:p></p>


</div>


<div>


<p class=MsoNormal> <o:p></o:p></p>


</div>


<div>


<p class=MsoNormal>[Text Books]<o:p></o:p></p>


</div>


<div>


<p class=MsoNormal> <o:p></o:p></p>


</div>


<div>


<p class=MsoNormal>[18] Modern Compiler Implementation in ML, by A. W. Appel,

Cambridge <br>

University Press, 1998.<o:p></o:p></p>


</div>


<div>


<p class=MsoNormal> <o:p></o:p></p>


</div>


<div>


<p class=MsoNormal>[19] Compilers: Principles, Techniques, and Tools (2nd

Edition) by <br>

Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman, <br>

Addison Wesley, 2006.<o:p></o:p></p>


</div>


<div>


<p class=MsoNormal> <o:p></o:p></p>


</div>


</div>


</body>


</html>