Adrian Soviani will present his preFPO on Monday May 17 at 4PM in Room 402. The members of his committee are: J.P. Singh, advisor; Kai Li and David August, readers; Brian Kernighan and Ken Steiglitz, nonreaders. Everyone is invited to attend his talk. His abstract follows below. --------------------------------------------- Building scalable parallel applications remains a difficult task due to implementation complexity, performance portability, and inability to understand hidden costs and bottlenecks. Specific machine architectures and software layers further influence application efficiency and performance transparency, making optimization a time consuming burden. The parallel programming model has a great impact on addressing these issues and reaching a good tradeoff between implementation effort and application efficiency. This thesis presents a hybrid SPMD - Coarse Grain Dataflow programming model (CGD) that describes data and task parallelism at high level. CGD applications are specified as dependencies between computation modules and data distributions while communication and synchronization are added automatically and optimized for specific architectures. We claim the CGD model can make application development and design space exploration simpler compared to message passing, at the same time providing similar or better performance. Results on the 128 CPU SGI Altix 4700 show our optimized CGD FT is 27% faster than the original NPB 2.3 MPI implementation, the optimized CGD stencil has a 41% advantage over handwritten MPI, while the CGD Barnes-Hut 1M particle benchmark is 15% faster than the pthreads Splash2 implementation. CGD takes advantage of its dataflow semantics and explicit distribution rules to automatically schedule computations and insert, aggregate and overlap communication, simplifying application development and optimization. E.g., the CGD NPB FT implementation requires 85 lines of dataflow and 700 lines of sequential C++ code, while the original MPI implementation has 1260 lines of code and exhibits poorer performance portability.