Yushan Su will present her FPO "Making Neural Network Models More Efficient" on Wednesday, May 10, 2023 at 10am in Friend 202.

The members of her committee are as follows: Examiners: Kai Li (adviser), Wyatt Lloyd, and Karthik Narasimhan; Readers: Ravi Netravali and Olga Troyanskaya.

Please see title and abstract below. All are welcome to attend.

Complex machine learning tasks typically require large neural network models. However, training and inference on neural models require substantial compute power and large memory footprints, and incur significant cost. My thesis studies methods to make neural networks efficient at a relatively low cost.

First, we explore how to utilize CPU servers for training and inference. CPU servers are more readily available, have larger memories, and cost less than GPUs or hardware accelerators. However, they are much less efficient for training and inference tasks. My thesis studies how to design efficient software kernels for sparse neural networks that allow unstructured pruning to achieve efficiency of training or inference with minimum accuracy losses. Our evaluation shows that our sparse kernels can achieve 6.4x-20.0x speedups at different sparsities over the commonly used Intel MKL sparse library.

Second, we study how to achieve high-throughput inference for large models. We explore ways to combine data multiplexing with pruning. We find that combining the two approaches can achieve better throughput than using each approach alone for a given accuracy loss. We

then propose an automatic method to find parameters for such combinations to maximize inference throughput given an accuracy loss budget. We show that the proposed combining can improve Transformer models throughput by 7.5-29.5X at different accuracy thresholds, which is higher than using either approach alone.