CS Colloquium Speaker

Speaker: Zhen Dong, University of California, Berkeley
Date: Tuesday, February 25
Time: 12:30pm EST
Location: CS 105
Host: Kai Li
Event page: https://www.cs.princeton.edu/events/make-ai-more-accessible-and-run-faster
Register for live-stream online here: https://princeton.zoom.us/webinar/register/WN_yrwYu8XfT2-UxV-2RVKGvg

Title: Make AI More Accessible and Run Faster

Abstract: LLMs and diffusion models have achieved great success in recent years. However, many AI models, particularly those with state-of-the-art performance, have a high computational cost and memory footprint. This impedes the development of pervasive AI in scenarios lacking sufficient computational resources (e.g., IoT devices, lunar rovers), requiring ultra-fast inference (e.g., AI4Science), or demanding real-time interaction under constrained computation (e.g., AR/VR, Embodied AI). Model compression (quantization, pruning, distillation, etc) and hardware-software co-design are promising approaches to achieving Efficient AI, which makes AI more accessible and run faster.

In this talk, I will first introduce my work on 1) mixed-precision quantization based on Hessian analysis (HAWQv1v2, ZeroQ, Q-BERT) and 2) hardware-software co-design (HAWQv3, CoDeNet, HAO). Then I will talk about my ongoing and future works in the era of LLMs and GenAI, including SqueezeLLM, Q-Diffusion, efficient AI agent systems, advanced CoT distillation, efficient deep thinking for OpenAI-o1 and Deepseek-R1, etc. My research vision is that efficient AI is becoming indispensable both at the edge where increasingly powerful sensors with diverse modalities generate huge volumes of local data, and in the cloud where reducing costs is essential to bridge the speed gap between inference scaling laws and Moore’s law for hardware.

Bio: Dr. Zhen Dong is currently a Postdoc at UC Berkeley. He obtained his Ph.D. from Berkeley AI Research advised by Prof. Kurt Keutzer. Before Berkeley, Zhen received B.E. from Peking University. Zhen’s research focuses on efficient AI, model compression, hardware-software co-design, and AI systems. Zhen has received Berkeley University Fellowship and SenseTime Scholarship. Zhen has published over 10 papers as the first or co-first author at top AI conferences. He won the best paper award at AAAI Practical-DL workshop, and Zhen is also a winner of the DAC 2024 PhD forum and CVPR 2024 doctoral consortium.