Andrew Or General Exam Presentation Wednesday, May 16, 2018 at 10:00 am - CS302 Title: Quality-aware cluster scheduling Abstract: Modern data analytics is increasingly concerned with quick response times. With the recent growth of data far exceeding the rate at which hardware has advanced, many applications have begun to explore options that return intermediate results early. For example, approximate query processing systems begin returning inexact answers before having processed the entire input data, and machine learning practitioners often stop the training process early once they are sufficiently satisfied with the training error. However, existing cluster schedulers are not well equipped for these kinds of applications. In particular, state-of-the-art scheduling solutions are primarily focused on resource fairness when making decisions, treating the applications running on the cluster as black boxes. This approach, though widely popular, often wastes a significant amount of resources on applications that have been running for a long time and are no longer making much progress due to diminishing marginal utility. This talk presents an alternative scheduling approach that makes decisions based on application utility. The intuition behind this approach is that prioritizing fresh applications can lead to cluster-wide utility improvement. We have implemented such a scheduler in Spark and will discuss its effectiveness in the context of approximate query processing and machine learning. Barbara A. Mooring Interim Graduate Coordinator Computer Science Department Princeton University