[talks] Andrew Or General Exam Presentation Wednesday, May 16, 2018 at 10:00 am - CS302

14 May 2018

      Andrew Or General Exam Presentation Wednesday, May 16, 2018 at 10:00 am - CS302

Title: Quality-aware cluster scheduling

Abstract:

Modern data analytics is increasingly concerned with quick response times. With the recent
growth of data far exceeding the rate at which hardware has advanced, many applications have
begun to explore options that return intermediate results early. For example, approximate query
processing systems begin returning inexact answers before having processed the entire input
data, and machine learning practitioners often stop the training process early once they are
sufficiently satisfied with the training error.

However, existing cluster schedulers are not well equipped for these kinds of applications. In
particular, state-of-the-art scheduling solutions are primarily focused on resource fairness when
making decisions, treating the applications running on the cluster as black boxes. This
approach, though widely popular, often wastes a significant amount of resources on applications
that have been running for a long time and are no longer making much progress due to
diminishing marginal utility.

This talk presents an alternative scheduling approach that makes decisions based on
application utility. The intuition behind this approach is that prioritizing fresh applications can
lead to cluster-wide utility improvement. We have implemented such a scheduler in Spark and
will discuss its effectiveness in the context of approximate query processing and machine
learning.

Barbara A. Mooring
Interim Graduate Coordinator
Computer Science Department
Princeton University