Andrea Wynn will present her MSE talk "Learning Human-Like Representations to Enable Learning Human Values" today (April 16) at 10:00 AM in CS 301.
Andrea Wynn will present her MSE talk "Learning Human-Like Representations to Enable Learning Human Values" today (April 16) at 10:00 AM in CS 301. Advisor: Prof. Danqi Chen; Reader: Karthik Narasimhan CS Grad Calendar: https://calendar.google.com/calendar/event?action=TEMPLATE <https://calendar.google.com/calendar/event?action=TEMPLATE&tmeid=MDQybnM4NW g0NXJ2bW50cjUyNWRsZ3E4OGQgYWNnMDc5YmxzbzRtczNza2tmZThwa2lyb2dAZw&tmsrc=acg07 9blso4ms3skkfe8pkirog%40group.calendar.google.com> &tmeid=MDQybnM4NWg0NXJ2bW50cjUyNWRsZ3E4OGQgYWNnMDc5YmxzbzRtczNza2tmZThwa2lyb 2dAZw&tmsrc=acg079blso4ms3skkfe8pkirog%40group.calendar.google.com
Andrea Wynn will present her MSE talk "Learning Human-Like Representations to Enable Learning Human Values" today (April 16) at 10:00 AM in CS 301. Advisor: Prof. Thomas Griffiths; Reader: Prof. Benjamin Eysenbach Abstract: How can we build AI systems that can learn any set of individual human values both quickly and safely, avoiding causing harm or violating societal standards for acceptable behavior during the learning process? We argue that representational alignment between humans and AI agents facilitates learning human values quickly and safely, an important step towards value alignment in AI. Making AI systems learn human-like representations of the world has many known benefits, including improving generalization, robustness to domain shifts, and few-shot learning performance. We propose that this kind of representational alignment between machine learning (ML) models and humans can also support safely learning and exploring human values. We focus on ethics as one aspect of human values and train ML agents using a variety of methods in multi-armed bandit setting, where rewards reflect human value judgments over the chosen action. We use a synthetic experiment to demonstrate that agents 'representational alignment with the environment bounds their safe learning performance. We then repeat this procedure in a realistic setting, using textual action descriptions and similarity judgments collected from humans and variety of language models, to show that the results generalize and are model-agnostic when grounded in an ethically relevant context. CS Grad Calendar: https://calendar.google.com/calendar/event?action=TEMPLATE <https://calendar.google.com/calendar/event?action=TEMPLATE&tmeid=MDQybnM4NW g0NXJ2bW50cjUyNWRsZ3E4OGQgYWNnMDc5YmxzbzRtczNza2tmZThwa2lyb2dAZw&tmsrc=acg07 9blso4ms3skkfe8pkirog%40group.calendar.google.com> &tmeid=MDQybnM4NWg0NXJ2bW50cjUyNWRsZ3E4OGQgYWNnMDc5YmxzbzRtczNza2tmZThwa2lyb 2dAZw&tmsrc=acg079blso4ms3skkfe8pkirog%40group.calendar.google.com
Malinda Huang will present her MSE talk "CONFINE: Conformal Prediction for Interpretable Neural Networks" Friday, April 19 at 10:30 AM in CS 301. Advisor: Prof. Niraj Jha; Reader: Prof. Olga Troyanskaya Abstract: Deep neural networks exhibit remarkable performance, yet their black-box nature limits their utility in fields like healthcare where interpretability is crucial. Existing explainability approaches often sacrifice accuracy and lack quantifiable prediction uncertainty. In this study, we introduce Conformal Prediction for Interpretable Neural Networks (CONFINE), a versatile approach that generates prediction sets with statistically robust uncertainty estimates instead of point predictions, to enhance model transparency and reliability. CONFINE not only provides example-based explanations and confidence levels for individual predictions but also boosts accuracy by up to 3.6%. We define a new metric, correct efficiency, to evaluate the proportion of prediction sets that contain exactly the correct label and show that CONFINE achieves correct efficiency of up to 3.26% higher than the original accuracy, matching or exceeding prior methods. Adaptable to any pre-trained classifier, CONFINE has proven effective across tasks from image classification to language understanding, marking a significant advance towards transparent and trustworthy deep learning applications in critical domains. CS Grad Calendar: https://calendar.google.com/calendar/event?action=TEMPLATE <https://calendar.google.com/calendar/event?action=TEMPLATE&tmeid=Mzlua2x1YT lwYm12bWU0OXZ2YmVxOGlqNzQgYWNnMDc5YmxzbzRtczNza2tmZThwa2lyb2dAZw&tmsrc=acg07 9blso4ms3skkfe8pkirog%40group.calendar.google.com> &tmeid=Mzlua2x1YTlwYm12bWU0OXZ2YmVxOGlqNzQgYWNnMDc5YmxzbzRtczNza2tmZThwa2lyb 2dAZw&tmsrc=acg079blso4ms3skkfe8pkirog%40group.calendar.google.com
participants (1)
-
gradinfo@cs.princeton.edu