Andrea Wynn will present her MSE talk “Learning Human-Like Representations to Enable Learning Human Values” today (April 16) at 10:00 AM in CS 301.

 

Advisor: Prof. Thomas Griffiths; Reader: Prof. Benjamin Eysenbach

 

Abstract:

How can we build AI systems that can learn any set of individual human values both quickly and safely, avoiding causing harm or violating societal standards for acceptable behavior during the learning process? We argue that representational alignment between humans and AI agents facilitates learning human values quickly and safely, an important step towards value alignment in AI. Making AI systems learn human-like representations of the world has many known benefits, including improving generalization, robustness to domain shifts, and few-shot learning performance. We propose that this kind of representational alignment between machine learning (ML) models and humans can also support safely learning and exploring human values. We focus on ethics as one aspect of human values and train ML agents using a variety of methods in multi-armed bandit setting, where rewards reflect human value judgments over the chosen action. We use a synthetic experiment to demonstrate that agents ‘representational alignment with the environment bounds their safe learning performance. We then repeat this procedure in a realistic setting, using textual action descriptions and similarity judgments collected from humans and variety of language models, to show that the results generalize and are model-agnostic when grounded in an ethically relevant context.

 

CS Grad Calendar:

https://calendar.google.com/calendar/event?action=TEMPLATE&tmeid=MDQybnM4NWg0NXJ2bW50cjUyNWRsZ3E4OGQgYWNnMDc5YmxzbzRtczNza2tmZThwa2lyb2dAZw&tmsrc=acg079blso4ms3skkfe8pkirog%40group.calendar.google.com