Princeton AI Alignment and Safety Seminar (PASS)

Catastrophic misalignment of large language models

Paul Christiano, Alignment Research Center

Tuesday, March 19
2:00 - 3:00 pm

Abstract: I’ll discuss two possible paths by which AI systems could be so misaligned that they attempt to deceive and disempower their human operators. I’ll review the current state of evidence about these risks, what we might hope to learn over the next few years, and how we could become confident that the risk is adequately managed.

Bio: I run the Alignment Research Center. I previously ran the language model alignment team at OpenAI, and before that received my PhD from the theory group at UC Berkeley. You may be interested in my writing about alignment, my blog, my academic publications, or fun and games. I am an advisor and board member at METR, an external advisor to the UK AI Safety Institute, and a trustee of Anthropic’s Long-Term Benefit Trust.

Stay informed and receive seminar reminders by joining our mailing list: https://tinyurl.com/pass-mailing

Organized by: Princeton Language and Intelligence