Huihan Li will present her MSE talk "Re-evaluating Conversational Question Answering" on April 21, 3:30pm via Zoom.

Zoom link: https://princeton.zoom.us/j/92763452837

Huihan's committee is as follows: Danqi Chen (adviser) and Karthik Narasimhan (reader)

All are welcome to attend.

Abstract:

Conversational question answering aims to provide natural-language answers to users ininformation-seeking conversations. Existing conversational QA benchmarks compare models

with pre-collected human-human conversations, using ground-truth answers provided in

conversational history. It remains unclear whether we can rely on this static evaluation for

model development and whether current systems can well generalize to real-world human-

machine conversations. In this work, we conduct the first large-scale human evaluation of

state-of-the-art conversational QA systems, where human evaluators converse with models

and judge the correctness of their answers. We find that the distribution of human-machine

conversations differs drastically from that of human-human conversations, and there is a

disagreement between human and gold-history evaluation in terms of model ranking. We

further investigate how to improve automatic evaluations, and propose a question rewriting

mechanism based on predicted history, which better correlates with human judgments. Finally,

we demonstrate that training conversational question answering systems using their own

predictions benefits the models under the new automatic evaluation protocol.