
Prachi Sinha will present her MSE talk "Augmenting Concept-Labeled Datasets for Interpretability Using Counterfactual Generation" Monday, April 22 at 12:30 PM in CS 402. Advisor: Vikram Ramaswamy, Reader: Olga Russakovsky Abstract: Concept-based interpretability methods use a pre-defined set of human-understandable concepts to explain a model's output. These explanations are learned by "probing" the model using a concept-labeled dataset and learning how concepts can be linearly combined to predict outputs. However, there are several limitations to current methods. Explanations are highly dependent on the probe dataset, and they may reflect correlations between concepts and outputs, rather than causation. To address these issues, we augment COCO, a concept-labeled dataset, with counterfactual images in which particular semantic concepts have been removed. Using such counterfactual images, we attempt to improve the accuracy of explanations for a scene classification model, and more directly learn the causal relationships between concepts and outputs. We find that explanation accuracy improves in certain cases, but is generally still limited by the probe dataset's alignment with the model, and that counterfactual images elucidate issues with assigning weights to concepts when there are strong correlations between concepts. CS Grad Calendar: https://calendar.google.com/calendar/event?action=TEMPLATE <https://calendar.google.com/calendar/event?action=TEMPLATE&tmeid=NzljZTNoZm 1ibWc1NGU4ZmtodjJkdHRuMmggYWNnMDc5YmxzbzRtczNza2tmZThwa2lyb2dAZw&tmsrc=acg07 9blso4ms3skkfe8pkirog%40group.calendar.google.com> &tmeid=NzljZTNoZm1ibWc1NGU4ZmtodjJkdHRuMmggYWNnMDc5YmxzbzRtczNza2tmZThwa2lyb 2dAZw&tmsrc=acg079blso4ms3skkfe8pkirog%40group.calendar.google.com