Jihoon Chung will present his General Exam "Enabling Detailed Action Recognition Evaluation Through Video Dataset Augmentation" on Wednesday, May 17, 2023 at 10:15 AM in CS 302 .
Committee Members: Olga Russakovsky (advisor), Jia Deng, Felix Heide
Abstract:
Human action recognition is the task of automatically classifying a video according to the action of the person depicted in the video. Despite the overall high accuracy on current benchmarks, it is well-known in the video understanding community that human action recognition models suffer from background bias, i.e., over-relying on scene cues in making their predictions. However, it is difficult to quantify this effect using existing evaluation frameworks. We introduce the Human-centric Analysis Toolkit (HAT), which enables the evaluation of learned background bias without the need for new manual video annotation. It does so by automatically generating synthetically manipulated videos and leveraging the recent advances in image segmentation and video inpainting. Using HAT we perform an extensive analysis of 74 action recognition models trained on the Kinetics dataset. We confirm that all these models focus more on the scene background than on the human motion; further, we demonstrate that certain model design decisions (such as training with fewer frames per video or using dense as opposed to uniform temporal sampling) appear to worsen the background bias. We open-source HAT to enable the community to design more robust and generalizable human action recognition models.
Reading List:
Everyone is invited to attend the talk, and those faculty wishing to remain for the oral exam following are welcome to do so.