
Colin Wang will present his masters' thesis "Evaluating Multimodal Models for Chart Understanding" on April 21, 2025 at 2pm in the AI Lab (41 William Street) Conference Room (Room 274). The members of his committee are as follows: Danqi Chen (Adviser) and Sanjeev Arora (reader). All are welcome to attend. Please see the title and abstract below. Title : Evaluating Multimodal Models for Chart Understanding Abstract : Chart understanding and reasoning play a pivotal role when applying Multimodal Large Language Models (MLLMs) to real-world tasks such as analyzing scientific papers or financial reports. In this work, we propose CharXiv, a comprehensive evaluation suite involving 2,323 natural, challenging, and diverse charts from arXiv papers. CharXiv includes two types of questions: 1) descriptive questions about examining basic chart elements and 2) reasoning questions that require synthesizing information across complex visual elements in the chart. To ensure quality, all charts and questions are handpicked, curated, and verified by human experts. Our results reveal a substantial, previously underestimated gap between the reasoning skills of the strongest proprietary model and the strongest open-source model. All models lag far behind human performance of 80.5%, underscoring weaknesses in the chart understanding capabilities of existing MLLMs. Since its release, CharXiv has gained significant real-world adoption. It has been used internally by organizations such as OpenAI and Anthropic, and serves as an official benchmark for models including QwenVL, InternVL, and the Doubao series. CharXiv has also become a standard testbed for academic research in visual reasoning and has inspired the development of more challenging and visually-diverse evaluations in the AI community. In this talk, I will present the design and curation process of CharXiv, share key findings and failure modes from model evaluations, and reflect on CharXiv's ongoing impact within the broader research community.