Madelyne Xiao will present her General Exam "Machine Learning Approaches for Automated Misinformation Detection" on Tuesday, May 16, 2023 at 3:00 PM in Sherrerd 008 and via zoom.

Zoom link: https://princeton.zoom.us/my/madelyne.x

Committee Members: Jonathan Mayer (advisor), Arvind Narayanan, Andrew Guess

Abstract:

The spread of misinformation, disinformation, and false news on social media platforms in recent years has triggered a corresponding proliferation of automated misinformation detection methods. We present a systematic review and analysis of existing machine learning-driven approaches to the automated detection of misinformation. Across 248 well-cited papers in the field, we develop a taxonomy of five different information “scopes” for which numerous automated methods have been developed: claims, news articles, social media posts, users/authors, and whole websites.

Having established this framework, we identify errors in ML-driven misinformation detection methods, with a focus on errors in corpus curation and model development. In particular, we find that methods that rely solely on textual features suffer from a lack of information context, such as reader response and source reputation. As a result, these methods are especially susceptible to text-specific dependencies (e.g., narrowly-defined subject domains and stylistic idiosyncrasies) that interfere with actual signals for misinformative content. Meanwhile, methods that consider extra-textual features (user behavior, social media platform-specific UI elements, knowledge networks) suffer from errors common to the “curse of dimensionality”: poor feature selection and mismanagement of data proxies. We support these findings with three replication studies, whose results highlight serious issues of data availability, reproducibility, and method generalizability within the existing misinformation detection literature.

In view of the results of our literature analysis and replication studies, we conclude that existing ML-based methods for automated misinformation detection fall far short of what might be reasonably expected of methods deployable in a real world setting. As such, we make recommendations for proactive misinformation interventions beyond automated ML detection methods.

Reading List:

https://docs.google.com/document/d/1J80hwND6Jiw29ohiuEizjT7AchNNOtZFLH2ySTE_p1U/edit

Everyone is invited to attend the talk, and those faculty wishing to remain for the oral exam following are welcome to do so.

Louis Riehl
Graduate Administrator
Computer Science Department, CS213
Princeton University
(609) 258-8014