Please note: Ryan's FPO will also be available via Zoom.  Details below.

https://princeton.zoom.us/j/96477756137?pwd=a0pPdjA1TkE2MzNHRHNkQkswWFBtdz09
Meeting ID: 964 7775 6137
Passcode: 277734


Ryan Amos will present his FPO "Consumer Protection on the Web with Longitudinal Web Crawls and Analysis" on March 17, 2022 at 3pm in CS 402.
The members of his committee are as follows:
Examiners: Edward Felten (adviser), Prateek Mittal (adviser), and Arvind Narayanan;
Readers: Jonathan Mayer and Joe Calandrino (Federal Trade Commission)

A copy of his thesis is available upon request.  Please email gradinfo@cs.princeton.edu if you would like a copy of the thesis.

Everyone is invited to attend his talk.

Abstract:
The world wide web has brought with it new consumer protection hazards, such as deceptive reviews and online tracking. While many academics have studied consumer protection on the web at specific points in time, we approach this problem from a longitudinal perspective, exploring how consumers' rights to privacy and to be informed have been impacted by the web. Our work highlights the key role in study of consumer protection issues played by longitudinal analyses and longitudinal data collection -- data collected over repeated, time-spaced passes.

We investigate consumer protection issues on the web through longitudinal studies in two landscapes: website privacy policies and reviews on Yelp. We approach both problems by collecting data with automated, repeated visits to the websites of interest to collect large scale datasets. In our study of privacy policies, we aggregate Internet Archive's crawls to perform longitudinal collection, and in our online reviews study, we crawl the data ourselves. We collected 1M privacy policies spanning 22 years and 12.5M reviews over 11 months. 

We used our data to study the evolution of privacy policies raising concerns with rights to privacy and information. We find gaps in disclosure of privacy-related practices. We show declining readability over the long term, doubling in length and becoming more complex. We show disparities in website-reported and independently-observed tracking. In our study of online reviews we raise concerns with the right to be informed. We present the first study of "reclassification," wherein a platform changes its filtering decision for a review. We find that reviews routinely move between Yelp's two main classifier classes ("Recommended" and "Not Recommended"), up to five reclassifications on a single review. We identify demographic disparities in review prevalence and filtering decisions.

By showing phenomena that cannot be studied without longitudinal data collection and analysis, we emphasize the importance of longitudinal study for consumer protection issues online. Our work helps lay the groundwork for future work on these issues through our software and data releases, easing the pathway for future researchers.