Samyak Gupta will present his General Exam "Attacking Federated Natural Language Modeling and Building Tools for Optimizing Cognitive Models" on Friday, April 15, 2022 at 1:00 PM in CS 302 and via zoom.

Zoom link: https://princeton.zoom.us/my/samyakg

Committee Members: Kai Li (advisor), Danqi Chen, Jon Cohen

Abstract:

In this talk, I will present two distinct areas of my work.

In the first half of my talk, I describe an application related to privacy in federated learning. Federated learning allows distributed users to collaboratively train a model while keeping each user’s data private. As a result, it is actively being considered for privacy sensitive applications such as virtual mobile keyboards and analysis of electronic health records in hospitals. A growing body of recent work suggests that a clever eavesdropper may effectively reconstruct image data from gradients transmitted during federated learning. However, little progress has been made regarding the recovery of text data. Moreover, existing literature does not establish any metrics which may be used to quantify the severity of text data leakage.

In this presentation, I present the first attack which can reconstruct sentences of private text data during training of a neural language model in federated learning. Our proposed method is built on beam search for text generation, by recovering a set of words from gradients and leveraging the latent memorization of text by large language models. We experiment with a GPT-2 model on several large datasets and demonstrate that our proposed attack can successfully recover private text data during training.

Reading List:

https://docs.google.com/document/d/1t6BU6TflwycaOe7XECeUk7Tln19Aui9JqVOLhxk6uZA/edit?usp=sharing

Everyone is invited to attend the talk, and those faculty wishing to remain for the oral exam following are welcome to do so.

Louis Riehl
Graduate Administrator
Computer Science Department, CS213
Princeton University
(609) 258-8014