Anirudh Ajith will present his MSE talk "Towards Optimizing In-Context Learning and Ensuring Data Transparency in Large Language Models" on Friday, April 26, 2024 at 1pm in CS 402

Adviser: Karthik Narasimhan

Reader: Danqi Chen

All are welcome to attend.

Abstract:

Large language models have emerged as a significant development in the fields of natural language processing and artificial intelligence. These models, primarily trained on large corpora of text scraped from webpages and books show emergent abilities enabling strong (and even human-level) performance on a wide range of tasks.

These impressive abilities are typically elicited using "in-context learning" prompts that include instructions, annotated demonstrations and unsolved test examples. Recent work has shown that LLM performance remains sensitive to the precise details of the demonstrations, instructions and even formatting used in these prompts. While the optimal selection of demonstrations has been explored, instruction-choice remains understudied with existing analyses confined to narrow subsets of models and tasks, thus reducing the generalizability of their insights. The first part of this thesis introduces the InstructEval suite for the systematic evaluation of instruction selection methods for ICL. We use InstructEval, which includes 13 open-sourced LLMs of varying scales from four model families and covers nine tasks across three categories to evaluate seven popular instruction selection methods over five metrics relevant to ICL. Our experiments reveal that curated manually-written instructions or simple instructions without any task-specific descriptions often elicit superior ICL performance overall than automatic instruction-induction methods, hence pointing to a lack of generalizability among the latter.

As the scale of data used to train contemporary LLMs has grown into the trillions of tokens, model developers have also become increasingly reluctant to disclose their data sources. Simultaneously, there have been growing concerns that academic benchmarks used to evaluate LLMs may have been compromised by the leakage of test data into pretraining corpora. Concerns also exist that these LLMs may be trained on sensitive personal information or copyrighted content which could be problematic to generate post-deployment. In the second part of this thesis, we study the pretraining data detection problem: given a piece of text and black-box access to an LLM, can we determine if the model was trained on the provided text? We introduce the WikiMIA benchmark to facilitate this study and propose a new detection method called Min-K% Prob that unlike prior work, does not require reference models, additional training or any knowledge about a model’s pretraining data distribution. In addition to showing that Min-K% Prob outperforms baselines on WikiMIA, we also demonstrate its utility for detecting the leakage of benchmark data and copyrighted content.