Jiatong Yu will present her MSE thesis "On the Impossibility of Retrain Equivalence in Machine Unlearning" on Friday, April 24th, 2026 in CS 301 at 10:00am.

Advisor: Sanjeev Arora

Reader: Kai Li

All are welcome to attend.

Abstract:

{\em Machine unlearning} seeks to selectively remove the ``influence'' of specific training data on a model's outputs. The ideal goal is {\em Retrain Equivalence}---behavior identical to a model trained from scratch on only the retained data. This goal was originally formulated for models trained on {\em i.i.d.}\ data batches, but modern pipelines often involve multi-stage training, with each stage having a distinct data distribution and objective. Examples include LLM finetuning for alignment, reasoning ability, etc. Building on prior work that has shown retrain-equivalent behavior can be ill-defined due to forgeable datasets in mini-batch training, we study a complementary source of fragility that arises in multi-stage training. In an overparameterized linear regression model, we prove an exponential lower bound on {\em path-dependent divergence} under a family of unlearning objectives, unifying purely local gradient ascent and weakly local algorithms with retain-set regularization. Models trained on the same stages in different orders can be driven exponentially far apart during unlearning, and we show that this pointwise divergence implies violation of the distributional $(\varepsilon,\delta)$-certified unlearning definition. We then empirically measure the same kind of path dependence in LLM post-training across Llama and Qwen models (1B--14B) with gradient ascent, NPO, and SimNPO algorithms.Models finetuned via different orderings of identical training stages diverge in behavior during unlearning, with the degradation in GSM8K accuracy after unlearning varying by over $20\%$ across paths. We also observe that some learning paths consistently produce models that unlearn slowly while preserving higher retained utility---a phenomenon we term the \emph{recency effect}---and that the fate of probability mass during unlearning (e.g., paraphrasing vs.\ alternative concepts) is likewise path-dependent. Taken together, these results quantitatively demonstrate that local, path-oblivious unlearning in multi-stage pipelines can be highly sensitive to training history, adding a multi-stage perspective to the impossibility results and motivating the search for new unlearning desiderata and evaluations.