Karan Singh will present his generals exam Monday, May 15, 2017 at 1pm in CS 401.

The members of his committee are: Elad Hazan (Advisor), Sanjeev Arora, and Ran Raz

Everyone is invited to attend his talk, and those faculty wishing to remain for the oral exam following are welcome to do so. His abstract and reading list follow below.

Random Perturbations in Online Learning

Adaptivity, Privacy, and Efficiency

Explicit regularization (via adding a convex regularizer) is the mainstay of (distribution-free) statistically efficient learning algorithms. [Kalai, Vempala 2002], also [Hannan 1957], present an alternative framework (FTPL) for sequential prediction, based on randomly perturbing the history of losses on previous predictions. While these perturbation-based algorithms admit computationally efficient implementations, discounting some simple settings, their statistical efficiency is not well understood (vis-a-vis regularization). Building on some recent progress towards this understanding, we present three compelling cases for the use of perturbation methods for online learning that improve upon the state-of-the-art. The theme unifying the three applications is the utilization of structured noise for perturbations, an element hence so far not employed by the previous analyses.

Adapting to the Inherent Structure

For the problem of expert advice, when the loss vectors are restricted to some unknown d-dimensional subspace (fondly referred to as low-rank experts), we demonstrate that a natural variant of FTPL achieves a regret of d^0.75√T, crucially independent of the number of experts. This improves on the previous best of d√T achieved by regularization-based schemes due to [Hazan, Koren, Livni, Mansour 2016].

Privacy for Free

We design differentially private algorithms for the problem of online linear optimization in the full information and bandit settings with optimal √T regret bounds. In the full-information setting, our results demonstrate that differential privacy may be ensured for free -- in particular, the regret bounds scale as √T+1/ε. Notably, this phenomenon was unacknowledged even in the (weaker) setting of statistical learning. This separation of the extra (additive) regret suffered to ensure privacy from the non-private regret bound is permitted through a reinterpretation of privacy-aimed noise addition schemes as random perturbations of the objective. For bandit linear optimization, and as a special case, for non-stochastic multi-armed bandits, the proposed algorithm achieves a regret of √T/ε, while the previously best-known regret bound was T^0.75/ε.

Efficient Algorithms for Contextual Learning

We devise the first algorithm for transductive contextual learning that enjoys optimal √T regret guarantees while admitting computationally efficient implementations at run-time. In contrast, previous works either obtain sub-optimal regret bounds or require explicit enumeration of the policy class on every iteration. Contextual learning (with partial feedback) is a strict, far-reaching generalization of the Markov Decision Process model for policy-based reinforcement learning. Contextual learning offers the same sample complexity bounds without placing any assumption on the underlying state dynamics (for example, conditional Markovian evolution of states).

The first and third results are due to collaborations with my advisor Elad Hazan (Princeton). The second result is joint work with Naman Agarwal (Princeton).

Reading List

Texts
Bubeck, Cesa-Bianchi: Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems.
Hazan: Introduction to Online Convex Optimization.
Shalev-Shwartz: Online Learning and Online Convex Optimization.

Szepesvári: Algorithms for Reinforcement Learning.

Papers

Contextual Learning

Dudik, Hsu, Kale, Karampatziakis, Langford, Reyzin, Zhang: Efficient optimal learning for contextual bandits. UAI 2011

Kakade, Kalai: From batch to transductive online learning. NIPS 2005
Langford, Zhang: The epoch-greedy algorithm for multi-armed bandits with side information. NIPS 2008
Rakhlin, Sridharan: BISTRO: An efficient relaxation-based method for contextual bandits. ICML 2016
Syrgkanis, Krishnamurthy, Schapire: Efficient Algorithms for Adversarial Contextual Learning. ICML 2016

Imitation Learning

Chang, Krishnamurthy, Agarwal, Daume, Langford: Learning to search better than your teacher. ICML 2015

Ross, Bagnell: Efficient Reductions for Imitation Learning. AISTATS 2010

Ross, Gordon, Bagnell: A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning. AISTATS 2011

Learning with Partial Information

Bubeck, Cesa-Bianchi, Kakade, Mannor, Srebro, Williamson: Towards Minimax Policies for Online Linear Optimization with Bandit Feedback. COLT 2012
Dani, Hayes, Kakade: The Price of Bandit Information for Online Optimization. NIPS 2007

Neu, Bartók: Importance weighting without importance weights: An efficient algorithm for combinatorial semi-bandits. JMLR 2016

Differentially Private Online Learning

Dwork, Naor, Pitassi, Rothblum: Differential privacy under continual observation. STOC 2010

Jain, Kothari, Thakurta: Differentially Private Online Learning. COLT 2012

Smith, Thakurta: (Nearly) optimal algorithms for private online learning in full-information and bandit settings. NIPS 2013

Perturbation Schemes in Statistics

Abernethy, Lee, Sinha, Tewari: Online Linear Optimization via Smoothing. COLT 2014

Hazan, Jaakkola: On the Partition Function and Random Maximum A-Posteriori Perturbations. ICML 2012

No-Regret Algorithms in Structured Settings

Garber, Hazan, Ma: Online Learning of Eigenvectors. ICML 2015

Hazan, Koren, Livni, Mansour: Online Learning with Low-Rank Experts. COLT 2016