[talks] Karan Singh will present his generals exam Monday, May 15, 2017 at 1pm in CS 401

Fri May 5 09:37:18 EDT 2017

Karan Singh will present his generals exam Monday, May 15 , 2017 at 1pm in CS 401. 

The members of his committee are: Elad Hazan (Advisor), Sanjeev Arora, and Ran Raz 

Everyone is invited to attend his talk, and those faculty wishing to remain for the oral exam following are welcome to do so. His abstract and reading list follow below. 
Random Perturbations in Online Learning 

Adaptivity, Privacy, and Efficiency 

Explicit regularization (via adding a convex regularizer) is the mainstay of (distribution-free) statistically efficient learning algorithms. [Kalai, Vempala 2002] , also [Hannan 1957] , present an alternative framework (FTPL) for sequential prediction, based on randomly perturbing the history of losses on previous predictions. While these perturbation-based algorithms admit computationally efficient implementations, discounting some simple settings, their statistical efficiency is not well understood (vis-a-vis regularization). Building on some recent progress towards this understanding, we present three compelling cases for the use of perturbation methods for online learning that improve upon the state-of-the-art. The theme unifying the three applications is the utilization of structured noise for perturbations, an element hence so far not employed by the previous analyses. 

Adapting to the Inherent Structure 

For the problem of expert advice, when the loss vectors are restricted to some unknown d-dimensional subspace (fondly referred to as low-rank experts ), we demonstrate that a natural variant of FTPL achieves a regret of d^0.75√T , crucially independent of the number of experts. This improves on the previous best of d√T achieved by regularization-based schemes due to [Hazan, Koren, Livni, Mansour 2016] . 

Privacy for Free 

We design differentially private algorithms for the problem of online linear optimization in the full information and bandit settings with optimal √T regret bounds . In the full-information setting, our results demonstrate that differential privacy may be ensured for free -- in particular, the regret bounds scale as √T+1/ε . Notably, this phenomenon was unacknowledged even in the (weaker) setting of statistical learning. This separation of the extra (additive) regret suffered to ensure privacy from the non-private regret bound is permitted through a reinterpretation of privacy-aimed noise addition schemes as random perturbations of the objective. For bandit linear optimization, and as a special case, for non-stochastic multi-armed bandits, the proposed algorithm achieves a regret of √T/ε, while the previously best-known regret bound was T^0.75/ε. 

Efficient Algorithms for Contextual Learning 

We devise the first algorithm for transductive contextual learning that enjoys optimal √T regret guarantees while admitting computationally efficient implementations at run-time. In contrast, previous works either obtain sub-optimal regret bounds or require explicit enumeration of the policy class on every iteration. Contextual learning (with partial feedback) is a strict, far-reaching generalization of the Markov Decision Process model for policy-based reinforcement learning. Contextual learning offers the same sample complexity bounds without placing any assumption on the underlying state dynamics (for example, conditional Markovian evolution of states). 

The first and third results are due to collaborations with my advisor Elad Hazan (Princeton). The second result is joint work with Naman Agarwal (Princeton). 

Reading List 

Texts 
Bubeck, Cesa-Bianchi : Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. 
Hazan : Introduction to Online Convex Optimization. 
Shalev-Shwartz : Online Learning and Online Convex Optimization. 

Szepesvári: Algorithms for Reinforcement Learning. 

Papers 

Contextual Learning 

Dudik, Hsu, Kale, Karampatziakis, Langford, Reyzin, Zhang : Efficient optimal learning for contextual bandits. UAI 2011 

Kakade, Kalai: From batch to transductive online learning. NIPS 2005 
Langford, Zhang : The epoch-greedy algorithm for multi-armed bandits with side information. NIPS 2008 
Rakhlin, Sridharan : BISTRO: An efficient relaxation-based method for contextual bandits. ICML 2016 
Syrgkanis, Krishnamurthy, Schapire : Efficient Algorithms for Adversarial Contextual Learning. ICML 2016 

Imitation Learning 

Chang, Krishnamurthy, Agarwal, Daume, Langford : Learning to search better than your teacher. ICML 2015 

Ross, Bagnell: Efficient Reductions for Imitation Learning. AISTATS 2010 

Ross, Gordon, Bagnell : A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning. AISTATS 2011 

Learning with Partial Information 

Bubeck, Cesa-Bianchi, Kakade, Mannor, Srebro, Williamson : Towards Minimax Policies for Online Linear Optimization with Bandit Feedback. COLT 2012 
Dani, Hayes, Kakade: The Price of Bandit Information for Online Optimization. NIPS 2007 

Neu, Bartók : Importance weighting without importance weights: An efficient algorithm for combinatorial semi-bandits. JMLR 2016 

Differentially Private Online Learning 

Dwork, Naor, Pitassi, Rothblum : Differential privacy under continual observation. STOC 2010 

Jain, Kothari, Thakurta : Differentially Private Online Learning. COLT 2012 

Smith, Thakurta : (Nearly) optimal algorithms for private online learning in full-information and bandit settings. NIPS 2013 

Perturbation Schemes in Statistics 

Abernethy, Lee, Sinha, Tewari : Online Linear Optimization via Smoothing. COLT 2014 

Hazan, Jaakkola : On the Partition Function and Random Maximum A-Posteriori Perturbations. ICML 2012 

No-Regret Algorithms in Structured Settings 

Garber, Hazan, Ma : Online Learning of Eigenvectors. ICML 2015 

Hazan, Koren, Livni, Mansour: Online Learning with Low-Rank Experts. COLT 2016 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.princeton.edu/pipermail/talks/attachments/20170505/dcbffd9b/attachment-0001.html>