[talks] Karan Singh will present his generals exam Monday, May 15, 2017 at 1pm in CS 401
ngotsis at CS.Princeton.EDU
Fri May 5 09:37:18 EDT 2017
Karan Singh will present his generals exam Monday, May 15 , 2017 at 1pm in CS 401.
The members of his committee are: Elad Hazan (Advisor), Sanjeev Arora, and Ran Raz
Everyone is invited to attend his talk, and those faculty wishing to remain for the oral exam following are welcome to do so. His abstract and reading list follow below.
Random Perturbations in Online Learning
Adaptivity, Privacy, and Efficiency
Explicit regularization (via adding a convex regularizer) is the mainstay of (distribution-free) statistically efficient learning algorithms. [Kalai, Vempala 2002] , also [Hannan 1957] , present an alternative framework (FTPL) for sequential prediction, based on randomly perturbing the history of losses on previous predictions. While these perturbation-based algorithms admit computationally efficient implementations, discounting some simple settings, their statistical efficiency is not well understood (vis-a-vis regularization). Building on some recent progress towards this understanding, we present three compelling cases for the use of perturbation methods for online learning that improve upon the state-of-the-art. The theme unifying the three applications is the utilization of structured noise for perturbations, an element hence so far not employed by the previous analyses.
Adapting to the Inherent Structure
For the problem of expert advice, when the loss vectors are restricted to some unknown d-dimensional subspace (fondly referred to as low-rank experts ), we demonstrate that a natural variant of FTPL achieves a regret of d^0.75√T , crucially independent of the number of experts. This improves on the previous best of d√T achieved by regularization-based schemes due to [Hazan, Koren, Livni, Mansour 2016] .
Privacy for Free
We design differentially private algorithms for the problem of online linear optimization in the full information and bandit settings with optimal √T regret bounds . In the full-information setting, our results demonstrate that differential privacy may be ensured for free -- in particular, the regret bounds scale as √T+1/ε . Notably, this phenomenon was unacknowledged even in the (weaker) setting of statistical learning. This separation of the extra (additive) regret suffered to ensure privacy from the non-private regret bound is permitted through a reinterpretation of privacy-aimed noise addition schemes as random perturbations of the objective. For bandit linear optimization, and as a special case, for non-stochastic multi-armed bandits, the proposed algorithm achieves a regret of √T/ε, while the previously best-known regret bound was T^0.75/ε.
Efficient Algorithms for Contextual Learning
We devise the first algorithm for transductive contextual learning that enjoys optimal √T regret guarantees while admitting computationally efficient implementations at run-time. In contrast, previous works either obtain sub-optimal regret bounds or require explicit enumeration of the policy class on every iteration. Contextual learning (with partial feedback) is a strict, far-reaching generalization of the Markov Decision Process model for policy-based reinforcement learning. Contextual learning offers the same sample complexity bounds without placing any assumption on the underlying state dynamics (for example, conditional Markovian evolution of states).
The first and third results are due to collaborations with my advisor Elad Hazan (Princeton). The second result is joint work with Naman Agarwal (Princeton).
Bubeck, Cesa-Bianchi : Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems.
Hazan : Introduction to Online Convex Optimization.
Shalev-Shwartz : Online Learning and Online Convex Optimization.
Szepesvári: Algorithms for Reinforcement Learning.
Dudik, Hsu, Kale, Karampatziakis, Langford, Reyzin, Zhang : Efficient optimal learning for contextual bandits. UAI 2011
Kakade, Kalai: From batch to transductive online learning. NIPS 2005
Langford, Zhang : The epoch-greedy algorithm for multi-armed bandits with side information. NIPS 2008
Rakhlin, Sridharan : BISTRO: An efficient relaxation-based method for contextual bandits. ICML 2016
Syrgkanis, Krishnamurthy, Schapire : Efficient Algorithms for Adversarial Contextual Learning. ICML 2016
Chang, Krishnamurthy, Agarwal, Daume, Langford : Learning to search better than your teacher. ICML 2015
Ross, Bagnell: Efficient Reductions for Imitation Learning. AISTATS 2010
Ross, Gordon, Bagnell : A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning. AISTATS 2011
Learning with Partial Information
Bubeck, Cesa-Bianchi, Kakade, Mannor, Srebro, Williamson : Towards Minimax Policies for Online Linear Optimization with Bandit Feedback. COLT 2012
Dani, Hayes, Kakade: The Price of Bandit Information for Online Optimization. NIPS 2007
Neu, Bartók : Importance weighting without importance weights: An efficient algorithm for combinatorial semi-bandits. JMLR 2016
Differentially Private Online Learning
Dwork, Naor, Pitassi, Rothblum : Differential privacy under continual observation. STOC 2010
Jain, Kothari, Thakurta : Differentially Private Online Learning. COLT 2012
Smith, Thakurta : (Nearly) optimal algorithms for private online learning in full-information and bandit settings. NIPS 2013
Perturbation Schemes in Statistics
Abernethy, Lee, Sinha, Tewari : Online Linear Optimization via Smoothing. COLT 2014
Hazan, Jaakkola : On the Partition Function and Random Maximum A-Posteriori Perturbations. ICML 2012
No-Regret Algorithms in Structured Settings
Garber, Hazan, Ma : Online Learning of Eigenvectors. ICML 2015
Hazan, Koren, Livni, Mansour: Online Learning with Low-Rank Experts. COLT 2016
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the talks