Orestis Plevrakis will present his Pre-FPO talk "New Exploration and Identification schemes for Linear Dynamics: the cases of Convex costs and Censored Data" on Wednesday, June 9, 2021 at 2PM via zoom.
Orestis Plevrakis will present his Pre-FPO talk "New Exploration and Identification schemes for Linear Dynamics: the cases of Convex costs and Censored Data" on Wednesday, June 9, 2021 at 2PM via Zoom. Zoom link: https://princeton.zoom.us/j/99277010909 Committee: Examiners: Sanjeev Arora (advisor), Mark Braverman, and Chi Jin (ECE) Readers: Elad Hazan and Jason D. Lee (ECE) All are welcome to attend. Title: "New Exploration and Identification schemes for Linear Dynamics: the cases of Convex costs and Censored Data" Abstract: Adaptive control, i.e., learning to control an unknown linear dynamical system, has a long history in statistics and control theory. However, sharp non-asymptotic bounds were only recently obtained, and are outputs of an ongoing research program aiming to apply tools from learning theory and optimization, to control theory. A common feature of all the major results of the program has been the learning-part of the proposed algorithms, i.e., the exploration strategy. All these algorithms apply the so-called "certainty-equivalence", where the learner spends the first steps applying random controls (e.g. Gaussian noise), then estimates the dynamics via least-squares, and thereafter optimizes the control-policy using these estimates. Even though this is the simplest exploration strategy, it is optimal for the classical linear quadratic regulator (LQR). In this thesis, I study 1) the problem of adaptive control when the cost can be any convex function (thus generalizing LQR), and 2) the problem of learning a linear dynamical system (LDS) from "censored" observations, i.e., the state is observed only when it falls inside some set (e.g., the state can be the position of an object and the set can be the camera-frame). For the first problem, certainty-equivalence is suboptimal. For the second, the least-squares solution is not even consistent. The contribution of the thesis is the development of new computationally and statistically efficient algorithms for both problems. Specifically, 1. For adaptive control with convex costs, we consider the objective of regret with respect to the benchmark of stabilizing linear control policies. Leveraging ideas from convex geometry, we design an exploration strategy that achieves optimal regret, in terms of the dependence on the time-horizon. Our result improves upon the previous best known bounds, which correspond to algorithms applying certainty-equivalence. 2. In the problem of learning an LDS from censored observations, the learner observes the state $x_t \in \R^d$ if and only if $x_t$ belongs to some set $S_t \subseteq \R^d$. This setting was first considered by Lee and Maddala (1985), and Zeger and Brookmeyer (1986). We develop the first computationally and statistically efficient algorithm for learning the system, assuming only membership-oracle access to the sets $S_t$ (which can be arbitrarily complex sets, e.g., non-convex). Our algorithm, Stochastic Online Newton with Switching Gradients, is a novel second-order method that builds on the Online Newton Step of Hazan et al. (2007).
participants (1)
-
jfarquer@cs.princeton.edu