Orestis Plevrakis will present his FPO "New Exploration and Identification schemes for Linear Dynamics: the cases of Convex costs and Censored Data" on Thursday, July 22, 2021 at 1PM via Zoom.

 

Zoom link: https://princeton.zoom.us/j/92035947506

 

The members of Orestis’ committee are as follows: Sanjeev Arora (Adviser); Readers: Chi Jin and Jason Lee; Examiners: Mark Braverman, Elad Hazan and Sanjeev Arora.

 

A copy of his thesis is available upon request. Please email gradinfo@cs.princeton.edu if you would like a copy of the thesis.

 

Everyone is invited to attend his talk.

 

Abstract is as follows:

 

Adaptive control, i.e., learning to control an unknown linear dynamical system, has a long history in statistics and control theory. However, sharp non-asymptotic bounds were only recently obtained, and are outputs of an ongoing research program aiming to apply tools from learning theory and optimization, to control theory. A common feature of all the major results of the program has been the learning-part of the proposed algorithms, i.e., the exploration strategy. All these algorithms apply the so-called “certainty-equivalence”, where the learner spends the first steps applying random controls (e.g. Gaussian noise), then estimates the dynamics via least-squares, and thereafter optimizes the control-policy using these estimates. Even though this is the simplest exploration strategy, it is optimal for the classical linear quadratic regulator (LQR). In this thesis, I study 1) the problem of adaptive control when the cost can be any convex function (thus generalizing LQR), and 2) the problem of learning a linear dynamical system (LDS) from “censored” observations, i.e., the state is observed only when it falls inside some set (e.g., the state can be the position of an object and the set can be the camera-frame). For the first problem, certainty-equivalence is suboptimal. For the second, the least-squares solution is not even consistent. The contribution of the thesis is the development of new computationally and statistically efficient algorithms for both problems. Specifically,

1. For online control with convex costs, we consider the objective of regret vs. the class of disturbance-feedback-controllers. Leveraging ideas from convex geometry, we design an exploration strategy that achieves optimal regret, improving upon the previous best known bounds, which correspond to algorithms applying certainty-equivalence.

2. The problem of learning an LDS from censored observations was first considered by Lee and Maddala (1985), and Zeger and Brookmeyer (1986). We develop a new second order method: Stochastic Online Newton with Switching Gradients, building on the Online Newton Step of Hazan et al. (2007). This method is the first computationally and statistically efficient algorithm for this problem.