Seyed Sobhan Mir Yoosefi will present his General Exam on May 2nd, 2019 at 10:45am in CS 401.

The members of his committee are as follows: Yoram Singer (adviser), Robert Schapire, and Matt Weinberg

Everyone is invited to attend his talk, and those faculty wishing to remain for the oral exam following are welcome to do so. His abstract and reading list follow below.

Title: Reinforcement Learning with General Constraints

Abstract:

In some applications of RL, it's not easy to achieve the goal in mind only by designing a scalar reward function. In those tasks, it might be more appropriate to think of both constraints and reward. These constraints might help you to meet some safety criteria, shape the behavior of the agent, or encourage exploration and diversity. The previous works done on this topic are mostly limited to a specific type of constraints and they aim to achieve only one of the above.

We introduce a new formulation of the problem which handles the general form of constraints; we are given an MDP in which instead of getting a scalar reward for our actions, we receive a vector measurement. Now the goal is to find a policy for which the expected vector measurement lies in a given convex target set. Given access to a planning oracle (scalar with no constraint), We show that we are able to give an algorithm which solves this general problem. Now we can aim for more than one of the goals the previous algorithm were dealing with. For example, we can meet the safety constraints and at the same time encourage exploration and diversity.

Reading List:

https://docs.google.com/document/d/1gyK7CSo3Fh99ABH8O6yNYtPY2U_DHW5wUcQ-iXT4dY4/edit?usp=sharing

Text Books:

Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction. MIT press, 2018.

Chapters: 2-10,13

Shalev-Shwartz, Shai, and Shai Ben-David. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.

Chapters: 2-6, 9, 10, 12, 14

Papers:

Syed, Umar, and Robert E. Schapire. "A game-theoretic approach to apprenticeship learning." Advances in neural information processing systems. 2008.
Blackwell, David. "An analog of the minimax theorem for vector payoffs." Pacific Journal of Mathematics 6.1 (1956): 1-8.
Abernethy, Jacob, Peter L. Bartlett, and Elad Hazan. "Blackwell approachability and no-regret learning are equivalent." Proceedings of the 24th Annual Conference on Learning Theory. 2011.
Hazan, E., Kakade, S. M., Singh, K., & Van Soest, A. (2018). Provably efficient maximum entropy exploration. arXiv preprint arXiv:1812.02690.
Le, Hoang M., Cameron Voloshin, and Yisong Yue. "Batch Policy Learning under Constraints." arXiv preprint arXiv:1903.08738 (2019).
C. Tessler, D. J. Mankowitz, and S. Mannor. Reward constrained policy optimization. arXiv preprint arXiv:1805.11074, 2018.
Agrawal, Shipra, and Nikhil R. Devanur. "Bandits with concave rewards and convex knapsacks." Proceedings of the fifteenth ACM conference on Economics and computation. ACM, 2014.
Mannor, Shie, and John N. Tsitsiklis. "Online learning with constraints." International Conference on Computational Learning Theory. Springer, Berlin, Heidelberg, 2006.
Abernethy, Jacob D., and Jun-Kun Wang. "On frank-wolfe and equilibrium computation." Advances in Neural Information Processing Systems. 2017.