Title: Reinforcement Learning with General Constraints


In some applications of RL, it's not easy to achieve the goal in mind only by designing a scalar reward function. In those tasks, it might be more appropriate to think of both constraints and reward. These constraints might help you to meet some safety criteria, shape the behavior of the agent, or encourage exploration and diversity. The previous works done on this topic are mostly limited to a specific type of constraints and they aim to achieve only one of the above.

We introduce a new formulation of the problem which handles the general form of constraints; we are given an MDP in which instead of getting a scalar reward for our actions, we receive a vector measurement. Now the goal is to find a policy for which the expected vector measurement lies in a given convex target set. Given access to a planning oracle (scalar with no constraint), We show that we are able to give an algorithm which solves this general problem. Now we can aim for more than one of the goals the previous algorithm were dealing with. For example, we can meet the safety constraints and at the same time encourage exploration and diversity. 

