Syed, Umar, and Robert E. Schapire. "A game-theoretic approach to apprenticeship learning." Advances in neural information processing systems. 2008.
Blackwell, David. "An analog of the minimax theorem for vector payoffs." Pacific Journal of Mathematics 6.1 (1956): 1-8.
Abernethy, Jacob, Peter L. Bartlett, and Elad Hazan. "Blackwell approachability and no-regret learning are equivalent." Proceedings of the 24th Annual Conference on Learning Theory. 2011.
Hazan, E., Kakade, S. M., Singh, K., & Van Soest, A. (2018). Provably efficient maximum entropy exploration. arXiv preprint arXiv:1812.02690.
Le, Hoang M., Cameron Voloshin, and Yisong Yue. "Batch Policy Learning under Constraints." arXiv preprint arXiv:1903.08738 (2019).
C. Tessler, D. J. Mankowitz, and S. Mannor. Reward constrained policy optimization. arXiv preprint arXiv:1805.11074, 2018.
Agrawal, Shipra, and Nikhil R. Devanur. "Bandits with concave rewards and convex knapsacks." Proceedings of the fifteenth ACM conference on Economics and computation. ACM, 2014.
Mannor, Shie, and John N. Tsitsiklis. "Online learning with constraints." International Conference on Computational Learning Theory. Springer, Berlin, Heidelberg, 2006.
Abernethy, Jacob D., and Jun-Kun Wang. "On frank-wolfe and equilibrium computation." Advances in Neural Information Processing Systems. 2017.