Niranjani Prasad will present her Pre FPO "Methods for Reinforcement Learning in Clinical Decision Support" on Friday, February 21, 2020 at 1pm in CS 402.

The members of her committee are as follows: Barbara Engelhardt (advisor); Readers: Ryan Adams, Finale Doshi-Velez (Harvard);  Examiners: Sebastian Seung, Mengdi Wang (ORFE), and Barbara Engelhardt 

Everyone is invited to attend her talk.  The talk abstract follows below:

 The management of routine interventions such as mechanical ventilation and sedation, or the ordering of blood tests, constitutes a major part of the care of critically ill patients. Timely and proportionate interventions are crucial to improving patient outcomes and reducing costs, but these procedures
 are often poorly understood, particularly for heterogeneous patient populations, and clinical opinion on best protocols vary.

 In my thesis, I focus on the development of a clinician-in-loop decision support system for weaning patients from mechanical ventilation, that leverages available information in the data-intensive ICU setting to predict time to extubation readiness and provide personalized recommendations for sedation
 and ventilator support. To this end, I employ off-policy reinforcement learning algorithms to learn an optimal sequence of treatment actions from suboptimal historical ICU data. I model patient admissions as Markov decision processes (MDPs), developing tailored
 representations of patient state, action space and reward function, and learn treatment policies using Fitted Q-iteration. I demonstrate that this framework shows promise in recommending protocols with positive clinical outcomes, in terms of managing patient
 stability and reintubation rates, when assessed against current practice.

 The second part of my thesis is directed towards the problem of effective reward design when applying reinforcement learning to clinical decision-making tasks. A key impediment to reinforcement learning in practice is in distilling multiple (and often conflicting) clinical imperatives into a single
 scalar feedback signal. To tackle this, I develop methods based on Pareto optimality to solve MDPs given a vector-valued reward function. I illustrate these methods in the context of optimizing the ordering of laboratory tests for patients in critical care.
Extending this further, I introduce a mechanism for using available clinical data to restrict the candidate set of convex combinations of elements in the reward function, to those that yield a scalar reward which (a) reflects what we implicitly know from the data about reasonable behaviour for a task, and (b) allows for robust off-policy evaluation. Together, these help ensure that we learn policies that we trust to be implemented in high-risk settings.