*This was a HYBRID Event with in-person attendance in Levine 512 and Virtual attendance…
Offline reinforcement learning (RL), which uses pre-collected, reusable offline data without further environment interactions, permits sample-efficient, scalable and practical decision-making; however, most of the existing literature (1) focuses on improving algorithms for maximizing the expected cumulative reward, and (2) assumes the reward function to be given. This limits the applicability of offline RL in many realistic settings — for instance, there are often safety or risk constraints that need to be satisfied, and the reward function is often difficult to specify. In this talk, we will explore how we can (1) train a broad class of risk-sensitive agents using purely risk-neutral offline data and provably prevent out-of-distribution extrapolations, and (2) bootstrap offline RL through flexible forms of expert demonstrations, significantly expanding the scope of valid supervision for offline policy learning. With these advances, we aim to bring offline RL closer to real-world applications.