Exploration Scavenging
Learning from Logged Interventions
High Confidence Policy Improvement
Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms
Toward the understanding of partial-monitoring games
Safe RL
Off-policy Model-based Learning under Unknown Factored Dynamics
Policy Search for RL
Deep Reinforcement Learning
Large Scale Ranking Problem: some theoretical and algorithmic issues