PhiBE-Q-Learning: Bridging Off-Policy Reinforcement Learning and Continuous-Time Control


This is a companion discussion topic for the original entry at https://arxiv.org/abs/2606.21925