Gap-Dependent Bounds for Q-Learning using Reference-Advantage Decomposition

Published in ICLR, 2025