Federated Q-Learning with Reference-Advantage Decomposition: Almost Optimal Regret and Logarithmic Communication CostPublished in ICLR, 2025Share on Bluesky Facebook LinkedIn X (formerly Twitter) Previous Next