microsoft Jan 28, 2026

Diagnosing instability in production-scale agent reinforcement learning (opens in new tab)