Engineering and algorithmic interventions for multimodal post-training at Microsoft scale (opens in new tab)
Aditya Challapally leads post-training research and infrastructure for Copilot agent capabilities that process millions of multimodal interactions. This post builds on the diagnostics from Diagnosing instability in production-scale agent reinforcement learning with the engineeriā¦