microsoft Feb 27, 2026 Engineering and algorithmic interventions for multimodal post-training at Microsoft scale (opens in new tab) llmreinforcement-learningpost-trainingmultimodal-agentspolicy-gradientagentic-systemscurriculum-learning